STILL LEARNING: INTRODUCING THE LEARNING TRANSFER MODEL, A FORMAL MODEL OF TRANSFER By Jeffrey David Olenick A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Psycholo gy Doctor of Philosophy 2020 ABSTRACT STILL LEARNING: INTRODUCING THE LEARNING TRANSFER MODEL, A FORMAL MODEL OF TRANSFER By Jeffrey David Olenick Although training has been a key topic of study in organizational psychology for over a century, a century which has seen great progress in our understanding of what a quality training program entails, a substantial gap persists between what is trained and what is transferred to the job. Reduction of the training - transfer gap has driven research on tra nsfer - focused interventions which have proven effective. However, although we know a lot regarding how individuals learn new material, and correlates of wheth er they transfer that material back to their work environment, we know very little about how indiv iduals go about choosing whether to apply their new knowledge to , typically , previously - encountered situations in their work environment and how those decisions unfold over time. Improving our knowledge regarding how individuals transfer learned material w ill lead to new insights on how to support the transfer of organizationally directed training, or any learning event, back to the work environment. Thus, the present paper introduces a formal model of the transfer process , the Learning Transfer Model (LTM) , which proposes a process for how transfer unfolds over time and gives rise to many of the findings we have accumulated in the transfer literature. This is accomplished by reconceptualizing transfer as its own learning process which is affected by the dua l nature of s elf - regulatory processes . The LTM was then instantiated in a series of computational model s for virtual experimentation. Findings and implications for research and practice are dis cussed throughout . Copyright by JEFFREY DAVID OLENICK 2020 iv This dissertation is dedicated to my loving family, without whom I could not have made it this far v ACKNOWLEDGEMENTS I would like to thank all those who have helped me over the years to get to this point . To my professors : Dr. Steve W.J. Kozlowski for being my primary mentor through my doctoral career and always pushing me further in my thinking; Dr. J. Kevin Ford for being a good mentor and friend, and providing the developmental opport uni ties through which the ideas expressed within this dissertation were formed; Drs. Richard DeShon and Zachary Neal for being a part of my committee and introducing me to the kind of theory and models which provide the basis for my thinking in this paper; Dr . Ann Marie Ryan for guiding me when I wanted to switch careers and had no experience in this field; and Drs. Michael Stamm and Matthew Pauly for believing in me as a struggling undergraduate searching for my place in the world. Thank you to my friends who have always been there to provide a distraction from the pressures of life. A special thank you to my family, especially my mother and my grandparents for providing me with the foundation I needed to reach the success I have. And my father, alt hough you l eft my life far too soon, you have provided a lifetime of inspiration. Thank you to my son for providing a daily dose of motivation and levity, although it seems like a weird way to show it, this paper is very much a labor of love for you. Final ly, thank y ou to my wonderful wife, Catherine. You pulled me out of the darkest time of my life and helped get me back on track. All my accomplishments would be impossible without you. vi TABLE OF CONTENTS LI ST OF TABLES ................................ ................................ ................................ .......................... x LIST OF FIGURES ................................ ................................ ................................ ....................... xi LIST OF ALGORITHMS ................................ ................................ ................................ ............ xvi Introducti on ................................ ................................ ................................ ................................ ... 1 Review of Transfer Literature ................................ ................................ ................................ ..... 6 Computational Modeling and t he Modeling Cycle ................................ ................................ ... 13 Transfer Findings for Which to Account ................................ ................................ .................. 15 Practice and Overlearning ................................ ................................ ................................ ..... 17 Utility Reactions ................................ ................................ ................................ ................... 18 Work Environment ................................ ................................ ................................ ................ 18 Implementation intentions ................................ ................................ ................................ .... 20 Maintenance Curves ................................ ................................ ................................ .............. 21 Self - efficacy ................................ ................................ ................................ .......................... 22 Skill type ................................ ................................ ................................ ............................... 22 Near versus Far Transfer, Adaptive Transfer an d Adaptive Per formance ............................ 23 Study 1: Base Learning Transfer Model ................................ ................................ .................. 26 Dual Process Models and Habits ................................ ................................ .............................. 26 Reinforcement Learning ................................ ................................ ................................ ........... 33 The Learning Transfer Model ................................ ................................ ................................ ... 36 Study 1: Method ................................ ................................ ................................ .......................... 44 Model outcome metrics ................................ ................................ ................................ ............. 44 Analysis ................................ ................................ ................................ ................................ ..... 46 Study 1: Simulation and Results ................................ ................................ ................................ 48 Model verification ................................ ................................ ................................ ..................... 48 Logical Consistency ................................ ................................ ................................ .............. 48 Parameter Effects Check ................................ ................................ ................................ ....... 48 Simulation Length ................................ ................................ ................................ ................. 49 Policy Value ................................ ................................ ................................ .......................... 49 Policy Value Estimates ................................ ................................ ................................ ......... 50 Explor ation Rate ................................ ................................ ................................ ................... 50 Generative Sufficiency, Sensitivity and Robustness ................................ ................................ 51 T rue policy values ................................ ................................ ................................ ................. 51 Timing of interventions ................................ ................................ ................................ ......... 55 Type 2 Processing ................................ ................................ ................................ ................. 56 vii Practice and Overlearning ................................ ................................ ................................ ..... 58 Utility reactions ................................ ................................ ................................ ..................... 59 Transfer trajectories ................................ ................................ ................................ .............. 60 Implementation Intentions ................................ ................................ ................................ .... 60 Exploration rates ................................ ................................ ................................ ................... 61 Exploratory experimentation ................................ ................................ ................................ 62 Study 1: Discussio n ................................ ................................ ................................ ..................... 65 Theoretica l Implications ................................ ................................ ................................ ........... 65 Practical Implications ................................ ................................ ................................ ................ 69 Conclusion ................................ ................................ ................................ ................................ 72 Study 2A: Adding Social Learning to the LTM ................................ ................................ ....... 74 Social Learning Theory ................................ ................................ ................................ ............. 74 The Formal Transfer Model with Soc ial Learning ................................ ................................ ... 76 Study 2A: Method, Simulation and Results ................................ ................................ ............. 79 Virtual Experimentation ................................ ................................ ................................ ............ 79 Model verification ................................ ................................ ................................ ................. 79 Number of Trainees ................................ ................................ ................................ .............. 80 Connectedness ................................ ................................ ................................ ....................... 81 Interac tion between Trainees and Connectedness ................................ ................................ 81 Study 2A: Discussion and Conclusion ................................ ................................ ....................... 83 Study 2B and 2C: Rethinking Social Learning Model ................................ ............................ 85 Model 2B Overview ................................ ................................ ................................ .................. 88 Model 2C Overview ................................ ................................ ................................ .................. 89 Study 2B: Method, Simulation and Results ................................ ................................ .............. 91 Trainees Versus Imitation Experiment ................................ ................................ ..................... 91 Study 2C: Me thod, Si mulation and Results ................................ ................................ ............. 94 Trainees Versus Conformity Experiment ................................ ................................ ................. 94 Study 2B and 2C: Discussion and Conclusion ................................ ................................ .......... 97 Implications for Theory ................................ ................................ ................................ ............ 97 Future modeling of social learning ................................ ................................ ..................... 100 Other modeling p ossibilities ................................ ................................ ............................... 102 Implications for Practice ................................ ................................ ................................ ......... 103 Conclusion ................................ ................................ ................................ .............................. 104 Study 3 A: Adding Self - Regulation to the Transfer Process Model ................................ ...... 105 Self - Regulation ................................ ................................ ................................ ....................... 105 Hierarchical goal pursuit ................................ ................................ ................................ ..... 105 Self - regulatory negative feedback systems ................................ ................................ ......... 106 viii Self - Efficacy ................................ ................................ ................................ ....................... 107 The LTM with Self - Regulation ................................ ................................ .............................. 108 Study 3A: Method, Simulation , and Results ................................ ................................ .......... 111 Virtual Experimentation ................................ ................................ ................................ .......... 111 Study 3A: Discussion ................................ ................................ ................................ ................ 115 Study 3B: Tweaking Goal Seeking ................................ ................................ .......................... 116 Model 3B - 1 ................................ ................................ ................................ ............................. 117 Model 3B - 2 ................................ ................................ ................................ ............................. 117 Study 3B: Methods, Simulation, and Results ................................ ................................ ......... 119 Model 3B - 1 ................................ ................................ ................................ ............................. 119 Model 3B - 2 ................................ ................................ ................................ ............................. 121 Study 3B: Discussion ................................ ................................ ................................ ................. 124 Study 3C: Engagement Thresholds ................................ ................................ ......................... 127 Disc ontinuous Self - Efficacy in the LTM ................................ ................................ ................ 128 Study 3C: Methods, Simulation, and Results ................................ ................................ ......... 130 Causal Effects of Self - Efficacy on Transfer and Performance ................................ ............... 130 Effects of Engagement Threshold ................................ ................................ ........................... 132 Study 3C: Discussion ................................ ................................ ................................ ................ 134 Theoretical and Research Implications ................................ ................................ ................... 134 Practical Implications ................................ ................................ ................................ .............. 138 Conclusion ................................ ................................ ................................ .............................. 138 Study 4: Exploring the Full LTM Model ................................ ................................ ................ 139 Experiment 4A: Engagement Thresholds, Value Changes and Implementation Intentions ... 139 Methods ................................ ................................ ................................ ............................... 140 Results ................................ ................................ ................................ ................................ . 141 Discussion ................................ ................................ ................................ ........................... 142 Experiment 4B: Number of Trainees, Conformity, and Goal Levels ................................ ..... 142 Methods ................................ ................................ ................................ ............................... 143 Results ................................ ................................ ................................ ................................ . 143 Discussion ................................ ................................ ................................ ........................... 145 Experiment 4C: Value Change, Conformity, and Goal Levels ................................ .............. 145 Method s ................................ ................................ ................................ ............................... 146 Results ................................ ................................ ................................ ................................ . 147 Discussion ................................ ................................ ................................ ........................... 148 Experiment 4D: Type 2 Likelihood, Con formity, and Goal Levels ................................ ....... 148 Methods ................................ ................................ ................................ ............................... 149 Results ................................ ................................ ................................ ................................ . 149 ix Discussion ................................ ................................ ................................ ........................... 150 Overall Discussion ................................ ................................ ................................ .................. 151 Overall Discussion ................................ ................................ ................................ ..................... 152 Theoretical Implications and Future Research Directions ................................ ...................... 157 Practical Implications ................................ ................................ ................................ .............. 166 Conclu sion ................................ ................................ ................................ ................................ . 170 APPENDICES ................................ ................................ ................................ ............................ 268 Appendix A: Study 1 Environment and Code ................................ ................................ ........ 269 Appendix B: Study 2A Environment and Code ................................ ................................ ...... 275 Appendix C: Stud y 2B Environment and Code ................................ ................................ ...... 282 Appendix D: Study 2C Environment and Code ................................ ................................ ...... 289 Appendix E: Study 3A Environment and Code ................................ ................................ ...... 297 Appendix F : Studies 3B - 1 and 3B - 2 Environment and Code ................................ ................. 305 Appendix G: Study 3C E nvironment and Code ................................ ................................ ...... 320 REFERENCES ................................ ................................ ................................ ........................... 328 x LIST OF TABLES Table 1. Model 1 Variables ................................ ................................ ................................ ......... 171 Table 2. Model 1 Equations. ................................ ................................ ................................ ....... 172 Table 3. Overall results for practice effect on behavioral tr ansfer and performance change in Model 1. ................................ ................................ ................................ ................................ ...... 173 Table 4. Experimental comparisons of practice conditions to control for behavioral transfer and performance change in Model 1. ................................ ................................ ................................ 174 Table 5 . Initial policy value estimate effects on behavioral transfer and performance change in Model 1. ................................ ................................ ................................ ................................ ...... 175 Table 6. Implementation level effects on behavioral transfer and performance change in Mode l 1. ................................ ................................ ................................ ................................ ..................... 176 Table 7. Model 2 Variables. ................................ ................................ ................................ ........ 17 7 Table 8. Model 2 Equations. ................................ ................................ ................................ ....... 178 Table 9. Effects of number of trainees on behavioral transfer and pre - post performance change in Model 2A. ................................ ................................ ................................ ................................ ... 179 Table 10. Connectedness effec ts on behavioral transfer and pre - post performa nce change in Model 2A. ................................ ................................ ................................ ................................ ... 180 Table 11. Model 3 Variables. ................................ ................................ ................................ ...... 181 Table 12. Model 3 Equati o ns. ................................ ................................ ................................ ..... 182 Table 13. Three - way interaction models for Experiment 4A. ................................ .................... 183 Table 14. Three - way interaction models for Experiment 4B. ................................ .................... 184 Table 15. Three - way interaction models for Experiment 4C. ................................ .................... 185 Table 16. Three - way interaction models for Experiment 4D. ................................ .................... 186 xi LIST OF FIGURE S Figure 1. Conceptual model for initial LTM. ................................ ................................ ............. 187 Figure 2. Behavioral Transfer for exploration o f polic y values in Model 1. .............................. 188 ..... 189 Figure 4. Behavioral Transfer for exploration of policy value changes in Model 1. .................. 190 ................................ ................................ ................................ ................................ ..................... 191 Figure 6. Behavioral Transfer for exploration of burn - in and transfer times in Model 1. .......... 192 Figure 7. Performance change for exploration of burn - in and t ransfer times in Model 1. ......... 193 Figure 8. Predicting behavioral transfer from type 2 processing likelihood in Model 1. ........... 194 Figure 9. Pre dicting performance change from type 2 processing likelihood in Model 1. ......... 195 Figure 10A - D. Example transfer trajectories for Mode l 1. ................................ ......................... 196 Figure 11. Exploration rat e effect on behavioral transfer in Model 1. ................................ ....... 197 Figure 12. Exploration rate effect on performance change in Model 1. ................................ ..... 198 Figure 13. Type 2 likelihood vs implementation intention experimental effect on behavioral transfer in Model 1. ................................ ................................ ................................ ..................... 199 Figure 14. Type 2 likelihood vs implementation intention exper imental effect on performan ce change in Model 1. ................................ ................................ ................................ ...................... 200 Figure 15. Type 2 likelihood vs implementation intention experimental effect on behavioral transfer in Model 1 heat map. ................................ ................................ ................................ ..... 201 Figure 16. Type 2 likelihood vs implementation intention experimental effect on post training performance in Model 1 heat map. ................................ ................................ ............................. 202 Figure 17. Type 2 li kelihood vs implementati on intention experimental effect on performance change in Model 1 heat map. ................................ ................................ ................................ ...... 203 Figure 18. Proposed conceptual model for LTM with Social Learning. ................................ .... 204 Figure 19. Heatmap of interaction effect of number of trainees and connectedness on behavioral transfer in Model 2A. ................................ ................................ ................................ .................. 205 xii Figure 20. Number of trainees and level of imitation p redicting behavioral transfer in Model 2B (replication level). ................................ ................................ ................................ ....................... 206 Figure 21. Number of trainees and level of imitation predicting post trainin g performance in Model 2B (repl ication level). ................................ ................................ ................................ ...... 207 Figure 22. Number of trainees and level of imitation predicting pre - post training performance in Model 2B (condition level). ................................ ................................ ................................ ........ 208 Fig ure 23. Heatmap of trainees and imitation predicting behavioral transfer in Model 2B. ...... 209 Figure 24. Heatmap of trainees and imitation predicting post training performance in Mo del 2B. ................................ ................................ ................................ ................................ ..................... 210 Figure 25. Heatmap of trainees and imitation predicting pre - post performance change in Model 2B. ................................ ................................ ................................ ................................ ............... 211 Figure 26. Number of trainees and le vel of conformity predicting behavioral transfer in Model 2C (replication level). ................................ ................................ ................................ ................. 212 Figure 27. Number of trainees and level of conformity predicting post training performance in Model 2C (repl ication level). ................................ ................................ ................................ ...... 213 Figure 28. Number of trainees and level of conformity predicting pre - post performance change in Model 2C (condition level). ................................ ................................ ................................ ........ 214 Figure 29. Heat map of number of trainees and level of conformity predicting behavioral transfer in Model 2C. ................................ ................................ ................................ ............................... 215 Figure 30. Heat map of number of trainees and level o f conformit y predicting post training performance in Model 2C. ................................ ................................ ................................ .......... 216 Figure 31. Heat map of number of trainees and level of conformity predicting pre - post performance change in Model 2C. ................................ ................................ .............................. 217 Figure 32. Conceptual model for LTM including self - regulation. ................................ ............. 218 Figure 33. Goal level and exploration rate change predicting post trainin g perfor mance in Model 3A (replication level). ................................ ................................ ................................ ................. 219 Figure 34. Goal level and exploration rate change predicting behavioral transfer in Model 3A (replication level). ................................ ................................ ................................ ....................... 220 Figure 35. Goal level and exploration rate change predicting pre - post performance change in Model 3A (condition level). ................................ ................................ ................................ ........ 221 Figure 36. Heat map of goal level an d exp loration rate change predicting behavioral transfer in Model 3A. ................................ ................................ ................................ ................................ ... 222 xiii Figure 37. Heat map of goal level and exploration rate change predicting post training performance in Model 3A. ................................ ................................ ................................ .......... 223 Figure 38. Heat map of goal level and exploration rate change predicting pre - post performance change in Model 3A. ................................ ................................ ................................ ................... 224 Figure 39. Observed p os t training performance by goal level in Model 3B - 1. .......................... 225 Figure 40. Observed behavioral transfer by goal level in Model 3B - 1. ................................ ..... 226 Fig ure 41. Observed pre - post performance change by goal level in Model 3B - 1. ..................... 227 Figure 42. Goal level and policy value change predicting behavioral transfer in Model 3B - 1 (replication level). ................................ ................................ ................................ ....................... 228 Figure 43. Goal level and policy value change predicting post training performance in Model 3B - 1 (replication level). ................................ ................................ ................................ .................... 229 Figure 44. Go al level and policy value change predicting pre - post performance change in Model 3B - 1 (condit ion level). ................................ ................................ ................................ ................ 230 Figure 45. Heat map of goal level and policy value change predicting behavioral tr ansf er in Model 3B - 1. ................................ ................................ ................................ ................................ 231 Figure 46. Heat map of goal level and policy value change predicting post training performance in Model 3B - 1. ................................ ................................ ................................ ............................ 232 Figure 47. Heat map of goal level and policy value change predicting pre - post performance change in Model 3B - 1. ................................ ................................ ................................ ................ 233 Figure 48. Observed post training performance by goal level in Model 3 B - 2. .......................... 234 Figure 49. Observed behavioral transfer by goal level in Model 3B - 2. ................................ ..... 235 Figure 50. Observed pre - post performance change by goal lev el in Model 3B - 2. ..................... 236 Figure 51. Goal level and policy value change predicting behavioral transfer in Model 3B - 2 (replication level). ................................ ................................ ................................ ....................... 237 Figure 52. Goal level and policy value change predicting post training performance in Model 3B - 2 (replication level). ................................ ................................ ................................ .................... 238 Figure 53. Goal level and policy value change predicting pre - post per formance change in Model 3B - 2 (condition level). ................................ ................................ ................................ ................ 239 Figure 54. Heat map of goal level and policy value change predicting behavioral transfer in Model 3B - 2. ................................ ................................ ................................ ................................ 240 xiv Figure 55. Heat map of goal level and policy value change predicting post training performance in Model 3B - 2. ................................ ................................ ................................ ............................ 241 Figure 56. Heat map of goal level and p olicy value cha nge predicting pre - post performance change in Model 3B - 2. ................................ ................................ ................................ ................ 242 Figure 57. Observed and predicted behavioral transfer from threshold level in Model 3C. ...... 243 Figure 58. Observed and predicted post training performance from threshold level in Model 3C. ................................ ................................ ................................ ................................ ..................... 244 Figure 59. Observed and predicted pre - post performance chan ge from threshold level in Model 3C. ................................ ................................ ................................ ................................ ............... 245 Fi gure 60. Three - way interaction of engagement thresholds, implementation intentions, and value change predicting behavioral transfer in Experiment 4A (r eplication level). ................... 246 Figure 61. Three - way interaction of engagement thresholds, implementation intentions, and value change predicting post training performance in Experiment 4A (replication l evel). ........ 247 Figure 62. Three - way interaction of engagement thresholds, implementation intentions, and value change predicting pre - post training performance change in Experiment 4A (condition leve l). ................................ ................................ ................................ ................................ .......... 248 Figure 63. Heat map of three - way interaction of engagement thresholds, implementation intentions, and value change predicting behavioral transfer in Experiment 4A (replication level). ................................ ................................ ................................ ................................ ..................... 249 Figure 64. Heat map of three - w ay interaction of engagement thresholds, implementation intentions, and value change predicting post training performance in Experiment 4A (replication level). ................................ ................................ ................................ ................................ .......... 250 Figure 65. Heat map of three - way interaction of engagement thresholds, implementation intentions, and value change predicting pre - post training performance change in Experiment 4A (condition l evel). ................................ ................................ ................................ ......................... 251 Figure 66. Three - way interaction of number of trainees, conformity, and goals predicting behavioral transfer in Experiment 4B (replication level). ................................ .......................... 252 Figure 67. Three - way interaction of number of trainees, conformity, and goals predicting post training performance in Experiment 4B (rep lication level). ................................ ....................... 253 Figure 68. Heat maps of three - way interaction of number of trainees, conformity, and goals predicting behavioral transfer in Experiment 4B (replication level). ................................ ......... 254 Figure 69. Heat maps of three - way int eraction of number of traine es, c onformity, and goals predicting post training performance in Experiment 4B (replication level). .............................. 255 xv Figure 70. Three - way interaction of conformity, goals, and va lue change predicting behavio ral transfer in Experiment 4C (replication level). ................................ ................................ ............ 256 Figure 71. Three - way interaction of conformity, goals, and value change predicting post training performance i n Experiment 4C (replication l evel). ................................ ................................ .... 257 Figure 72. Three - way interaction of conformity, goals, and value change predicting pre - post training performance change in Experiment 4C (condition level ). ................................ ............. 258 Figure 73. Heat map of three - way interaction of conformity, goals, and value change predicting behavioral transfer in Experiment 4C (replication level). ................................ .......................... 259 F igure 74. Heat map of three - way interaction of conformity, goals, and value change predicting post training performance in Experiment 4C (replication level). ................................ ............... 260 Figure 75. Heat map of three - way interaction of conformity, goals, and value change predicting pre - post training performance change in Experiment 4C (condition level). .............................. 261 Figure 76. Three - way interaction of type 2 likelihood, c onformity, and goals predicting behavioral transfer in Experiment 4D (replication level). ................................ .......................... 262 Figure 77. Three - way interaction of type 2 likelihood, conformity , and goals predicting post training performance in Experiment 4D (replication level). ................................ ....................... 263 Figure 78. Three - way interaction of type 2 likelihood, conformity, and goals predicting pre - post traini ng performance change in Experiment 4 D (condition level). ................................ ............ 264 Figure 79. Heat map of three - way inter action type 2 likelihood, conformity, and goals predicting behavioral transfer in Experiment 4 D (replication level). ................................ .......................... 265 Figure 80. H eat map of three - way interaction of type 2 likelihood, conformity, and goals predicting post training performance in Experiment 4D (replication level). .............................. 266 Figure 81. Heat map of three - way interaction of type 2 likelihood, conformity, and goals predicting pre - post training performance change in Experiment 4D (condition level). ............. 267 Figure 82. Snapshot of the modeling environment for Study 1 in NetLogo. .............................. 269 Figure 83. Snapshot of the modeling environment for Study 2A in NetLogo. ........................... 275 Figure 84. Snaps hot of the modeling environment for Study 2B in NetLogo. ........................... 282 Figure 85. Snapshot of the modeling environment fo r Study 2C in NetLogo. ........................... 289 Figure 86. Snapshot of the modeling environment for Model 3A in NetLogo. .......................... 297 Figure 87. Snapshot of the m odeling environment for Models 3B - 1 and 3B - 2 i n NetLogo. ..... 305 Figure 88. Snapshot of the modeling environment for Model 3C in NetLogo. .......................... 320 xvi LIST OF ALGORITHMS Algorithm 1. Value Estimate Calculation ................................ ................................ ..................... 40 Algorithm 2. Type 1 Process Equation ................................ ................................ ......................... 42 Algorithm 3. Probability of Choosing Type 2 Processes ................................ .............................. 42 Algorithm 4. Type 1 Process with Implementation Intentions ................................ ..................... 61 Algorithm 5. Other Agent Value Estimation ................................ ................................ ................ 77 Algorithm 6. Weighted Value Estimate ................................ ................................ ........................ 77 Algorithm 7. Agent Pe rformance ................................ ................................ ................................ 109 Algo rithm 8. Goa l Discrepancy ................................ ................................ ................................ .. 109 Algorithm 9. Effector Mechanism 1 ................................ ................................ ........................... 110 Algorithm 10. Effector Mechanism 2 ................................ ................................ ......................... 117 Algorithm 11. Effector Mechanism 3 ................................ ................................ ......................... 117 Algorithm 12. Effector Mechanism 4 ................................ ................................ ......................... 118 Algorithm 13. NetLogo Code for Study 1 Model ................................ ................................ ....... 270 Alg orithm 14. NetLogo Code for Study 2A Model ................................ ................................ .... 276 Algorithm 15. NetLogo Code for Study 2B Model ................................ ................................ .... 283 Algorithm 16. NetLogo Co de for Study 2C Model ................................ ................................ .... 290 Algorith m 17. NetLogo Code for Model 3A ................................ ................................ .............. 298 Algorithm 18. NetLogo Code for Model 3B - 1 ................................ ................................ ........... 306 Algorithm 19. NetLogo Code for Model 3 B - 2 ................................ ................................ ........... 313 Al gorithm 20. NetLogo Code for Model 3C ................................ ................................ .............. 321 1 Introductio n Continuous learning is a mantra of organizations, often directed towards employees to emphasize the need for continually improving their knowledge and skills to maintain or increase their ability to perform their roles and advance their careers ( e.g., L ondon, 2012 ). To achieve continuous learning , organizations spend e ver - increasing amounts of money on training programs, averaging more than $1,296 per employee in 2017 ( American Society of Training and Development, 2018 ). Thankfully, organizations benefit from spending on training programs. For example, spending on train ing programs aids the development of knowledge, skills, attitudes, and other characteristics (KSAOs) which feed into the emergence of human capital resources for an organization (Ployhart & Moliterno, 2011), which in turn helps organizational profi tability (Kim & Ployhart, 2014). Unfortunately, there remains a gap between what is taught in training programs and what gets transferred back to the work environment, sometimes referred to as the training - transfer gap (e.g., Vermeulen, 2002 ). Typical stat ements t hat only 10 - percent of trained material is transferred to the job are not generally based in fact (Ford, Yelon, & Billington, 2011), but few would argue that no such gap exists. Transfer is that results from a trai ning exp erience transfers to the job and leads to meaningful changes in Blume, Ford, Baldwin, & Huang, 2010 , p. 1066 ). Thus, any rate of transfer less than 100 - percent theoretica lly results in wasted money on the part of the organization as no m eaningful changes in performance occur . What, then, can we do to improve the rates of transfer and reduce the training - transfer gap? Over the last 100 years, r esearch has taken a multi - pro nged approach to this question, seeking to improve training program s at each of their three stages of pre - training, training, and post - training ( e.g., Jaidev & Chirayath, 2012 ), which has led to a great deal of knowledge 2 regarding the functioning of traini ng programs (Bell , Tannenbaum, Noe, & Kraiger, 2017 ). That knowledg e has improved programs largely by introducing principles to the learning event which can improve knowledge retention ( e.g., Donovan & Radosevich, 1999 ; Dunlosky, Rawson, Marsh, Nathan, & W illingham, 2013 ). Though fewer principles exist wit hin the post - tra ining transfer stage of the training process, several consistent findings have emerged , including the use of implementation intentions (e.g., Gollwitzer, 1999), and the perceptions learners hold of the utility of their newly gained knowledg e (e.g., Blume e t al., 2010) , among others . Unfortunately, most studies on the transfer of learning in organizations are essentially correlational and /or cross - sectional in nature . S tudies of training int erventions typically measure learning and individua l difference var iables at the end of training , and then measure the transfer of learning at a single time point in the future. These tendencies can be seen in the types of studies available for meta - analys es of transfer (e.g., Blume et al., 2010; Blume, pe rsonal communica tion). With scientific emphasis on understanding causal mechanisms, it is tempting to interpret findings with time lags as causal in nature, but temporal precedence is only one precondition for establishing causality. Unfortunately , the eff ects of variable s in transfer environments are often hard to isolate, especially in real organizations where random assignment is often difficult or impossible to achieve , though such isolation is possible ( see Hanges & Wang, 2012, for a discussion of c ausal m odels ). Thus , even though we are interested in the mechanisms which explain transfer, we are generally only studying correlates of transfer, and as we know, correlation does not equal causation , or alt ernatively prediction does not eq ual explanation (M u thukrishna & He nrich, 2019) . To better understand the causal mechanisms that lead to transfer we must advance our understanding of transfer as it occurs over time and seek to discover the dynamic process which gives rise to what we curre ntly observe as transfer. The 3 stud y of such dynamic relationships is gaining increasing interest in our field ( e.g., DeShon, 2012), and the study of person - level processes within the training context has been described as a frontier for research in this ar ena (Salas & Kozlowski, 2010), tra nsfer - specific research must follow . A dynamic process here refers to the interactions of lower levels of analysis that give rise to a higher - level observed variable in a process of emergen ce (e.g., Grand, Braun, Kuljanin, Kozlowski, & Chao, 2016; Kozlowsk i & Klein, 2000), those levels in this paper being the cognitive processes of an individual and their output behaviors. Other researchers are interested in studying dynamic processes of tra nsfer and are attempting to unpack them . This can be seen in the in crease of longitudinal designs studying transfer (Baldwin, Ford, & Blume, 2009), and , for example, the use of within - person analyses to understand the interplay of motivational changes over time with changes in transfer (Huang, Ford, & Ryan, 201 7 ). However , most studies that claim to be interested in such dynamics do not really study dynamic relationships , and are largely restricted to cross - sectional designs or versions of growth modeling w ith fe w time points (e.g., Ford, Bhatia, & Yelon , in press ; Gist, S tevens & Baveta, 1991; Cheng, 2016; Dierdorff & Surface, 2008; Zerres, Huffmeier, Freund, Backhaus, & Hertel, 2013 ) . E ven when longitudinal designs are utilized , the mere study of change ov er tim e does not constitute the study of explanatory dynamic proces ses because they rely on time as predictors and time is not explanatory (Di s hop, Olenick, & DeShon, in press ; Ployhart & Vandenberg, 2010 ). T raining - transfer studies are motivated to show t hat ch ange does occur, and thus that the program of interest is suc cessfully affecting outcomes of interest. However, explaining change when it occurs is only half the battle and any process model should also be able to demonstrate when 4 change will not occ ur, as a lack of change is still likely to be driven by a dynamic p rocess and a lack of change does not imply the lack of dynamic processes (Dishop et al., in press ). Moves to better understand the dynamic process of transfer are being made , but only a few known model s of transfer couch transfer in an iterative way such t hat it unfolds via repeated attempts that can dynamically a ffect future attempts. Existing models of the training process that do consider time generally treat transfer as an outcome that d oes not explicitly feed into future attempts (e.g., Baldwin, Magjuk a, & Loher, 1991; Bell & Kozlowski, 2009; Cannon - Bowers , Salas, Tannenbaum, & Mathieu, 1995a ; Cheng & Hampson , 2008; Colquitt , Lepine, & Noe , 200 0 ; Thayer & Teachout, 1995 ) , though some con sider transfer as an input to future training cycles (e.g., Salas, Weaver, & Shuffler, 2012 ; Goldstein, 198 6 ) . A m ore dynamic view of transfer can be found in Chen, Th omas, and Wallace (2005) who proposed a multi - level model of training outcomes which disc usses the episodic nature of the post - training environment. Additio nally, Blume, Ford, Surface and Olenick ( 2019 ) introduced the Dynamic Transfer Model (DTM) which describes how trainees decide what to retain from their learning experience and apply their new KSAOs to their work environment in an iterative way. Their mode l is already impacting emerging research on how trainees transfer their news KSAOs over tim e (e.g., Vignoli & Depolo, 2019). However, the DTM has multiple weaknesses. One of these weaknesse s is that the model is a verbal model instead of a mathematical for mal model. Although such models are good for describing processes, they generally lack spec ificity and can struggle to make testable hypotheses ( e.g., Vancouver, 2008 ). A second weakness of the DTM is that it relies heavily on self - regulation (e.g., Carver & Sheier, 1998), and person - situation interactionism (Hattrup & Jackson, 1996). Though the se are good bases from which to begin building a dynamic theory of transfer, they leave out much o f what we know about human cognition and social effects, at 5 least, and therefore do not tell the whole story. Thus, further work is required to improve our th eorizing regarding transfer as its own dynamic process. To address existing gaps , this paper is g uided by two research questions. First, what is the learning and de cision - making process through which individuals go when attempting to transfer new knowledg e to an old situation? Second, can a single, relatively simple, formal model of the dynamic transf er process account for our current findings in the transfer literat ure? In addressing these two questions the present paper makes four key contributions. Fir st, a process - oriented theory of learning transfer is introduced by building a formal mathematical model of that process , called the Learning Transfer Model ( LTM). T he LTM will begin to build a unifying theory of transfer in the workplace, partially answer ing calls for psychological science to move towards more unifying theories to improve the explanat ion of human behavior (Muthukrishna & Henrich, 2019). Second, the L TM further integrates several disparate but related theories, specifically: reinforcement learning (e.g., Sutton & Barto, 2018), Social Learning Theory (Bandura, 1977), and Control Theory ( e.g., Carver & Sheier, 1 998) within a dual process cognitive model framework ( e.g., Kahneman, 2011) . Third, computational approaches to reinforcement learning and dual process models will be more fully brought into the organizational literature. Finally, t hat integrative formal theory is instantiated in a computational mo del , allow i ng for virtual experimentation to explore the effects of the theory, the formation of testable predictions which may later be evaluated against real - world data , and providing pot entially novel insights into the transfer process from which to bui ld future p ractical interventions. 6 Review of Transfer Literature Prior to building a process theory of transfer, we must take stock of the current transfer literature. This review will o ccur in two parts. The first is an overview meant to give a feel fo r where the field stands, particularly regarding our knowledge of tr ansfer as a process and is not meant to be exhaustive . The primary points to be made are, first, that despite calls for v iewing transfer as a process (e.g., Foxon, 1997), transfer has larg ely been treated as an outcome or a product. Second, because of the transfer - as - outcome view, transfer is typically measured at one or very few time points, which largely forgoes the abilit y to study transfer as a process, with few exceptions. Third, the n atur e of existing research is largely correlational and cross - sectional, resulting in a field of inquiry which can be characterized as a set of potentially useful but unrelated empirical fi ndings. Fourth, the existing longitudinal transfer research does no t ge nerally examine dynamic processes, even if the authors state they are interested in them . Fifth, emerging theory on the training and transfer process is moving in the right direction to unpack within - person transfer processes but has far to go. T he dis cuss ion will then mov e towards introducing a transfer process theory by more specifically describing some key concepts and findings which are critical to consider in the early stages of the ory development. B efore diving into the review, we must define som e te rms. Broadly, this paper is interested in the learning process experienced by employees. Learning has been given many definitions (Salas, Weaver, & Shuffler, 2012) . For example, learnin g can be viewed as a permanent change in the range of possible beha vior s for an organism ( Huber, 1991 ). More specifically it is or ganizationally directed learning experience aimed at introducing ne w kn owledge, skills, attitudes, or other characteristics 7 (KSAOs) which expand the range of possible behaviors an employee may exhibit on the job. Training experiences are typically divided into three phases: pre - training, training, and post - training, or va riat ions thereof (e.g., Beier & Kanfer, 2010). The present paper is focused specifically on the processes within the post - training phase. However, not all learning by employees is organizat ionally directed. Instead, employees learn much about how to accomp lish their jobs and navigate their work environments when they are engaging with the relevant tasks and environment through informal learning processes ( Tannenbaum, Beard, McNall, & Salas, 2010 ). This paper is concerned with all learning events, formal or info rmal, so the terms learners and learning will be used interchangeably with trainees and training in this paper and the model is suggested to apply to the transfer of learning from eithe r formal or informal learning events . The primary outcome of inte rest in the post - learning phase is the transfer of trained materials back to the work environment. According to Baldwin and Ford (1988), transfer consists of generalization and maintenance. Generalization is taking that which was gained through trainin g an d applying it to more or less similar situations as experienced in training once back on the job. Maintenance is the continued application of those new KSAOs to the job over time. The goal of this paper is to unpack how , not merely whether , learners tr ansf er new KSAOs to their work environment. To begin studying the how of transfer, we will begin by assuming learners exit their learning experience with the ability to generalize that learning to their work, and now must find a way to actually commit to t hat transfer and maintain it over time. Therefore, the present paper is focused more directly on the maintenance portion of transfer than on the precondition of being able to generalize knowled ge at all. Historically, transfer has been treated as an outco me, or a product, instead of something that unfolds over time driven by a process . Foxon (1997) noted this tendency and its effects on 8 our understanding of transfer , writing asuring transfer as a one - dimensional product, rather than as sess ing it in process terms, may have led practitioners to underestimate 43). Unfortunately, ca ll s for more process - oriented transfer research have been large ly u nheeded until recently. Theoretical models of the training process still almost universally treat transfer as a single outcome. Part of this problem may be traced to the way researchers hav e traditionally treated the organizing model utilized by Baldwi n an d Ford in their classic (1988) review. In their model, training inputs lead simply to transfer, mediated by training outputs. However, instead of using the model to organize a disparate lit erature, researchers in part treated it as something to be test ed. Although much useful knowledge arose from research inspired by Baldwin and Ford (1988), unfortunately the path that research took may have limited progress on the understanding of certain a spects of the training process by largely ignoring transfer ove r ti me . The treatment of transfer as a product or outcome is t he key limitation in understanding transfer specifically as a process . Interpreting transfer as an outcome or product led to the tendency to collect transfer measures at only one or a very limited number of time points. The typical study on transfer effects measure s covariates of interest before, during, or at the end of a training event, or create s their experimental manipulat ions during training, and measure transfer at some later point in time. This tendency can be seen in the studies available for meta - analyses focused on transfer effects (e.g., Blume et al., 2010; Blume, personal communication) , even though transfer include s bot h generalization and maintenance (Baldwin & Ford, 1988). Generaliza tion could be considered as something which either happens or does not and therefore be measured at a single time , but maintenance implies the continuance of transfer over time and thu s req uires multiple measure ments to study it. Baldwin, Ford, and Blume ( 2009) in their 9 updated review found the number of time points examined in the transfer environment had improved, but that the number was still limited. The lack of stronger longitudina l des igns, where all measures of interest are measured at multiple time points, limits the analyses and knowledge we may gain. For example, single measurements are inadequate for cross - lagged designs that can start to unpack dynamic process - like relationsh ips u nderlying phenomen a (e.g., Kenny, 2005 ). Thus, transfer research is not examining dynamic processes, even when researchers are interested in them. A recent example is a measurement piece by Ford, Bhatia and Yelon (2019) which reports a multidimensiona l mea sure of transfer as use. The authors state they are interested in t he dynamics of transfer, but their data collection is a single time point for each participant and conduct no dynamic analyses. Transfer research also lacks a general guiding theory. The l ack of a guiding theory or framework has resulted in a large set of potentially useful but largely unrelated empirical findings. Existing models of the training process are not scientific theories in that they do not posit universal mechanisms underlying t he process, especially in a way that can be applied directly to tra nsfe r . Instead, existing models are generally tools for organizing the vast set of empirical findings in a coherent way ; they do not tie those findings together into a unified whole. This c an be seen in any of a number of reviews which have expanded overti me to include more detail because the extent of empirical findings has also greatly expanded over the past three decades, but the essential structure remains highly similar (e.g., Baldwin & Ford, 1988; Salas et al., 2012). Within single studies , on the oth er hand , theory can be found to guide hypothesizing. For example, as r eviewed by Beier and Kanfer (2010), common theories of motivation utilized to study transfer include goal choice and se lf - efficacy from self - regulation (Bandura, 1977), expectancy theory ( a.k.a. Valence, Instrumentality, Expectancy (VIE) Theory; Vroom, 196 4), 10 individual differences such as the Big Five personality traits ( McCrae & Costa, 1987 ), and transfer of training cli mates (Rouiller & Goldstein, 1993). The predictions made based on t hese theories independently show positive effects on transfer outcomes (e.g., Blume et al., 2010), but are not integrated into any comprehensive whole, leaving the empirical findings scatte red and in need of some underlying scientific framework to unify th em. Such work also answers calls for scientific frameworks to enhance the rigor of psychological science (Muthukrishna & Henrich, 2019). As mentioned by Baldwin et al . (2009), the number of studies examining multiple time points has improved over the yea rs. In many ways, these studies are applications of more typical learn ing or performance studies which examine learning curves on a task of interest. A few examples will suffice. Gist, Stev ens, and Baveta (1991) tested post - training interventions to improv e maintenance and transfer, finding that pre - training self - efficacy relates to both initial and delayed performance on a target test. They also found that the effect of efficacy on maintena nce was moderated by th e type of training the learner received. Van couver and Kendall (2008) made the important point that relationships may differ when examined at the within - instead of between - person level, when they showed efficacy can be negatively re lated to performance an d motivation in some learning contexts at th e within - person level. Th eir finding opposes the common view that efficacy and performance are positively related, but which is typically studied at the between - person level. Dierdorff and Surface (2008) showed t hat skill - based pay was related to skill mai ntenance over a seven - year period with multiple measurement points. F inally, Scholz, Nagy, Schuz, and Ziegelmann (2008) studied 30 initially untrained runners for a year, taking 11 measurem ents of their running t endencies as they prepared for a marathon. A t the between - person level they found trend in efficacy predicted trend in the amount 11 of running, and that fluctuations in efficacy predicted fluctuations in running. At the within - person l evel, controlling for b etween - person trends, amount of running was predicted by efficacy and intentions , among other variables. Recent studies aim to unpack within - person effects specifically on training transfer. The best examples may be those of Huang a nd colleagues. Huang, B lume, Ford, and Baldwin (2015) showed in a m eta - analysis that maximal and typical transfer are weakly related, and that predictors of the two forms of transfer differ . Specifically, maximal transfer was better predicted by abilities , while motivational mea sures were better predictors of typical tran sfer. The ir findings suggest more research is necessary to unpack why those factors differentially predict aspects of transfer. Huang, Ford, and Ryan (2017) then studied within - person varia bility in transfer in a multi - wave design. They showed that initial attempts to transfer were best predicted by post - training self - efficacy and that motivation to transfer better predicted rates of change in transfer. Unfortunately , the basis for this stud y relies on growth modeling so is not truly dynamics (DiShop et al. , in press ), but it represents a significant step forward conceptually in understanding the within - person nature of transfer. However, a ll is not lost, and research ers are making theoretic al advances regarding the process underlying the transfer of learni ng. R epeated calls are being made to study the training process, including transfer, from a multi - level perspective. Such arguments center on not only the need to be tter understand higher l evel organizational effects on training and transfer, but also to c onsider the within - person nature of the training and transfer processes (e.g., Mathieu & Tesluk, 2010; Sitzman & Weinhardt, 2019 ) . Such calls have in part manifested in micro - level research , such as on preventing knowledge and skill decay (Cascio, 2019). T hese advances in part emphasize that transfer is an episodic process and theory aimed at unpacking that process is 12 emerging. Blume, Ford, Surface and Olenick (2019) described transfer as a self - regulat ory process where learners proceed through episodes of deciding to retain or discard new KSA O s in favor of their existing repertoire, attempt to apply those KSA O s, receive feedback on their attempts, and reiterate the de cision process. That pro cess interacts with organizational factors to determine how it unfo lds over time. Surface and Olenick (forthcoming) , are developing a mechanistic model of transfer seeking to unpack the cognitive processes underlying the general pro cess described by Blume et al . (2019). This new model (Surface & Olenick, forthcoming) desc ribes how the transfer process relies on cognitive processes and the overriding of automatic responses to transfer new, non - automatic KSA O s, and how the individual d evelops in this process over time. Both theories make substantial strides in describing tra nsfer as a process. However, they remain limited by their informal linguistic nature. Further work is required to build on these models to enhance formalization, inc rease prediction precisi on, and falsifiability. All these advances are important and have provided a wealth of useful information. However, much work remains to provide a process - oriented explanation for when and why learners transfer to their work environments. T his paper argue s that advancement may be made b y reconceptuali zing transfer as ano ther learning process, rather than something theoretically removed from the processes which drive a learning event. By reframing transfer as learning, we can draw on existing process - orient ed theories of learning to provide a strong fou ndation from which t o begin , including both informal natural language theories, and more formal mathematical and computational approaches. For example, Tannenbaum et al . (2010) described a dynamic model of inf ormal learning on the job where employees learn over time through a n iterative process of intent, experience, feedback and reflection , which is affected by organizational and individual factors. More formal conceptualizations of learning through 13 experience can be found, such as reinforcement learning ( e.g., Sutton & Barto , 2018). Some of the basic mechanisms, such as experience, of these theories and others will be evident in the model explicated below. However, the primary point is that transfer theory may be advanced by approaching transfer as the pro cess by which indivi duals learn if a new KSAO is a good fit for their job . The model presented here will be called the Learning Transfer Model (LTM) for the double meaning of transferring learning to a target job environment, and individuals going through what amounts to a pr ocess of learning to transfer their new KSAO to the target environment, or not. This conceptualization emphasizes the individualized nature of the transfer process where learners eventual t ransfer outcomes are largely a function of the ability of their tra ining to fit the needs of their job, and them learning through experience the fit between their training and their needs. Computational Modeling and the Modeling Cycle Before beginning , it is important to set expectations regarding the approach to theory building undertaken in this paper, and discuss the implications that approach has for the theory outlined below. The present paper takes a computational approach to theory building. Computa tional modeling is a useful tool for building n ew process - oriented theory for multiple logic, especially as it evolves over time (e.g., Vancouve r, 2008; Vancouver & Weinhardt, 2015 ). Second, computational modeling allows for the e xploration of the theory in a low - risk environment. Third, those virtual experiments allow for better understanding of phenomena of interest, and can, but do not have to, lea d to novel insigh theorizing o r unguided data collections (e.g., Miller & Page, 2012) , t hough such insights can then be tested using targeted data collections on real subjects (e.g., Vancouver, Weinhardt, & 14 Vigo, 2012 ) . Importantly, a f ormal theory can also provide specific point estim ates for effect sizes one would expect to observe in the real world. Although mak ing such specific predictions is not the historical norm in psychology , doing so is a stronger form of scien ce where we can s upport or refute an underlying theory by assessing the fit of observed effects to predicted ones using Bayesian inference (Dienes, 2019). More generally, when one makes the mechanisms of a theory explicit as is required by computational mo deling you can be absolutely sure of what has led to the outcomes o f the model in a way not typically achieved in traditional theory building. That is, when we collect empirical data in our field, we often propose hypotheses regarding the direction of rela tionships between constructs of interest which we believe follow fr om the logic of some underlying theory we are drawing upon. For example, we might predict that self - efficacy and performance are positively related while drawing on Social Cognitive Theory (Bandura, 1977) t o discuss why we should expect such a relationship . However, when we only measure self - efficacy and performance and find the predicted relationship we have not actually tested the underlying mechanisms driving that relationship, such as ef fort (e.g., Vanco uver & Kendall, 2008) and therefore cannot be cert ain our underlying theory is the actual explanation for that relationship, we can only be sure that the relationship is consistent with our expectations. However, when you use a computatio nal model of the type used in the present research you can be sure that the mechanisms you specify led to the relationships between any higher - level emergent properties you may be interested in because they are the only mechanisms in play. Finally , a compu tational approach to theory building allows for an iterative proces s w hereby a relatively simple form of a theory can be built, explored, then expanded over time as necessary to account for phenomena of interest. Researchers have argued that this approach is the direction in 15 which our field should be evolving (e.g., Kozlo wski & Chao, 2012), and may be of particular use for studying the training process (Salas & Kozlowski, 2010). The iterative approach to theory building and modeling was described by Railsb ack and Grimm (20 12) as the Modeling Cycle. The Modeling Cycle is c omposed of six total steps: 1) formulate the question, 2) assemble hypotheses, 3) choose model structure, 4) implement the model, 5) analyze the model, and 6) communicate the model. The pro cess is iterative in that step five feeds back to step one, except when the author decides the time has come for communicati on . Over time, the theory and associated model is developed and explored, becoming increasingly sophisticated or more representative of the phenomeno n of interest. B y starting simple , this paper ac knowledges that the resulting theory will not be a perfect picture of the transfer process, but that is not the intent. Rather, this model provides a starting point for future development w hile hopefully pr oviding useful insights into the transfer process . This approach stays true to the principle s of theoretical parsimony as outlined by Box (1976, p. 792) that s ince ation. On the contrary following William of Occam, he should seek a n economical description of natural , that a theory T ransfer Findings for Which to Account Along with his admonition fo models are wrong the scientist must be alert to what is importantly wrong In this section, potentially important concepts and fin dings will be discussed with reasoning for why or why not they need to be included in initial steps in building a t ransfer process theory. T he discussions here are not meant to be in - depth 16 reviews of each topic. T he goal is to define the concept and genera l findings, based in meta - analytic evidence where possible. This approach is deliberate in considering the initial stages of development for the present theory . An overemphasis on examining nuance can inhibit the development of sound theories of human beha vior because it stands in the way of the abstraction on which good theory depends (e.g., Healy, 2017). Within psych ology, researchers are incentivized to focus on theoretical contributions in their work (e.g., Olenick, Walker, Bradburn, & DeShon, 2017) whi ch for most studies means extending an existing theory by examining a new application or moderation of that theory. However, with no incentive to replicate findings the supposed nuance gain ed by such studies can long go unchallenged and cloud the developme nt of a core theory to unify those findings. Further, relying on single studies to build informal theory is treache rous at best because interpretations and conclusions from single studies c an differ greatly depending on who does the analysis and interpreta tion (Starns et al., 2019). Thus, it is imperative that a potentially unifying theory account for general findings before exploration of more nuanced findings which may be misleading . To th is end, the meta - analytic effects discussed here are not to be trea ted as precise targets for replication in the models explored in this paper. Instead, the meta - analytic effects are general guides for the patterns of relationships expected from the LTM as there are limitations to the use of meta - analyses as exact targets such as variability in the contexts in which their underlying studies were conducted, measures used, and theoretic al underpinnings among other between - study differences that get aggregated across when estimated meta - analytic effects . The model presented i n this paper is meant to be a general theory of training transfer and should therefore represent the general findin gs of applicable meta - analyses, but the exact point estimates from those m eta - analyses may be overly restrictive targets for a theory in the initial stages of development as is the LTM, and 17 future work should look to refine the LTM to better target precise effects in their applicable research contexts. Practice and Overlearning The effects of practice on important training outcomes is well est ablished. Practice on tasks is related to important performance outcomes as individuals tend to improve over time with exposure to said task. For example, Hausknecht, Di Paolo, and Moriarti Gerrard (2007) found that test scores show an increase upon retest ing with a meta - analytic eff ect of .26. Such practice effects are critical when considering personal outcomes, such as employment decisions (e.g., Olenick, Bhatia, and Ryan, 2016). Similarl y, practice is critical within learning contexts for improving impo rtant outcomes and is consid ered one of the best strategies for improving learning and retention (e.g., Dunloski, Rawson, Marsh, Nathan, & Willingham, 2013 ). Within the transfer environment , practice of skills is also essential for maintenance , where Arthu r, Bennett, Stanush, and McN elly (1998) found that skills deteriorated significantly over time without use in a meta - analysis. Related ly , researchers have explored the use of overlearning as a design feature of training. Overlearning is essentially the us e of extreme levels of pract ice to develop automaticity before the learner leaves the learning event. The development of automaticity is a key outcome in training and the development of exp ertise (e.g., Erics s on, 2006 ; Goldstein & Ford, 200 2 ). Meta - analyti c investigation of the effec ts of overlearning on retention show an uncorrected relationship between overlearning and retention of .298 (Driskell, Willis, & Copper, 1992). Thus, it is impor tant for the transfer process theory to account for improvement in transfer when practice and o verlearning are part of the training design before the learner even enters the transfer environment. 18 Utility Reactions tion of the usefulness of their learning experience (e.g., Ruona, L eimbach, Holton, & Bates, 2002) , typically collected via an affective reaction measure at the end of a training session. R esearchers predict that when adult learners see new information as useful to them, they are more likely to utilize that information in the future. This p rediction fits with training principles regarding the need to improve trainee motivation to learn or transfer by connecting the material to personal outcomes (e.g., Bauer , Orvis, Ely, & Surface, 201 6 ). Interestingly, relatively few studi es actually examine utility reactions despite their demonstrated strength in predicting transfer outcomes. In Blume et al . - analysis, only nine studies were found that met thei r inclusion parameters, but those studies demonstrated a corrected relationship with t ransfer of .46, making utility reactions one of the strongest overall predictors of transfer and important to account for in the present model . Work Environment Work env ironmental factors have long been considered an important driver of transfer, often re ferred to as transfer climate. Transfer climate includes aspects of supervisor and peer support, opportunity to use, supervisor sanctions, positive and negative personal outcomes, and resistance to change (Nijman et al., 2006: Rouiller & Goldstein, 1993; H olton et al . , 1997; Holton et al . , 2000). This paper f ocuses on supervisor and peer support and opportunity to use. Supervisor and peer support are important antecedents of training success (e.g., B aldwin & Ford, 1988). These two factors are part of social support, which is an ability to draw on emotional and task resources of others ( Steele - Johnson, Narayan, Delgado, & Cole, 2010 ). Social support has important effects on stress and well - being of individuals, with perceptions of support b eing potentially mo re important than actual support (e.g., Kessler, 1992). The importance of support 19 for the transfer of training has been confirmed via meta - analysis with Blume et al . (201 0) finding a corrected relationship of .21 between support and tran sfer. Several studi es on the effects of supervisor support , specifically, are interested in exploring the mechanisms through which support operates to affect training outcomes . For example, Nijman et al . (2006) found that support affects transfer through p erceptions of trans fer climate and motivation to transfer. However, most studies in this area are cross - sectional in nature. Even Foxon (1997) , who argued for examining transfer as a proces s, examin es supervisor support but collected measures at a single t ime point in the transfer environment. Similarly, Nijman et al . (2006) develop a process model of transfer but are limited to a small sample and a cross - sectional design. Thus, it is import ant to consider the effects of support for transfer o f a new KSAO, but the development of support effects over time need further examination. The situations in which learners find themselves attempting to apply their new KSAOs also impact transfer. One im portant way situations differ is the degree to which they are weak or strong. Situations are strong to the extent they provide clear context clues on the appropriate courses of action to take (Meyer, Dalal, & Hermida, 2010). Strong situations dictate the a ctions that must be taken while weak ones allow more room for indiv idual differences to influence how to proceed, and thus affect related outcomes. For example, Judge and Zapata (2015) showed that the effects of personality traits on performance were highe r in weak contexts than in strong contexts. In transf er environment s situation strength manifest s in various ways, such as if the received training is the organizationally required way to carry out a task transfer would be more likely. Or , in the relations hip between supervisor and trainee, closer and less - a utonomous supe rvision should create a stronger situation and lead the trainee to transfer their new KSAOs in a way more consistent with the desires of their supervisor (e.g., Yelon & Ford, 1999). 20 All su ch higher - level factors fit with calls for multi - level investigatio ns of training and transfer effects (e.g., Mathieu & Tesluk, 2010; Sitzman & Weinhardt, 2019). Multi - level theory (Kozlowski & Klein, 2000) emphasizes the nested nature of phenomena in orga ni zational psychology. Namely, measurements across time are nested within individuals, individuals within teams, teams in organizations, and so on. N esting has implications for both how we study phenomena, and how phenomena are likely to manifest. It has b ee n argued that research should examine target phenomen a from a bra cketed perspective, including effects of both one level above and one level below the target phenomenon ( Hackman, 2003 ). In the present study this includes explication of an individual proc es s which occurs over time, and higher - level effects on that proces s imposed by such concepts as situation, opportunity, and climate . Overall, environmental effects including transfer climate, support, as well as constraints or opportunities for us e have a m eta - analytic relationship of .22 with transfer (Blume et al., 201 0). Implementation intentions Psychologists in several areas of inquiry have studied the potential for implementation intentions to reduce the intention - behavior gap (e.g., Schniehotta, Sh olz & Schwarzer, 2005). I mplementation intentions link situational - that when situation X arises the person will respond by doing Y (Gollwitzer, 1999), and have been shown to have a substantial meta - ana lytic effect on goal attainment ( Gollwitzer & Sheeran, 2006) . They also , Webb, & Gollwitzer, 2005). Weiber, Thurmer and Gollwit zer recently (2015) describe d the mechanisms underlying t he functioning of implementation intentions. Implementation intenti - form a strong relationship between mental representations of the theoretical goal - relevant situation and the goal - directed action , delega ting action control to a lower order 21 cognitive process, changing the normal top - down proces sing approach of goal attainment into a more automatic and efficient bottom - up process. Health psychologists have utilized implemen tation intentions to improve the effects of patient educa tion programs. For example, Harris and colleagues showed that both implementation intentions and self - affirmation increased fruit and vegetable consumption at seven - day and four - month follow - ups (Har ris et al., 2014) . K en dzierski , Ritter, Stump, and Anglin (2015) showed the moderating effect of self - schema s on implementat ion intentions. In two studies they showed that implementation intentions increased healthy eating habits among individuals who alre ady held a self - schema of being healthy eaters , meaning i mplementation intentions work better for individuals who already se e themselves as approximating the end goal. A systematic review recently suggested the effect of implementation intentions has a sma ll but reliable effect on healthy eating behaviors (Turto n, Bruidegom, Cardi, Hirsch, & Treasure, 2016). Thus, although not ubiquitous in organizational training studies, implementation intentions show important effects and should be accounted for in a model of transfer. Maintenance Curves M aintenance is one of the two primary aspects of transfer outlined by Baldw in and Ford ( 1988) and Baldwin, Ford, and Blume (2009). Baldwin and Ford (1988) describe possible trajectories a learner may take in displaying transfer which are labeled maintenance curves . These poten tial trajectories range from initial lack of transfer with later in creases in transfer rates , to initially high levels of transfer that decrease over time. Such t rajectories can be studied using growth modeling techniques, as accomplished in the study by D ierdorff and Surface (2008) on the effects of skill - ba sed pay on ma intenance. Unfortunately, because the study of maintenance curves requires several waves of data collection, they are rarely studied in primary 22 research. A transfer process model should be able to explain why an individual may take any one of the potential general transfer trajectories. One advantage of using computational model ing to explore the present theory lies in the ability to explore such curves in an environment that does not necess itate large scale data collections. Self - efficacy Sel f - efficacy is the belief of an individual in their ability to execute desired behaviors in the pursuit of some outcome (Bandura, 1977). Efficacy is a central variable in self - regulation theory which wil l be more thoroughly introduced below. Importantly, ac cording to Ba ndura , efficacy is the primary way in which individuals show agency in affecting their personal environments . Within the learning context , efficacy drives outcomes through the amount of eff ort the individual is willing to place into the task i n question (e .g., Vancouver & Kendall, 2006). In examining the effect of efficacy on transfer, it is common to collect feelings of efficacy at the end of a training event to predict future use. Across s tudies efficacy has been shown to be a moderate predic tor of transf er (Blume et al., 2010). Given the centrality of efficacy to the key theory of self - regulation and the demonstrated effect of efficacy on transfer, efficacy is another variable which holds importance for the LTM . Skill type A potentially crit ical aspect t o consider is the nature of the skill targeted for transfer. A typical delineation between skill types is open versus closed. Closed skills have a relatively strictly defined way in which t hey may be applied, for example there may be only one way to succes sfully operate a machine. Open skills are those over which the trainee has more discretion regarding how they are applied to their job, for example how to handle an interpersonal interactio n (e.g., Yelon & Ford, 1999). Similarly , Laker (2011) introduce d so ft and hard skills. 23 Hard skills are technical skills or those that define how to do a given task. Soft skills are those that have a more inter or intrapersonal focus. These categories are l ike open and closed skills but are argued to go furthe r in differen tiating the skill types in question. The type of skill studied may have important implications for transfer on its own and affect parts of the LTM. For examp le, Laker (2011) argues that sof t skills are less likely to transfer because the trainee is more li kely to have prior experience that needs to be overcome, and that feedback is more difficult to receive accurately. In addition, the level of support for transfer may matter more for open a nd soft skills than closed and hard. For exam ple, Salas, Milham, an d Bowers (2003) argued that as the military moves towards more open - skills training programs a more supportive environment would be required to enhance transfer as trainees would have great er discretion over the implementation of thei r new skills. Yelon an d Ford (1999) further discuss the interplay between closed versus open skills and the level of autonomy a trainee has from their supervisor in determining transfer outcomes. However, it i s important to begin building explanatory the ories as simple as pos sible and later iterations may build in complexity. For that reason, the initial LTM will be more directly applicable to hard or closed skills because they are more straightforward. T his do es not mean the proposed theory is inapplicab le to more open - type s kills as the underlying process driving transfer is likely the same and future investigations will be required to unpack any nuance required to account for differences in transfer outcomes between the various skill types. Near versus Far Transfer , Adaptive Transfer and Adaptive Performance Near and far represent a key distinction in describing the nature of the transfer task . Near transfer is when tasks in the transfer environment closely r esemble those on which the learner received i nstruction , allowing m ore direct application of what was learned to the transfer 24 environment. Far transfer is when the task in the transfer environment is different in some larger degree from the task on which t he learner received instruction , requiring gr eater adaptation on th e ir part (Beier & Kanfer, 2010). The type of transfer has potentially differential effects on other important variables. For example, it was originally demonstrated that self - efficacy was o nly related to transfer when near transfer wa s required (e.g., Math ieu, Tannenbaum, & Salas, 1992; Martocchio, 1992). However, it was later shown that self - efficacy is important in determining far transfer as well , though potentially to a different degree (e.g., Kozlowski et al., 2001). Related to fa r transfer is adaptive transfer. Adaptive transfer occurs when knowledge from training is applied to a task which is not identical to that which was trained but is instead a n adaptation of that task. Adaptive tr ansfer can also involve the generation of nov el approaches to probl em solving (e.g., Beier & Kanfer, 2010; Smith, Ford, & Kozlowski, 1997). More broadly, Baard, Rench, and Kozlowski (2014) reviewed research on adaptation and adaptive performance, which ar e related to generalization. Based on their review, the field of ad aptive performance is largely unorganized, characterized by multiple appro aches which are not in agreement with one another. To provide some structure, the authors introduce a taxonomy of p erformance adaptation. The most relevant category they define for t he present purposes is that of domain - specificity, which is based in train ing and skill development. They write that key assumption of this approach is that specific capabilities underl ying performance adaptation can be learned and that their applicati on is specific to a knowledge and skill domain rather than general across a range of work situations. The primary target for this work is to develop knowledge, skills, and capabilities via training or other developmental experiences that can increase perfo rmance in 25 a task context that shifts in novelty, difficulty, and/or comple ; emphasis in original ) . Further, within adaptation research, decision - making and learning are importa nt topics of study, which are primary foci of the LTM . Examples of research in this domain include decision - making tasks (e.g., TANDEM), and how individuals adapt their decision making in changing situations which drives adaptive performance . However, ada ptation is more concerned with applying existing knowledge to new a nd changing situations, not with applying new knowledge to old situations, which is more the domain of transfer. This is a close but important distinction. The adaptation of existing knowle dge is important and interesting , but a large portion of actions un dertaken by typical employees are relatively routine, even in complex jobs (Susskind & Susskind, 201 7 ) . F urther, estimates based on experience samples are that 45 - percent of behaviors are r epeated in the same location every day ( Neal, Wood, & Quinn, 2006; Wood, Quinn, & Kashy, 2002). This paper most directly concern s situation s where the encountered situation is stable enough that the same general approach to the task may be applied , thus avoiding the complications of adaptation, skill type, near or far t ransfer, etc., for the time being. This is directly applicable to types of jobs that are very consistent in their nature but is also in line with the idea that teaching principles they can apply to a broad range of situations is beneficial . The argument is made that the same basic process of learning about the potential uses of a newly trained KSAO will be applicable to both situations. However, it is agreed that this process is complicated by attempts to apply training to more adaptive tasks . Thus, the ini tial LTM should be interpreted as directly applicable to transfer tasks which are broadly definable as near transfer, but with potential insights for the processes underlying far transfer a s well. 26 Study 1: Base Learning Transfer Model To inv estigate the no ted gaps in the transfer literature, the remainder of this paper will be dedicated to introducing and exploring a formal model of the transfer process called the Learning Transfer Model (LT M) . The complete model will be described and tested in multiple ite rations drawing on existing work in fields other than organizational psychology to form the basis of the proposed transfer process. The first model is primarily based on theories of D ual P r ocess C ogniti on (e.g., Kahneman, 2011) and reinforcem ent learning (e .g., Sutton & Barto, 2018) , and informed by work on habit formation and change (e.g., Neal, Wood, & Quinn, 2006). Dual Process Models and Habits I argue that a primary shortcoming in the e xisting training and transfer literature for the study of transfer as a process is a lack of basis in established cognitive theory. One particularly underutilized framework, not just in the training literature but across Organizational Psychology more broa dly, is that of Dual Process Cognition. By drawin g on existing Dual Process Theories we can provide an overarching framework from which to explain how learners may process their transfer situations and make decisions regarding how to respond. Once establis hed, we can discuss how other important theories may further explic ate key mechanisms withing the dual processing framework. One thorough and accessible explanation of dual process theory comes from Nobel Laureate Daniel Kahneman (2011), though other versi ons exist (e.g., Pennycook, Fugelsang, & Koehler, 2015; Bago & De N eys, 2017). Kahneman (2011) explains that humans have two separate information processing and decision - making systems. The first system, conveniently labeled System 1, is characterized by f ast, automatic information processing which requi res little effort and makes decisions based on heuristics learned over time which tend to result in an 27 acceptable level of success, whatever success may be. Automatic decisions allow humans to carry out most of their daily information processing and decisi on making without becoming cognitively overloaded, but these decisions also tend to be biased and suboptimal. On the other hand, System 2 is an effortful processing system which moves slower and requires con scious cognitive effort. System 2 tends to make m ore nuanced decisi ons but may lead to the same conclusion which would be made by System 1. Kahneman (2011) also argues that humans are lazy cognitive process o rs and will default to the use of their System 1 processing whenever to cognition and decision making has the added benefit of arising from behavioral economics, which tends to be more formal in its theorizing and replicates more frequently than traditional psychological res earch. It has been suggested that behavioral econ omics and dual pro cessing theories show promise for the building of unifying, but falsifiable, psychological theory (Muthukrishna & Henrich, 2019 ; Popper, 1959 ). Criticisms of dual processing theories have b een levied by many researchers. Evans and Stanovich (2013) outlined and responded to the five most common criticisms. Those criticisms include 1) dual process theorists have offered multiple and vague definition s of those processes, 2) proposed attribute c lusters are not reliably aligned, 3) the existence of a continuum o f processing styles and not discrete types, 4) single - process accounts may be offered for dual - process phenomena, and 5) evidence for dual proce ssing is ambiguous or unconvincing. Evans and Stanovich (2013) respond to each of these in turn, but generally s uch criticisms are levied against dual process theories en masse instead of against single theories, ignoring specific developments within dual process theories. Their points include that c haracterizing cognitive processing as strictly dichotomous is overs implified and processing should be viewed as more varied, with some processes being more automatic and others less so. Such a view overcomes the 28 continuing charge of unreliable alignment of attribute clusters (Melnikoff & Bargh, 2018a, Melnikoff & Bargh, 2 018b; Pennycook, De Neys, Evans, Stanovich, & Thompson, 2018). Evans and Stanovich (2013) further outlined that a dual process conceptual approa ch better fits the data patterns of cognition than any other explanation, such as a single process model, and th at it is largely nuances within the field of dual processing itself that remain to be fleshed out rather than disregarding the framework as a wh ole. The view of dual processing as the essen tial framework for cognition becomes stronger when organizing it fr om a default - interventionist perspective. The default - interventionist perspective views processing as being essentially automatic in nature for most instances, where we generate automatic r esponses and it is then up to the more deliberate processes to inte rvene or not. Finally, clarity may be brought by referring to these two processes as type 1 and type 2, which is meant to overcome the shortcomi ngs of using the system terminology that give s the false impression that there are two clearly identifiable proc essing systems. The current paper cannot clarify the nature of dual processes. Instead, this paper argues that the dual process framework, thou gh imperfect, is a useful dichotomization for forming parsimonious explanations for meso - level processes which a re driven by underlying cognitive systems. The dichotomy used here will refer to type 1 and type 2, with type 1 processes being generally more a utomatic and unconscious and type 2 being gen erally more deliberate and conscious, though it is understood this is not necessarily a perfect characterization. In addition, this paper adopts the view of Evans and Stanovich (2013) that the two processing typ es occur in a default - interventionist, sequen tial, fashion. Approaching dual processing from this general perspe ctive will provide a framework from which to approach the transfer process, representing an imperfect but significant step forward in understand ing that process. 29 As previously stated, this paper is fundamentally about learning, and r esearchers have previou sly described dual process models of knowledge and learning. For example, Dienes and Perner (1999) distinguished between implicit and explicit k nowledge. Implicit knowledge largely, but not exclusively, being that which is automatic, unconscious, nonverbal ized, and declarative. Implicit knowledge underlies explicit knowledge as knowing something explicitly implies you know the information underlyi ng it but knowing something implicitly does n ot necessitate being able to make it explicit. Sun, Slusarz, and Te rry (2005) built on the distinction between implicit and explicit knowledge by explicating the CLARION model of learning which includes both imp licit, bottom - up, and explicit, top - down, for ms of learning in skill acquisition. In implicit learning individua ls gain knowledge through direct experience, which a more unconscious form of learning and may not lead to knowledge which the individual can di rectly articulate. Such knowledge occurs in t he development of learning patterns in complex recognition tasks, o r in the learning of grammatical rules in real or made - up languages. Explicit knowledge acquisition can be delivered directly from the outside e nvironment, such as being told the decision r ules required for a given task. Over time, implicit knowledge can w ork its way up to become explicit where the learner can refine rules in a more conscious way. This exemplifies the split between more unconsciou s type 1 and more conscious type 2 processing in a learning environment. However, their model is directly concer ned with skill acquisition, so is more directly applicable to the training event itself in an organizational training process, and not to the pr ocess of transferring that skill. The automat ic nature of type 1 processing is of major importance for the prese nt paper. Successful training interventions have long attempted to develop a degree of automaticity in skills that are being targeted. Intervent ions have been able to develop automaticity p articularly 30 through overlearning approaches (e.g., Arthur, Bennett, Stanush, & McNelly, 1998), which effectively have the learner repeat a process until they are engrained to the point of an automated response. However, what we do not appreciate enough is that when we introduce a new KSAO to an employee there is likely so me existing KSAO that the new one must override which, at the very least, has a head start on the development of automaticity. We do study exper tise development where the essential process is the breaking of old automatic processes and replacing them with better processes (e.g., Ericsson, 2006). However, this process is covered at a very high level and does not reach the granularity of deciding to apply some new given approach over the old. In addition, the expertise literature is tangential to the training and transfer literature. Within the more traditional training literature we appreciate that adult learners come to their learning with a person al history (Knowles, 1984), and that this aff ects their outcomes from the learning event. This essential process of overcoming an existing automatic behavior to implement a new one is a central focus of the LTM. Focusing on overcoming existing automatic b ehaviors fits with a broader trend in psychol ogy: the re - emergence of interest in habits and habit change. Habit s are conceptualized in many ways in the literature, but can be categorized as tics, neural networks, conditioned responses, everyday activities , routines, customs or rituals, character, or habitus (Clark, Sanders, Carlson, Blanche, & Jackson, 2007). Of th ese, the most important forms of habits for the present discussion are 1) conditioned responses actions learned through reinforcement and cond itioning, 2) everyday activities things we do every day with little or no conscious thought, and 3) routines more complex than single activities, involving sequences and combinations to create order. A second typology of habits by Southerton (2013) def ines habits as either 1) 31 dispositions, 2) pro cedures, or 3) sequences. Dispositions are the most important here, which are propensities to act in a particular manner when suitable circumstances arise. Whether one takes the more macro, dispositional, or mo re micro response approach to habits, they fi t well with the dual process model. If one takes the broader , dispo sitional, any instantiation of an actio n could be driven by either type 1 or type 2 processes. This broader conceptualization of habits works because e ven an effortful process may arrive at the same conclusion as an automatic process. Thus, a habitual reaction in any given situation could be du e to an automatic reaction, or due to a more On the other hand, viewing habits specifically as behaviors which are in some way automatic firmly places habits as the outputs of type 1 processes. From thi s view, any habitual behavior in an organizat ion is that which an employee may default to through automatic proc esses. It may seem that few work behaviors would fall under such habitual responses with the increasing complexity of the work world, but Susski nd and Susskind (2017) argue most work acts, even by individuals in relatively complex jobs, are fairly repetiti ve and mundane, making them ripe for habituation. Further, the development of automaticity is essentially the development of habitual responses. It is that developed habitual response I arg ue we must overcome which we do not always account for explicitly i n training research, and especially in considering if newly trained KSAOs will transfer back to the work environment. Difficulties in overcomin g existing automatic responses of adult learn ers is evident in some topics of study within the training literatu re. A primary example can be seen in attempts to train for implicit (automatic) racial attitudes to reduce racially biased attitudes and connect ed behaviors, and this point is worth some ex ploration as it has implications for the LTM. 32 Greenwald and Banaji (1995) explained that much of social behavior is driven by implicit or unconscious processes which allow individuals to take the correct actions in social situations without effortful proce ssing. However, it makes changing social behavior difficult because many decisions in those situations occur outside of direct cognitive control. Relatedly, Wilson, Lindsey and Schooler (2000) proposed a dual mo del of attitudes specifying the relationship between implicit and explicit attitudes held towards an object or g roup. Specifically, individuals hold both implicit and explicit attitudes that do not necessarily agree with each other. Wilson and colleagues a rgue implicit attitudes are the product of lo ng - term learning processes and are usually rooted in childhood expe riences. Explicit attitudes may agree but are more susceptible to learning in adulthood. Which attitude determines behavioral outcomes is driven by the dual processes of cognition such that the implicitly learned attitude will drive behavior unless the ind ividual is given the opportunity and resources to call on their explicit attitudes. This model also explains why the average correlation between implicit and explicit measures of attitudes tends to be low (e.g., Brauer, Wasel, & Niedenthal, 2000). The dua l nature of attitudes and cognitive processes pose problems when we attempt to des are automatic judgements learned over lon g periods of time makes them habitual, suggesting there are deeply ingrained cognitive processes and structures which must be altered or overcome to cause lasting change. Changing existing habits is possible but difficult, and the longer one uses a given K SAO successfully the harder it will be to change it. In the case of implicit attitudes, such as racial attitudes, an employee is likely coming to the learning event with decades of experience using that attitude . Organizations then attempt to affect such a ttitudes through diversity training, which often lasts four hours o r less (Kalinoski, Steele - Johnson, Peyton, Leas, Steinke, & Bowling, 33 2013). Thus, it should be no surprise that training initiatives to change r acial attitudes generally fail to cause lasti ng change in explicit and implicit attitudes, as well as the outcom e behaviors to which those attitudes lead (Lai, Hoffman, & Nosek, 2013; Lai et al., 2016). In most cases, at least pertaining to racial attitude s, individuals are not exposed to a strong en ough shock to fundamentally alter their beginning set point and fai l to maintain the hoped - for change over time and merely return to baseline tendency, in this case habit, after some period (Olenick et al., in p ress ; Baldwin & Ford, 1988). A similar, thoug h potentially less extreme, effect likely occurs for many KSAOs, an d the LTM can account for such an effect. Reinforcement Learning As mentioned above, one way to view habits is as the product of the reinforcem ent of actions through their past successful application . This framing makes Reinforcement Learning a natural pl ace to look for an existing learning theory to explain learning mechanisms within the LTM. R einforcement L earning has been thoroughly researched by both psychologists and computer scientist s and is both informative regarding how individuals learn and has t he benefit of the level of formality and thoroughness required to form overarching theoretical frameworks (Muthukrishna & Henrich, 2019). Psych ological study of reinforcement learning date s to at least the studies of Ivan Pavlov (1927), and what is now kn own pairing of a stimulus and a reward could result in the later excitation of a re sponse which had previously not been associat ed with the stimulus. For example, the initial presentation of a be ll does not cause a dog to salivate. However, if over time food is presented in tandem with the bell, the dog will begin to salivate with the ri nging of the bell alone . More formally, an in itial unconditioned response (salivation) is normally paired with a natural trigger (unconditioned 34 stimulus), but later can become a predictable response (conditioned response) to an unnatural trigger (condition ed stimulus). Over time, the dog comes to exp ect food at the ringing of a bell because of previous experience. F ormalized versions of classical conditioning exist, such as the Rescorla - Wagner model (Wagner, 2008) which proposes, in part, that weighting of stimuli and response connections are updated when animals are surprised by outcomes (e.g., Kamin, 1969). Anothe r model of reinforcement learning can be found in the operant conditioning approach of Skinner (1938, 1963), or instrumental condit ioning in the language of Thorndike ( 1898 ). Both study beh avior - contingent reinforcement, and the subsequent effects of that reinforcement on future behaviors. C lassic experiments by Thorndike include the use of puzzle boxes in which cats were placed and r equired to escape. The cats could escape such as by pushin g a lever or pulling a string. Initially the cats would struggle an d often only escape d by a chance solving of the puzzle, but their ability to escape increased as they gained more practice at perfo rming the required action and were reinforced by being abl e to escape their confinement. Computer science drew inspiration f rom the original research on animal learning completed by psychologists to development reinforcement learning algorithms (Sutton & Barto, 2018). T he essential function of a learning agent i n a reinforcement problem is to identify the best behavioral strate gy, labeled a policy, to apply in a given situation to maximize the reward it receives from its environment. As an agent encounters its environment, it applies some p olicy a vailable to it, and receives rewards based on the success of that policy. Over time , the agent estimate s the expected value of that p olicy a nd can compare the expected value s of multiple policies. Over time the age nt applies more and more valuable policies to its task and improves its performance. Through this iterative action, feedback, and learning process agents can develop novel and powerful solutions to complex problems which are often more efficient and comple x 35 than those which human s develop on their own. Examples i nclude robots navigating an environment ( Sutton & Barto, 2018 ), and games as varied as checkers (e.g., Samuel, 1967), Jeopardy! (Tesauro, Lechner, Fan, & Prager, 2013), and backgammon ( e.g., Tesauro , 2002 ) . A lgorithms of varying complexity for reinforcemen t learning exist depending on the type of learning problem (Sutton & Barto, 2018). Regardless of the complexity of the chosen algorithm s ome of the ir essential features can be directly tied to the types of psychological conditioning described previously. F or example, one of the laws of learning discovered by Thorndike (18 98) was the Law of Effect , which states that behaviors which produce satisfying ou tcomes are more likely to occur again when presented with the same situation, and those which produce unsat isfying outcomes are less likely to occur again in that situation. Sutton and Barto (2018) connect reinforcement algorithms to the Law of Effect , wri ting : F irst, reinforcement learning algorithms are selectional , meaning that they try alternatives and se lect among them by comparing their consequences. Second, reinforcem ent learning algorithms are associative , meaning that the alternatives found by se lection are associated with des cribed by the Law of Effect, reinforcement learning Is not just the process of finding actions that produce a lot of reward, but also of connecting t (p. 358 - 359, emphasis in original). Although computer science applica tion of reinforcement learning is designed for agents in idealized environments, their algorithms are useful for understanding and modeling animal 36 le arning in psychology (Sutton & Barto, 2018) , and may hold the key for understanding transfer as a learning process . The Learning Transfer Model Using the background of dual process theory and reinforcement learning , I propose the Learning Transfer Model (LTM) as a process theory which may account for common effects observed in transfer research. The LTM propos es that learners exit training with a new KSAO for which they must learn if it is a better fit for their work tasks than their previously used KSAOs. Once in the transfer environment, learners encounter relevant tasks and must choose which of their availab le KSAOs to apply. Based on dual processing , the learner will have an initial automatic response based on how habitual that KSAO is at that time. Once this initial automatic response occurs, it may be intervened upon by more deliberate decision processes i f the learner can engage in such processes. However, even in cases where more deliberate processing is possib le , the learner may still apply their old KSAO instead of their new one. Over time , the learner gains experience which will inform their future tra nsfer decisions, and , with many applications , develop their new KSA O into a new automatic response. The basic outline of the LTM can be found in Figure 1. This description represents the general form of the LTM, but t o build strong theory for testing and future development, a key point of this paper is to develop a forma l model. The rest of this section will be dedicated to explicating that formal model. The backbone of the formal LTM is based on the algorithms of k - armed bandit problems, and unless other wise noted all information presented in the following discussion is based on uctory text to reinforcement learning. In k - armed bandit problems, a learning agent, synonymous with an individual transferring knowledge in the cu rrent 37 theory, attempts to choose the optimal solution from a number (k) of pre - defined behavioral options. Th at choice is made through estimating the long run value of each available policy through an iterated sampling and feedback process. K - armed bandits have four important components. First, each behavioral option avai lable to the agent is called a policy. In the LTM , each agent has access to two policies representing their pre - training KSAO ( Policy A ) relevant to the theoretical work situation targeted by the training intervention, and the organizationally - introduced K SAO relevant to that situation ( Policy B ). The assumption that only two policies are of interest for transfer questions makes the approach used here a 2 - armed bandit problem. Second, each p olicy has a reward function, or true value, which dictates the dist ribution of rewards the agent receives whe n the agent chooses to apply that policy. Third, estimate s of the value of the policy which represents the predicted reward of the Policy A ccording experiences applying that policy. Thus , the agent i s estimating the reward of each policy a nd attempting to discover the best policy to apply at each time step. Fourth, the agent does not always exploit the policy which it currently deems t he most valuable, and sometimes explores other potential policies i nstead. The inclusion of a minor amount of exploration terms such methods as E - greedy , where the agent greedily exploits the current most valued p olicy b ut explores with some rate of error. S everal important aspects to this approach to reinforc ement learn ing are worth mentioning. First, agents learn based on the evaluation of actual actions they take, not from instruction by outside entities. This is one point which separates the current mo del from the CLARION (Sun et al., 2005) model previously discussed. Second, learning in such agents is limited to a single, unchanging situation. That is, the value of each policy is fixed because the environment to which they are applicable is unchanging. Sutton and Barto (2018) describe such 38 approaches as non - associativ e, where the agent does not need to choose which policy to use in different situations. There are more sophisticated reinforcement learning approaches that can be applied to changing situat ions, but these are more complex than necessary at this stage of de veloping the LTM but could be utilized in the future to study adaptive transfer . For now, we will assume the transfer situation is stable enough to apply their newly learned policy. Third, the k - armed bandit approach assumes that the goal of the agent is t o maximize the long - term value of their actions. Fourth, events in bandit problems are episodic as opposed to continuous. Finally, the reward received by the agent at each episode is random ly chosen from a stationary distribution of the rewards associated with that policy. The application of k - armed bandits to humans in transfer environments requires at least three other assumptions to be made. First, individuals /agents exit their learning experience with the ability to apply the targeted KSAO r epresented in their new policy. This assumption suggests that this model is currently more applicable to maintenance than generalization within the transfer space. Second, the learner will not alter t he given policy to fit their own needs once that policy is created. Third, the agent must possess perfect recall of their experiences when attempting to apply the available policies in order to accurately calculate the expected value of that policy. Given this background, we can fully describe the formal LTM a nd outline how a computational instantiation of that model would operate. Agents , synonymous with learners from here forward , are presented with an abstract task at each time point. For our purposes, the task does not actually matter and will remain undefi ned, other than that the task is such that the agent can be successful or unsuccessful only . The probability of success on any given attempt is defined by the policy which the agent chooses in that at tempt and is equal to the true value of 39 that policy. For example, i f a given policy has a true value of .8 0 , the agent will have an 80 - percent chance of succeeding on the task when applying that policy. In the computational version of the model, success on an attempt will be determined by a random draw from a u niform dist ribution from 0 to 1, with any number below the true value of the p olicy b eing considered successful. If successful, the agent is rewarded with 1 point, otherwise it receives 0. In this way , the mean of a large enough sample of rewards received by the agen t will approximate the true value of the policy. The true value of P olicie s A and B will be represented by the variables R a and R b respectively. The random component here adds a crucial sto chastic element to the model (e.g., Railsback & Grimm, 2012) making the model non - deterministic, and Monte Carlo simulation important for exp loration . This stochastic component represents the idea that any sentially a random draw from all possible attempts of that task. A s an agent attempts its task it must estimate the value of its policies. E stimated values for a p olicy a re a dynamic process by which the estimation of the value at any time t + 1 is a func tion of the value estimate at time t, the difference between the ex pected value and reward on a given application of the policy, and a step - s ize parameter that defines the rate of learning for the agent. The essential framework for reinforcement algorithms follows this framework of NewEstimate < - OldEstimate + StepSize[T arget OldEstimate] where Target is the reward at a given time step (Sutt on & Barto, 2018). In k - armed bandits, those value estimates can be obtained through action - value methods, which us e the experience of the agent to drive the estimation. A simple cal culation of the value estimate is to average the received rewards up to th at point in time, thus: Q t ( a ) = (sum of rewards when a taken prior to t )/(number of times a taken prior to t ) 40 whe re Q t ( a ) is the value function for Policy A . A more sophisticated w ay to track the value estimate is as a function of the n th reward: where the expected value of Policy A at step t + 1 is a function of the estimate at step n plus a weighted function of tha t prior estimate and the received reward R t at that time. E stimatin g values this way defines that value as a dynamic process underlying the primary transfer decision process in this model (Dishop et al, in press ). In addition, this equation defines the lea rning r ate as the inverse of the number of steps tak en , meaning lea rning will decrease over time , fitting with the power law of learning (Newell & Rosenbloom, 1981). The above equation provides a ps over time, but the agent also can be given an initial estimate o f that policy. The initial values given to an agent can affect the behavioral decisions of that agent over time and can improve long term results under certain conditions (Sutton & Barto, 2 018). I n the LTM , that initial estimate can be defined by Q 1 ( a ) and Q 1 ( b ) for P olicies A and B respectively. Tracking the expected value of each policy is only part of the learning process. Whenever the agent encounters its defined problem the age nt must choose which policy it will apply. Typically, this occurs t hrough action - value methods of selection, where the chosen policy is the one with the highest estimated value Q t ( a ) or Q t ( b ). Let P t represent the policy the agent chooses at a given time p oint. By choosing the highest value policy the agent is choosing th e policy which it believes will offer the greatest reward at that time point. However, always choosing the highest value policy does not allow the agent to effectively test other pot ential solutions. Instead, the agent can be allowed to explore other polic ies not currently seen as the most valu able to find other , potentially better , policies. This is the classic exploration versus Algorithm 1 . Value Estimate Calculation 41 exploitation choice seen in studies within the organiz ational literature ( e.g., March, 1991). The rate of exploration can be def ined by a variable E ; this approach is referred to in reinforcement learning as an E - greedy method (Sutton & Barto, 2018). In the transfer case of choosing between two possible polic ies, th t is defined as the greater of the two value functions Q t ( a ) and Q t ( b ) with some probability 1 E . The process described so far is very rational on the part of the agent . However, not all choices by individuals are so clearl y logical. The form of choice outlined thus far more closely aligns with type 2 processing systems from dual processing, however type 2 systems are not always engaged and are theorized to intervene, or not, in decisions already made by type 1 proces ses (e. g., Evans & Stanovich, 2013). Thus, we must expand the LTM to inclu de an initial automatic decision and learning process to represent type 1 processes, and a mechanism to determine if the type 2 processes will intervene in that decision. The type 1 proces s hypothesized here is based on the number of times a policy has be en applied. This is the idea that repetition leads to automaticity and that the more times a stimulus and response are paired, the more likely they are to be activated in the f uture. Let Z t ( a ) be the probability of choosing Policy A over B . Z t ( a ) is a func tion of the number of times that policy h as been chosen out of potential times it could have been chosen from A and B . In addition, the agent in a learning transfer context wil l likely have experience with their Policy A prior to entering the learning even t where Policy B is introduced. Thus, the agent should already have some value estimate of that p olicy b ased on their experiences and an associated number of times they have ap plied it. How ever, it is also possible that the agent receives some actual exper ience with their new Policy B prior to entering the transfer environment, such as in the learning event 42 itself. To account for those applications let L represent the number of practice atte mpts the agent has had with the new Policy B . The calculation of Z t ( a ) is then: The default choice of the agent at time t is Policy A at the rate Z t ( a ), and b at the rate 1 Z t ( a ). Once the type 1 process ha s chosen a policy, it is then up to a type 2 process to intervene. However, they do not always do so because they are not alway s able. For example, the agent may not have the necessary resources, whether those resources be cognitive, or exterior to the agent such as time. It would be possible to t heorize about the specific effects of various factors that may affe ct the likelihood of employing type 2 processes . However, f or simplicity the present model will cover all effects in a percentage chance that type 2 processes are implemented. The chance of engaging in type 2 processing at any time point will be defined as S 2 and ranges from 0 to 1. If the agent engages in type 2 processes, then the decision process outlined previously is utilized which may or may not result in the same decision arrived at b y type 1 processes, which refines the policy choice of type 2 proce sse s to be: Which represents the policy P chosen at time t given type 2 processing is the maximum value o f policies a and b with a likelihood dependent on the amount of exploration des ired , E. If the agent does not apply type 2 processes, the type 1 d ecision is utilized. In either case the agent updates relevant equations based on the outcome of their action and moves on to the next attempt. Algorithm 2 . Type 1 Process Equation Algorithm 3 . Probability of Choosing Type 2 Processes 43 All parameters and equations for the model c an be found in Table 1 and Table 2 respectively. It is important to note that almost all aspects of this process could draw on more complicated conceptualizations of that individual theory, however that is not the point of starting a modeling cycle in an a rea that has never been covered before. The present model should be viewed as a buil ding block for future theoretical development. 44 Study 1: Method The described model above was instantiated into an agent - based model using the simulation program NetLogo (Wilensky, 1999) . Although NetLogo does not offer as much flexibili ty as other programs such as R, NetLogo is a specially designed platform for implementing agent - based simulations. Although only a single agent is being studied in the present model , utiliz ing this platfor m allow s for easy expansion into later iterations t o examine multiple agents in networks, teams, organizations, etc. The equations outlined above were used to determine the learning and behavior of the agent modeled over time. A snapshot of the modeling en vironment and the code for use in NetLogo are avail able in Appendix A, and a copy of the program itself are available from the author upon request. Model outcome metrics To analyze the potential of the model to account for the important tr aining effects d escribed above, two primary outcomes were chosen to track within the modeling environment. Much has been written about what aspects of training outcomes are important to measure to describe training success. ology describes important outcomes at four levels: reactions, learn ing, behavior, and results. Much research in organizations is limited to reactions to training, despite reactions being probably the least informative. Other emphasis has been placed on cog nitive outcomes of training, such as learning, which has driven muc h research over the last couple of decades (Kraiger, Ford, & Salas, 1993; Ford, Kraiger, & Merritt, 2010). These two levels of outcomes have implications for the present models. Utility per ceptions are a t ype of reaction to training, but l earning outcomes take a background role in the LTM as the agent having successfully learned the new policy is an assumption made for simplicity. 45 To measure important outcomes in the modeling for this paper , a reemphasis must be placed on measuring behavior and outcomes. T he shift in emphasis to focus on cognitive outcomes of training moved the field from a focus on behavioral change (Kraiger & Ford , 2007), but these outcomes are focused on effects emerging from the training event itself. The study of transfer of those lear ning outcomes to on - the - job behavior is an area of needed research (Ford et al., 2010) and the present model is intended to help describe the process of that transference. Behavior and perf ormance outcomes of the agents in the models then become the key va riables of interest. A behavioral measure was created as the percentage of time the target policy, Policy B , is implemented by th e agent. Measuring behavioral choice outcomes in this way al so align s with definitions of learning which focus directly on beha vioral change (e.g., Myers, 2004). In addition , performance of the agent was tracked over time and was defined as the percentage of times the agent successfully completes its abstract task. Additionally, each agent stored their performance after an initial burn - in period which represents the pre - training phase and is a time when the agent can only apply its first policy. The agent t hen stored their performance at the end of the defined trans fer period both for their overall performance and their performance just within the transfer period . this model is equal to the percentage of time the agent successfully completes its tas k. Further, saving performance both pre - and post - training allowed the model to be analyzed as a pre - post intervention design, providing for greater insight into the causal effects of adding a def ined Further, do ing such a pre - post performance comparison aligns with our adopted definition of transfer as that results from a training experience transfers to the job and lead s to meaningful changes in et al. , 2010, p. 1066 ; emphasis added ) . This will be accomplished via 46 calcu d for conditions comparing pre - training and post - training performance, allowing for both easier comparison to e xisting effect sizes in the research literature, and placing results into a standardized metric to help correct for any idiosyn crasies that may make the interpretation of raw effects misleading. Analysis Analysis of computational models does not follow th e typical procedure of empirical research. Instead of testin g traditional statistical models, testing of the LTM follow ed commo n cycles of computational model exploration (e.g., Railsback & Grimm, 2012). Important steps include verification, showing generative sufficiency, and exploring sensitivity and robustness. Verification includes confirmation that the implemented model is co nsistent with the proposed theory (Banks , Carson, Nelson, & Nicol , 2010). This was accomplished via logical consistency checks by the author, and testing of the mechanisms of the model to e nsure basic relationships expected occur when the model is ex ecuted . Generative sufficiency entails confirming that the model can recreate general effects known from real data. Achieving generative sufficiency does not confirm that the proposed model is th e explanation of the process being studied, but it does confi rm the model is a possible explanation of that process (Epstein, 1999). Finally, sensitivity and robustness entail an exploration of the model parameters to determine how sensitive the model is t o changes in initial conditions and violations of assumptions (Rail sback & Grimm, 2012) , which achieves three goals. First, it is not clear at which levels of various parameters common effects seen in the literature may manifest, exploring the model allow s the tuning of parameters to more accurately reflect reality. Secon d, model exploration allows the discovery of potential discontinuous effects of parameters where the results of the model change rapidly as initial conditions for that parameter change. Thi rd, it may reveal unexpected or interesting 47 findings, which i s not the goal of the model but can be useful for providing insight to real world phenomena or guide future research. All four of these steps were executed and will be outlined below. Since comp utational models create simulated data , output statistics nee d to b e interpretable without traditional significance tests because they lose meaning when you can simulate as much data as you desire and you program in most primary effects (e.g., Railsback & Grimm, 2012). Instead, we must use summary statistics and cor relati ons to describe effects of interest . We can use these to calculate effect sizes of parameter changes on model outcomes and compare these to meta - analytic effects. Another key tool in the co heat maps, which can provide vis ualiza tions parameter effects which are easily interpretable and can show transition points in model parameters that drastic ally impact model outcomes. The same basic approach was utilized for an alyzing all models in this paper. 48 Study 1: Simulation and Results In this section I will outline simulations directed at the four main steps in exploration: verification, generative sufficiency, sensitivity, and robustness checks. Model verification Pr ior to beginning any simulation, the model wa s subjected to a serie s of verification checks (Banks, Carson, Nelson, & Nicol, 2010) outlined here. Logical Consistency The model was executed in NetLogo as outlined in the theoretical development and research methods. The only alteration of the theory ma de for computational e fficiency was to code the relationship between type 1 and 2 processing slightly differently than the default - interventionist approach outlined in the theory. Instead of making a default dec ision then choosing if the agent will use the ir type 2 processes to intervene, the agent decides if they will use their type 2 processes or not first. If not, then they make a habitual choice and implement it, if they do use their type 2 processes, they ma ke their more rational decision as if they we re intervening in an e xisting but now inconsequential habitual reaction. Although the code is not strictly default - interventionist, its outcomes should be identical while avoid ing the computational inefficiency of performing a default judgement when it wou ld be overridden anywa ys. Parameter Effects Check Once implemented in NetLogo, a series of tests were run to ensure that when parameters were adjusted, corresponding and expected changes occurred within the mode l. The following outlines a series of tests t o show the adjustment of each parameter corresponds with the desired effects . 49 Simulation Length The first test confirmed the desired lengths of the pretraining and post - training simulations. Parameters besides the pretraining and transfer for these simula tions are of no conseq uence and were held at constant levels. To test the length of the pretraining periods, one simulation each was run with a length of 250 - and 500 - time steps with no transfer time allowed. T hese returned the 250 - and 500 - time step lengt hs expected. A similar test was then completed with transfer lengths of 250 - and 500 - time step lengths but no pretraining period. These again returned the expected lengths to 250 and 500. Policy Value To test t he effect of the true value of the policies, t he success rates of th e policies were checked across a series of simulations. To check the value of Policy A , the simulations focused on the pretraining period because only Policy A is available to the agent. S imulations were run for 500 - time steps, with t rue policy values of . 50 and .75. Success rates for these simulations were .50, and .76. Given this was a single simulation, this confirms the expected effect of the success rate of Policy A . A test of the val ue of Policy B is more complicated because it was only available in the transfer environment. To isolate the effect of Policy B , the value of Policy A was set to 0, and no pretraining time was allowed. In addition, no exploration was allowed and Type 2 thi nking always e mployed. This should force the agent to apply Policy B alone. Values of Policy B were tested at .50 and .75, with 1000 transfer attempts. True success rates in these conditions for a single run were .51 and .75, in line with expectations. 50 P olicy Value Esti mates Two tests were completed to check the veracity o f policy value estimates, corresponding to initial and final estimates. Initial estimates should correspond to the set initial value estimate for the defined policy, such as .5 0 or .75. To check this, m odels were executed and the value of the policy estima te at the first time point was verified to be equal to the value set for the simulation. Additionally, at the end of the simulation, we should expect value estimates to approximate the true value of th e underlying policy representing an accurate judgement on the part of the agent under ideal conditions. To assess this, models were run to isolate the effects of both Policy A and Policy B at levels of .50 and .75. For Policy A , only pretra ining time was allowed, run for 500 steps (the maximum allowed in the simulation). The model was run 10 times at each level. For these 10 runs, results ranged from .472 to .546 with a mean of .51, and from .742 to .802 with a mean of .769 for the .50 and . 75 levels respectively. For Policy B , the transfer environment was iso lated and run for 1000 steps, the maximum allowed in this simulation. For these runs, the range for the .50 policy was from .487 to .518, with a mean of .50, and for the .75 policy the r ange was .734 to .761 with a mean of .75. Exploration Rate To check th at t ype 2 thinking processes are willing to explore at a defined rate, the simulation was set up with a value for Policy B at 0, and Policy A at 1, and a 100% chance of Type 2 processes engaging. This should result in Policy A being chosen nearly every tas k attempt, except for a rate approximating the defined exploration rate. This simulation was run 10 times with an exploration rate of 10%. These simulations ranges from .088 to .106 in r egard to rates of choosing Policy B , with a mean of .096. This is in l ine with the expected value of .10. 51 Given the results observed in these checks, it appears the simulation is operating as expected. Generative Sufficiency , Sensitivity and Robustness F ollowing model verification, a series of experiments were conducted to assess the model for generative sufficiency. This section outlines the attempts to determine if the model could generally account for existing findings in the training and transfer lite rature. Due to the nature of the experimentation, the model was essent ially simultaneously checked for sensitivity and robustness as parameters were tuned to better represent naturally observed phenomena. To accomplish this, p arameters were ma nipulated ini tially via coarse sweeps of the available space for the parameter of i nterest, holding all other parameters constant, to determine the effects of the parameter and to ensure that the model code reliably changes the levels of parameters (which is in some wa ys a continuation of the verification process) . As modeling proceed ed , experimentation bec ame iteratively more complex and focused on potentially interesting facets of the model in a way guided by the emerging findings of the modeling process. In addition, though generally desired end results were known from meta - analys e s, l ittle if any guidance exists on how strong a manipulation is from a mathematical standpoint to determine the size of manipulation to make in the experimental code a priori. Therefore, in itial exploration aim ed to tune the model parameters to create reasona ble transfer outcomes , for example obtaining ds on pre - post measures of performance on .3 - .5, and not exceedingly large effects such as 2 or more. T rue policy values The first s et of models aimed to tune the model into a reasonable parameter space regarding the values of both Policy A and Policy B . D efining policy values that are both 52 representative of the type of tasks in which we may be interested in the real world, and the sep aration in policy values which will reproduce reasonable transfer effe cts are important consideration s . For example, if we were interested in improving baseball batting skills, the success rate of each policy should be very low, such as .25 to approximate the batting average of Major League Baseball players. On the other han d, success rates for performing well defined tasks on an assembly line are likely .95 or higher. Most closed skills in regular organizations probably exist at this hig h end of the value continuum, but open skills may be much lower. Two slightly different ways of parameterizing the policy values were explored here. In the first version, the true policy values of A and B were independently set. Based on the above discus sion of the possibl e range of relevant values, the true values of both Policy A and B wer e swept from 0 to 1 in .05 increments, fully crossed with 500 replications each. Runs were a 250 burn - in and transfer period, exploration rate set to 10%, system 2 act ivation 50%, and in itial policy value estimates set to .5. To analyze the results heat ma ps were generated of the effects on behavioral transfer rates, and pretraining - post - training changes in performance as measured in d . Results for behavioral tr ansfer and performa nce change can be found in Figures 2 and 3 respectfully. In examining these results , we can see that behavioral transfer rates range from about 5% to about 55%. Low numbers make sense given effect of habitual response and time allowed f or Policy B to over ride that previously habitual response. This low rate of transfer also aligns well with expectations given the low amount of transfer commonly cited in the research literature (Ford et al., 2011). Performance change also shows the genera lly expected patter n . We see a diagonal where when policy values are equal lead to no per formance change, as expected. There are negative performance changes below that diagonal representing training a policy that is less 53 valuable than the existing policy, and positive value s above the diagonal representi ng improvements of the new policy over the old. In addition, improvements appear to be stronger than corresponding decrements, which makes sense because agents should abandon the new policy if they do not see it as an improve ment. The sudden change in magn itude of effects across this diagonal s uggest a sensitive area of the model where values change suddenly and dramatically. Changes d range from - 3.28 to 11.69. Obviously, the upper and lower portions of th is range are well outside of wh at we might expect in the training lite rature, indicating that some areas of policy values are essentially out of bounds regarding their ability to replicate reality. However, along the diagonal where values of Policy B are j ust barely higher than the valu es of Policy A , we see performance effe cts of about d = .3 0 , indicating that when the value of Policy B is slightly greater than Policy A the model is able to reproduce the essential effect of training we expect from research experience. Although this ini tial result is promising, the way these parameters are defined limits the ability to vary policy values along with other variables in the future while maintaining low enough dimensionality that the results may be interpreted. Thus, the decision was made to redefine the true value of Policy B in direct relation to the true value of Policy A . This was accomplished by inclusion of a parameter indicating the change in the true value of the policies moving from Policy A to Policy B . So, for example, if Policy A was given a true value of .5 0 , and the change in policy value defined as .1 0 , Policy B would have a true value of .6 0 . In the first set of models we saw that what appears to matter in making the model a plausible representati on of the real world is Policy B having a slightly greater value than Policy A . By reconfiguring the model to define Policy B in direct relation to Policy A , we can better home in on a difference between the two policies which best represents reality. 54 In the updated model, simulations were run sweeping Policy A value fro m 0 to 1 in .05 increments, with policy value change swept from - 1 to 1 in .05 increments. Runs were a 250 burn in, exploration rate set to 10%, system 2 activation 50%, and initial poli cy value estimates set to .5 0 . 500 replications of each were run. Beha vioral transfer and performance change results can be seen in Figures 4 and 5 respectively. Results show little behavioral transfer when the policy change is negative. This is expected a nd indicates agents are discard ing the new policy except for a small a mount due to the exploration factor when the new policy is worse than their old one. This is a pattern that we would hope to observe in the real world as we would not want to have employ ees using a worse behavior if t hey do not have to. Overall, transfer r ates range from about 6% to 55%. Interestingly, some nonlinearity appears to be occurring where the highest transfer happens when policy values start initially low and change a lot (as w ould be expected), but when pol icies start low and only improve a litt le transfer actually does not occur as much as when policies are already valuable and change upwards a little bit. Transfer rates of about 30% run in a line along a change of .4 0 when Po licy A starts at 0 to .1 0 when Policy A starts at .7 0 . We see similar patterns in performance change. There are slight performance decrements when the new policy is worse than the old as we would expect because a worse policy does get applie d sometimes. We also see expected incredibly high performance improvements when the e xisting policy is low and the new one is high. However, higher d s are seen with higher starting policies in many instances as with the above behavioral change. The kind of effect sizes w e tend to see for performance improvement, or at least expect, occur i n a similar diagonal to behavioral transfer where it requires less improvement when prior policy values are higher. A great example is at a Policy A being .7 0 and an improvement of only .05 is a d of .34. Given the convergence of 55 behavioral transfer rates and performance improvement to reasonable ranges when Policy A is .7 0 and policy change is .05 these values were selected for use in further modeling efforts. Timing of interventions Wi th a d efensible level of policy values to use to define the model, it was also important to explore the effects of pre - training and transfer times on the model to ensure proper time was allotted for each. History with Policy A represents the length of time the a gent applied that p olicy b efore the introduction of Policy B . Th at history was modeled by a burn - in period where the only p olicy a vailable to the agent is Policy A . Exploring timing of an intervention importantly accounts for the history adult learne rs bri ng to their learning events (Knowles, 1984), and begins to accou nt for overcoming established automatic responses once that learner returns to their work environment. It was expected that l onger periods of time where the agent could only access their Polic y A would result in reduced transfer of Policy B as A will be mo re likely to be activated by type 1 processes, and this effect will hold longer in the face of repeated application of Policy B . To explore the effects of training and transfer time sim ulations were run sweeping burn - in and transfer time each from 25 to 5 00 time points in 25 step increments. As discovered in the above simulations as an interesting and applicable level of policies Policy A was set to a reward of .70 and the policy chan ge to B at .05. Exploration rate set to 10%, system 2 activation 50%, and initial policy value estimates set to .5 0 . 500 replications each . At the condition level, pretraining (burn - in) time was correlated with behavioral transfer at r (200000) 1 = - .48 ( p < .0 01) , 1 S ample sizes, degrees of freedom, and significance have been reported for statistical anal yses, but it should be reiterated that traditional interpretation of significance holds no meaning in the context of computational models. Sample sizes and associated degrees of freedom for statistical tests are arbitrary when one has control over the mode ling environment as mo re data can always be simulated. Therefore, readers should focus on reported effect sizes and interpret any associated significance conclusions w ith extreme caution. See Cumming (2014) for further discussion of the limits of null - hypo thesis significance te sting and move towards the use of effect sizes to improve research in general. 56 and performance change at r (200000) = - .31 ( p < .001) , indicating less transfer when the agent had used its old policy for longer prior to training, as expected. On the other hand, transfer time was related r (200000) = .80 ( p < .001) to behaviora l tra nsfer, and r (200000) = .58 ( p < .001) to performance change, indicatin g that the longer the agent had to attempt transfer the more likely they were to do so. d ) were plotted in heat maps , these can be found in F igures 6 and 7. In these depictions , we see that e arlier training improves transfer rates . There also appears to be a possible augmentation effect where t he combination of early training and a long time to adopt the new behavior leads to much greater transfer rates. Interestingly, it is also apparent that pre - training time quickly overwhelms the effect of longer transfer time. From these results, a burn - in of 100 with a transfer of 500 might be reasonable to use for future exploration of other parameters to produce transfer and per formance improvement levels commensurate with real - world levels. These results also seem to be suggesting that although the present process may be a good approximation for a possibl e tra nsfer process, the effect of habits might be too strong. But this coul d be caused by either the habit process itself, or the low level of agent ability to engage in type 2 thinking. As such, that was explored next . Type 2 Processing Next, the effect o f bei ng able to engage in type 2 processing was examined independently of o ther variables. Here, a greater ability to engage in type 2 processes equates to a greater opportunity to use their new skills, or a situation strength where they are free to make t hat t ransfer choice more independently. For these, Policy A value was held at .7 0 , Policy B as .05 57 better, 100 burn in time , 500 transfer time , .1 0 exploration rate, .50 initial value estimates. Likelihood was swept from 0 to 1 at .01 intervals. 500 replication s were chosen for resolution. Because only one variable was being exam ined here, a slightly different approach was utilized to examine the res ults. First, a correlation between type 2 likelihood and behavioral transfer at the replication level revealed a r elationship of r (50500) = .51 ( p < .001) . As the likelihood of engagin g in type 2 behaviors has been argued here to be akin to the opportunity to comparable effect size was around .30 - .40 (Blume et al ., 2011). This suggests the model was able to essentially replicate th e expected pattern of results. For further analysis, i nstead of heat map s, a linear regression was utilized to examine both linear and curvilinear relationships between the likelihood of engaging in type 2 processing and behavioral transfer and performance change. Through this analysis it was found that behavioral transfer was predicted by type 2 likelihood, at the condition level, at a linear rate of .756 ( = 1.43, t = 58.253, p < .001) , and curvilinear rate of - .233 ( = - .46, t = - 18.520, p < .001) intercept was - .016 ( t = - 5.848 , p < .001 ; F (2, 98 ) = 12949.35, p < .001, R 2 = .998 ) . d of performance change was predicted from type 2 li kelihood at a linear rate of .9 73 ( = 1.28, t = 10.888, p < .001) , and curvilinear rate of - .249 ( = - .34, t = - 2.882, p = .005; intercept of - .292 t = - 15.117, p < .001 ; F( 2, 98 ) = 519.44, p < .001, R 2 = .96 ). Graphs of predicted and observed behavioral transfer and performance chang e can be found in Figures 8 and 9 . From these you can see that as type 2 likelihood improves, so does behavioral transfer and perform ance change. Interestingly, performance change displays a negative effect at low levels of t ype 2 likelihood, but turns pos itive once likelihood is above about 33%. In addition, a likelihood of .8 0 seems important as there is a spike in performance improve ment and transfer 58 rate improves to near . 40. Due to this effect, further models used likelih oods of .8 0 unless stated other wise. Practice and Overlearning W e know that practice with a new skill can improve transfer outcomes, especially when that skill is practiced to the point of overlearning. of the comparable meta - analytic effect of overlearni ng is .298 (Driskell et al., 19 92). Although the Driskell et al. meta - analysis uses retention as an outcome, retention may be approximated by whether the agent is still applying the new p olicy a t the end of the simulation run. The effect of overlearning in the present model was tested b y the manipulation of the level of L from 0 to 200 in steps of 25. The high end of 200 was chosen because it r epresents up to twice number of pretraining task attempts. For these simulations Policy A was set to .7 0 , change in value to .05, system 2 activat ion .8 0 , 100 pretraining time step s , 500 post training time steps , and exploration to .1 0 , with 500 replications of each condition. When examined at the replication level, that is individual agents, we see a correlation betw een practice attempts and behav ioral transfer is r (5500) = .118 ( p < .001) , and the correlation with post training performance (so only performance after the training event) is r (5500) = .074 ( p < .001 ) . These relationships are in the expected direction, b ut substantially lower than the comparable meta - analytic ef fect . For further analysis, condition - level results were calculated for behavioral transfer and performance - change. These results can be found in Table 3 . There, you can see that there is a clear b enefit to practice as we would expect, though it is difficu lt to tell the strength of the effect. To remedy this, the data was reanalyzed as a series of experiments comparing a control condition with no practice attempts to ever increasing amounts, with di fferences expressed to the cont d f or both behavioral transfer and 59 performance change in Table 4 . Through these analyses we see that performance improvement remains relatively low, even in stronger conditions, while behavioral trans fer improves quite substantiall y as practice increases. Des pite this, it is evident that although the general positive effect of practice is obtained, it may not be easily tuned to better approximate typical research findings . Utility reactions Utility re ion regarding the usefulness of their learning experience (e.g., Ruona et al., 2002) and are strong predictors of transfer (Blume et al., 2010). Utility reactions can be equated to the initial value estimates of a learning ag ent in reinforcement models. Fo r example, Sutton and Barto exploration by the agent. In the same way, a transfer agent in the LTM that has a higher initial expectation of the value of their new policy sho uld be more likely to transfer that p olicy b ecause they are willing to explore its potential. To test this effect, the model was explored by varying initial policy value estimation for Policy B from 0 to 1 in .05 steps. Policy A was set to .7 0 , change in v alue to .05, type 2 activation .8 0 , 100 pretraining points, 500 post training, exploration .1 0 , and 500 replications per condition . At the replication level, results reveal almost no relationship between initial value estimate and our outcomes of interest . Specifically, the relationshi p between initial value estimate and behavioral transfer was on r (10500) = .02 ( p = .025) , and only r (10500) = .01 ( p = .254) with post training performance. These results are not in line with what was hoped for regarding ex isting effects of utility r eact ions. One possibili ty is that there was too much noise at the individual level regarding outcomes, so results were also examined at the condition level. There, the relationship between initial value estimates and behavioral t ransfer was r (21) = .476 60 ( p = . 029) and wa s r (21) = .570 ( p = .007 ) for performance change. The condition level results for this exploration can also be found in Table 5. Transfer trajectories A subset of models was run to examine the development of trans fer over time to explain the de velopment of the various trajectories described by Baldwin and Ford (1988). To examine these trajectories, models were run using the baseline parameters of Policy A value of .7 0 , change in policy value of .05, type 2 activati on of .8 0 , 100 pre - training, an d 500 post training time points. The goal here is only to show the transfer trajectories described by Baldwin and Ford (1988) are possible within this model. Thus, the model was run several times, examining the shape of the b ehavioral transfer rates wit hin the modeling environment. Some examples of transfer trajectories from the model can be seen in Figure 10A - D. These examples show a variety of transfer trajectories, such as (A) initially high levels of transfer and later tap ering off; (B) initial failu re to transfer with later increased transfer; (C) immediate and consistent transfer; and (D) a general failure to transfer. Implementation Intentions As discussed, implementation intentions are used to establish an automatic li nk between situation and res pon se to improve the automaticity of that response (e.g., Gollwitzer, 1999). Although also impacting the automaticity of applying Policy B , implementation intentions are not the same as practice or overlearning which is already included in the model. To ac cou nt for the improved automaticity brought by implementation intentions let us instead define a variable I as the percentage increase in chances of applying Policy B when type 1 processes are enacted. This changes our calculati on of to be: 61 We can then manipulate the level of I to explore the effects of implementation intentions. This tweak was coded into the model for exploration. Due to the underlying math behind the simulation, when the agent has no history of engag ing in Policy A , the likelihood of doing so when only Type 1 processes are available should be equal to 0 minus the defined level of implementation intentions. This was verified with a level of implantation intentions at .1 0 , which returned a simulat ed cri tical value for automatically applying Policy A of - .1 0 , as expected. Implementation intentions were explored from 0 to .5 0 in .05 increments, 500 replications each. L ikelihood of type 2 processing was set to .8 0 , Policy A value was held at .7 0 , Pol icy B as .05 better, 100 burn in, 500 transfer, .1 0 exploration rate, and .50 initial value estimates. From these, the replication - level correlation between implementation in tentions and post training performance was r (5500) = .111 ( p < .001) and was r (550 0) = . 193 ( p < .001) with behavioral transfer. Condition - level results for this experiment can be found in Table 6, which shows a steady improvement in behavioral transfer as implementation intentions increase. However, the effect on performance improvemen t is m uch less consistent. Exploration rates It is typical in both reinforcement learning problems (Sutton & Barto, 2018) and in organizational research (e.g., March, 1991) to have an exploration parameter in the model. Within this model, exploration repre sents the degree to which an agent is willing to explore behavioral policies that they do not currently see as their most valuable. In the real world suc h exploration would be akin to an employee searching for a better way to do their job than their curren t domi nant approach. Typically, there is a trade - off between exploration and exploitation Algorithm 4 . Type 1 Process with Implementation Inte ntions 62 for overall performance where some degree of exploration is ben eficial but too much can hinder performance (e.g., March, 1991). One possible implication of this model is id entifying a degree to which trainees should be willing to explore new task approaches in order to maximize their performance. To explore this possi bility, while holding all other parameters constant, the exploration parameter was swept in .01 increme nts fr om 0 to 1.0. In examining this simulation, we find a negative overall relationship between exploration and behavioral transfer ( r (50500) = - .368 , p < .001 ) and post training performance ( r (50500) = - .168 , p < .001 ). Similar results were seen at the condit ion level with behavioral transfer ( r (101) = - .600 , p < .001 ) a nd performance change ( r (101) = - .514 , p < .001 ). Such relationships are initially s urprising as it was expected that a willingness to explore would allow the agent to find more optimal s olutio ns. To understand this relationship better a regression was run examining both the linear and curvilinear effects of implementation intentions on b ehavioral transfer and performance change. In doing so, we find exploration to have a linear relationsh ip wit h behavioral transfer of .882 ( = 2.08, t = 13.333, p < .001) , and a curvilinear relationship of - 1.136 ( = - 2.77, t = - 17.754, p < .001; intercept of .322 , t = 22.492, p < .001 ; F (2,98) = 273.85, p < .001, R 2 = .92 ). With performance change we see a linear relationship of 1.335 ( = 2.16, t = 10.757, p < .001) , and curvilinear relationship of - 1.653 ( = - 2.76, t = - 13.767, p < .00 1; intercept of .145). In addition, you may find predicted and observed values of behavioral transfer and performance change in Figures 11 and 12. These re sults show the effects of exploration peak at some moderate level and further exploration proves detr imental for transfer outcomes. Exploratory experimentation Finally, one strength of building a computational model of a proposed theory lies in the ability to execute virtual experiments which can guide future real - world data collections. This 63 allows us to test novel moderations or interventions which would be difficult to justify spending the resources on to test in real data collections without some prior empirical guidance. In addition, it could lead to the discovery of novel interactions which can lead to targeted data collections to help further support or refute the veracity of the proposed theory. Given the positive relationships found in the present m odel between implementation intentions, type 2 likelihood, and our studied outcomes , a virtual experi ment was designed to test the mutual effects of implementation intentions and type 2 likelihood on those outcomes. Given their independent positive effects on behavioral transfer and performance change, it was expected the two would have an augmenting effe ct where high levels of both would result in the highest outcome levels. To explore this possibility, a virtual experiment was designed where the paramete rs for both implementation intentions and type 2 likelihood were swept from 0 to 1.0 in .05 increment s, fully crossed with each other. The other parameters were held constant at the levels settled on above: 100 pre - training burn - in time periods, 500 post t raining time points, Policy A value of .70, change in value of .05, exploration rate of .10, and init ial policy value estimates of .5 0 . To explore these effects, both a more traditional multiple regression approach to analyzing interactions and heat maps w ere employed. Predictors were mean centered prior to estimating the regression and an interaction ter m created from the products of our two predictors. In predicting behavioral transfer, it was found that type 2 likelihood ( F (3, 2020496) = 57700.917, p < . 001, R 2 = .663, b 0 = .646 ( t = 1457.75, p < .001) , b 1 = - .217 , 1 = - .236 ( t = - 148.16, p < .001) ) actually had a negative main effect, but intentions had the expected positive (b 2 = .496 , 2 = .539 ( t = 338.45, p < .001) ) main effect, along with a negativ e interaction effect (b 3 = - .925 , 3 = - .305 ( t = - 191.32, p < .001) ). A similar pattern of results was found in predicting post training 64 performa nce (b 0 = .732 ( t = 14774.36, p < .001) , b 1 = - .011, 1 = - .133 ( t = - 67.09, p < .001), b 2 = .025, 2 = .302 ( t = 152.50, p < .001), b 3 = - .046 , 3 = - .169 ( t = - 85.29, p < .001) ). These interactions have been graphed in Figures 13 and 14 respectively. From these visualizations, we can see that the expected augmentation effect does not emerge. Instead, we see the best transfer and performance occurs when intentions are high, but type 2 pr ocessing is low. To better understand this effect, heat maps were created examining the condition - level results on behavioral transfer, post training performance, and performance c hange which can be seen in Figures 15 - 17. These heat maps confirm the best o utcomes occur with high intentions but low type 2 likelihood. They also show the interaction effect is more nuanced than suggested by traditional analyses in that the worst outcome s only occur when both implementation intentions and type 2 likelihood are l ow, as we originally expected. However, the fact that both the worst and best outcomes occur when type 2 likelihood is low, combined with some non - linearity in the change in effect s across implementation intentions as type 2 likelihood increases obscures t he benefits of type 2 likelihood in this experiment. 65 Study 1: Discussion The primary goals of this paper are to: 1) build a process - oriented theory of training transfer, 2) fu rther integrate disparate related theories, 3) incorporate dual process cognition and reinforcement learning more fully into the organizational sciences, and 4) provide a computational model for virtual experimentation which may provide novel insights for both theory and practice. Over the course of several rounds of virtual experimentation , progress has been made towards all these goals. Let us discuss a few of the more important theoretical and practical implications uncovered thus far. Theoretical Implic ation s The primary goal of exploring a computational version of the LTM was to show the theory can reproduce common findings in the broader research literature. This is the process of showing generative sufficiency (Epstein , 1999 ). In the explorations dis cusse d here, it appears that the model can reproduce general patterns of findings in the literature for several important effects, especially regarding the direction of those effects, if not the precise magnitude. However, not all expected effects were obs erved , indicating the model, although promising, is not yet complete. Here, I will review the standing on some of those effects. First, the explorations here show that the proposed theory can reproduce the generally expected effects of training on behavio ral t ransfer and performance outcomes we see in the literature. For example, behavioral transfer rates fall in the 10 - 50% range, which covers typical estimates of transfer in organizations (e.g., Ford et al., 2011). However, plausible effects for observed train ing outcomes only occur in the model within relatively narrow ranges, especially regarding the parameters governing the true value of the behavioral policies. This limitation could be for at least two reasons. First, it is possible the model breaks do wn ou tside of this 66 narrow band of policy values and does not necessarily operate in a way clearly mappable onto real - world phenomena outside of this band. Such a limitation would not itself invalidate the theory , merely place limitations on its generalizab ility as occurs with any model. Second, it could be an indication of the narrow range of situations we tend to study in the research literature, which is likely to at least be somewhat part of the explanation. In studying training interventions in organiza tions , we typically enter an organization to deliver a training program to employees who have some degree of experience on the job where they are already successful to a greater or lesser extent. The intervention delivered is likely to be a slight improvem ent o n however they were trained, or however they discovered, to do the job prior to our arrival , despite any organizational claims of the great improvement individuals are likely to see . This naturally creates only slight differences in policy values in t he te rms of the presented model, and therefore it makes sense that such small differences are where the model best matches existing data. Rarely, if ever, in research would we encounter a situation where we are training individuals, and collecting the nece ssary data, who are completely incompetent at a task and providing them with the skills to be almost perfectly successful on that task. This situation obviously occurs to some extent when new employees are trained from scratch, but this is not the focus of the kinds of individual studies which are generally conducted. If we were to compare completely novice performance to their later post - training performance, it is more likely that we would see the kind of extreme effects demonstrated at the edges of the p resen t model. As such, the presented model in some regions may be more broadly applicable to studies of the development of expertise than just training transfer (e.g., Benner, 1982) In addition, implementation intentions (Gollwitzer, 1999) appear to work well in comparison to the limited body of research on their use in organizational training interventions. 67 The observed effects in the present model were in the expected direction, and plausibly scaled effects were found for both behavioral transfer and per forma nce. Unfortunately, there is not yet a meta - analytic estimate of this effect known to the author of this paper, but typical training results for implementation intentions appear to fall in the medium to large effect range (e.g., Friedman & Ronen, 2015 ), mu ch as observed here. Thus, the present model appears to account for the general effect of intentions . Unfortunately, the effect of practice in the present model creates effects in the desired direction but does not really work as would be expected in real training situations. Namely, although we know practice and overlearning opportunities are a key driver of training success , the present model only creates substantial effects when the level of practice approaches and subsequently exceeds the level of expe rience the agent had previously with the task and not close to the recorded meta - analytic effect ( Driskell et al. , 1992 ) . Such experience in a real training environment is obviously impractical as I have argued that the degree to which an individual h as prior experience with the task is a major driver of training outcomes and most individuals will enter training with large amounts of experience. In such situations, small amounts of practice should have large effects to better match research finding s . F uture iterations of the model should examine how to better account for the practice effect. One idea would be to count practi ce attempts as essentially more impactful than regular attempts, if we assume training - based practice attempts count for more than regular attempts this could fit with the idea of deliberate and focused practice being a key to skill development (e.g., Eric sson, Krampe, & Tesch - Romer, 1992 ). Utility reactions in this model are an interesting case. The comparable meta - analytic effect t argeted was the .46 corrected relationship between utility reactions and transfer described 68 by Blume et al (201 0 ). When analy zed at the replication level, which in this case represents individual agents, there was essentially no relationship between the in itial value estimate of Policy B , the stand in here for utility reactions, and our outcomes. This initially seemed to indicat e that the model did not work regarding utility reactions. However, when the data was analyzed at the condition level, a correlatio n between the initial value estimate of Policy B and transfer was .476, almost perfectly matching the meta - analytic estimate. In no other experiment run here was there such a great disparity in observed relationships at the individual versus the conditiona l level. It might be the case that in this model the effect of utility reactions gets drowned out by random noise when examin ing individuals. This explanation makes sense when investigating a series of individual value estimate trajectories, where it becom es obvious that the initial estimate for B quickly becomes overwhelmed by the weight of experience and ceases to have drastic effects. However, once we study hundreds of individuals, that noise averages out and the effect of the initial estimate becomes mo re obvious . Thus, the effect of initial estimates for Policy B ture better than any other parameter examined in this study, but they do not do so in the way initially expected and may need furth er . Along similar lines, a set of models were run looking only at the behavioral transfer trajectories of single agents in i ndividual runs of the model. Through these, as displayed in Figures 10A - D, even within the same base parameters of the model agents can follow several types of trajectories in their transfer over time. These trajectories display many of the types outlined for maintenance by Baldwin and Ford (1988) . Thus, even the simplest version of the LTM appears capable of generating a classic effe ct from the transfer literature even without substantial empirical guidance given that such trajectories are rarely studied i n practice. 69 As a demonstration of the potential for the present model to guide future research, the interaction between implementa tion intentions and type 2 processing on our chosen outcomes was explored. In this experiment, it was expected that the two w ould have an augmenting effect where high levels of each would result in the best outcomes. However, this was not the case. Instead , we saw the best outcomes when implementation intentions were high, but type 2 likelihood was low. One reason for this might be that the effect of automaticity in the model is driving the interaction here since both variables affect the automatic process either by forcing the agent to engage in that process, or directly altering it. Thus, when type 2 likelihood is low implementation intentions can have a more direct effect on outcomes because they have a chance to work, whereas their impact becomes diluted when agents can more often engage in type 2 processes. This is the kind of init ially counter - intuitive finding which can be brought to light by computational models. Future research should now test this effect in either laboratory or real - world situations . If the same interaction effect is found which is predicted by the model then m ore support will be lent to the theory proposed in this paper, if the opposite is found then the current theory would be falsified. Practical Implications A primary goal for t he LTM and such associated computational models as has been explored here is the ability to provide useful insight for real world application. The first practical takeaway here is that for jobs at any level of current performance, even small improvements, as long as the trainees are able to discern that the new training is an improvement on whatever they currently do, can lead to fairly substantial gains in performance. In addition, it does not necessarily take incredibly large amounts o f behavioral transfe r to result in substantial performance gains. There are many conditions in the simulations presented here where 70 behavioral transfer is 50% or less, but performance improvements display simultaneous effect sizes we would consider to be l arge in the traditio nal research literature. Thus, while it is true that a substantial training - transfer gap exists, we should take heart in the ability of even moderate transfer rates to have substantial effect on important performance outcomes. One inte resting finding in t his model is the apparent strong effect of pre - training time on the inability of agents to successfully change their behaviors and performance, as we see in Figures 6 and 7. This finding aligns with viewing training from a nonlinear dyn amics perspective (O lenick, Blume, & Ford, in press) which in part suggests that training effects are governed by attractors which develop with experience over time, and stronger interventions would be required to affect permanent change on the job for emp loyees who had been doing the job a certain way for longer prior to the intervention. In the present model, that pre - training time allows for development of such an attractor which the relatively mild intervention studied in this model (seen in the policy change being set at .05) is unable to overcome. Such a finding reemphasizes the need for considering the timing of our organizational training interventions as delay in such training is likely to lead to sub - optimal outcomes. The modeling of type 2 likel ihood for effects on training outcomes may be of particular importance. As discussed in the introduction to this paper, the general failure of training interventions to result in expected outcomes is of great concern to organizations. The LTM suggests that one reason for this failure may be not only a lack of opportunities to use the training, but an opportunity for the trainee to make the kind of effortful decisions that are more likely to lead to them applying their training instead of reverting to their old practices. In fa ct, there appears to be a critical level below which positive effects are essentially impossible and that this critical threshold (at 33% likelihood of type 2 thinking in this model) must be passed in 71 the transfer environment for positi ve transfer to occur and result in performance improvements. This could be especially important for environments where such time for thinking is not necessarily always available, such as fast - moving assembly lines . One implication of this model, then, is t hat organizations sh ould not only make sure trainees can use their training, but also have the opportunity to think about the tasks upon which they have been trained. Another interesting implication is the degree to which it is useful to encourage explora tion in the transfer environment. The results that observed rates of behavioral transfer peak when the exploration rate is about 25%, and performance change follows a similar, but noisier, pattern. Such a curvilinear relationship in general is not surprisi ng as we would expec t exploration rates above 50% to be detrimental because agents are then purposely not exploiting their better - perceived policy. However, it is mildly surprising that the optimal exploration rate is so much lower than 50%, suggesting tha t it is better for a n agent to err on the side of exploiting their currently perceived better policy than to explore to some degree. This finding potentially informs the implementation of existing tools, such as the popular Error Management Training, where learners are encour aged to make errors as they explore a new KSAO (e.g., Keith & Frese, 2008). Such an approach to training helps improve outcomes as trainees learn from their mistakes and push through initial struggles with a new skill. According to the present model, this error - based approach extended to the entire post - training period would likely be beneficial for outcomes as well, but only to a point. Therefore, we should encourage trainees to continue trying a new ly trained task approach but only to a moderate degree be cause we do want them to settle on a behavioral approach for the long term, and we want them to discard approaches which do not actually improve performance outcomes. 72 Finally, the virtual experiment here exploring the mutual effects of implementation inte ntions and type 2 likelihood can provide us some guidance on how to best include implementation intentions in our training designs. The results do suggest that implementation intentions are generally beneficial regardless of the environ ment (assuming here that the that the required strength to have a substantial effect and the ability of implementation intentions to improve our outcomes chang es based on that env ironment. When jobs are such that the use of type 2 processes is highly likely, the inclusion of implementation intentions is unlikely to have a substantial impact on our desired outcomes, though they would still be useful. However, if we are conducting tr aining in an environment where type 2 processing is especially unlikely , such as a fast - paced assembly line or similar environment, implementation intentions are likely to be highly beneficial to include in our training programs. Howeve r, we must work to e nsure these intentions are as strong as possible, as weak intentions are also unlikely to have a great effect on outcomes. Conclusion The above represents the first iteration of modeling in the building of the LTM, which has the goal o f becoming a unifyin g theory to explicate the moment - to - moment process underlying training transfer. Although not perfect, the general patterns of results appear to align well with existing findings. W e know the model is wrong, but the degree to which it i s meaningfully wrong ( Box, 1976 ) could be contended to be rather small for the time being as exactly replicating existing meta - analytic effects is not completely necessary . Future iterations of this portion of the model should attempt to refine the operation of parameters and the math governing their effects to better match their real - world counterparts, such as the effects of 73 practice attempts, but that is a task for anoth er time. For now, I argue this is a reasonable first iteration of the LTM with apparent implications for both theory and practice. With that, we know that there are substantial ways in which the model as it stands is meaningfully incorrect , such as not acc ounting for social learning mechanisms. To rectify this shortcoming, another iteration of theorizing and modeling was endeavored upon. 74 Study 2 A : Adding Social Learning to the LTM The first iteration of the LTM appears to have done well in describing the transfer decision and learning process of a single agent /learner , but people in the real world do not lea r n in isolation. Instead, they also learn from the models around them. Thus, the model was iterated to include a social learning (Bandura, 1977) proces s allowing agent s to learn from other agents in their environment . Social Learning Theory Bandura (1977) introduced Social Learning Theory (SLT) partly as a reaction to the then - dominant behavioral approaches to learning exemplified by early reinforcement learning. Bandura ( 1977) posits that individuals not only learn from their own experiences, but that the y also learn from others. In fact, Bandura argued that most learning occurs through observing others in action, a process called modeling. Once the lea rner observes a model complete an action, they can form an idea of how the new behavior is to be performe d , and later use that as a guide to their own actions. Learning through observ ing others is more efficient than only learning through individual experi ence as less trial and error is required to lea r n a given behavior. i zes reciprocal determinism between cognitive, behavioral, and environmental influences. At the risk of oversimplifying these influences, the cognitive processes of the individual affect their behaviors, which affect their environment. The individual recei v es feedback from the environment based on the effects of their behavior, which lead to changes in cognition and behavior in the future. The essential proposed change in the LTM once we consider social learning is that there is an effect of other individu a ls on the learning process of our target learner. The LTM account s for this effect by considering multiple learners engaging in the transfer process simultaneously. 75 Obviously, t his does not represent all potential social learning influences on our target l earner. Instead, the current approach best represents an idealized version of a work team or community of practice which are all exposed to the same learning intervention and th en must attempt to transfer back to their work environment. Though this concep t ualization is admittedly simple, it aligns with arguments that involvement in communities of practice can enhance training transfer through the sharing of information across the community network (e.g., Tentin, 2001 ). The new conceptual model, seen in F i gure 18 , incorporates any number of learners in addition to a target learner. Every learner in the model is assumed to have access to the same two policies, and to proceed indiv idually through the basic decision and learning process described above . Howev e experience but includes feedback from the experiences of all other agents. That is, it is a mechanis m whereby other agents in the environment have modeled for the target learner t he behaviors represented by the two policies, and the agent is then informed of their effectiveness through that observation. In this way, the perceived value of each p olicy b ec omes a type of pooled estimate from all learners. This pooling procedure is no t assumed to take all experiences of learners equall y . Instead, the pooling for any individual is such that they weight their own experiences differently than any of their co - lea rner s , the extent to which we can control via a parameter in the model . Once p olicy values are updated for each learner, the decision and learning process iterates. It is expected that additional learners in the model will improve transfer and performance of target learners because it reduces u e through the more rapid reduction in sampling error created by more learners gaining experience. The faster accrual of experience as a group should allow for more quickly disca rding new 76 policies when they are poor, and a decreased likelihood of incorrect l y discarding a policy when it is good based on initial random error that could underestimate policy value. The Formal Transfer Model with Social Learning To expand on the forma l version of the LTM to include social learning, three essential changes must b e made to the model : additional learning agents, a way to pool experiences of the agents , and a way to weight the importance of group experiences against those of the target agent. The first change is simply conceptual, instead of assuming we are only int e rested in one agent engaged in the transfer process, multiple agents engag e in this process simultaneously. To build on the previous reinforcement learning approach from computer science and account for all agent experiences , it could be possible to draw o n algorithms that are designed for multiple agents (Sutton & Barto, 2018). However, th o se algo rithms are designed for multiple agents attempting to solve a single problem, giving each agent a chance to explore more possible solutions. Such approaches are n ot necessarily the best fit for a model of transfer where multiple , but only have lim ited solutions which they could apply. Future extensions of the present model could explore other options along these line s , but they are not the most parsimonious potential approach , which is a primary goal of theorizing (Box, 1976) . A simpler approach i s to pool the experiences of multiple learning agents to affect the value estimates of each individual agent and thereby a f fect application decisions. T he easiest way to pool experiences of other agents in a he The average value then of other agents, for the j th agent in the model, can be defined a s : 77 where N is the number of other agents in the model, and is the value estimate of the i th other agent for Policy A . C alculating at the value estimate level avoids an assumption that the target agent know s the outcomes of the individ u al attempts of the other agents, and only assumes However, a simple averaging of the value estimates of the other agents does assume an equal weighting of the opinions of all other age n ts, so does not account for network effects and varying strengths of ties to those agents. Exploring the effects of networks will be an interesting avenue for future work. Regardless of the assumptions, the value estimates of the group must be combined w ith the estimate of the target agent in some way. The LTM proposes a weighting approach that can vary the degree to which the target agent weights their own value estimate over that of the others in their group. This approach results in an ability to vary the degree of connectedness between the target agent and the rest of the group, making the target agent and their group a type of loosely coupled system (e.g., Weick, 1976). Let this level of connectedness be defined by C . The variable C represents a weig h tin g factor such that when levels of connectedness are high the value estimate of the group will be weighted more heavily than that of the individual. Thus, we Algorithm 5 . Other Agent Value Estimation Alg orithm 6 . Weighted Value Estimate 78 However, since is only calculated when there are multiple agents in the model, when only a target agent exists . Variables and equations introduced in this section can be found in Tables 7 and 8 respectively. 79 Study 2A: Method , Simul ation and Results As with the first model, the extension of the LTM was ins tantiated in a computational model by expanding on the model from Study 1 in NetLogo. A visual of the modeling environment and associated code can be found in Appendix B . Otherwise , methods outlined for Study 1 apply to this model as well. Virtual Experimentation A primary goal of model exploration at this stage was to ascertain the effects of having multiple agents learning simultaneously, and the degree of influence of those agent s on each other in determining transfer outcomes. To explore these effects, two simple verification checks were made, then three experiments were run to simultaneously check for generative sufficiency, sensitivity, and robustness. Model verification The tw o primary changes in this model are the addition of trainees to the modeling environment, and the mechanism for combining effects of experience from multiple agents for use by each individu al agent. The check to ensure multiple agents are populated into th e environment is simply visual in NetLogo, and it was affirmed that the proper number of agents were generated as specified. To check the pooling procedure, levels of connectedness were set at 0 and 1 and the model run for 500 time steps with 20 agents. Wh en connectedness is 1, the pooled estimate for each agent should be equal to the pooled estimate of all other agents in the model. For example, for agent 1, when all other agents have an av erage estimate for Policy B of .73, a fully connected model should have a pooled estimate of .73 for agent 1. This is indeed the case. On the other hand, when connectedness is 0, the pooled estimate for Policy B for agent 1 s own value estimate for that policy. For example, if that agent 80 es timates the value is .70, the pooled estimate should also be 0. This is also indeed the case in testing the model. Therefore , the model appears to be operating as planned. Number of Trainee s To initially understand the effect of the number of learners/trai nees in the transfer environment a series of simulations were completed manipulating the number of trainees from 1 to 20, 500 replications each. Other variables held at levels decided upon in the first model ( type 2 likelihood at .8 0 , initial policy estima tes at .5 0 , burn in time 100, transfer time 500, no practice, no implementation intentions, true Policy A reward .7 0 , change in policy value .05 ). Additionally, the new c onnectedness variab le was set at .5 0 . I t was expected that more agents in the model wi ll improve transfer outcomes by improving the value estimates of each agent through a more rapid increase in sample stability. In examining these results, it was found that there was almost no relationship at the replication level between number of traine es and either behavioral transfer ( r (10000) = .007 , p = .484 ) or post training performance ( r (10000) = .005 , p = .617 ). At the c ondition level there was a slight positive effec t of the number of trainees on both behavior ( r ( 20 ) = .14 , p = .556 ) and perform ance ( r ( 20 ) = .13 , p = .585 ). Th is condition - level effect can be further examined by looking at the condition - level results of behavior and pre - d ) in Table 9 . Despite the small positive correlation between the number of trainees and behavioral transfer , upon examination of the descriptive statistics in Table 9 it is obvious this effect is of little consequence, with transfer only increase from 43% to 44% as the number of trainees increases from 1 to 20. On the other hand, there is a substantial impact o n observed d for pre - post performance change whereas the number of trainees increases from 1 to 20 the observed effect size increases from .28 to 1.43. 81 Connectedness A second set of simulations were run to explore t he potential effects of the conn ectedness parameter. The expected effects of manipulating the degree of the connection between the individual and group were less clear than for the number of trainees . Assuming having other trainees in the model is benefici al to the agent, it would be exp ected that more connectedness would also be beneficial as the potential detrimental effects of sampling error leading an individual agent down a sub - optimal path should be diluted the more they take into account the experien ces of other agents. To test th is, the connectedness parameter was swept from 0 to 1.0 in .05 increments, with 10 agents simulated, and holding all other parameters constant at the same levels in the above simulation. Unfortunately, results for this simul ation were even less impressive than those for the number of trainees. Relationships between connectedness and behavioral transfer and post training performance were nonexistent ( r (10500) = .009 , p = .356, and r 10500) = .001 , p = .918 , respe ctively) at the replication level, and were mixed at the condition level ( r (10500) = .217 , p = .345 , with behavior, r (10500) = - .031 , p = .894) with post training performance, and r (10500) = - .078 , p = .737) with pre - post performance change). Table 10 displays the condit ion - level outcomes for behavi oral transfer and pre - post performance change effect size. Examination of this data confirms no effect of note as behavioral transfer remained ~44% regardless of condition, and pre - post performance change was around d = 1.00 wi th some random error. Interac tion between Trainees and Connectedness Given the results of the above simulations it was not expected that an interaction effect would be enlightening. However, such a model was proposed for this project, and given the intrica cies of computational models such as these it is possible traditional analyses obscure 82 meaningful relationships. Thus, the potential interactive effect of trainees and connectedness on transfer outcomes was still explored. It was predicted that a positive effect of having multiple agents in the model would increase as the d egree of connectedness increased. This effect was expected as the agent should benefit from taking advantage of the extra experiences of their colleagues through greater weighting of thos e experiences and their combined increased sampling rate. To test for this, the number of trainees was swept from 1 to 20, and connectedness from 0 to 1.0 in .05 steps, fully crossed, while holding all other variables constant at the levels chosen from Mod el 1. Moderated multiple regression was used to examine the effects o f the number of trainees and connectedness on behavioral transfer and post training success. In alignment with the previous results from this model, there are no discernable main or inter action effects from this experiment . In predicting behavioral transfe r neither the number of trainees ( b 0 = .436; b 1 = - .00004) or connectedness ( b 2 = .001), nor their interaction ( b 3 = - .00006) demonstrated substantial effects. Similar results were found in predicting post - trai ning performance ( b 0 = .722; b 1 = - .000007; b 2 = .00004; b 3 = .000007). In accordance with other analyses in this paper, heatmaps were also generated to examine potential effects missed by more standard analyses. These reconfirmed no substantial effects, with differences between conditions largely attributable to noise. An exa mple of this can be seen in Figure 19. 83 Study 2A: Discussion and Conclusion The goal of this iteration of the LTM was to account for general effects of social l earning in a training transfer environment in a parsimonious matter. The initial modeling discu ssed here suggests this attempt was a general failure as the expected effects of the primary variables failed to emerge. Specifically, i t was expected that the number of trainees would improve transfer outcomes as the greater numbers would essentially smo oth out sampling errors for single agents which could lead to suboptimal transfer. This prediction does not appear to have quite been the case. At best, there ap pears to only be a slight improvement in behavioral transfer and post training performance as t he number of agents increase, and nothing like the strong effects expected based on SLT (Bandura, 1977) or existing meta - analytic social effects in training tran sfer (Blume et al., 2010). A misleading exception to this failure lies in the observed effect s izes comparing pre and post - training performance, which range from d = .28 to 1.43. However, given the lack of improvement in behavioral transfer and slight rela tionships between the number d appears t o be an artifact of its d is in part calculated using the pooled standard deviation of the two groups being compared. When the number of agents increases the observed standard deviations of performance within those groups decreases as sampling error becomes less problematic. Then, when the effects are compared the pooled standard deviation utilized is smaller when there are more agents, making the effect size appear artificially large. Thus, there is an effect of more agen ts in the model due to their effect on sampling error as was predicted, but the effect is not the one which was expected. Further, t he results for connectedness effects were even more disappointing. It was hoped that connectedness would be a simple way to recreate the 84 social support which has a meta - analytic effect of .21 on transfer (Blume et al., 2010), but this appears to clearly not be the case. More sophistic ated forms of social learning will need to be explored to see if they can account for such soci al effects. Given the clear failure of this integration of social learning into the LTM, we are forced to explore other options. It is to the exploration of the se other options we shall now turn. 85 Study 2B and 2C: Rethinking Social Learning Model Unfortu nately, the initial attempt to include a social learning mechanism in the LTM failed. In order to assess other potential options for social learning in the prese nt context, a search was conducted for more existing models of social learning mechanisms. Mult iple potential mechanisms have previously been modeled to answer various research questions, such as the use of genetic algorithms (e.g., Yeh & Chen, 2001), imit ation (Richerson & Boyd, 2005) , and emulation (Lopes, Melo, Kenward, & Santos - Victor, 2009) , so me of which have been applied to organizational research such as coordination within teams (e.g., Singh, Dong, & Gero, 2013) . Although all these approaches, and likely others, could prove fruitful , the present modeling will focus on imitation. The choice to focus on imitation lies in its use in studying the mutual development of culture and genetics in human populations. Richerson and Boyd (2005) described the process by which human culture and genetics mutually reinforced each other over thousa nds of years to produce societ ies , and the actual humans within it, that we know today. A primary mechanism within models on this relationship is social learning as the a ctions of groups over time are largely dictated by the pressures exerted upon the indi viduals within those groups by the other people around them. Over time, these pressures lead to the success and spread of certain cultural artifacts and the elimination o f others and the ability to acquire novel behaviors via social learning is a prerequis ite for cumulative change . We can see the outcomes of hundreds of generations of such pressures in the emergence of complex cultures representing the sum of the socially selected actions, beliefs, values, etc. that were adaptive for success in the social g roups within which they emerged. 86 I argue this view is particularly relevant to examining training in organizations. As has been discussed above, social effects are one o f our most important factors in the success or failure of training and subsequent tran sfer. These social effects come in several guises, such as manager and coworker support, climate for transfer, and organizational culture (e.g., Blume et al., 2010). Much as Richerson and Boyd (2005) view social learning as a mechanism through which cultur e is developed and reinforced, we can view the social effects within training and transfer as a form of culture that both affects and is reinforced by the actions of the individuals within the organization. In our first attempt here at a social learning me chanism we can already see this form of mutual causality. Namely, agents within the model have experiences with their task and their experiences combine to form a collect ive view of the task which is an emergent property of the simulated work group. This e merged view then acts as the cultural context within which the agents act, and this context impacts the decisions the agents make. These mutually causal, simultaneous top - down and bottom - up (Kozlowski & Klein, 2000) effects then dynamically play out over t ime. Thus, the underlying causal relationship modeled in the prior iteration of the LTM appears to be in line with other models of culture in the scientific literature, b ut the actual social learning mechanism too simplistic to have the expected effects. T o rectify this shortcoming, two models were built and explored to study the potential effects of imitation. In Richerson and Boyd (2005) imitation occurs when organisms c opy others in their environment as a way to navigate that environment. As in SLT (Bandura, 1977), imitation allows organisms, in this case humans, to learn new behaviors and the consequences of those behaviors through observation. This observation improves the rate and outcomes of individual learning, all else being equal. Two predominant forms of imitation are pertinent here. The first form is imitation of the successful where individuals will tend to do the actions the y see successful 87 individuals around t hem doing on the same tasks. Such a mechanism can be seen throughout our modern world. For example, many people play and watch major sports with a dream of someday being as good as the professionals they see on television. To improve at their own games, a comm on approach is for individuals to attempt to emulate the athletes they see succeeding at the same game. Simple searches for tips on golf, for example, return hits on how to drive the ball like Rory McIlroy, or putt like Phil Mickelson. Richerson and Bo (2005) second form of social l earning occurs through a frequency bias where learners tend to do the things that the majority of their peers are doing. That is, when in a group, people will tend to do the things that the individuals around them are doi ng, using a This approach is especially adaptive to learners in unfamiliar situations in that they can take cues from those around them on how to navigate the novel environment. Richerson and Boyd to survive in their environment. In either case, Richerson and Boyd (2005) argue these social learning mechanisms are fast and frugal forms of learning which offload the burden of learning much information through direct experience. In addition, these learning mechanisms are biases in favor of following the successful or the lead of their social groups. Thus, their framing of the benefit of these social cognition (e.g., Kahneman, 2011). However, for clarity we need to find a way to readily distinguish between the two types of social lea rnin g Richerson and Boyd (2005) describe. In their description conformity is more about a form of coercion than it is vicarious learning in the form of SLT (Bandura, 1977). Therefore, in our nomenclature it does not seem quite correct to 88 label it directly as a form of imitation. On the other hand, the imitation of the successful does fit well with SLT. Thus, it seems proper to let imitation mean specifically imitation of the successful , while relabeling the coercive form as conformity . Therefore, the rest o f th is paper will use imitation and conformity to refer to these types instead of imitation alone. To examine the potential effects of these mechanisms on the LTM, two independent iterations of the theory and associated computational model were made. The f irst , referred to as Model 2B, focused on imitation, while Model 2C focused on conformity. Model 2B Overview This iteration of the LTM explores the effects of social learning through a tendency for learners to imitate other successful learners in their en vironment. For this mechanism, it is proposed that learners observe others in their environment, judge their performance and their behavior, and have some degree of likelihood of following the same behavior that high performing other agents are enacting. T o include this mechanism in the formal version of the LTM and its associated computational model, we need to make two adjustments to Model 2A. The first change lies in the observational and pooling procedure originally proposed. Instead of pooling the v al ue estimates of all other learners in their environment, learners must instead track the actual performance of their fellow learners. From that set of learners, they must then judge which one exhibits the highest performance for them to make a judgement ab out how to imitate that high performer. This approach does assume that learners have a perfect ability to judge the performance of others that was not present in Model 2A , but this assumption can be relaxed and explored in future modeling endeavors. Onc e performance and behavioral judgements have been made , a mechanism needs to be created for those observations to affect the behavioral choices of the observing agents. Within 89 the dual processing framework , the decision to imitate other agents is proposed to fit more cleanly into type 2 processes due to the level of cognitive effort required to make accurate possible that the mechanism could be placed in type 1 proce ss es which would fit with future explorations where the assumption of perfect observation is relaxed, but for now we will assume the decision to imitate the successful is a more conscious and effortful one than a more automatic one. Thus, when individuals en gage in their type 2 processes, they first must decide on whether to imitate someone else or not. Within the computational version of this model, the likelihood of imitating another agent is controlled with a parameter labeled imitate . If a learner does ch oose to imitate someone else, they must scan their environment for other learners and judge their performance to identify the one with the highest level of performance. Once identified, they must then observe the behavioral choices that learner is making a nd then apply the same behavior. In the computational model agents carry out this observation process and choose to apply the behavioral policy enacted by the most successful other agent in their environment on the previous task attempt. Therefore, the b eh avior of any agent i in the model at time t + 1 is the behavior of the highest performing other agent in the model at time t, when agent i chooses to imitate during task attempt t + 1. Model 2C Overview Model 2C represents a modification on the theory a nd model in 2B to change the social learning mechanism from imitation to conformity. Thus, instead of a tendency to do the same behaviors as successful learners around them, under this model learners tend to do the behaviors that the majority of the learne rs around them are doing. In this case, individuals do not need to track the performance of others around them, only their behavioral choices. From these 90 observations the learner can use a simple voting procedure to determine which behavioral choice is the m ost common among their group. In the computational version of this proposal, when each agent decides to conform to their group their behavior on the task at time t + 1 is equivalent to the behaviors displayed by the majority of the other agents in the en vi ronment at time t. The tendency to conform to the group on any given task attempt is controlled via a conform parameter. 91 Study 2B: Method, Simulation and Results The described theoretical additions to the LTM regarding the use of imitation as a social le arning mechanism were instantiated in a computational model, expanding on the base model explored in Study 1. A screen shot of the modeling environment and copy of the associated simulation code in NetLogo can be found in Appendix C. The new mechanisms i n this model were verified through examining the tracking mechanisms of agents to ensure they were correctly identifying the performance and behavior of the top performing other agents in their environment, and that they would follow the indicated behavior u nder conditions where they always engage in type 2 processes and always imitate the best performers. Once this was completed, a small experiment was run to study the effects the number of trainees in the model and level of imitation have on our outcomes of behavioral transfer and task performance. It was expected that both the number of trainees and level of imitation would improve transfer outcomes as agents would benefit from the increased sampling rate of the agents around them, making it more likely t ha t at least one agent discovers that Policy B is indeed the better policy and that finding would propagate through the rest of the agents. Trainees Versus Imitation Experiment To study the effects of the number of trainees and level of imitation in this ve rsion of the LTM, a simulation was conducted crossing the number of trainees, swept from 1 to 20, with level of imitation rate swept from 0 to 1 in .05 increments. Other variables were held at the levels chosen in Model 1: 100 pre - training time step s, 500 transfer time steps, type 2 likelihood set at .8 0 , initial policy estimates were .5 0 , true value of Policy A was .7 0 with a change in value of .05, and 500 replications of each condition. 92 In examining the results of this simulation, we see the dire ction o f relationships expected both within this model, and more broadly as guided by social effects in the training literature. Namely, at the replication level, the number of trainees in the model was positively related to both behavioral transfer ( r (210 000) = .103 , p < .001 ), and post training performance ( r (210000) = .156 , p < .001 ). In addition, the level of imitation was also positively related to both behavioral transfer ( r (210000) = .495 , p < .001 ), and post training performance ( r (210000) = .730 , p < .001 ). Then, a mu ltiple regression analysis was performed to test the joint effects of the number of trainees and imitation level on the transfer outcomes. In predicting behavioral transfer, both the number of trainees ( F ( 3 , 20999 6 ) = 89557.07, p < .001, R 2 = .75; b 0 = .728 , t = 3096.22, p < .001 ; b 1 = .004, = .156 , t = 107.61, p < .001 ) and imitation rates ( b 2 = .392, = .730 , t = 505.13, p < .001 ) show positive relationships, and a positive interaction ( b 3 = .006, = .064 , t = 44.04, p < .001 ). In predict ing post training performance, both the number of trainees ( F ( 3 , 20999 6 ) = 24351.15 , p < .001, R 2 = . 51 ; b 0 = .736 , t = 32701.20, p < .001 ; b 1 = .0002, = .106 , t = 54.78, p < .001 ) and imitation rates ( b 2 = .020, = .495 , t = 263.55, p < .001 ) show posi tive relationships, and a positive interaction ( b 3 = .0003, = .046 , t = 24.36, p < .001 ). Finally, predicting pre - post d ) across conditions, both the number of trainees ( F (3, 41 6) = 1816.19, p < .001, R 2 = .96; b 0 = 2.346 , t = 175.72, p < .001 ; b 1 = .137, = .770 , t = 59.00, p < .001 ) and imi tation rates ( b 2 = 1.849, = .548 , t = 41.93, p < .001 ) show positive relationships, and a positive interaction ( b 3 = .111, = .189 , t = 14.47, p < .001 ) . These interaction e ffects, displaying an augmenting effect between the number of trainees and imitation rates , were graphed in Figures 20 - 22. In addition, to better understand the nuances of these effects, heat maps were generated and c an be found in Figures 23 - 25. In examin ing these heat maps, we see almost no actual effect of the number of trainees beyond that gained by adding 93 even one agent to the model. On the other hand, we see a steady improvement in behavioral transfer and post tr aining performance as imitation rates i ncrease. In the calculation of pre - post performance effects, we see the highest effect sizes when trainees and imitation are high, but this d. 94 Study 2C: Method, Simulation and Results The theoret ical additions to the LTM described above regarding the use of conformity as a social learning mechanism were instantiated in a computational model, expanding on the base model introduced in Study 1. A screen shot of the modeling environment and copy of th e associated simulation code in NetLogo can be found in Appendix D. The new mechanisms in this model were verified through examining the tracking mechanisms of agents to ensure they were correctly identifying the behaviors of their fellow agents, and that they would follow the indicated behavior of the majority under conditions where they always engage in type 2 processes and always conform. Once this was completed, an experiment was run to study the effects of the number of trainees in the model and level of conformity have on our outcomes of behavioral transfer and task performance. It was expected that both the number of trainees and level of conformity would again improve transfer outcomes as agents would benefi t from the increased sampling rate of the a gents around them, making it more likely that at least one agent discovers that Policy B is indeed the better p olicy a nd that finding would spread through the rest of the agents. Thus, as with the imitation model, it was expected that we would see positive relationships between the number of trainees, level of conformity, and the transfer outcomes. Trainees Versus Conformity Experiment To study the effects of the number of trainees and level of conformity in this version of the LTM, a simulation was conduc ted crossing the number of trainees, swept from 1 to 20, with level of conformity swept from 0 to 1 in .05 increments. Other variables were held at the levels chosen in Model 1: 100 pre - training time steps, 500 tr ansfer time steps, type 2 likelihood set at .8 0 , initial policy estimates were .5 0 , true value of Policy A was .7 0 with a change in value of .05, and 500 replications of each condition. 95 In examining the results of this simulation, we see the opposite of t he general relationships expected within th is model. A t the replication level, the number of trainees in the model was negatively related to both behavioral transfer ( r (210000) = - . 2 03 , p < .001 ), and post training performance ( r (210000) = - .1 44 , p < .001 ). In addition, the level of conformity was also negatively related to both behavioral transfer ( r (210000) = - . 742 , p < .001 ), and post training performance ( r (210000) = - .528 , p < .001 ). A multiple regression analysis was performed to test the joint effects of the number of trainees and conformi ty level on the transfer outcomes. In predicting behavioral transfer, both the number of trainees ( F (3, 299996) = 103802.16, p < .001, R 2 = .77; b 0 = . 232 , t = 907.59, p < .001 ; b 1 = - .006 , = - .203 , t = - 146.30, p < .001 ) and conformity rates ( b 2 = - .452 , = - .742 , t = - 536.13, p < .001 ) show positive relationships, and a negati ve interaction ( b 3 = - .007 , = - .070 , t = - 50.73, p < .001 ). In predicting post training performance, both the n umber of trainees ( F (3, 299996) = 30371.68, p < .001, R 2 = .55; b 0 = . 712 , t = 30279.36, p < .001 ; b 1 = - .0003 , = - .144 , t = - 78.84, p < .001 ) and conformity rates ( b 2 = - .023 , = - .528 , t = - 289.96, p < .001 ) sh ow negative relationships, and a negative interaction ( b 3 = - .000 4 , = - .052 , t = - 28.63, p < .001 ). Finally, predicting pre - post training performance d ) across conditions, both the number of trainees ( F (3, 416 ) = 3150.12 , p < .001, R 2 = .55; b 0 = .058 , t = 8.78, p < .001 ; b 1 = - .012 , = - .104 , t = - 10.28, p < .001 ) and conformity rates ( b 2 = - 1.979 , = - .918 , t = - 91.21, p < .001 ) show negative rel ationships , and a negative interaction ( b 3 = - .121 , = - .323 , t = - 32.03, p < .001 ). These interaction effects, displaying a dep ressive effect between the number of trainees and conformity rates, were graphed in Figures 2 6 - 2 8 . In addition, to better understand the nuances of these effects, heat maps were generated and can be found in Figures 2 9 - 31 . These heat maps show that the bes t outcomes indeed occur when the number of trainees and rates of conformity are low. In addition, 96 they also show a clear sensitive area in the model and some non - linear effects. Specifically, outcomes suddenl y improve as conformity rates drop below about . 45. However, the level this change occurs depends on the number of trainees in the model, such that the transition occurs at higher levels of conformity when there are fewer trainees in the model. An interest ing pattern also emerges comparing when there is an odd number of trainees in the model versus an even number such that the transition point to better outcomes occurs at a higher level of conformity for even numbers of trainees than for the odd numbers aro und them. This is likely a statistical artifact of the voting process in the model rather than something of great significance. 97 Study 2B and 2C: Discussion and Conclusion The models introduced in Studies 2B and 2C were meant to explore other potential mechanisms to account for social effects in train ing transfer studies which the initially proposed model failed to do. From the initial experimenting outlined here, it appears that either have the ability to provide interesting insights into the transfer process, at least beyond that obtained in the orig inal theory. Here, let us briefly discuss the implications of these models for both theory and practice. Implications for Theory The primary effects on transfer from a social standpoint lie in the environmental factors of perceived support and climate for transfer. Corrected meta - analytic estimates for the effects of these on transfer are .21 and .27, respectively (Blume et al., 2010). In Model 2B we studied the potential effects of imitation on tr ansfer, which is the tendency to engage in the behaviors ot her successful learners are engaged in. In Model 2C we viewed the social learning mechanism from a standpoint of conforming to the behavioral tendencies of the majority of th e other learners around a target learner. These mechanisms could be argued to fit conceptually with what is occurring in producing the effects we see for support and climate. Namely, these two effects are based on the perceptions of learners of the actions of those around them f or providing the necessary social and physical conditions i n which they can transfer their training. In both present models, as argued above, we have created an emergent environment from the actual behaviors of agents which in turn produce a social context in which the agents must act. When the social environment created by the agents is such that it promotes the transfer of the trained policy, this is akin to the agents perceiving an environment supportive of their transfer attempts. 98 In addition, the two models might fit conceptually a little better with specifi c effects. For example, Model 2B relies on target agents seeing other successful agents apply a behavior that the target agent can then mimic. This is more of a one - on - one interaction where the suc cessful agent essentially either supports or does not the t role model for that behavior or not. T his conceptually fits with recommendations for managers to promote transfer by modeling desired behaviors to the ir employees (e.g., Lancaster, Di Milia, & Cameron, 2013). Although not labeled as managers in the model here, managers could be seen as successful employees whom their followers are likely to view as role models as occurs in the outlined mechanism in Mode l 2B. To that end, examination of Model 2B shows an ability to create relationships in the desired direction, but the effects of the actual imitation mechanism are of a substantially larger magnitude than observed in meta - analyses of support effects on tra nsfer (Blume et al., 2010). A reason for the increased effe ct sizes observed in the present model being so much larger than target effect sizes may be the current ability of the agents to perfectly view the performance and behaviors of their models. Adding noise to their observations to reflect imperfect observati on in real life may correct this deficiency and bring findings more in line with research. For now, this model appears to be a great step forward over the previous iteration of the model that fits better with existing research both within our field, and wi th scientific efforts around social learning in general. The case surrounding Model 2C is considerably more complicated. Conceptually, Model 2C fits more cleanly with effects of transfer climate b ecause the underlying mechanism in this model, conformity, is no longer a one - on - one modeling case, rather it is truly a group - level consideration. That is, here the agents form a social environment regarding their collective use of the behaviors available to them. That social tendency to use either behavior avail able to them or 99 not could be considered the climate for the use of that behavior. Therefore, if the group tends to use the trained Policy B , the climate would be one that is positive for transfer. If the group tends to use the pre - existing Policy A , there would be a negative climate for transfer. Initially it was expected that conformity would be positively related to transfer as the group would be more likely to collectively realize the trained pol icy is beneficial for their use and therefore create that p ositive climate. However, that is not what we find in simulation. Instead, we find that conformity has a substantial negative effect on transfer, although the absolute magnitudes of those effects a re closer to the meta - analytic effects of climate than obse rved for support in Model 2B (Blume et al., 2010). This finding was initially surprising. However, in retrospect, because of the dynamic nature of the mechanism the negative relationship perhaps sh ould be expected. Specifically, the behaviors of agents at any time t + 1 are a function of the behaviors of the group at time t . This in effect creates a heavy bias, especially when conformity is high, towards the continued use of Policy A because when the agent begins the transfer period the behavior for all age nts at the previous time point was Policy A . To overcome this fact , it takes the agents independently choosing to apply Policy B and slowly changing the balance of the group until a majority are applying Policy B . This process is harder and takes longer wh en there are more agents in the environment, which also accounts for th e now negative effect of the number of trainees in the model. Therefore, I argue that initially low pressure to conform is actually akin to a climate allowing transfer because the age nts are free to explore the benefits of their training rather than bein g pressured to avoid doing so and the inverse directional relationship with how we operationalize this effect in the literature should be expected . Within actual work groups there can b e substantial pressure placed on workers not to comply with organizatio nal interventions 100 such as training and therefore create a negative climate for transfer, whereas the absence of such pressure can be interpreted as positive climates for transfer. Over time, as more individuals pick up the new behavior that pressure could reverse to lead to improved transfer. This essential switch in pressure could be a reason why we see the sudden improvement in transfer outcomes when conformity drops below .45 in the p resent model. Early in transfer attempts the agents need that freedom t o do their own exploring, but as transfer goes on there is some benefit to social pressure helping bring late adopters over to begin transferring at a greater rate. Thus, it appears tha t with some reconceptualization of what the parameters represent, this model also provides a potential window into existing effects in the transfer literature, though more work will be required to explore those effects and verify the mechanisms are correct . For the time being, given the closer absolute magnitude of the effect s in this model and the more nuanced relationships observed with the parameters in this model with the outcomes of interest, this model may provide more potential insights than Model 2B for future work. It is for this reason that this model is chosen to pr ovide the basis for the next round of modeling outlined in Study 3, although it is acknowledged much work remains to verify this model for long term use. Future modeling of social learn ing As mentioned in the introduction to these two models, there are oth er versions of social learning which could be interesting to explore in future efforts to model the employee learning and transfer process. For example, the use of a genetic algorithm a pproach (e.g., Yeh & Chen, 2001) could provide interesting insights for skill development if we were interested in how employees might generate their own novel solutions to work tasks and then propagate their discoveries to their work groups and the organi zation at large. Furthermore, existing models sometimes model tradeoffs in several dimensions of preference at the same time. Along these 101 lines, Lopes et al. (2009) modeled tradeoffs between making decisions based on individual preferences, versus making t hose choices based on social pressures from either imitation or emulati on. In their modeling, momentary choices are based on a tripartite tradeoff between these three pressures, where increased weight of any one type lowers the weight of the other two. In the present modeling we focused on only a single type of social learnin g, imitation or conformity, and a tradeoff with choices based on personal experience. Future iterations should explore the potential simultaneous effects of these pressures. In additio n, models where imitation and conformity are placed in type 1 processes instead of type 2 processes should be explored. The argument for why they were placed in type 2 processes for this model was laid out above, and it seems likely that placing these mech anisms in type 1 processes would only exacerbate the overly strong rela tionships they displayed with transfer and performance compared to meta - analytic estimates (Blume et al, 2010) , although description of these mechanisms as fast and frugal, though it is not cl ear they are discussing cognitive load rather than general effect. This expectation is due it furthering their ability to override type 2 processes prior to being able to make any furth er conscious judgement. On the other hand, the way they were included i n the model may de facto place them somewhere between the clean separation of type 1 and type 2 processes otherwise followed in this paper, as they are only enacted when type 2 processe s are called upon but do not make the same level of logical judgement a s other type 2 processes modeled and instead override those processes to automatically imitate or conform. Thus, the decision to imitate or conform is treated as conscious, but its exec ution is more automatic in nature. The middle ground occupied in the pr esent model by the imitation and conformity mechanism does potentially fit with the view of cognitive systems on a continuum 102 from conscious and effortful to automatic (Evans & Stanovich , 2013). Regardless, it might be more fruitful to relax the assumptions model, which would add noise to the imitation and conformity decisions and should therefore lower the observed relationships to m ore closely match existing meta - analytic effects. These are future expl orations which should be undertaken to refine the present model. Other modeling possibilities It is also possible that the models here could be combined with other theories to study gro up effects on training transfer. One intriguing example would be to inc orporate Diffusion of Innovation Theory (e.g., Rogers, 2003 ) to study the propagation of transfer through a work group. As discussed regarding the reasons for the unexpected negative re lationship between conformity and outcomes in Model 2C, a key to overco ming the momentum of the group is for individual agents to adopt the target behavior and slowly bring other agents on board. Eventually, a tipping point is reached where it becomes acce ptable for the group to use the trained behavior where a critical mass will do so, then over time straggling agents will be pressured to adopt the new behavior instead of clinging to the old one. Conceptually this fits quite well with the diffusion of inno vation, where research suggests new innovations are adopted in stages a cross populations. Initially, new innovations are adopted by only a few individuals, called innovators. Some individuals will follow these first adopters eagerly, representing early ado pters but still making up a minority of the population. Once this group has opened the door, more and more individuals pick up the innovation in rapid succession as a critical mass is reached and soon most of the population uses the innovation. Eventually, only a few holdouts remain, representing the laggards in the populatio n (see Rogers, 2003, for a broader overview ). The conformity model here may inadvertently speak to this process in a transfer environment 103 where the innovation is the newly introduced be havioral possibilities. Expanding on the proposed mechanisms and explor ing others from the Diffusion of Innovation literature could provide new and useful insights into the social pressures governing transfer rates in organizations. In addition, it would be interesting to pair this approach with network effects to understand how training adaptation might propagate across a network of employees. Social networks have gained much traction in organizational psychology of late (e.g., Soltis, Brass, & Lepak, 201 8 ), but networks have long been of interest in related fields. One of t he benefits of using NetLogo as the base for the present simulation is the existence of easily accessible network models which could be integrated with the present theory. For example, NetLogo comes with a model to study diffusion of information across dir ected networks (Stonedahl & Wilensky, 2008). Further, related theories could be drawn on to more broadly study the long run development of employees as suggested above, but in the ecolo gical context of their work group as has previously been suggested with Ecological Systems Theory (Bronfrenbrenner, 1977, 1979) in understanding child development (Neal & Neal, 2013). By drawing on such existing modeling efforts we could greatly enrich our understanding of the social effects on the transfer of training. Such work would also provide potential extensions of recent research on vicarious learning mechanisms within teams (Myers, in press). Implications for Practice The combined effects of the above discussions and modeling here have some implications for practice as well. When it comes to promoting transfer , it is especially important to promote a climate wherein learners are free to explore their training and not succumb to group pressure to r evert to pre - training behaviors. This is important to have esta blished at the very end 104 of training so that the dynamic effect of observing continued use of old behaviors and the pressure that use can create is broken to allow greater time for exploration. Once that climate is established, practitioners should encourag e learners to observe the most effective individuals in their work groups and follow their lead. Through this sequence, one may be able to unlock the benefits of both types of social learning e xplored in the present models. Conclusion It seems apparent th at the models explored in studies 2B and 2C are substantial improvements upon the LTM over both the baseline model, and the first attempt to include a social learning mechanism. These models ha ve a stronger basis in the broader scientific literature and ap pear to conceptually fit with ways we think about groups in the organizational sciences. However, much work will remain to establish which of these mechanisms, or which mix of them, best accoun ts for transfer effects. For the time being, they appear to off er potential theoretical and practical insights and more work along these lines is encouraged. 105 Study 3A: Adding Self - Regulation to the Transfer Process Model Unfortunately, the originally prop osed social learning mechanism for the LTM did not operate as e xpected. However, it appear s that the alternate models exploring conformity and imitation show greater promise for examining the effects of social groups in the transfer process. It was also ar gued that the conformity model fit better conceptually with the social environment effect we study through transfer climate, despite having the opposite direction effect initially expected. Therefore, the conformity model w as used as a basis for a third it eration of the LTM which integrates a perspective that no withi n - person process model could, or at least should, avoid addressing Self - R egulation (e.g., Vancouver, 2008). Self - Regulation Self - regulation is the dominant theory of motivation in organizati onal psychology ( Vancouver & Day, 2005) and describes how individuals guide their actions towards goals over time (Karoly, 1993). Hierarchical goal pursuit Goals are internally represented desired states of being held by individuals (Lord, Diefendorff, Sch midt, & Hall, 2010) and ar e the central construct in self - regulatory systems . Goals exist in a hierarchy, such that individuals possess both short and long - term goals with short - term goals being nested underneath longer - term goals. As lower level goals are completed, individuals mo ve closer to attaining higher - level goals. This hierarchical system is the basis of theories of self - regulation, and the application of self - regulation to organizational phenomena such as training transfer (e.g., Carver & Sheier, 1998; Powers, 1973; Blume et al., 2019 ). At each level of the goal hierarchy a self - regulatory system monitors goal progress and adjusts system outputs to maintain the desired goal level (Vancouver & Day, 2005). 106 Self - regulatory negative feedback systems Se lf - regulatory systems are based around negative feedback loo ps which attempt to minimize discrepancies between the goal of the system and perceived progress towards it (Vancouver & Day, 2005). Two major versions of self - regulation exist, Social Cognitive T heory ( SCT; Bandura, 1991; Schunk & Usher, 2012) Learning Theory (S L T) , and Control Theory ( CT; Powers, 1973; Carver & Sheier, 1998). Social Cognitive and Control versions of self - regulation do not differ all that substantially and make div erging predictions only in specific circumstances (Vancouver, Gullekson, Morse, & Warren, 2014). However, CT has the distinct advantage of relying on more formal forms of logic than SCT does, which is reliant on the narrative appr oach to theorizing while C T is more computational in nature due to its historical roots. As discussed previously, narrative theory is useful for conveying ideas, but suffers in making formal predictions (Vancouver, 2012; Adner, Polos, Ryall & Sorenson, 200 9). Thus, because this pap er is building a formal model of transfer and CT is a more formal version of self - regulation, CT will provide the basis of the present theorizing. Many specific versions of CT exist (e.g., Powers, 1973; Campion & Lord, 1982; Carv er & Sheier, 1998; Vancouv er, 2008) , but all have crucial elements in common. Broadly they all consider the nested goal hierarchy previously mentioned, but more specifically the basic negative feedback system of regulation relies on just a few key ideas. L ord and Hanges (1987) argue that the negative feedback systems of CT all have five elements : 1) some standard (goal) which the system seeks to maintain, 2) a sensor which monitors the state of the environment, 3) a comparator which compares the standard to the sensed environmen t, 4) a decision mechanism to decide whether something should be done to reduce any perceived discrepancy, and 5) an 107 effector mechanism which produces some behavior meant to reduce the perceived discrepancy. As goal striving unfolds o ver time the regulator y system monitors progress towards that goal and works to lower discrepancies through multiple mechanisms. The two primary pathways to reducing discrepancies are 1) changing output behaviors that affect the perceived environment, such as increasing effort, and 2) changing the set - point of the goal in question (Campion & Lord, 1982). Within the CT literature it has been a point of contention on which option is most likely to occur, with some theorists arguing that goals are more easily adjusted, and that beh avior is Self - Efficacy Arising from the Social Learning and Social Cognitive perspective of Bandura (1977, 1991), self - efficacy represents the othe r central variable in self - regulation. Self - efficacy is the belief individuals hold regarding their ability to execute desired behaviors in the pursuit of some outcome (Bandura, 1977) , and is the primary mechanism through which individuals exert agency (Ba ndura, 1977, 1989, 199 1). That exertion of agency being represented in which environments people choose to enter (Bandura, 1989), and in choosing the tasks with which they will engage (Bandura & Cervone, 1987). Control theorists generally agree that effica cy is an important con struct (Vancouver, 2012), but they dispute its nature, which complicates how efficacy will be instantiated in the present modeling. Much research on self - efficacy followed the predictions of Bandura (1977, 1989, 1991), who describes efficacy as having a ( nearly) uniform positive effect on performance. Through its influence on task and environmental choices, efficacy has a .38 meta - analytic relationship with performance between individuals (Stajkovic & Luthens, 1998). Theory and resear ch on the within - perso n nature of efficacy is less uniform in its findings. In explicating their theories, 108 multiple observers, including Bandura (1977) , have discussed how extremely low levels of efficacy will lead to complete task disengagement as a cogni tive defense mechanism (e.g., Lindsley, Brass, & Thomas, 1995). Additionally, Vancouver and colleagues have argued that the relationship between efficacy and performance is not always positive (Vancouver et al., 2014), and in a series of experiments have s uggested that the rela tionship between efficacy and task engagement is discontinuous in nature. Specifically, at very low levels of efficacy individuals are unlikely to engage in a task at all, but as their efficacy increases, they will suddenly choose to engage in the task but will need to put forth maximum effort in order to succeed. Then, as efficacy continues to increase the individual will reduce effort as they do not feel full effort is required to ensure success, conserv ing those resources for when t hey are more necessary (Vancouver, Moore, & Yoder, 2008). The location of that discontinuity can be moderated by the nature of the task in question as well, such as by manipulating the value attached to said task (Sun, Vancouv er, & Weinhardt, 2014). The L TM with Self - Regulation In some ways, motivational aspects of goal pursuit are already a part of the LTM . Sutton and Barto (2018) explain that internal state components of learning agents correspond to animal motivational stat es. In addition, learning agen ts have a built - gradient of its value function, that is, to select actions expected to lead to the most highly - valued 2018, p. 361). In a normal reinforcement learning problem, the maximum attainab le goal is defined by the environment and the ways in which the programmer encodes rewards. In a basic reinforcement problem, t he agent will continuously attempt to improve its value states because it does not know what the ma ximum is. Humans, on the other hand, can decide that they have reached a goal and stop pursuing higher levels of attainment. 109 This ability to decide an acceptable level of performance has been reached and voluntarily avoid further improvement can be achieve d through self - regulatory syst ems. There are approaches to modeling goal directed behaviors in learning agents seeking to navigate an environment in a typical reinforcement learning problem (Sutton & Barto, 2018), but those approaches are beyond what is re quired for the present purpose s and the framework of a simple 2 - armed bandit problem. H ow can we account for specific goal directed behavior within the LTM ? Fir s t, we must define a performance goal for the agent, T . To understand the relationship between performance goal, we must define and track performance. Performance in this model will be represented by the variable Y and be calculated as the ave rage performance across all task attempts, regardless of p olicy a pplied. In the present model, where any successful atte mpt is rewarded with 1 point, the average performance as percentage of successful attempts is equivalent to the average reward received on task attempts. Thus , Y may be calculated as: where Y at time t + 1 is equivalent to the average of all rewards rece ived by the agent . A comparison must then be made between the level of performance and the stated goal. This can be done through a simple difference variable we shall define as D , calculated as: T he learner must then decide if they are short of their performance goal or not. This decision is defined by a variable J , equivalent to 0 if the goal is met, and 1 if it is not. That decision then feeds into an effector mechanism where if performance is short of the goal the a gent chooses to change their behav ior to reach it. Algorithm 7 . A gent Performance Algorithm 8 . Goal Discrepancy 110 In other computational models of the self - regulatory system, the effector mechanism of th e self - regulatory system he defines the effector mecha nism in terms of the actions taken to close the perceived gap between goals and perceptions where the acts taken are relevant to the goals being monitored. For example, if an overarching goal is to writ e a paper, the acts could be a series of steps such as doing research, outlining, etc., that are all governed by their own regulatory system. Given the undefined nature of the tasks modeled in this paper, it makes most sense then to define the actions an e ffector mechanism may take in terms of the current act ions the learner/agents may take. Thus, t he LTM hypothesizes that agents are more likely to explore their policy options if they are currently short of their performance goals. To account for this effec t, let F represent the degree to which they are more l ikely to explore on a given task attempt. The variable F then modulates the exploration parameter E as a function of whether the agent is reaching their goal or not, resulting in the calculation of E as : The newly added variables and equations for all o f Study 3 can be found in Tables 11 and 12 . Algorithm 9 . Effector Mechanism 1 111 Study 3A: Method , Simulation, and Results The outlined model for including self - regulatory mechanisms was instantiated in a new model in NetLogo. A screen capture of the modeling environment and code for this mo del can be found in Appendix E. Virtual Experimentation Although some key findings connected broadly to self - regulatory effects underl ie transfer findings were already explored in experiments above, two key effects require illumination here. Namely the eff ects of goal setting and efficacy. Following the nature of self - regulatory systems, it stands to reason that the higher the level goal set in a system, the higher we would expect the performance outputs of that system to be. A higher set goal creates a lar ger discrepancy between that goal and perceived reality. When individuals sense this discrepancy, they act to reduce it (Carver & Sheier, 1998). This basic finding, that higher goals lead to higher performance, is the essence of goal theory as outlined by Locke and his colleagues (Locke, 1968, 1975; Locke & Latham, 1990) . Althou gh it is often suggested that goals are specific and challenging yet attainable , e ven especially difficult goals can enhance performance if feedback regarding that performance is pro vided (Campion & Lord, 1982). Within the training literature, goal choice play s a key motivational role in the post - learning transfer phase (e.g., Beier & Kanfer, 2010). Post - training goals show a small meta - analytic lume et al., 2010). To test for the effect of goal setting in the LTM , T was systematically manipulated. In addition, any discrepancies between observed performance and the set goal have effects on behavior in the present model through the level of explora tion the agent is wi lling to engage in. There was no a priori expectation of which levels of change in the baseline exploration rate would result in expected relationships between 112 goals and outcomes, therefore the level of exploration change, F, when short of the goal was sim ultaneously explored. To explore the joint effects of goal level and change in exploration rate, a simulation was executed that crossed goal level, swept from 0 to 1.0 in .05 increments, against change in exploration from - .1 to 1.0 in .05 increments. F b egan at - .1 0 because the baseline exploration was held at .1 0 , as established in prior simulations, and it was decided to cover the full range of final exploration rates. All other variables were held constant: likelihood of type 2 proc essing at .80, value of Policy A at .70, change in value at .05, initial policy estimates at .50, 100 pre - training time steps, and 500 transfer time steps. Initial results of this simulation were surprising as the relationship between goal level and the o utcomes of behaviora l transfe r ( r (241500) = - .126 , p < .001 ) and post training performance ( r (241500) = - .065 , p < .001 ) were negative at the replication level and the condition level ( r (483) = - .335 , p < .001, and r (483) = - .326 , p < .001, respectively). Similarly, relationships betw een exploration rate change and behavioral transfer ( r (241500) = - .097 , p < .001 ) and post training performance ( r (241500) = - .050 , p < .001 ) were also negative at the replication and condition ( r (483) = - .249 , p < .001, and r (4 83) = - .257 , p < .001 ) levels. Additionally, at the condition level, both goal level ( r (483) = - .25 9 , p < .001 ) and exploration rate change ( r (483) = - .234 , p < .001 ) were negatively related to the effect size of pre - post performance change. Given the surp ris ing nature of these relationships, further analyses were completed in an effort to better understand them . Moderated mult iple regression analyses showed the combined effects of goal level ( F (3, 241496) = 3503.46, p < .0 01, R 2 = .20; b 0 = .411 , t = 783.4 8, p < .001 ; b 1 = - .110, 1 = - .126 , t = - 63.48, p < .001 ) and exploration rate change ( b 2 = - .077, 2 = - .097 , t = - 48.69, p < .001 ) had negative main effects on behavioral transfer, and a negative interaction ( b 3 = - .335, 3 113 = - .128 , t = - 64.11, p < .001 ). Similarly, in predicti ng post training performance, goal level ( F (3, 241496) = 61.56 , p < .001, R 2 = . 53; b 0 = .721 , t = 3725.10, p < .001 ; b 1 = - .005, 1 = - .065 , t = - 8.40, p < .001 ) and exploration rate change ( b 2 = - .004, 2 = - .050 , t = - 6.41, p < .001 ) had negative main e ffects, and a negative interaction ( b 3 = - .016, 3 = - .067 , t = - 8.55, p < .001 ). Finally, in predicting pre - post performance change, goal level ( F (3, 479) = 43.54, p < .001, R 2 = .46; b 0 = .289 , t = 46.86, p < .001 ; b 1 = - .130, 1 = - .259 , t = - 6.40, p < .001 ) and explorati on rate change ( b 2 = - .107, 2 = - .234 , t = - 5.78, p < .001 ) had negative main effects, and a negative interaction ( b 3 = - .460, 3 = - .304 , t = - 7.50, p < .001 ). These interaction effects, showing the mutually de pressive effects of these variables on our outcomes, have been graphed in Figures 33 - 35. For further analysis, heat maps of these results were generated and can be found in Figures 36 - 3 8. These heat maps are especially informative as they show that tradit ional statistics are unab le to fully describe the underlying data pattern . In these simulations, the heat maps suggest that goal level had no effect when it was below .70, that is when t he goal was below the true value of Policy A . However, when the goal l evel was above .70, the e ffects of goals on the outcomes depended on the rate of exploration change. When the exploration change was very low, or very high, outcomes were worse than when exploration changes were low to moderate in magnitude. Thus, when sho rt of their goal, it was beneficial for agents to explore to some degree, but not too much. The extreme negative effects when exploration changes were very low or very high may be maskin g the expected positive effect of goals. To test this, correlations be tween goal level and outc omes only across replications where the change in exploration rate was .10 were calculated. These relationships were indeed positive ( r ( 21) = .105 , p = .651, for behavior, r (21) = 114 .061 , p = .793, for post training performance), whi ch at least matches the e xpected direction of the effect of goal set point. To explore the possibility that this model works within certain parameter ranges, a second simulation was run holding the change in exploration rate to .10 while sweeping goal leve l from 0 to 1.0 in .01 in crements, 500 replications each. These results reveal a similar r (21) = . 095 , p = .682, relationship between goals and behavioral transfer, and an r (21) = .05 , p = .830, relatio nship between goals and post training performance . 115 St udy 3A: Discussion The r esults of the initial exploration of the effects of goal level on the LTM did not match expectations. Instead of the small positive relationships expected between goals and transfer outcomes (Blume et al., 2010) the overall observe d relationship was n egati ve. However, it appears this negative effect is driven by especially bad outcomes when the change in exploration change are detrime ntal does fit with p revio us findings in this paper in that especially high levels of exploration do not allow the agent to exploit the policies they do happen to find as more productive. On the other hand, especially low, and in this simulation negative, c hanges in the explor ation rate represent a degree of disengagement from trying to transfer, and therefor would not result in positive transfer outcomes because the agent has in effect stopped trying to do so. When investigated further, it may be that the e ffects of goals in t he mo del only work as expected within certain ranges of the change in exploration rate. This was shown as plausible in that the relationship between goal and behavioral transfer approximates the meta - analytic effect (Blume et al., 2010) when only examining the effect whe re the change in exploration rate is .10. Given these findings, overall, it was concluded this is a plausible model provided parameters are held within certain ranges. However, prior to exploring the model further it was decided to investig ate o ther potential mechanism s action effects to ascertain which may be more broadly applicable to the transfer environment. 116 Study 3B: Tweaking Goal Seeking Prior to fully accepting M odel 3A it was decid ed th at at two other potential implementation s of the self - regulatory system should be explored for the LTM. Model 3A relied on an effector mechanism that blindly makes the same adjustment to behavior regardless of the degree to which o ne is short of the d esire d goal. However, this does not necessarily align completely with reality. One other potential interpretation is that when individuals are further from their desired goal, they will take more drastic actions to close that gap. Leavi ng aside, for the ti me be ing, the issue of disengagement in the face of extreme deficits between goals and current states , an increase in motivation generally fits with the CT view of self - regulation where as an individual approaches their goal motivation would only be mainta ined through the increase of their goal level in order to maintain a deficit, or motivation would be redirected towards the completion of other goals (Carver & Sheier, 1998). A second potential effector mechanism would be one which rai ses exploration as a n ind ividual nears their goal. Thus, when an individual is close to their goal but not quite there, they may work harder to find a way to push their current state to finally come in line with their desired one instead of backing off. Th is would imply an in verse relationship between the view fits with other theories such as the Temporal Motivation Theory (Steel & Konig, 2006) which states that the expected value of en gaging in a task increases as the temporal distance to that task decreases, raising the motivation of the individual to engage in that task. Additionally, some research has shown that levels of motivation increase as subjective jud gement of how close one i s to their goals increases, and that close goals specifically increase focus on the process of meeting that goal (e.g., Peetz, Wilson, & Strahan, 2009). Such a relationship would fit with an 117 effector mechanism that increases explor ation when goals are clos e but unreached to a greater extent than when those goals are further away. Given these two other possibilities for effects of goal deficits, two alternative effector mechanisms were explored for the LTM. Model 3B - 1 The first al ternate effector mechanis m proposes a direct link between the perceived , and the degree to which they are willing to explore their behavioral options. Specifically, the degree to whi ch they are short of their goal increases their desire to explore to the extent they see themselves as short of that goal. Mathematically this makes the variable F outlined in Model 3B to be a dynamic variable instead of a static one. Now, F will be calculat ed as: Stating that F at time t + 1 is equal to the observed difference, D l and observed state at time t . Model 3B - 2 The second alternate effector mechanism proposes an inverse relationship such that exploration will be greatest wh en one is just short of t he set goal, and that rate will taper off as the distance to the goal increases. The simplest way to create such a relationship is to simply adjust F to be However, this will create extremely large relative values of F , making ex ploration essentially 1 e very time an individual agent is short of their goal. As seen in the simulations for Model 3A, Algorithm 10 . Effector Mechanism 2 Algorithm 11 . Effector Mechanism 3 118 this is not ideal or realistic. To place an upper limit on that change in exploration then it was chosen to calculate F as which wil l limit exploration rates to .5 + baseline defined exploration rate, which we typically are setting at .1, when individual agents are just short of their goals. Algorithm 12 . Effector Mechanism 4 119 Study 3B: Methods, Simulation, and Results The two mechanisms outlined above were in stantiate d into two mirrored compu tational models in NetLogo where the only difference is the calculation of F . A snapshot of the modeling environment and copies of the simulation code for each can be found in Appendix F. Model 3B - 1 To explore the effect of treati ng F as a direct positive function of the difference between to 1.0 in .01 steps. Other variables were held constant: type 2 likelihood at .80, Policy A value at .70, change in value a t .05, baseline exploration rate at .10, initial policy estimates at .5 0 , 1 trainee, 100 pre - training time points and 500 post - training time points. Initial results suggest correlations between goals and behavioral transfer ( r (101) = .701 , p < .001 ) and po st training performance ( r (101) = .392 , p < .001 ) are positive, as would be expected. To better understand the nature of the effect, graphic depicti ons were created of the mean observed behavior, post training performance, and pre - post performance improvem ent in Figures 39 - 41. These results show the relationship between goals and these outcomes is not uniform. Instead, goals have no real effect when g oals are well below the set value of Policy A . Then, as goals approach and pass the value of Policy A , outco mes rapidly improve until leveling out once goals reach a level just higher than the value of Policy A . For pre - post performance change specifically , positive outcomes begin to occur around a goal level of .60. Given the observed positive relationship bet ween goals and outcomes in this model, and the potentially interesting effects of the semi - discontinuous nature of their effects, a small experiment was run to explore these effects further. Specifically, goal level (varied from 0 to 1.0 in .05 increments) was crossed with changes in value from Policy A to Policy B (varied from - 120 1.0 to 1.0 in .05 increments) to study the effect of changing goals again st changing behavioral options. In this simulation, it was found that goals again h ad a positive effect on b oth behavioral transfer ( r (430500) = .638 , p < .001 ) and post training performance ( r (430500) = .032 , p < .001 ) , as well as pre - post performance change ( r (861) = .094 , p = .006 ) . In addition, value cha nge also had positive effects on behavior ( r (430500) = .336 , p < .001 ) , post training performance ( r (430500) = .589 , p < .001 ), and pre - post performance change ( r (861) = .611 , p < .001 ) as would be expected. A moderated multiple regression analysis was then completed. In predicting beh avioral transfer it w as f ound that goal level ( F (3, 430496) = 181108.91, p < .001, R 2 = .75; b 0 = .224 , t = 688.04, p < .001 ; b 1 = .183, 1 = .336 , t = 331.21, p < .001 ), and value change ( b 2 = .678, 2 = .638 , t = 629.60, p < .001 ) had positive main effects, and a positive interaction ( b 3 = .351, 3 = .196 , t = 192.95, p < .001 ). Similarly, in predicting post training performance it was fou nd that goal level ( F (3, 430496) = 301747.87, p < .001, R 2 = .82; b 0 = .720 , t = 6484.78, p < .001 ; b 1 = .128, 1 = .589 , t = 681.17, p < .001 ), and value change ( b 2 = .013, 2 = .032 , t = 36.73, p < .001 ) had positive main effects, although the effect of value change was very small after controlling for the effect of goal level, and a positive interaction ( b 3 = .411, 3 = .574 , t = 663.25, p < .001 ). Fina lly, pre - post performance change displayed similar positive relationships with goals ( F (3, 857 ) = 9 08.08 , p < .001, R 2 = .87; b 0 = .331 , t = 6.32, p < .001 ; b 1 = 3.236, 1 = .611 , t = 36.57, p < .001 ), and value change ( b 2 = .972, 2 = .094 , t = 5.62, p < .001 ), and a positive interaction ( b 3 = 10.761, 3 = .615 , t = 36.82, p < .001 ). These positive interacti ons a re depicted in Figures 42 - 44. For further investigation, heat maps of these effects are depicted in Figure 45 - 47. As with the simulation of goals alone for this model, goal level had basically no effect on either behavioral transfer, performance, or p erfor mance improvement when goals were below about .60. When goals reach .60, there is a sudden and rapid change in the pattern of results where the 121 best outcomes occur when goals are slightly above the baseline value of Policy A , and the change in value o f the policies is moderately positive. If goals become too high, outcomes become worse as the agent begins to search for a better option than those available instead of exploiting the available options. In addition, outcomes are only especially bad when go als a re high , and the new policy is substantially worse than the existing policy. Model 3B - 2 To explore the effect of treating F goal and perceived state, as with Model 3B - 1, an initial simulation swe pt the performance goal variable from 0 to 1.0 in .01 steps. Other variables were held constant: type 2 likelihood at .80, Policy A value at .70, change in value at .05, baseline exploration rate at .10, initial policy estimates at .5 0 , 1 trainee, 100 pre - training time points and 500 post - training time points. Initial results suggest correlations between goals and behavioral transfer ( r (50500) = . 802 , p < .001 ) post training performance ( r (50500) = . 307 , p < .001 ) a re positive, as previously observed . Visua ls were created of the mean observed behavior, post training performance, and pre - post performance improvement in Figures 48 - 50 . T hese results also show the relationship between goals and these outcomes is not uniform but in a different way than with Model 3B - 1 . Here, goals still do not have a noticeable effect at extremely low levels, but they begin to impact outcomes at a lower lev el than in Model 3B - 1. Additionally, their effect on behavior and performance does not come so suddenly and drastically. Inste ad, as goals increase behavioral transfer and post training performance gradually increase until leveling out around .50 and .73, respectively. For pre - post performance change we only begin to observe positive effects when goals reach at least .40. 122 Having found a positive relationship between goals and outcomes in this mode l, the same experiment run for Model 3B - 1 was executed for this model as well . I t was found that goals again had a positive effect on both behavioral transfer ( r (430500) = . 123 , p < .001 ) and post training performance ( r (430500) = . 315 , p < .001 ), as well as pre - post per formance change ( r (861) = .094 , p = .006 ). In addition, value change also had post training performance ( r (430500) = . 827 , p < .001 ), and pre - post performance change ( r (43 0500) = .611 , p < .001 ) as would be expected , but a negative effect on behavioral transfer positive effects on behav ior ( r (861) = - .525 , p < .001 ) . A moderated multiple regression analysis was then completed. In predicting behavioral transfer it was found that goal level ( F (3, 430496) = 218770.33, p < .001, R 2 = .78; b 0 = . 347 , t = 1372.82, p < .001 ; b 1 = . 107 , 1 = . 123 , t = - 547.17, p < .001 ) had a positive main effect , but value change ( b 2 = - . 233 , 2 = - .525 , t = 127.98, p < .001 ) had a negative main ef fect, and a positive interaction ( b 3 = . 822 , 3 = . 560 , t = 583.56, p < .001 ). In predicting post training performance it was found that goal level ( F (3, 430496) = 567084 . 97 , p < .001, R 2 = .89; b 0 = . 607 , t = 4698.99, p < .001 ; b 1 = . 196 , 1 = . 315 , t = 12 08.04, p < .001 ), and value change ( b 2 = . 264 , 2 = . 827 , t = 459.89, p < .001 ) had positive main effects , and a negative interaction ( b 3 = - .126 , 3 = - .119 , t = - 174.33, p < .001 ). Finally, pre - post performance chang e displayed positive relationships with goals ( F (3, 857 ) = 844.33 , p < .001, R 2 = . 86 ; b 0 = - 2.523 , t = - 29.02, p < .001 ; b 1 = 4.081 , 1 = . 244 , t = 48.20, p < .001 ), and value change ( b 2 = 7.083 , 2 = . 828 , t = 14.21, p < .001 ), and a negative interaction ( b 3 = - 1.397 , 3 = - .049 , t = - 2.88, p = .004 ). These interactions are depicted in Figures 51 - 53 . For further investigation, heat maps of these effects are depicted in Figure 54 - 56 . Interestingly, these analyses show that the best performance outcomes occur when value changes and goals are both high, which we would expect. However, 123 a greater degree of beha vioral transfer occurs when goals and value changes are low, counter to expectations. 124 Study 3B: Discussion Models 3B - 1 and 3B - 2 were meant to explore other potential effector mechanisms with in the self - regulatory processes of the LTM which are consistent with existing theory and research findings . This was undertaken after finding the originally proposed mechanism explored in Model 3A may only approximate meta - analytic estimates in the transf er literature under a limited range of para meters. The present models changed the value of F from an a priori set effect of on perceived differences between one Initial simulati on results for these models are mixed. First, Model 3B - 1 did display the expected overall positive relationships between goal level and transfer outcomes of behavior and performance. This represents an improvemen t over the overall model of 3A, where initi al results suggested overall negative effects of goals instead of positive ones. However, the magnitude of the goal effects in Model 3B - 1 are much larger than the suggest ed .08 in transfer research (Blume et al., 2010). The same can be said of Model 3B - 2, where relationships were in the expected directions, but of abnormally large magnitude. E ven so, there are some potentially intriguing results. For example, the finding that behavioral transfer rates actually rev erse at very high goal levels in this model may be a sign of agents finding that a roughly 50 - percent exploration rate is optimal given two behavioral choices as they desperately search for an option which may complete their goal. The combined lack of tran sfer at low goal levels, and this upper lim it on transfer in this case may provide an explanation for low transfer rates commonly cited in the literature (Ford et al., 2010). For workers with low goals for their personal performance, all these models (3A, 3B - 1, and 3B - 2) suggest we will not see hig h degrees of behavioral transfer, although the exact amount differs by 125 model. Further, when goals are very much higher than the achievable performance through available means transfer does not occur to an extreme extent because the agent does not just sett le and acquiesce to use the best available policy, but keep s searching for an option which will fulfill their goals. For real world employees, the same calculus could be in play where among workers with high goals a failure to directly transfer received tr aining may not be out of a failure to recognize the improvement of the training over whatever their old approach is, but represent a recognition that the training is not good enough and a result of their personal pursuit of other, not necessarily organizat ionally directed, options to achieve their goals. Such an insight could provide guidance to future research projects. Further, this set of models may provide practical guidance on the post training setting of goa ls. Following goal theory (Locke & Latham, 1990), goals for transfer are set following training to, ideally, be specific, challenging, and attainable. The relationship between those goals and actual transfer could be said to be disappointing given the weak meta - analytic relationship between goals a nd transfer (Blume et al., 2010). The findings here suggest that we may need to focus more on those post training goals being attainable to keep them in the range where they can have a substantial effect on later transfer. Further, on the research side, wh en we study those goals, we may need to change the way we analyze their effects. We traditionally rely on ordinary least squares regression and correlational approaches to study these effects, but it has been sugg ested that more advanced analytic technique s could improve our understanding of transfer (Olenick et al., in press), and the effect of goals on that transfer is a good example. In selection research it has recently been shown that taking a fit approach and associated polynomial regression technique s can greatly improve the ability of interests to predict work performance ( Nye, Prasad, Bradburn, & Elizondo, 2018). Similarly, given the interplay between goal levels 126 and the value of the received training indic ating an interplay where different goals ma y work better with different levels of value for the trained behavior, we could study the congruence between set goals and the value of the training. In this way we may better estimate the value of goal setting wi thin the training and transfer field. Desp ite the potential insights gained from these models in total, the results of the simulations run for this paper in studies 3A and 3B suggest the best model for use in transfer may be that proposed in Model 3A. Thi s conclusion results from the very close ma tch between simulated effects of goal level, provided the model is within certain ranges of parameters, for Model 3A while the effects observed in Models 3B - 1 and 3B - 2 are larger than the meta - analytic effect of . 08 (Blume et al., 2010). Further, the mecha nism tested in Model 3A is more parsimonious than those tested in 3B - 1 and 3B - 2, and it provides a degree of control for further simulation. The approach Model 3A takes is one more akin to treating not just goals, but the effects those goals have on decisi ons as an individual difference within the transfer environment, adding to existing studies of individual differences such as personality, goal orientations, need for cognition, and implicit theories of learning ( e.g., Jaeggi, Buschkuehl, Shah, & Jonides, 2014). Given the results showing Model 3A replicates the effects of goals on transfer closely, provided F is set to plausible ranges, and the potential it implies for future research, Model 3A was retained for use in the full LTM. However, it is acknowledg ed that future work will be required to explore this and other effector mechanisms, especially in regard to applying the model to any particular task of interest. 127 Study 3C: Engagement Thresholds Having establish ed that the originally proposed self - regulatory system in Model 3A generally outperforms two other alternatives, shown in Models 2B - 1 and 2B - 2, in recreating regulatory effects in transfer research, another set of self - regulation findings a nd implications was explored. Thus far in exploring self - regulation in training transfer we have focused on the effects of goals. Now, we must explore the effect of self - efficacy, which has long been a central variable in self - regulatory models. Self - effi cacy, the belief 1977), has been argued as the central motivational variable by which individuals have agency over their environments (Bandura, 1989). Decades of research have established a clea r general patter n of higher efficacy relating to higher task performance (Stajkovic & Luthans, 1998), and this effect has been meta - analytically established in training transfer where post training efficacy has a corrected relationship with transfer of .22 (Blume et al., 2010). However, in the last decade, some minor but important disagreements have arisen over the nature of self - efficacy. Importantly, it has been argued that when studied in a causal manner self - efficacy is a product of performance, and not necessarily the other way around. In this case, Sitzmann and Yeo (2013) found that within individuals performance predicted self - efficacy at = .30 when controlling for linear trajectories, but self - efficacy only predicted performance at = .06 under th e same condition s. Thus, it would be beneficial for the present model to display a general positive relationship between efficacy and transfer, and performance, but also replicate the differences in the causal strength of the efficacy - performance relations hip. Further wo rk by Vancouver and colleagues has challenged the traditional view of self - efficacy having a monotonously positive effect on important outcomes such as performance and 128 task engagement. Over various studies, they have found that self - efficac y can have negat ive effects in learning tasks under some conditions ( Vancouver, Gullekson et al., 2014 ), and that the relationship between efficacy and task engagement is actually discontinuous in nature (Vancouver et al., 2008; Sun et al., 2014). This dis continuous relat ionship suggests that at very low levels of efficacy for a task, individuals will refrain from engaging in that task and instead conserve their resources for tasks they are more confident in. As efficacy levels for a task increase, eventual ly a threshold i s passed where suddenly those individuals will choose to engage in the task and will outlay substantial resources in order to improve their odds of success. In the transfer environment it is possible that learners would not even attempt to transfer their l earning if they do not believe they can succeed at the application of that learning, which would drastically reduce transfer rates and provide another potential explanation for the common belief that transfer rates are disappointingly low. knowledge no studies have examined the effect of efficacy on training transfer from a discontinuous perspective. Therefore, the present model will explore the potential effects of a discontinuous model of efficacy on transfer to guide futur e research. Disc ontinuous Self - Efficacy in the LTM A s currently conceived, the LTM and its computational equivalent does not directly incorporate a variable labeled efficacy. However , since efficacy is a perception of the individual regarding their ability to complete a task (Bandura, 1977) , and that efficacy is the product of past performance (Sitzmann & Yeo, 2013), the equivalent of an efficacy evaluation is already present within the LTM . The underlying value of each policy which the learner may a pply to their encounte red situation is the percentage probability of that policy succeeding in that situation. The p olicy a nd receive 129 feedback to inform that estimate. Thus, t heir estimate of the p efficacy because it is their estimate of the likelihood of their succeeding at applying the policy. Therefore, nothing needs to be directly changed in the existing model to incorporate efficacy as a construct. However, as discussed, the relationship between efficacy and task engagement is not actually linear (Vancouver et al., 2008; Sun et al., 2014). As it exists in the present model , the likelihood of using any p olicy a vailable to the learner is a positively l inear function of the e stimated value of that policy. That is, even though the exact choices made by an individual on a given task attempt is dependent on several dynamic variables, the underlying relationship is that as the estimated value of a policy inc reases the likelihood o f using that policy will increase. If the relationship between efficacy and engagement is non - linear, then the existing underlying relationship is incorrect. To remedy this, a single variable needs to be added to our overall model. W e will call this variab le the engagement threshold, labeled V, and will represent the value estimate below which the learner will not choose to implement that p olicy a nd will instead opt for the other p olicy a vailable to them. As tasks are encountered and policy decisions are ma de by the agents, each learning agent in the model independently compares their value estimate of that policy to the cutoff level defined by V . If that policy has a lower value estimate than that threshold level the agent will choose the other policy, but only if the other policy option lies above the threshold, otherwise the original policy choice will be implemented. 130 Study 3C: Methods, Simulation, and Results The addition of an engagement threshold and necessary code to ensure a gents only applied beha viors above that threshold when possible was made to the expanding computational model of the LTM. A screen capture of the modeling environment and associated code can be found in Appendix G. Causal Effects of Self - Efficacy on Transf er and Performance Sin ce efficacy is a value in this model which develops of its own accord , the effects of efficacy were not explored via direct manipulation. Instead a model was executed which track ed the level of the estimated value of Policy B over time to attain a measure for that policy. Given the mechanics of the model as presented in this paper, it was expected that the value estimate will be related to outcomes of interest in the way efficacy is found to be in the ps ychological literature. Specif ically, the estimated value of a policy will be positively related to the likelihood that the learner will choose that policy on a given task attempt and thus transfer it to the task from their theoretical learning environment , and the perceived value of t he policy will be positively related to task performance (Blume et al., 2010; Sitzmann & Yeo, 2013; Stajkovic & Luthans, 1998). To study the dynamic relationships between efficacy, performance, and transfer , 1000 replications of a model with one agent wer e run for the established 500 time point transfer length. At each time point, the value estimate of Policy B , whether or not Policy B was applied at that time point, and the reward (representing task success or failure) for th at time point were saved. All other variables were held constant as before, with type 2 likelihood at .80, value of Policy A at .70, change in policy value of .05, exploration rate of .10, and starting policy value estimates of .50. 131 To analyze this data, correlations were computed bet ween the saved variables, adjusting the data set to account for causal ordering. Only time points within the transfer period were analyzed to remove any biasing effects of the data regarding Policy B during the pretraining pha se when those values were not affecting the behavior of the agent. Additionally, only the value of Policy B is of interest as it is the target of transfer and therefore the subject of the efficacy measurements typically taken at the end of training. First, the estimated value of Policy B at each time is causally related to performance at that time point. The relationship between these two variables was found to be r ( 500000 ) = .065 , p < .001 . This relationship nearly perfectly replicates the relationship fou nd by Sitzmann and Yeo (2013) for the same effect. Next, the value of Policy B was related to the tendency to choose Policy B at that time point, representing behaviora l transfer. This relationship was found to be r ( 500000 ) = .369 , p < . 001 , which is in the correct direction for transf er as found by Blume et al. (2010). Finally, a lag variable was required to test the effect of performance to align performance on one task attempt with the value estimate of Policy B on the next attempt. When Policy B is chosen at time t , the resulting p erformance at that time point should have a causal relationship with the value estimate of Policy B at time t + 1. To isolate these effects, only time points where Policy B was applied in the transfer environme nt were analyzed. Among these time points, the relationship between performance and the value estimate of Policy B was r (207936) = . 048 , p < .001 , which is in the expected direction according to meta - analytic estimates, but substantially less in magnitude (Sitzmann & Yeo, 2013). It is possible that th e length of the transfer run obfuscates the relationship between these two variables as the value es timate of Policy B stabilizes over time and therefore would not be greatly affected by a single performance. Within most research studies we are unable to s tudy any length of time close to 500 data points long and 132 instead the dynamic relationship between e fficacy and performance is based on a much shorter time period. To test if a shorter time period would better approximate the expected relationship, the cor relation between performance and the value estimate of Policy B on the next time point was estimated for both the first 100 transfer attempts, and the first 25 transfer attempts. In the first 100 transfer attempts the relationship was r (41587) = .064 , p < .001, and was r (5199) = .056 , p < .001, in the first 25. These rel ationships suggest it is not merel y the time period examined which accounts for the difference between the meta - analytic relationships and those generated by the present model. Effects of En gagement Threshold Unlike with the general effects of our efficacy stand - in, the policy value estim ate, an experiment was completed to explore the effects of the discontinuous model of efficacy as it applies to transfer. To test the effect of the engage v ariable, V , a simulation swept the parameter from 0 to 1.0 in . 0 1 increments. It was expected that transfer would diminish as the threshold level of engage increases. The logic of that relationship being that not only will it be more likely for the value e stimation of the target policy to fall below that threshold overall , t hat effect is exacerbated by the instability of small samples where even high true values for policies will often have lower estimates of that value in the initial stages of transfer onl y because of sampling error . This incorrect early judgement would sometimes result i n the abandonment of a policy b efore its true value is revealed to the learner. Such an effect would seem logically consistent with experience given that some learners will not apply their new KSAO because they feel it is too difficult. As such, this also represents an initial relaxation of the assumption that learners enter the transfer environment with the ability to successfully apply their new KSAO. 133 In running the full parameter sweep of the engagement threshold, all other parameters were held constant at our established levels: type 2 likelihood was set to .80, .10 exploration rate, true value of Policy A .70, a .05 change in value to Policy B , 100 pre - training time poi nts, 500 transfer time points, 500 replications each, with one agent in each model. However, unlike previously, the initial estimate for the value of Policy B was set to 1.0 instead of .50. This change was made to refrain from artificially limiting initial transfer attempts by a parameter which in this simulation was not our focus, instea d allowing any reluctance from the agent in applying Policy B to arise from its own experience. Initial examination of the results of this experiment confirmed expectation s outlined above. The relationship between threshold level and behavioral transfer ( r (50500) = - .365 , p < .001 ), and post training p erformance ( r (50500) = - .214 , p < .001 ) were both negative. To further understand the relationship between the engagement th reshold and transfer outcomes, mean results for each condition for behavioral transfer, post training performance, and the effect size of pre - post training performance change have been plotted in Figures 57 - 59. In addition, best fitting trend lines with a quadratic term were plotted to better visually illustrate the general pattern. The pattern of all these findings indicate that transfer outcomes are relatively high when the engagement thresho ld is low. However, when the threshold reaches about .50, transf er outcomes begin to deteriorate rapidly as they transition to a lower set point starting around .80 where those outcomes display essentially no transfer. In addition, performance change as ex pressed in d becomes negative when threshold levels exce ed about .60, which is well below the .75 true value of Policy B . 134 Study 3C: Discussion The present study explored the effects of self - efficacy within the LTM. Specifically, it suggested that the value perceptions for the behavioral policy representing th e targeted transfer behavior would display relationships with outcome variables that have been observed in the literature. Further, it explored the effects of the discontinuous model of self - e fficacy (e.g., Vancouver et al., 2008) on transfer. Here I shall discuss the implications of these simulations for both theory and practice. Theoretical and Research Implications Overall, the effects of the value estimate for Policy B in the present model continue to be mixed. As you will recall, it was argued in a pr evious simulation that the effect of the initial value estimate for Policy B should approximate the effect of utilit y reactions we observe in the transfer literature (Blume et al., 2010). However, the expected relationship did not emerge at the replication level, but did to some degree at the condition level, leaving the support for the expected effect as plausible but needing some future refinement. Similarly, the results for the effect of and on Policy B value estimates were mixed in the present study. On the one hand, all the relationships between Policy B value estimates, transfer, and performance were in the expecte d direction. In addition, the magnitude of the causal effect of the policy estimate and performance was essentially identical to that observ ed in the research literature (Blume et al., 2010). Thus, it could be argued that the general pattern of results fro m this model fits that which was expected, and generative sufficiency has been achieved . In addition , the general effects of the inclusion the present model fit expectations . Overall, the effect of having a thr eshold for when to apply a given policy was such that high thresholds decreased behavioral transfer and performance 135 outcomes. Unfortunately, there are no known studies to which the effect observed here can be directly compared, although the observed effect fits with general expectations from the work by Vancouver and colleagues on the nuanced effects of self - efficacy. However, we cannot direct ly compare the effect sizes observed here to theirs to enhance the claim of generative sufficiency as the tasks used in their work are not transfer related, nor do they collect data in a comparable way. For example, Vancouver et al. (2008) use a task calle d the Hurricane Game where participants must click on squares of various sizes, representing different levels of eff icacy for doing so, as they randomly jump around a computer screen. There is no real learning component to this task, and they do not collec t and report data on the behavioral strategies employed by their participants to compare how those strategies to each other. Therefore, future work is needed to apply the prese nt simulation to more applicable learning and transfer related tasks which are d esigned to study the discontinuous nature of self - efficacy. Along with applying the present theory to more directly comparable data, the nature of the discontinuous effect of self - efficacy in the present model needs to be further tuned and explored. As im plemented in this version of the LTM, effort is assumed to be constant across all levels of self - efficacy if the agent has decided to engage in the targeted behavior. That is, the agents either fully engage with the behavior or they do not. The discontinuo us model of self - efficacy (Vancouver et al., 2008) suggests that this is not quite the case. The discontinuous model does suggest that individuals completely disengage from tas ks which are below that above that threshold there is a negative relationship between efficacy and effort. In their studies on this phenomenon (Vancouver et al., 2008; Sun et al., 2014), Vancouver an d colleagues use time allocation as a measure of effort applied to the task, but in the LTM it is currently assumed all resources are applied as long as the 136 threshold is met. Future iterations of the present model should examine the effects of resource all ocation to relax the assumption that individuals always fully engage or do not a nd explore the impact of a tapering off resource allocation by agents at high levels of efficacy. It could be the case that very high levels of efficacy are then detrimental to transfer while the highest levels of transfer occur when efficacy is just high enough to get a learner to engage. Such a finding would provide a potential explanation for the surprisingly low relationship found in the literature between efficacy and trans fer (Blume et al, 2010) as the negative effect of high levels of efficacy would mask its overall benefits. One surprising outcome of the discontinuous effect of the threshold model explored here is worth some discussion. Specifically, although expected to a lesser degree than was observed, it is surprising to see the threshold have n egative effects on transfer at levels so far below the true value of Policy B . The reason for this likely has to do with sampling errors by the agents. In the early stages of t ransfer, the value estimate for Policy B can fluctuate quite wildly as the agent does not have much experience with that policy. On the other hand, even in early transfer attempts the same agent has at least 100 experiences with Policy A and therefore alre ady has a relatively stable and accurate estimate of the value of Policy A . This results in a situation where in early transfer attempts the agent will have a good idea of the true value of Policy A , and therefore their theoretical efficacy for that behavi or as it has been argued that the value of the p olicy a nd efficacy are equivalen t in this model, and if that true value is above the threshold where they are willing to use that behavior. Simultaneously, they are unsure of the true value, and therefore the ir efficacy, for Policy B and just a couple poor experiences with Policy B can e asily lead to their value estimate falling below the threshold and them discarding the p olicy b efore ever truly giving it a fair chance. It is worth noting that this discarding of Policy B based 137 on these experiences again fits with general predictions of r ecent narrative theorizing around the transfer process (Blume et al., 2019). The negative effect of the threshold then occurs at a lower level than the true value of Policy B b ecause even relatively lower thresholds will sometimes lead to the agent erroneo usly discarding Policy B based on few experiences. Combined with this effect, sometimes the value of Policy A will be overestimated based on pre - training experience, making it even less likely the agent will decide to transfer Policy B . Then, in the transf er environment , that agent discards Policy B only to potentially learn over time Policy A is not as valuable as it believed, and the overperformance of Policy A observed in the pre - training environment will tend to even out over the force of the extra time simulated in the transfer environment. This over - estimation then correction likely explains the observed negative effects seen in pre - post training performance comparisons her e. Despite the initially surprising nature of this effect, it would again help e xplain the general belief in low levels of training transfer if individuals are giving up on that transfer in part due to a misreading of the benefits of their training compare d to their personal willingness to employ that training. Overall, given these re sults, the present model is potentially viable for studying the basic patterns of relationships we might expect in the transfer environment. F uture research should fine tune th e way in which the policy value estimates operate to better match real - world obs ervations. Or, the model will require further exploration to understand under what parameter combinations the expected relationships may be reproduced. For example, it could be that when the difference in policy values from Policy A to B are even smaller t han .05, the relationship between the value estimate and transfer may similarly decrease as the agent would erroneously choose to apply Policy A more often. However, this would also decrease the overall rate of transfer and potentially mo v e the model out o f acceptable ranges in other ways. 138 Practical Implications Th e interesting finding that negative transfer outcomes begin at threshold levels well below the true value of a trained behavioral policy has significant implications for how we approach transfer in real organizations. In the present simulations we see that it is possible to behavior is well below that theoretically required for them to do so. Therefore, i n our training interventions we should take extra care in ensu ring trainees are willing to try their new training back on the job multiple times before judging whether to retain or discard it for future use. This could include measures taken within the tra ining program itself, such as providing examples of the traini ng working to provide evidence that it should be useful, or during the transfer phase such as check - ins on their progress and supervisor support early in the transfer process before the learner has a chance to discard the training as not being useful. Conc lusion The model explored in this study represents the final iteration of the LTM for the present paper. Given the pattern of observed results it appears the model can be defensibly applied to the study of training transfer as it is able to largely reprod uce expected patterns of relations and results. However, more work will need to be done in the future to fine tune aspects of the model to better fit existing data. For now, the model appears to be a useful first step towards accounting for transfer effect s with a dynamic process theory, and that the model could provide potentially novel and useful insights for future research and practice. 139 Study 4 : Exploring the Full LTM Model Over the course o f the present paper, we have explored several iterations of a process - oriented theory of learning transfer called the Learning Transfer Model. This evolving theory was instantiated in a series of computational models and explored to establish generative su fficiency for existing research findings in the transfer liter ature. Based on the simulations presented here, it appears that this process has largely, though not completely, been successful. However, work is not yet done. One strength of computational mod els is the ability to run novel experiments in a low - risk envi ronment to provide insights for theory and practice that would not normally be feasible, if not completely impossible, in a traditional research environment. Therefore, the final study of this p aper takes advantage of the developed modeling platform to dem onstrate some of the types of experiments that can be conducted in this environment and discusses some of the implications of those findings. The experiments executed here were chosen a priori f or the apparent potential novelty of the effects that we do no t typically study in the transfer literature, as well as their ability to demonstrate effects we may not be able to easily study in real world environments without prior guidance. Experiment 4A: Engagement Thresholds, Value Changes and Implementation Inten tions The first exploratory experiment pitted level of engagement threshold, value changes, and implementation intentions against each other. We saw in Study 3C that engagement thresholds have an overall negative effect on transfer outcomes with a rapid c hange in outcomes as those thresholds approach the values of the available behavioral policies. One possible implication of this finding is that thresholds for trainees need to be surprisingly l ow to ensure positive transfer outcomes given the trained KSAO . On the other hand, it might suggest that individuals with especially high thresholds for engagement would require especially valuable new KSAOs from a 140 training event to ensure their successful transfer outcomes. In part, the present experiment explores t he tradeoffs between these two factors in order to guide decisions regarding training for individuals based on their likely willingness to engage with the given task using their trained KSAO, an d the theoretical performance value of that KSAO. However, on e way to overcome the reluctance to engage the task with the trained KSAO may be to pair that training with implementation intentions to make the response more automatic (Gollwitzer & Sheeran, 2 006). The positive effects of implementation intentions were d emonstrated in Study 1. It was expected that implementation intentions would reduce the negative effects of thresholds on transfer. It was further expected that implementation intentions would h ave a larger effect on transfer outcomes when engagement thres holds are high, but the value of improvement for the new policy is low. This was expected because when the value of the new policy is already high it should more often be able to overcome the th reshold without the need for the extra intervention of impleme ntation intentions. Methods To explore these effects, a three - way experiment was designed using the computational version of the LTM settled upon in Study 3C. To limit the number of runs require d, the range of parameters simulated were limited more to rang es where effects were most salient in previous simulations. To this end, engagement threshold was limited to the range of .50 to 1.0, swept in .05 increments; implementation intentions were swep t in .05 increments from 0 to .05; and value change to Policy B was limited to - .10 to .30, in .05 increments. Other variables were held constant as before, with the true value of Policy A being .70, type 2 likelihood of .80, pre - training time points of 10 0, with 500 transfer time points, but initial policy value es timates were 141 again set to 1.0 to ensure no artificial limiting of transfer due to the threshold variable, one agent was simulated in each run. 500 replications for each condition were created. Re sults Initial analyses suggest the effects of all three varia bles explored here are in the expected direction across replications on our outcomes of interest. Implementation intentions had small but positive relationships with behavioral transfer ( r (297000 ) = .026 , p < .001 ) and post training performance ( r (297000) = .020 , p < .001 ), while changes in policy value had substantial positive relationships with both behavioral transfer ( r (297000) = .649 , p < .001 ) and post training performance ( r (297000) = .721 , p < .001 ) . On the other hand, engagement thresholds are negatively related to both behavioral transfer ( r (297000) = - .283 , p < .001 ) and post training performance ( r (297000) = - .168 , p < .001 ). Given the nature of the present experiment , it is not advisab le to interpret these correlations for the ir strength as the targeted conditions could be either enhancing or truncating them , but it is notable that they are in the expected directions. Next, moderated multiple regression analyses were completed predictin g behavioral transfer , post training perfo rmance , and the effect size for pre - post performance change from the three - way interaction of implementation intentions, engagement thresholds and value change. The resulting parameters for these models can be foun d in Table 13 , and graphs of the interacti ons in Figures 60 - 62 . Heat maps were then generated at the condition level to explore these effects further and can be found in Figures 63 - 65 . These analyse s reveal that when the value of a policy is low, transfer is generally poor unless the threshold for engagement is low and implementation intentions are high. Such a pattern is okay though, because in the case of low value we generally actually do not want transfer to occur, unless there is a non - performance reas on to do so, as it will reduce performance . When the new policy has a high value, the effect of threshold level 142 dominates the rate of transfer such that low threshold levels are very beneficial and high levels are very detrimental. Beyond the effect of thr esholds, having strong implementation inte ntions only have a noticeable affect when thresholds are already low. Patterns of results for both post training performance and pre - post training performance change are similar. Discussion The results for this ex periment were somewhat surprising, especia lly when it came to the effect of implementation intentions. It was expected that implementation intentions would have a stronger effect when policy values were low, but threshold levels were high, essentially acti ng as a way to overcome the detrimental effects of high engagement thresholds. This was not the case. Instead, implementation intentions showed their strongest effects when engagement thresholds were already low , suggesting implementation intentions did no t act as a way to overcome a already willing to do so. This is the type of surprising finding that a model such as this can put forth to guide future researc h and opens the model to falsification. If this unexpected finding holds up to further scrutiny it would suggest that in designing training events one should first focus on encouraging trainees to lower their en gagement threshold before worrying about the use of implementation intentions. We know implementation intentions are generally effective additions to training events ( Friedman & Ronen, 2015 ), but their use may be for naught if our learners are unwilling to engage with the trained KSAO anyways. Experi ment 4B: Number of Trainees, Conformity, and Goal Levels A primary strength of the modeling platform built in this paper is the ability to explore social effects on transfer outcomes without requiring the hundreds, or even thousands, of individuals that w ould be required just to explore these ideas using real wo rld data. This allows 143 us to look for potential effects of interest from the theory and use that modeling to guide future targeted data collections and utilize our limited resources more judiciously. To this end, the rest of the exploratory simulations disc ussed here focus on the social effects of the conformity mechanism established in Study 2 C . The simulations in Study 2C showed that high levels of conformity were extremely detrimental to transfer o utcomes, especially after the number of agents reached abo ut 3 or 4. One possible way to overcome the pressures of the group to conform is for individuals to have higher goals that will lead to them exploring behavioral possibilities more even in the face of that pressure. To test this possibility, the initial si mulation from Study 2C crossing number of trainees with level of conformity was extended to include an effect of goals which was introduced in Study 3A. It was expected that conformity would still h ave a negative effect, especially as the number of trainee s increased, but that negative effect would be tempered by increased goals. Methods The final model from Study 3C was again used to conduct this exploration. Trainees were swept from 1 to 20 in 1 tr ainee increments, conformity from 0 to 1.0 in .05 incremen ts, and goals from 0 to 1.0 in .05 increments. Other variables were held constant as before, with the true value of Policy A being .70, type 2 likelihood of .80, pre - training time points of 100, wit h 500 transfer time points, and initial policy value estim ates of .50. 500 replications were completed for each condition. Results Initial results largely produce expected relationships between variables of interest here and behavioral transfer and post tr aining performance across all replications. The number of trainees in the model was negatively related to both transfe r ( r (4410000) = - .199 , p < .001 ), and 144 post training performance ( r (4410000) = - .145 , p < .001 ). The same was found for the relationships b etween conformity and transfer ( r (4410000) = - .766 , p < .0 01 ) and post training performance ( r (4410000) = - .556 , p < .001 ). However, goal levels were positively related to both transfer ( r (4410000) = .093 , p < .001 ) and post training performance ( r (4410000 ) = .068 , p < .001 ) . To further understand these simulated effects, multiple moderated regression analyses were completed testing the three - way interaction of trainees, conformity, and goals on behavioral transfer and post training performance. Parameter e stimates for these models can be found in Table 14 . Additionally, graphic depictions of these interactions can be found in Figures 66 and 67 . Additionally, heat maps of these results at the condition level are depicted in Figures 68 and 69 . Due to the misl eading results with changing numbers of trainees observed in previous simulations, effect sizes for pre - post performance change were not computed for this experiment. The general effect of conformity in this experiment is identical to that observed in Stud y 2C where conformi ty levels above about .45 largely eliminate the transfer of the new policy. However, we do see that goals have a n effect where they essentially push this boundary slightly higher, such that it now occurs around .50 conformity. We also se e an example of a p otentially misleading result when relying on only traditional methods to examine these results, where the regression model and simple slopes analysis suggests an effect of the number of trainees such that fewer trainees is very detriment al when goals are l ow, but more trainees are detrimental when goals are high. When we examine the heat maps of the results instead, we see that the effect of the number of trainees across levels of goals is largely the same and this apparent interaction sh ould not be over in terpreted. 145 Discussion severely depressed transfer outcomes once the degree of conformity reaches about .45. The likely reason for this is that t he default behavior i s to not transfer, so the pressure to follow along at the next time step will tend to keep transfer low. Once conformity is low enough to allow exploration the agents are much more likely to explore and discover the benefits of their t ra ining and therefore begin to transfer. What we see that is new here is a tempering effect of high goals on the depressive effect of conformity. Specifically, it appears that high goals shift the sensitive area between failure to transfer and where transf er begins improving f rom a conformity level of about .45 to about .50. This is a small but potentially very important effect suggesting that good goal setting may help push some individuals who would otherwise be on the fence regarding successfully transfe rr ing their training back to their work environment towards overcoming the pressures of the social world around them and doing so. Experiment 4C: Value Change, Conformity, and Goal Levels Another way to potentially overcome the negative effects of conform it y on transfer outco mes would be to improve the performative value of the newly trained KSAO represented by Policy B . Doing so should provide extra incentives initially for individuals to break from their work groups and begin using their newly trained KS AO . Then, the pressure to conform should benefit high - valued KSAOs once transfer has begun to spread that tendency quickly through the group and improve overall outcomes. Similarly, especially low - valued KS AOs should quickly be discarded by the group in fa vo r of keeping their old KSAO in place. Therefore, it is expected that outcomes will be made more extreme, positively and negatively, by different levels of value change. It is also expected that the positi ve effects seen when values are high will be furth er 146 enhanced when goals are moderately high due to the increased exploration undertaken by agents, but not when goals are so high individuals are unwilling to exploit the better policy once it is found. Meth ods The final model from Study 3C was again used a s the base model to conduct this exploration. G oals were swept from 0 to 1.0 in .05 increments , conformity from 0 to 1.0 in .05 increments, and three levels of value change at - .10, .05, and .20 . These cond itions for value change provide equidistant condit io ns of one negative behavior we should want the agents to discard, one representing the typical change we have discussed throughout this paper, and one especially beneficial training event. Other variables were held constant, with the true value of Policy A being .70, type 2 likelihood of .80, pre - training time points of 100, with 500 transfer time points, and initial policy value estimates of .50. 500 replications were completed for each condition. However , given the results from the exploration in 4B, an d previous simulations of the number of trainees in the model in Study 2, it was decided to choose a constant number of agents for the simulated work group. Based on those results, it was decided to simulat e groups of 3 as it appears that results pretty we ll stabilize once this number is reached. Limiting the simulation to 3 agents also has the benefit of being large enough to be traditionally considered teams (Tannenbaum, Mathieu, Salas, & Cohen, 2012), but go beyond the study of dyadic relationships . In a dd ition, limiting the teams to 3 instead of a larger number would reduce the burden on participant recruitment for any future attempts to apply the results of the present simulations to empirical investigat ions. 147 Results Findings suggest the effects of al l th ree variables explored here are generally in the expected direction across replications on our outcomes of interest. Value change was positively behavioral transfer ( r (661500) = .533 , p < .001 ) and post training performance ( r (661500) = . 711 , p < .001 ) . Co nformity had negat ive relationships with both behavioral transfer ( r (661500) = - . 670 , p < .001 ) and post training performance ( r (661500) = - . 369 , p < .001 ). However , goal level showed a positive relation ship with behavioral transfer ( r (661500) = .044 , p < .001 ) , but a negati ve one with post training performance ( r (661500) = - .016 , p < .001 ). Given the small nature of this negative relationship and the possibility of negative interactions with the other variables here, this finding should not outweigh th e ot her effects of goals observed in this paper. M oderated multiple regression analyses were completed predicting behavioral transfer , post training performance , and pre - post performance change from the three - way interaction of conformity, goal level, and valu e change. The resulting parameters for these models can be found in Table 15 , and graphs of the interactions in Figures 70 - 72 . Heat maps were then generated at the condition level to explore thes e effects further and can be found in Figures 73 - 75 . As b efor e, high levels of conformity have substantial negative effects on transfer outcomes. It also does not appear in the regression analysis that high policy values can overcome those negative effects of conformity, but we do potentially gain some nuance on the effects of goals and see they have a slight effect only when both conformity and value changes are low. In examining the heat maps, we gain a greater understanding of the effects, especially an effect such that when value change is especially low, beh avio ral transfer is best when goals are high. But when values are high the best transfer occurs when goals are lower. We see the opposite essential pattern for performance, in that performance is wor st at high goal levels when value 148 change is low, and best whe n values are high with low conformity and moderate goal levels. In the heat maps, it does appear that high values change the discontinuity for conformity slightly, provided goals are not extremel y high, such that positive outcomes occur at slightly hig her levels of conformity. Discussion Th ese results do not have the strength to overcome the negative consequences of social pressure in the model that was expected. There are slight positive effects of having highly valued new KSAOs in overcoming the detri ment al effects of conformity, but these are similarly weak as those seen for goals overall in the previous experiments. Further, the effects of goals in the model becomes clearer as agents again expl ore sub - optimally under many conditions, but a positive e ffec t of conformity, if there is one, is that agents do not improperly explore undesirable policies if their social group does not allow them to do so. Along the same lines, the transfer that does oc cur when values are low tends to be maladaptive as agents mak e the mistake of continuing to apply their training when they should not, largely as a function of high goals and the freedom to do such exploration. An interesting implication here is for traini ng which an organization knows will reduce performance, b ut m ay have other necessities, such as legal compliance. In such cases it is apparent that the organization will need to work to overcome substantial individual and group processes to make the new tr aining successfully transfer back to the work environment s. I t is in such cases where physical tools, such as checklists or software, to assist with compliance seem likely to be of extra value. Experiment 4D: Type 2 Likelihood, Conformity, and Goal Levels A final exploratory simulation examined the effect of th e ab ility for individuals to engage in type 2 cognitive processes on observed transfer outcomes across conformity and goal levels. 149 In this experiment, no direct predictions were made a priori as it is unclear what the effect of changing levels of type 2 li keli hood might be in this complex simulation . One might think that allowing individuals to engage in deeper cognitive processing would better allow them to think about the benefits of their ne wly trained KSAOs, but it would also allow them to think more ab out the potential consequences of not conforming to their social group. This counteractive effect could wash out any gains from improving cognitive processing. At the same time, lower type 2 p rocessing would lead to initial difficulties in transfer as trai nees habitually apply their old KSAOs to the presented task, but would provide potential benefits in countering the effects of their social groups if they are able to establish their newly tra ined KSAO as their habitual response. These contradictory possib ilit ies were explored in this experiment. Methods For a final time, the model coming from Study 3C was used to explore the joint effects of type 2 likelihood, conformity, and goal levels. For this experiment, conformity , goals , and type 2 likelihood were a gain swept from 0 to 1.0 in .05 increments. Other variables were held constant, with the true value of Policy A being .70, type 2 likelihood of .80, pre - training time points of 100, with 500 t ransfer time points, and initial policy value estimates of .50, 3 tr ainees per simulation. 500 replications were completed for each condition. Results Initial analyses suggest the effects of all three variables explored here have effects in the expected di rection across replications on our outcomes of interest. Goal le vels again had small but positive relationships with behavioral transfer ( r (4630500) = .068 , p < .001 ) and post training performance ( r (4630500) = . 038 , p < .001 ), while type 2 likelihood had positive relationships with both behavioral transfer ( r (4630500) = . 542 , p < .001 ) and post training performance 150 ( r (4630500) = . 302 , p < .001 ). Conformity again showed negative relationships with both behavioral transfer ( r (4630500) = - .593 , p < .001 ) and post training performance ( r (4630500) = - .330 , p < .001 ). M odera ted multiple regression analyses were completed predicting behavioral transfer , post training performance , and pre - post training performance change from the three - way interactio n of type 2 likelihood, conformity, and goals . The resulting parameters for the se m odels can be found in Table 1 6 , and graphs of the interactions in Figures 76 - 78 . Heat maps were then generated at the condition level to explore these effects further and ca n be found in Figures 79 - 81 . The moderation results initially suggest a typical mod eration effect where we see the best transfer outcomes when conformity is low and type 2 likelihood is high, largely regardless of goal level, and all other combinations res ult in poor outcomes. Our heat maps generally confirm this effect with little e lse to add, with the exception that very high levels of type 2 likelihood are the only levels which substantially overcome the effects of conformity. Discussion It was unclear what to expect a priori for the present simulation, and it was found that potent ial beneficial effects of goals and type 2 likelihood were essentially wiped out at all levels of conformity with the only exception being the ability of high ty pe 2 likelihood to lead to positive outcomes. Importantly, the effect of goals in overcoming th e ef fects of conformity were almost non - existent once controlling for the effect of type 2 likelihood. Interestingly, type 2 likelihood appears to do a better jo b than any other intervention tested here in overcoming the negative effects of conformity, but typ e 2 likelihood must be high. This effect suggests that in designing training interventions attending to environmental characteristics will be of great concer 151 environ ment allow them to engage in the kind of cognitive processes and exploration required to lead them to discover their training is beneficial to completing the rel evant task. Overall Discussion The four experiments described here were meant to be demonstrat ions of the potential of the modeling platform developed throughout this paper to provide novel insights and guidance for future research and practice in organiz ational training and transfer. One of the primary strengths of computationally modeling theorie s su ch as the LTM lies in the ability to conduct such explorations in a low - cost and risk - free environment prior to committing the resources necessary to do simi lar explorations in empirical data collections. In these experiments, results suggested that th e po wer of social learning as seen in the mechanism of conformity is a powerful depressing effect on transfer outcomes. Unfortunately, overcoming this effect is not necessarily easy, though goals and the ability to engage in type 2 cognitive processes show som e promise. The se results can be used to guide future data collections to continue testing the present model, and potentially for guidance in designing and su pporting effective organizational training events. 152 Overall Discussion Training represents on e of the classic areas of inquiry and practice in Organizational Psychology, with over 100 years of research to show for it (Bell et al., 2017). In that time, we have developed a substantial body of knowledge which has allowed us to continuously improve th e wa y we deliver training interventions in organizations and thereby improve training outcomes ( Bell et al., 2017; Salas et al., 2012). Unfortunately, this base of knowledge focuses largely on the training event itself and generally treats the transfer of that training as a cross - sectional outcome (Foxon, 1997). This typical approach necessarily limits our knowledge because we are not generally studying transfer a s a process that itself unfolds over time. The failure to study transfer as a process is unfort unat e as we have acknowledged that it is indeed a longitudinal phenomenon for at least 30 years (Baldwin & Ford, 1988). However, in practice, few studies measure transfer longitudinally, and even fewer unpack the dynamic processes driving that transfer, wi th f ew notable exceptions (e.g., Dierdorff & Surface, 2008; Huang et al., 2015; Huang et al., 2017). Recently, a group of researchers, including the present aut hor, has begun more substantially to attempt to unpack the processes underlying training transf er. Most prominently, Blume et al. (2019) described training transfer as a self - regulatory - driven process , labeled the Dynamic Transfer Model (DTM), where traine es iteratively attempt to transfer their learning to their work environment and subsequently ke ep o r discard their newly acquired KSA O s based upon the feedback they received. The primary drawbacks to their model lie in its narrative nature, and its failure to unpack the cognitive and learning mechanisms underlying the proposed feedback process. Surf ace and Olenick (forthcoming) are attempting to push the DTM to a lower level of abstraction and begin theorizing about how the transfer process may be driven by the 153 interpretation of environmental cues and subsequent execution of available behavioral scri pts, based largely in the same Dual Processing framework used in the present paper. However, their advancement still relies on narrative theorizing. Then, Olenic k et al. (in press) began to push transfer research towards using more mathematical bases by ap plyi ng non - linear dynamics to discuss training and transfer as a process of discontinuous shifts where old patterns of behaviors, represented by attractors in a mathematical sense, must be broken free from and new patterns formed. Their lens demonstrates h ow t ransfer trajectories can be modeled as dynamic processes that unfold over time as governed by mathematical attractors, which provides a more formal framework from which to build future research. The Learning Transfer Model presented in this paper repr esen ts a culmination, of sorts, of these efforts. The LTM takes the step of fully formalizing the learning and decision mechanisms I propose underly the process of learning/training transfer in organizations. In doing so, the LTM integrates theories from a cros s psychology, using Dual Process Cognition (e.g., Kahnemann, 2011) as a broad framework , self - regulation (e.g., Carver & Sheier, 1998), and Social Learning Theory (Bandura, 1977), with theories from outside of psychology, such as computational reinforc emen t learning (Sutton & Barto, 2018). Further, computational approaches to social learning were borrowed from studies of gene - culture coevolution (Richerson & Boyd, 2005) to discuss the effects of social learning on transfer from a lens of the simultaneou s em The fina l model, demonstrated via experiments in Study 4, broadly suggests that learners return to their work environment and must apply some new KSAO to their work instead of s ome existing KSAO they already were using. When encountering the applicable task, the l earner initially decides quickly and automatically, via type 1 cognitive processes, which KSAO to apply 154 based on previous experience. In some cases, the individual w ill have the opportunity to engage in deeper levels of cognitive processing and make a more conscious and informed decision regarding which available KSAO they should apply, these are governed by type 2 cognitive processes. Once an approach is chosen, the lear ner applies that choice to their task and receives feedback regarding the successfulnes s of their attempt. That feedback allows them to learn over time which of their available KSAOs can best allow them to perform the task to their desired level. If th e ne w KSAO is perceived to be better, regardless of if it is actually better or not, than t heir previous KSAOs the learner will transfer that new KSAO over the long term. Complicating matters, individuals do not always actually attempt tasks because they l ack the confidence in their ability to succeed so may decide not to even attempt to transfe r their learning. Further, these learning and decision processes do not take place in a vacuum as learners are often embedded in work groups. The environment for tra nsfe r is then a simultaneous emergent phenomenon governed by the individual experiences of all the learners in their transfer attempts, and a causal climate around them through either conforming or imitating mechanisms. As these decisions and learning eve nts play out over time, an individual may follow any one of a nearly infinite set of transfer trajectories that , in the end , result in what we traditionally observe as s uccessful transfer or not. This overall theory was formalized and instantiated into a computational model in NetLogo , building from existing mathematical frameworks such as computational reinforcement learning. A series of simulations then explored th e mo dels and developed them in an iterative fashion. The goal for this iterative process wa s to explore each model and according to established modeling steps check the models for verification, generative sufficiency, robustness, 155 and sensitivity (Railsback & G rimm, 2012). In addition, this process importantly opened the theory to an initial roun d of falsification (Popper, 1959). Overall, this process suggested the LTM, as originally proposed, was in many respects successful in its initial attempts to accoun t fo r broad patterns of findings within the transfer literature , but not completely so . For example, the LTM was able to reproduce a range of behavioral transfer rates typically discussed in the literature (e.g., Ford et al., 2011), and general effect size s fo r performance improvement we may expect in real world situations. However, it was also found that these findings were only true for some areas of the potential parameter space covered by the model, which were used in later simulations for further explo rati on. Such findings do not any more invalidate the present theory than do traditional tes ts of narrative theories in organizational psychology to establish boundary conditions (e.g. Grant, 2008; Hollenbeck, Colquitt, Ilgen, LePine, & Hedlund, 1998; Yamma rino & Dubinsky, 1994). Instead, it appears that the LTM is a plausible process explanation for general transfer findings provided the model is within certain parameters . Outside of those parameters the model may not apply to the phenomena of interest for at l east two reasons. First, it may be that the theory itself breaks down outside of the es tablished parameter ranges which produce the kinds of relationships and results we are used to seeing in the research literature. If this is the case, the model woul d be falsified for those conditions and need to be further refined to operate under those c onditions if it is deemed necessary, much as we would iterate a narrative theory. Second, as argued previously, it could be that it is not the theory that breaks dow n, b ut rather the limited range of conditions in which we tend to do our research. The mode l may be able to simulate conditions outside the bounds of reality, and therefore would not need to be applicable to them, and the breakdown in these ranges is there fore not a shortcoming. 156 However, one of the strengths of formal theorizing and computation al modeling is the greater ability to falsify and iterate theories than achieved through traditional narrative theory building. This strength is clearly shown in Stu dy 2 where the initially proposed pooling mechanism was incapable of replicating the expect ed social effects observed in the transfer literature. This model, being overly parsimonious and subsequently falsified via virtual experimentation, was able to be i tera ted by testing two alternate models of social learning borrowed from modeling of cultur al effects on populations (Richerson & Boyd, 2005) which utilized mechanisms of imitation and conformity. Unlike the originally proposed mechanism in the LTM, both m echa nisms appeared to provide some plausible results and novel insights into the nature of social effects in the transfer environment. Following some exploration, it was argued that with some reconsideration of how we operationalize culture and climate for tra nsfer, the conformity model may fit current findings in the research literature better and was retained for further exploration. Over the course of the iterative theorizing and model - building approach outlined throughout this paper, a final version of the LTM was accepted, for now, and more fully explored in Study 4. Through this process, i t is argued that the present paper has accomplished its primary goals of 1) providing a formal, process - oriented theory of training transfer, 2) integrating multiple dis parate theories to explain that process, 3) bringing outside theories, such as computat ional reinforcement learning and dual process cognition , more into the organizational psychology literature, and 4) building a modeling platform that would allow for the thorough exploration of the proposed theory for both theoretical and practical implica tions. It is to these implications we now turn. 157 T heoretical Implications and Future Research Directions al as In that spirit, the present paper sought to further our underst anding of one of the most practically impactful research areas in all organizational psychology, training and transfer, by introducing a mechanistic process theory of tr ansfer. To support the veracity of this theory, the Learning Transfer Model, a computat ional model was generated and explored to account for existing general findings in the research literature, a process referred to as establishing generative suffici ency. As discussed throughout this paper, these simulations suggest that the LTM can reprodu ce the general patterns of many research findings in this space. Therefore, it is argued that the LTM, as currently specified, generally provides a plausible proces s - exp lanation for training transfer . The ability of the present model to broadly account for many findings in the transfer literature is a critical first step in building a unifying theory for this area of our science and continue to improve our scientific rigo r ( Muthukrishna & Henrich, 2019 ). The general success of the LTM displayed in this paper has a couple of interesting implications for how we think about training and transfer in our literature. First, Blume et al. (in press) recently suggested the ne ed fo r more work on transfer as an i ndividualized process where trajectories between individuals are likely to be highly idiosyncratic. Modeling the LTM reaffirms this case as it was evident that individual trajectories of agents can vary substantially. On e fur ther implication of the LTM in terms of that individualization process is the importance of viewing transfer from a perspective of need fulfillment. Throughout this paper we have seen agents are only likely to transfer their training when that trainin g rep resents an improvement over the ir old behaviors, the training allows them to meet their personal goals, and they are allowed the ability to ascertain that benefit. Thus, if an individual is unable to discern how or whether 158 their training meets their o wn ne eds then transfer is unlikely. Future work should continue to unpack this individualized nature of training transfer. Further, the development of the LTM in this paper should encourage other researchers to look more closely at other fields as they be gin t o develop formal models of thei r own processes of interest. As a field, organizational psychology has not been on the forefront of the development of formal models and many other fields, from computer science, to biology, to economics, have been using math ematical tools to model their p rocesses for decades. We could likely draw on their already existing models and associated mathematical approaches to inform much of our own work on the organizational processes in which we are interested. Being willing to us e their work will keep us from reinventing the wheel when it comes to discovering many of the same essential processes. Similarly, through integrating models from across the sciences we can likely help place a break on continued construct and theoreti cal p roliferation where many researc hers from many different fields all study the same essential phenomenon but develop their own theories and constructs to explain and describe those phenomena. The historically siloed approach to science has likely slowed our knowledge accumulation and led to sprawling and confused literatures passing each other like ships in the night as each independently seek to solve similar problems. The ability of the LTM to provide a process capable of largely reproducing typical tr ansfe r findings in a relatively pars imonious model by integrating knowledge from across several disparate fields should provide further impetus for interdisciplinary work in the future. However, i t is not contended that the present paper has established t he LT M as the correct model of train ing transfer, only that it is a plausible explanation, or at least a plausible step in establishing such a theory. Perfection was never the goal of the present theorizing, and the 159 LTM cannot be evaluated against such a s tanda rd. As Box (1976) contends, all theories are wrong, the goal is to remain parsimonious while providing an explanation for the phenomena at hand. The LTM, although integrating multiple disparate theories, only has a few actual mechanisms when expressed form ally, making the overall model fairly parsimonious while still appearing to be broadly applicable to transfer research. The question then becomes not necessarily whether the theory is incorrect, but in which ways it is meaningfully wrong (Box, 1976). As h as been discussed through the r esults of the simulations above, there are at least a couple of ways in which the current version of the LTM is, or was, meaningfully wrong. For example, the effects of practice in the simulations was in the correct dire ction , but obviously not capable of reproducing the desired effects. This is problematic as practice effects are some of our best - established tools for improving learning outcomes (e.g., Dunloski et al., 2013). Additionally, even though in some cases the e ffect s of policy value estimates in the LTM worked nearly perfectly as with the recreation of the effect of efficacy on transfer, those value estimates only reproduced the effect of utility reactions at the condition level and not the individual level as e xpect ed . On the extreme end, it was shown that the initially proposed social learning mechanism for the LTM was inadequate for producing the desired social effects. Already within this paper two alternative mechanisms were proposed and explored, with both showi ng greater potential for illumi nating social effects in the transfer process. Future iterations of the LTM, combined with targeted data collections, will be required to fine tune these mechanisms. In the case of value estimates in relation to utility reac tions the underlying mathematic s will need adjusting. The current effect of initial value estimates quickly becomes swamped by the experience of the learning agent, and therefore does not substantially 160 affect the willingness of the agent to continue e ngagi ng with a task in the face of i nitial failure. If this effect can be drawn out over time by changing the updating procedure for value perceptions, the initial estimate may be able to better approximate the effect of utility reactions that it was thoug ht th ose initial estimates would app roximate. Similarly, the effect of practice within the LTM is not strong enough. It is not feasible, in most situations, for practice attempts to approximate the number of attempts an individual had using the behavior th ey ar e trying to replace. Therefore, the mathematical effect of practice attempts must be increased in some way. One way to accomplish this would be a multiplier on the practice attempt variable indicating the relative effectiveness of those practice attem pts. Low numbers of this moderator v ariable, such as the de facto 1 it is set to in the present model, would represent poor practice. Higher numbers could represent better practice, such as following recommendations for spaced practice, recall effects, etc . tha t would improve the strength of those practice attempts. Future iterations of the model should explore these possibilities. As for the social learning mechanisms, more modeling and data collection will be necessary to decide if the imitation, conform ity, or a possible mix of both (e.g. , Lopes et al., 2009) is needed to account for the social effects observed in transfer environments. Future empirical work should be partnered with versions of the social learning mechanisms tested in the LTM to ascertai n whi ch models better fit observed d ata regarding social interactions, learning, and how those lead to transfer or the lack there of over time. Targeted data collections and further modeling should then trade off in an iterative way to refine the models an d det ermine which has the stronger s upport in the real world. Doing this would be a prime example of strong theoretical development (Sutton & Staw, 1995), which is one of the primary draws of engaging in computational modeling. 161 More generally, studies wi ll be required to begin directly par ameterizing the model against real data and go beyond the replication of general results. Several good examples of such approaches exist in the organizational sciences, ranging from the study of motivational phenomena (e .g., Vancouver, Weinhardt, & Vino, 2 014 ), to the study of response processes to situational judgement tests (Grand, in press). However, it is unlikely that many opportunities exist to collect data within real organizations at the level of granularity requi red t o fit the LTM to the kind of mo ment to moment decisions that are being proposed to drive transfer patterns. Such a collection would , almost of necessity, be highly intrusive and distracting to the point of overly interfering with normal organizational oper ations. For this reason, I reit erate the calls of other papers (e.g., Blume et al., 2019; Olenick et al., in press) to look for opportunities to use new technologies which can collect data on decisions and behaviors in situ in near real time. These in clude the ability to collect data on momentary use of electronic systems, or sociometric badges to study interaction patterns (e.g., Zhang, Olenick, Chang, Kozlowski, & Hung, 2018) which could provide windows into both individual and group behavioral norms . Al ternatively, experimental parad igms will need to be adapted to study the mechanisms outlined in this paper. Existing options include: a) scheduling tasks which track decisions made over many time points to study motivational processes (e.g., DeShon & Rench , 2009; Schmidt & DeShon, 2007) , and b) a radar simulation task called TANDEM which can track participant decisions down to individual clicks of a mouse and time spent on various tasks, in a difficult environment where much learning is possible (e.g., Bell & Kozlowski, 2008 ). A major drawback of such platforms, however, is that the odds of success on the task attached to any specific behavior are unknown and might not be possible to be known without extensive simulation or prior data collection. This p oses a problem in testing the LTM as it relies on the underlying 162 probability of success associated with each behavioral option. Possibilities to overcome this limitation lie in using games which are well studied by mathematicians regarding the various prob abili ties associated with different tactics. It would also be idea l if such a game allowed for many repetitions of the same essential task with an obvious success or failure in short amounts of time. Natural fits here include poker and blackjack, which hav e wel l established guidel ines for play, happen quickly, and participants can typically be taught different strategies with little difficulty. Any of these could also potentially be adapted to study the social pressures surrounding transfer of any training inter ventions programed i nto the study environment using electronic confederates (Leavitt, Qiu, & Shapiro, 2019), or other real participants. Regardless of the research platform utilized, the present model has apparent implications for how we analyze trai ning data. Many of the si mulations discussed in this paper suggest the effects of various parameters do not always demonstrate the types of smooth, linear effects we typically study in the organizational literature, or at least that we typically capture us ing o ur ordinary least sq uares regression analytics. Instead, variables often display fairly sudden and rapid changes in their effects as levels in the variable of interest change. One example of such an effect was the change in behavioral transfer across level s of the threshold v ariable in Study 3C where behavioral transfer rates rapidly decreased from a threshold level of .60 to about .70. Such a pattern is not a complete discontinuity, but it suggests a pattern that may be better analyzed through nonline ar me thods. For example, a cusp catastrophe model could assess the likelihood of a target falling on either level of the observed rate of behavioral transfer while treating threshold level as a control variable for the location of that discontinuity. Such model s have long been use d in studies of animal and human learning (e.g., Baker & Frey, 1980; Guastello, 1987 ), and have 163 recently been suggested for greater use in the study of organizational training and transfer (Olenick et al., in press). The simulated resul ts of the LTM in thi s paper reaffirm this suggestion. Future iterations of the LTM should also seek to include other emerging research on human learning and decision making and its potential effects on transfer outcomes. For example, Spicer, Mitchell , Wil ls, and Jones (2020) suggest that humans protect their established causal beliefs instead of updating them when their predictions do not match observed outcomes , violati ng existing prediction error models. Their findings c ould be matched with the LTM to di scuss why in transfe r space learners/agents do not necessarily accurately update their beliefs regarding the value of their behavioral policies in the face of experience. For example, one of the biases operating in type 2 processing systems could be a disc ounting of the effec ts of failures for learning about the utility of Policy A . When the learner enters the transfer environment then, not only does their new policy have to outperform Policy A outright to convince the learner it is better for the task , but also overcome any b ias of the learner ignoring failures of Policy A in a protection of their prior beliefs. This is an intriguing idea that at least anecdotally fits with experience in real organizational environments and seems to be worth further ex plora tion. Another interesting possibility would be to combine with other computational models that explore pertinent aspects of the transfer process that are not yet included in the present model. For example, the LTM currently assumes that trainees can accur ately p erceive their environment in order to activate the relevant decision processes discussed here. This assumption can be relaxed by incorporating mechanisms in other models, such as Weichart, Turner, and nes d ecision making to understand how decisions The incorporation of similar mechanisms into the LTM would allow us to model how learners 164 might interact with thei r envir onment al cues to activate the relevant behavioral scripts represented by the policies used in the terminology of reinforcement learning. One interesting interaction would likely occur with the ability to identify the relevant environmental cues to f ully re alize the benefits of implementation intentions. As discussed previously, implementation intentions are described as if - then type rules where the learner applies the relevant response in the presence of the correct cue (Gollwitzer, 1999). For t his m echanis m to operate, the individual must be able to recognize the cue and doing so requires paying sufficient attention to the relevant environmental factors. Therefore, there is likely a moderating effect of attention on the effects of implementation inte ntions within transfer environments. Another frontier for the LTM will be to account for more and evolving behavioral options. Many tasks have specific ways they are supposed to be carried out, to which the current version of the LTM is most applicab le. H owever, many tasks are more open, allowing trainees greater discretion over how exactly they approach the task (e.g., Yelon & Ford, 1999). To incorporate many different behavioral options, the LTM should be expanded to utilize reinforcement principals for multipl e behaviors. The k - armed bandit approach used here is technically capable of assessing multiple policies at a time, but more sophisticated models exist (Sutton & Barto, 2018). Other reinforcement algorithms are likely better fits for different types of tra nsfer questions , and they should be systematically explored for that fit. Similarly, it may be possible that different types of learning, reinforcement or otherwise, are better fits for the learning mechanisms occurring within either type 1 pro cesse s or ty pe 2 processes during transfer events. The present approach was chosen as a starting point as historical research on animal learning and applied reinforcement learning models largely focuses on naĆÆve learners (see Sutton & Barto, 2018 for a dis cussi on), wh ile the specific question being addressed in the present paper 165 However, as suggested in the CLARION model (Sun et al., 2005), the type of experiential learn ing that lies at the heart of the reinforcement algorithms used in this paper (Sutton & Barto, 2018) are proposed to fit with type 1 processes but not necessarily with type 2 learning processes , although we are interested in more than the explicit inf ormin g of an individual regarding the usefulness of new KSAOs in the present case, thus tackling a different question than CLARION . Further research and modeling to refine these mechanisms to best fit the transfer environment will be required. In addition , the present paper has only focused on a single learning and transfer event, where a single old behavior must be overcome for transfer to occur. However, the development of individuals within organizations, and more broadly expertise, can be viewed as the cons tant breaking of these old habits and establishment of new ones (Ericsson, 2006; Olenick et al., in press). In traditional reinforcement learning problems, such as an agent discovering the most efficient way to navigate a maze, the agent generates sol ution s to its environment and learns their values over time (Sutton & Barto, 2018). In the same way, general employee development could be viewed as a series of pseudo - randomly generated solutions to organizational problems where the learner then chooses w hich to apply to their particular work situation or not, over time developing preferences for some behavioral policies over others and requiring new policies to overcome that preference in order for transfer to occur. Through such an approach we could go b eyond the study of the transfer of a single learning event to better understand sequential learning events. Simultaneously, such models can account for changing environments (Sutton & Barto, 2018) which would open the LTM to application further to question s of far transfer ( Beier & Kanfer, 2010 ), and problems of adaptability (e.g., Baard et al., 2014). 166 A final key area for exploration, both within the present version of the LTM and across future versions will be to explore the many other potential combinat ions of interventions and effects that were not in this paper. For example, once practice effects are refined, how might they interact with implementation intentions? Much as we expected , initially , that improving engagement in type 2 processes would augme nt ef fects of implementation intentions but the model suggests that is incorrect, it would seem logical that both practice and implementations would be beneficial and augment each other. However, maybe once one effect is accounted for the other provides no gain in transfer outcomes, and therefore it would not be worth the effort and cost to utilize both in a training intervention. The LTM could provide such guidance for future investigations into these interactive effects and therefore guidance for the effi cient practical application of research findings. It is to those more practical implications we now turn. Practical Implications Many practical implications for the individual models explored in this paper have been discussed throughout. However, there ar e a f ew overarching implications which warrant discussion. First, not only does the LTM and computational results have implications for how we measure transfer for research, it has implications for how we measure transfer for training evaluation. In this p aper, the outcomes tracked were at the behavioral and performance levels of the classic Kirkpatrick (1994) typology. To merely encourage organizations to evaluate training outcomes at these levels would be banal , although they should do so more frequently than is currently the standard. What the modeling in this paper suggests further is that the timing of the measurement for these outcomes is of great importance. It is commonly stated in the research literature that the timing of measurements should be cho sen b ased on the timing of the phenomenon of interest (e.g., Hanges & Wang, 2012 ), and this clearly pertains to the estimation 167 of transfer outcomes in the LTM. Specifically, if transfer measurements are taken too early the outcomes of interest may not have had a chance to emerge and stabilize which could lead to a drastic over or underestimate of the final effect of a training event. To make matters worse, the models here suggest that transfer may more likely emerge later than one might expect, causing an u ndere stimate of the effect of training, and therefore potentially leading an organization to incorrectly conclude their training was ineffective. Therefore, patience is urged in the timing of the collection of transfer data when possible to improve the fin al es timates of the effect of training. In fact, the timing of every aspect of training appears to be of incredible performance. In he longer one waits to intervene, the harder it likely is to cr eate lasting change (pagination not yet assigned) due to the formation over time of an attractor due to the recurrent success of the targeted behavior. Their piece only applied a mathematical lens to training to make such a suggestion, and the present pap er fu rther demonstrates their point via modeling. In the initial exploration of the LTM in Study 1, we saw a drastic effect on training outcomes according to how long the pre and post training time frames ran . What is occurring in the simulation is the ess entia l formation of the kinds of attractors Olenick et al. (in press) were discussing as the agent gained experience with their task. The sufficient to create a str ong e nough attractor that agents struggled to form new patterns unless given five times as many attempts to change that behavior. Such difficulties only become greater the longer the pre - training period is allowed to extend , as we see in the difficulty of overc oming implicit biases through training when those biases are the result of years or decades of experience ( Lai, Hoffman, & Nosek, 2013; Lai et al., 2016 ) . Although the exact number of trials 168 likely does not map cleanly onto any given real - world task, the o verall message is clear for the timing of training interventions: the sooner, the better. The advice for any practitioner choosing when to hold a key training event, at least regarding a task the trainees are already completing in some way, is to impl ement the intervention as soon as feasible as any delay is likely to make the task of causing permanent on - the - job change even more difficult. Olenick et al. (in press) also suggest that the strength of the intervention will be critical in overcoming esta blish ed KSAOs, especially when they are long - held patterns. One way to increase the strength of a single training event should theoretically lie in stacking multiple kinds of best practices or training enhancers into a learning event when possible. For exa mple, maybe as a training designer you incorporate both spaced practice and implementation intentions, and from the present modeling you want to target the transfer environment to improve their use of type 2 cognitive processes. Independently, all these ad ditio ns should improve learning and transfer outcomes, so it seems logical that doing all of them would be even more beneficial. However, the modeling in this paper suggests that may not always be the case. Instead, some types o f interventions may not be a ble t o effectively stack with each other to further improve outcomes and might even interfere with one another. In such a case, adding extra apparent enhancements to a training event could result in decreased return on investmen t for the event as energy is wast ed in implementing unhelpful tools. Therefore, training designers should think carefully about which such tools will best fit with their planned training event to enhance desired outcomes. Finally , the LTM suggests there m ay be other individual diffe rence s and environmental factors to consider when choosing who might be a good candidate for a given training event. 169 for training i s assessed, which includes p erson al characteristics such as ability, attitudes, personality, and motivation, and if their work environment will facilitate the desired outcomes (Langdon, 1997; Noe, 2017; Rummler, 1996). Some of these characteristics are dir ectly informed by the LTM. F or ex ample, we saw an interesting interplay between goals and the outcomes of training which suggests that individuals with extremely high goals might not be good fits for trainings which do not allow them to reach said goals. R ather, the focus should be o n ind ividuals whose current goals match well with what the training is offering. Further, we know individuals who are learning, or mastery in other nomenclature, oriented are focused on increasing their ability on their targeted tasks and this leads to imp roved performance outcomes over time (e.g., Dweck, 1986; Elliott, 1999; Payne, Youncourt, & Beaubein , 200 ), and part of doing so tends to be a greater willingness to explore the task for better solutions, leading to poorer perfo rmance early in those tasks but g reater success over time (e.g., Bell & Kozlowski, 2008 ). In a similar vein, the present model shows that moderate levels of exploration in response mes. Thus, the model reinfor ces t he potential importance of targeting individuals who are learning oriented for training interventions, or even adding a new measure directed specifically at their willingness to search for better task approaches in the face of adversity. Finally, on t he en vironmental side, we want to ensure that not only do trainees have the theoretical opportunities to apply their training in the sense that the correct situations present themselves, we want to ensure those trainees have the time and ability to think m ore d eeply about the situation and engage their type 2 cognitive processes to improve the chances that they will make the correct decision regarding whether or not to use their training. 170 Conclusion The Learning Transfer Model introduced in this paper ha s four central aims. First and foremost, it provides a formal process - oriented theory which has the potential to unify many current effects in the transfer literature under a single umbrella. Second, it further integrates multipl e important theories across disci plines both from within and outside of psychology . Additionally , the LTM brings important formal models of reinforcement learning, and dual process models of cognition further into organizational psychology. Finally, the LT M was instantiated into a co mputa tional model to provide a powerful tool for future theoretical development and practical application . The present work is not meant to be the final word on any of the theories incorporated into the LTM, or even be the final word on the mechanisms driv ing t ransfer in organizational contexts over time. Instead, the LTM as presented here is meant to provide a plausible and parsimonious model of the transfer process to drive future research and practice. To that end, o ver the co urse of several virtual expe rimen ts, the overall generative sufficiency of the model was largely established , although pieces of the model were falsified and subsequently revised , and novel implications of the model were explored. Substantial work remains to fully validate the presen t mod el against real world observations, which will inevitably lead to various tweaks to the underlying mathematics driving the proposed mechanisms in the LTM. However, the model established in this paper represents a su bstantial step in a formal process m odel of the transfer process. 171 Table 1 . Model 1 Variables Variable Definition a Policy A b Policy B R a True reward for Policy A R b True reward for Policy B Q t ( a ) Value estimate for Policy A at time t Q t ( b ) Value estimate for Polic y B at time t R ta Reward received at time t given Policy A R tb Reward received at time t given Policy B Q 1 ( a ) Initial value estimate for Policy A Q 1 ( b ) Initial value estimate for Policy B P t Policy chosen at time t E Error rate in choosing most valua ble policy, also referred to as exploration S 2 Probability of activating System 2 decision process Z t (a) Probability of choosing to apply Policy A automatically in system 1 L Number of times an agent has attempted their new policy in practice be fore entering the transfer environment I Effect of forming an implementation intention to activate Policy B in the presented situation 172 Table 2 . Model 1 Equations. Equation Definition Value estimate at time t + 1 for Policy A where t a is the number of times Policy A has been applied Value estimate at time t + 1 for Policy B where t b is the number of times Policy B has been applied Policy chosen at t is the maximum expected value from policies a and b with a probability of 1 E given the use of System 2 Probability of choosing to apply Policy A automatically in system 1 as calculated from the number of tim es that policy has been chosen out of poss ible applications and accounting for implementation intentions 173 Table 3 . Overall results for practice effect on behavioral transfer and performance change in Model 1. Number Practice attempts Behavioral Transfer Performance Chang e 0 .4 7 . 43 25 .4 9 . 48 50 . 48 . 31 75 .5 0 . 33 100 .5 1 . 38 125 .5 4 . 54 150 .5 7 . 54 175 . 55 . 63 200 .5 5 . 54 174 Table 4 . Experimental comparisons of practice conditions to control for behavioral transfer and performance change in Model 1. Number Practice attempts Behavioral Transfer Performance Change 25 .2 5 . 01 50 . 08 . 04 75 . 51 . 02 100 .5 6 . 05 125 . 97 . 12 150 1.50 . 14 175 1.15 . 14 200 1.25 . 14 175 Table 5 . Initial policy value estimate effects on behavioral transfer and performance change in Model 1. Initial Policy B Estimate Behavioral Transfer Pre - Post Performance (d) .00 .4 3 . 17 .05 . 42 . 21 .10 .4 4 . 23 .15 . 43 . 19 .20 . 44 .2 8 .25 . 45 . 43 .30 . 44 . 38 .35 . 43 . 33 .40 . 42 .3 0 .45 .4 5 . 27 .50 . 46 .3 5 .55 . 44 . 40 .60 . 45 . 28 .65 . 42 .2 7 .70 . 43 . 31 .75 .4 5 .36 .80 .4 2 . 30 .85 .4 5 .36 .90 . 45 . 31 .95 . 45 . 39 1.00 . 47 . 41 176 Table 6 . Implementation level effects on behavioral transfer and performance chan ge in Model 1. Implementation Level Behavioral Transfer Pre - Post Performance 0 .43 .31 .05 .45 .17 .10 .47 .34 .15 .50 .41 .20 .51 .39 .25 .54 .52 .30 .56 .33 .35 .55 .40 .40 .57 .32 .45 .59 .50 .50 .60 .47 177 Table 7 . Mo del 2 Variables. Variable Definition G t ( a ) Policy A at time t G t ( b ) Policy B at time t C Level of connected to group of co - learners wQ t (a) Weighted value estimat e for Policy A wQ t (b) Weighted value estimate for Policy B 178 Table 8 . Model 2 Equations. Equation Definition Calculation of the average value estimates of other transfer agents 1 - N as the sum of all the value estimates for each agent, i, divided by the number of agents for Policy A . Calculation of the average value estimates of other tra nsfer agents 1 - N as the sum of all the value estimates for each agent, i, divided by the number of agents for Policy B . Weighted value estimation for Policy A when N > 0 Weighted value estimation for Policy B when N > 0 179 Table 9 . Effects of number of trainees on behavioral transfer and pre - post performance change in Model 2A . Trainees Behavioral Transfer Pre - 1 .43 .28 2 .44 .52 3 .42 .54 4 .44 .68 5 .45 .78 6 .44 .91 7 .44 .86 8 .43 .84 9 .44 1.03 10 .43 .99 11 .44 1.09 12 .44 1.00 13 .44 1.19 14 .44 1.21 15 .43 1.21 16 .44 1.29 17 .44 1.24 18 .44 1.40 19 .44 1.42 20 .44 1.43 180 Table 10 . Connectedness effects on behavioral transfer and pre - post performance change in Model 2A. Connectedness Behavioral Transfer Pre - .00 .44 1.12 .05 .44 1.12 .10 .43 .9 5 .15 .44 .84 .20 .44 1.06 .25 .44 1.06 .30 .43 .84 .35 .43 1.01 .40 .43 .91 .45 .44 1.05 .50 .43 .92 .55 .44 1.03 .60 .43 .95 .65 .44 1.10 .70 .44 .94 .75 .43 1.01 .80 .44 1.08 .85 .44 1.09 .90 .44 1.02 .95 .44 .96 1.00 .44 .88 181 Table 11 . Model 3 Variables. Variable Definition T Goal of target agent Y Performance of target agent D Difference between performance and goal J Decision mechanism, takes 0 if goal is met, 1 if not F How much exploration increas es when V Threshold below which agent will not apply policy 182 Table 12 . Model 3 Equations. Equation Definition Performance Y is the average of all previous ly experienced rewards Difference calculated as the difference performance Error rate in choosing highest valued Policy A s changed by comparison of current performance to goal 183 Table 13 . Three - way interaction models for Experiment 4A. Behavioral Transfer Post Training Performance Pre - d Predictor b t p b t p b t p Constant .280 554.85 < .001 .761 6282.61 < . 001 1.157 16.97 < .001 Intentions .060 .026 20.2 8 < .001 .012 .020 16.9 2 < .001 .233 .016 .58 .560 Threshold - .706 - .283 - 221.4 1 < .001 - .107 - .168 - 139.11 < .001 - 1.973 - .125 - 4.5 8 < .001 Value Change 1.983 .649 507.8 7 < .001 .559 .721 595.95 < .001 1 4.149 .733 26. 80 < .001 Intentions*Threshold - .221 - .015 - 11.84 < .001 - .029 - .008 - 6.43 < .001 - .409 - .004 - .16 .871 Intentions*Value Change .361 .020 15.7 9 < .001 .109 .024 19.87 < .001 2.597 .023 .84 .401 Threshold*Value Change - 2.139 - .111 - 86.6 5 < .001 - .603 - .123 - 101.61 < .001 - 11.008 - .090 - 3. 30 .001 Intentions*Threshold*Value Change - .275 - .002 - 1.90 .057 - .128 - .004 - 3.69 < .001 - 1.576 - .002 - .08 .936 Dfs for all models are 8, 296991 184 Table 14 . Three - way interaction m odels for Experiment 4B. Behavioral Transfer Post Training Performance Predictor b t p b t p Constant .712 4602.11 < .001 .247 141606.60 < .001 Trainees .000 - .145 - 699. 16 < .001 - .007 - .199 - 373.1 3 < .001 Conformity - .024 - .556 - 2690. 95 < .001 - .476 - .776 - 1434.90 < .001 Goals .003 .068 325. 94 < .001 .058 .093 174.56 < .001 T rainees*Conformity .000 - .055 - 267.85 < .001 - .008 - .076 - 141.4 3 < .001 Trainees*Goals .000 - .001 - 153. 55 < .001 .000 - .001 - 81.51 < .001 Conformity*Goals - .004 - .032 - 3. 19 .001 - .090 - .044 - 2.39 .017 Trainees*Conformity*Goals .000 - .007 - 33.35 < .001 - .003 - .009 - 17.2 6 < .001 Dfs for all models are 8, 4409991 185 Table 15 . Three - way interaction models for Experiment 4C. Behavioral Transfer Post Training Performance Pre - d Predictor b t p b t p b t p Constant .223 2739. 29 < .001 .726 33146.55 < .001 .736 44.91 < .00 1 Conformity - .473 - .670 - 1754. 87 < .001 - .056 - .369 - 779.08 < .001 - 2.690 - .381 - 49.69 < .00 1 Goals .031 .044 115. 48 < .001 - .002 - .016 - 34.1 9 < .001 - .070 - .010 - 1.29 .196 Value Change .964 .553 1448. 33 < .001 .269 .711 1502. 67 < .001 12.945 .742 96.7 5 < .00 1 Conformity*Goals - .041 - .018 - 46.54 < .001 .007 .014 29. 73 < .001 .260 .011 1.4 6 .146 Conformity*Value Change - 2.181 - .379 - 991.81 < .001 - .572 - .459 - 96 9. 18 < .001 - 27.448 - .476 - 62.1 1 < .00 1 Goals*Value Change - .272 - .047 - 123. 84 < .001 .008 .006 13.10 < .001 .940 .016 2.1 3 .034 Conformity* Goals*Value Change .607 .032 83.59 < .001 .010 .002 5. 27 < .001 - .673 - .004 - .46 .645 Dfs for all models are 8, 661491 186 Table 16 . Three - way interaction models for Experiment 4D. Behavioral Transfer Post Training Performance Pre - d Predictor b t p b t p b t p Constant .145 3982.6 5 < .001 .707 120782.0 7 < .0 01 - .137 - 176.95 < .001 Type 2 .288 .542 2386.4 5 < .001 .014 .302 743. 90 < .001 .699 .608 272.84 < .001 Conformity - .314 - .593 - 2608.9 1 < .001 - .016 - .330 - 811.81 < .001 - .763 - .663 - 297.75 < .001 Goals .036 .068 297.86 < .001 .002 .038 92.34 < .001 .0 88 .077 34.4 8 < .001 Type2*Conformity - .580 - .331 - 1456.80 < .001 - .029 - .185 - 454.8 9 < .001 - 1.401 - .369 - 165.6 < .001 Type 2*Goals .053 .030 133.9 1 < .001 .003 .017 41.4 2 < .001 .143 .038 16.88 < .001 Conformity*Goals - .061 - .035 - 152.4 0 < .001 - .003 - .020 - 48.2 1 < .001 - .141 - .037 - 16.62 < .001 Type 2*Conformity*Goals - .070 - .012 - 53.06 < .001 - .004 - .007 - 17.09 < .001 - .184 - .015 - 6.59 < .001 Dfs for all models are 8, 4630491 187 Figure 1 . Conceptual model for initial LTM. 188 Figure 2 . Behavioral Transfer for exploration of policy values in Model 1. 189 Figure 3 . 190 Figure 4 . Behavioral Transfer for exploration of policy value changes in Model 1. 191 Figure 5 . Note: the white rectangle on t he right is because the v alue is undefined since their performance pretraining was always perfect there is no variability on which to calculate an effect size. 192 Figure 6 . Behavioral Transfer for exploration of burn - in and transfer times in Model 1. 193 Fig ure 7 . Performance change for exploration of burn - in and transfer times in Model 1. 194 Figure 8 . Predicting behavioral transfer from type 2 processing likelihood in Model 1 . 195 Figure 9 . Predicting performance change from type 2 processing likelihood in Model 1. 196 Figure 10 A - D. Example transfer trajectories for Model 1. A. B. C. D. 197 Figure 11 . Exploration rate effect on behavioral transfer in Model 1. 198 Figure 12 . Exploration rate effect on performance change in Model 1. 199 Figure 13 . Type 2 likelihood vs implementation intention experimental effect on behavioral transfer in Model 1. 200 Figure 14 . Type 2 likelihood vs implementation intention experimental effect on performance change in Model 1. 201 Figure 15 . Type 2 likelihood vs implementation intention experimental effect on beh avioral transfer in Model 1 heat map. 202 Figure 16 . Type 2 likelihood vs implementation intention experimental effect on post training performance in Model 1 heat map. 203 Figure 17 . Type 2 likelihood vs i mplementation intention experimental effect on performance change in Model 1 heat map. 204 Figure 18 . Proposed conceptual model for LTM with Social Learning. 205 Figure 19 . Heatmap of interaction effect of number of trainees and connectedness on behavioral transfer in Model 2A. 206 Figure 20 . Number o f trainees and level of imitation predicting behavioral transfer in Model 2B (replication level). 207 Figure 21 . Number of trainees and level of imitation predicting post training performance in Model 2B (replication level). 208 Figure 22 . Number of trainees and level of imitation predicting pre - post training performance in Model 2B (cond ition level). 209 Figure 23 . Heatmap of trainees and imitation predicting behavioral transfer i n Model 2B. 210 Figure 24 . Heatmap of trainees and imitation predicting post training performance in Model 2B. 211 Figure 25 . Heatmap of trainees and imitation predicting pre - post performance change in Model 2B. 212 Figure 26 . Number of trainees and level of conformity predicting behavioral transfer in Model 2C (rep lication level). 213 Figure 27 . Number of trainees and level of conformity predicting post training performance in Model 2C (replication level). 214 Figure 28 . Number of trainees and level of conformity pre dicting pre - post performance change in Model 2C (condition level). 215 Figure 29 . Heat map of number of trainees and level of conformity predicting behavioral transfer in Model 2C. 216 Figure 30 . Heat map of number of trainees and level of conformity predicting post training performance in Model 2C. 217 Figure 31 . Heat map of number of trainees and level of conformity predicting pre - post performance change in Model 2C. 218 Figure 32 . Conceptual model for LTM including self - regulation. 219 Figure 33 . Goal level and exploration rate change predicting post training performance in Model 3A (replication level). 220 Figure 34 . Goal level and exploration rate change p redicting behavioral transfer in Model 3A (replication level). 221 Figure 35 . Goal level and exploration rate change predicting pre - post performance change in Model 3A (condition le vel). 222 Figure 36 . Heat map of goal level and exploration rate change predicting behavioral transfer in Model 3A. 223 Figure 37 . Heat map of goal level and exploration rate change predicting post training performance in Model 3A. 224 Figure 38 . Heat map of goal level and exploration rate change predicting pre - post performance change in Model 3A. 225 Figure 39 . Observed post training performance by goal level in Model 3B - 1. Note scale intenti onally not starting at 0 to show sudden shift in percentages more clearly. 226 Figure 40 . Observed behavioral transfer by goal level in Model 3B - 1. 227 Figure 41 . Observed pre - post performance change by goal level in Model 3B - 1. 228 Figure 42 . Goal level and policy value change predicting behavioral transfer in Model 3B - 1 (replication level). 229 Figure 43 . Goal level and pol icy value change predicting pos t training performance in Model 3B - 1 (replication level). 230 Figure 44 . Goal level and policy value change predicting pre - post performance change in Model 3B - 1 (condition level). 231 Figure 45 . Heat map of goal level and policy value change predicting behavioral transfer in Model 3B - 1. 232 Figure 46 . Heat map of goal level and policy value change predicting post training performance in Model 3B - 1. 233 F igure 47 . Heat map of goal level and policy value change predicting pre - post performance change in Model 3B - 1. 234 Figure 48 . Observed post training performance by goal level in Model 3B - 2. Note scale int entionally not starting at 0 to show sudden shift in percentages more clearly. 235 Figure 49 . Observed behavioral transfer by goal level in Model 3B - 2. 236 Figure 50 . Observed pre - post perform ance change by g oal level in Model 3B - 2. 237 Figure 51 . Goal level and policy value change predicting behavioral transfer in Model 3B - 2 (replication level). 238 Figure 52 . Goal level and policy value change predicting post training performance in Model 3B - 2 (replication level). 239 Figure 53 . Goal level and policy value change predicting pre - post performance change in Model 3B - 2 (condition level). 240 Figure 54 . Heat map of goal level and policy value change predicting behavioral transfer in Model 3B - 2. 241 Figure 55 . Heat map of goal level and policy value change predicting post training performance in Model 3B - 2. 242 Figure 56 . Heat map of goal level and policy value change predicting pre - post performance change in Model 3B - 2. 243 Figure 57 . Observed and predicted behavioral transfer from threshold level in Model 3C. 244 Figure 58 . Observed and predicted post training performance from threshold level in Model 3C. Note scale intentionally not starting at 0 to show shift in percentages more clearly. 245 Figure 59 . Observed an d predicted pre - post performance change from threshold level in Model 3C. 246 Figure 60 . Three - way interaction of engagement thresholds, implementation intentions, and value change predicting behavioral transfer in Experiment 4A (replication level). 247 Figure 61 . Three - way interaction of engagement thresholds, implementation intentions, and value change predicting post training performance in Experiment 4A (replication level). Note: Y axis does not start at 0 to better highlight eff ect 248 Figure 62 . Three - way interaction of engagement thresholds, implementation intentions, and value change predicting pre - post training performance change in Experiment 4A ( condition level). 249 Figure 63 . Heat map of three - way interaction of engagement thresholds, implementation intentions, and value change predicting behavioral transfer in Experiment 4A (replication level). 250 Figure 64 . Heat map of three - way interaction of en gagement thresholds, implementation intentions, and value change predicting post training performance in Experiment 4A (replication level). 251 Figure 65 . Heat map of three - way interaction of engagement thresholds, implementation i ntentions, and value change predicting pre - post training performance change in Experiment 4A (condition level). 252 Figure 66 . Three - way interaction of number of trainees, conformity, and goals predicting behavioral transfer in Ex periment 4B (replication level). 253 Figure 67 . Three - way interaction of number of trainees, conformity, and goals predicting post training performance in Experiment 4B (replication level). Note: Y axis does not start at 0 to better illustrate effect. 254 Figure 68 . Heat maps of three - way interaction of number of trainees, conformity, and goals predicting behavioral transfer in Experiment 4B (replication level). 255 Figure 69 . H eat maps of three - way interaction of number of trainees, conformity, and goals predicting post training performance in Experiment 4B (replication level). 256 Figure 70 . Three - way interaction of conformity, goals, and value change predicting behavioral transfer in Experiment 4C (replication level). 257 Figure 71 . Three - way interaction of conformity, goals, and value change predicting post training performance in Experiment 4C (replication level). Not e: Y axis does not start at 0 to better highlight effect 258 Figure 72 . Three - way interaction of conformity, goals, and value change predicting pre - post training performance change in Experiment 4C (condition level). 259 Figure 73 . Heat map of three - way interaction of conformity, goals, and value change predicting behavioral transfer in Experiment 4C (r eplication level). 260 Figure 74 . Heat map of three - way interaction of conform ity, goals, and value change predicting post training performance in Experiment 4C (replication level). 261 Figure 75 . Heat map of three - way interaction of conformity, goals, and value change predicting pre - post training perfor mance change in Experiment 4C (condition level). 262 Figure 76 . Three - way interaction of type 2 likelihood, conformity, and goals predicting behavioral transfer in Experiment 4D (replication level). 263 Figure 77 . Three - way interaction of type 2 likelihood, conformity, and goals predicting post training performance in Experiment 4D (replication level). Note: Y axis does not start at 0 to better highlight effect 264 Figure 78 . Three - way interaction of type 2 likelihood, conformity, and goals predicting pre - post training performance change in Experiment 4D (condition level). 265 Figure 79 . Heat map of three - way interaction type 2 likelihood, conform ity, and goals predicting behavioral transfer in Experiment 4D (replication level). 266 Figure 80 . Heat map of three - way interaction of type 2 likelihood, conformity, and goals predicting post training performance in Experiment 4D (replication level). 267 Figure 81 . Heat map of three - way interaction of type 2 likelihood, conformity, and goals predicting pre - post training performance change in Experiment 4D (condition level). 268 AP P ENDICES 269 Appendix A : Study 1 Environment and Code Figure 82 . Snapshot of the modeling environment for Study 1 in NetLogo. 270 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B system1 _ choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;reward received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;number of times successful only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_rat e ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into transfer time ] globals [ mean_value_estimate_a ;mean of agent value estim a tes for Policy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pretraining only all agents mean_posttraining_succes s _rate ;success rate posttraining only all agents mean_behavi oral_transfer_rate ;rate of choosing Policy B in transfer environment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;cle a rs environment from previous simulation create - trainees num - trainees [ ;place specified number of agents at center of grid set value_estimate_a initial_policy_a_estimate ;set initial value estimate for Policy A for each trainee set value_estimat e _b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial set t o 0 set task_succ e sses 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;success rate pretraining only initial set to 0 set post_training_successes 0 ;number of successes for post training initial set to 0 set pos ttraining_success_rat e 0 ;success rate in posttraining environment initial set to 0 set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0 breed [trai nees trainee] ;types o f agents allowed in environment Algorithm 13 . NetLogo Code for Study 1 Model 271 ] set true_policy_b_reward (true_policy_a_reward + change_in_value) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - training] ;ca ll subroutine to save post training variables if ticks = (burn_in + transfer_time) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in period if tick s > burn_in [trainees - transfer] ;call subroutine for trainee decisions post training if ticks = burn_in [save - burn - in] ;call subroutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used to track sim functioning end to t r ainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_po l icy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;calcul ate all global variables used to track sim functioning set mean_value_estimate_a mean [ v alue_estimate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees set me a n_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees end to train ees - transfer ;call routine to choose which system will drive task system - choose ask t r ainees [set transfer_time_count (ticks - burn_in)] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_count + .000001) ) ] 272 end to system - choose ;decide if system2 will intervene, if not, rely on system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < syste m2_activation_liklihood [system2_decision] if system_choose >= system2_activation_likli h ood [system1_decision] ] end to system1_decision ;agent makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + practice_attempts + .000001)) - implementation_intention) ;update habitual decision rate ;note: all additions of .000001 are to avoid divisions by 0, number small so as n ot to affect simulation let choose_a random 100 / 100 ;generate random number to determine which policy to implement ifelse choose_a < sy s tem1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success _a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task s uccess set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_p o licy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attemp ts_policy_a + 1 ;update count on Policy A choice ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_succe sses (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on ta s k success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value _estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate _b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B set attem p ts_policy_b attempts_policy_b + 1 ;update count on Policy B choice ] ] end 273 to system2_decision ;default to system 2 using highest value estimated policy except at some error rate let e - greedy random 100 / 100 ifelse e - greedy < exploration _ rate [ run_low_value ] [ run_high_value ] end to save - burn - in ;save pretraining performa nce ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest es t imated value ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successe s (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_es t imate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_est imate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a ( value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A c hoice ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if s uccessful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts o n task success set post_training_successes (post_training_successes + 1) ;update c ounts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (att e mpts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_pol i cy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if val ue_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice ] ] end to run_high_v a lue ;subroutine to choose and execute policy with highest estimated value 274 ifelse value_estimate_ a >= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set rewar d _a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts o n task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_polic y _a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (att empts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 a n d update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_pol icy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attem p ts_policy_a attempts_policy_a + 1 ;update count on Policy A choice ] ] [let success_b r andom 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_trainin g_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choic e set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;upd a te count on Policy B choice ] ] end to save - post - training ;save post training performance variables ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + .000001))] ask trainees [set behavioral_trans fer_rat e (attempts_policy_b / (transfer_time + .000001))] end 275 Appendix B : Study 2A Environment and Code Figure 83 . Snapshot of the modeling environment for Study 2A in NetLogo. 276 trainees - own [ value_estimate_a ;estimated valu e of Policy A value_estimate_b ;estimated value of Policy B system1_choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;reward re c eived on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;number of times successful only post - training pretraining _ success_rate ;success rate pretraining only posttraining_su ccess_rate ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into transfer time other_agent_estimate_a ;value estimate of other agents in model for Policy A other_agent_estimate_b ;value estimate of other agents in model for Policy B grouped_value_estimate_a ;combined value estimate of target agent and other agents for Pol i cy A grouped_value_estimate_b ;combined value estimate of t arget agent and other agents for Policy B ] globals [ mean_value_estimate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pretraining only all agents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rate ;rat e of choosing Policy B in transfer environment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ ;place speci f i ed number of agents at center of grid set value_estimate_a initial_policy_a_estimate ;set initial value estimate for Policy A for each trainee set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each traine e set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial set to 0 breed [trainees trainee] ;types of agents allowed in environment Algorithm 14 . NetLogo Code for Study 2A Model 277 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ; s uccess rate pretraining only initial set to 0 set post_training_successes 0 ;number of successes for post training initial set to 0 set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 set behavioral_tran s f er_rate 0 ;percentage of time choosing trained policy initial set to 0 ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_value) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if tru e _policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - training] ;call subroutine to save post training variables if tick s = (bur n _in + transfer_time) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in period if ticks > burn_in [trainees - transfer] ;call subroutine for tr ainee de c isions post training if ticks = burn_in [save - burn - in] ;call subroutine to save pretraining performance ifelse num - trainees > 1 [pool_experiences] [no_pool_experiences] ;set group estimate depending on if more than 1 agent or not update - glob als ;cal l subroutine to calculate all global variables used to track sim functioning end to trainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward _a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_policy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;c alculate all global variables used to track sim functioning set mean_value_estimate_a mean [value_estimate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks s et mean_pretraining_success_rate mean [pretraining_success_rate] of trainees set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees 278 end to trainees - transfer ;call routine to choose which system will drive task system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask trainees [set be havioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [se t posttraining_success_rate (post_training_successes / (transfer_time_count + .000001))] end to system - choose ;decide if system2 will intervene, if not, rely o n system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < system2_ a ctivation_liklihood [system2_decision] if system_choose >= system2_activation_liklihood [system1_decision] ] end to system1_decision ;agent makes automati c decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_po l icy_a + attempts_policy_b + practice_attempts + .000001)) - implementation_intention) ;update habitual decision rate ;note: all additions of .000001 are to avo id divisions by 0, number small so as not to affect simulation let choose_a random 100 / 100 ; g enerate random number to determine which policy to implement ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, deter mine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful r eceive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy valu e estimate s et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice ] ] [let success _b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_tra ining_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice 279 set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (rewa rd_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001 ) ) * (reward_b - val ue_estimate_b))) ;update value estimate for Policy B set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice ] ] end to system2_decision ;default to system 2 using highest value estimated policy excep t at some error rate let e - greedy random 100 / 100 ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ] end to save - burn - in ;save pretraining performance ask trainees [set pretraining_success_rate (task_successes / (burn_in + .0 0 0001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_p o licy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on t ask success set attempt s _policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuc c essful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if val ue_estimate_a < 0 [set value_es t imate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_ reward [set reward_b 1 ;if succ e ssful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy _b attempts_policy_b + 1 ;updat e count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update poli c y value estimate 280 set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b a t tempts_policy_b + 1 ;update count on Policy B choice ] ] end to run_high_value ;subroutine to choose and execute policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if Policy A ch o sen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_succes ses + 1) ;update counts on task success set post_training_successes (post_training_succ e sses + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (valu e_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;updat e value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimat e_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value esti m ate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ife l se success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task succe s s set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [se t reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .00000 1)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estima t e_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice ] ] end to save - post - training ;save post training performance variables ask trainees [set posttraining_success_rate (post_traini n g_successes / (transfer_time + .000001))] 281 ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to pool_experiences ;pool experiences from all agents for decision making ask trainees [set other_agent_estimate _ a (mean [value_estimate_a] of other traine es)] ask trainees [set other_agent_estimate_b (mean [value_estimate_b] of other trainees)] ask trainees [set grouped_value_estimate_a (((1 - connectedness)*(value_estimate_a))+(connectedness * other_agent_esti m ate_a))] ask trainees [set grouped_value _estimate_b (((1 - connectedness)*(value_estimate_b))+(connectedness * other_agent_estimate_b))] end to no_pool_experiences ;if only 1 agent then group estimate is equal to personal estimate ask trainees [set g r ouped_value_estimate_a (value_estimate_a)] ask trainees [set grouped_value_estimate_b (value_estimate_b)] end 282 Appendix C : Study 2B Environment and Code Figure 84 . Snapshot of the modeling environment for Study 2B in NetLogo. 283 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B syste m1_choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_po l icy_b ;number time applied Policy B reward_a ;reward received on most recent attempt with Policy A reward_b ;reward receiv ed on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;n u mber of times successful only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_r ate ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transf e r environment transfer_time_count ; ticks into transfer time chose_b ;track behavioral choice of last task attempt, 0 = ch ose a, 1 = chose b other_success_rate ;success rate of most successful other trainee imitate_choice ;track decision to imitat e on each time step other_chose_b ;behavioral choice of most successful other trainee ] globals [ mean_value_estimate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_ta s k_success ;task rat e of success for full simulation mean_pretraining_success_rate ;success rate pretraining only all agents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rate ;rate of choosing Pol i cy B in transfer en vironment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ setxy random - xcor random - ycor ;place specified nu mber of agents at random coordinates set value_estimate_a initial_policy_a_estimate ;set initial value estimate for Policy A for each trainee set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial set to 0 set task_successes 0 ;number of task successes initial set to 0 set pretrainin g _success_rate 0 ;su ccess rate pretraining only initial set to 0 breed [trainees trainee] ;types of agents allowed in environment Algo rithm 15 . NetLogo Code for Study 2B Model 284 set post_training_successes 0 ;number of successes for post training initial set to 0 set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 se t behavioral_transfe r_rate 0 ;percentage of time choosing trained policy initial set to 0 set chose_b 0 ;set choice tracker to default of Policy A set other_success_rate 0 ;setup success rate of most successful other trainee set other_chose_b 0 ;setup choice made by other most successful trainee ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_value) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - training] ;call subroutine to save post training variables if ticks = (burn_in + transfer_tim e ) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in period if ticks > burn_in [trainees - transfer] ;call subroutine for trainee decisions post traini n g if ticks = burn_in [save - burn - in] ;call subroutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used to track sim functioning end to trainees - burn - in ;agents engage in work task during burn in a sk trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [set reward _a 0] set attempts_policy_a (attempts_policy_a + 1) set value_estimate_a (valu e _estimate_a + ((1 / attempts_policy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;calculate all global variables used to track sim functioning set mea n_value_estimate_a mean [value_estimate_a] of trainees set mean_value_estimate_b mea n [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behaviora l_transfer_rate mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will drive task 285 system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask tra i nees [set behavioral_transfer_ra te (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_count + .000001))] end to system - choose ;decide if system2 will intervene, i f not, rely on system 1 ask tra inees [ let system_choose (random 100 / 100) if system_choose < system2_activation_liklihood [system2_decision] if system_choose >= system2_activation_liklihood [system1_decision] ] end to system1_decision ;agent m a kes automatic decision about whi ch policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + practice_attempts + .000001)) - implementation_intention) ;update habitual decision rate ;note: all additions of .0000 0 1 are to avoid divisions by 0, n umber small so as not to affect simulation let choose_a random 100 / 100 ;generate random number to determine which policy to implement ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A c hosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_su c cesses + 1) ;update counts on ta sk success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;upd a te value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate s et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value es t imate for Policy A set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse su c cess_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;upd ate counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set rewa r d_b 0 ;if unsuccessful set reward to 0 and update policy value estimate 286 s et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B set attempts_policy_b a t tempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;default to system 2 using highest value estimated policy except at some error rate ifelse num - trainees > 1 [ r un - imitate ;have trainee choose if it will imitate or not if there are other trai nees if imitate_choice = 0 [let e - greedy random 100 / 100 ;if not imitating run egreedy as normal ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_va l ue ]] ] [ let e - greedy random 100 / 100 ;run choice with some degree of err or ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ] ] end to save - burn - in ;save pretraining performance ask trainees [set pretraining_success_ r ate (task_successes / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if succes s ful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (pos t_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimat e_a))) ] ;update value estimate for Poli c y A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;up date value estimate for Policy A i f value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 10 0 / 100 ;if Policy B chosen, determine i f successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_succes ses (post_training_successes + 1) ;updat e counts on task success 287 set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value _estimate_b))) ] ;update value estimate f or Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to run_high_value ;subroutine to choose and execute p o licy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a ran dom 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_ successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate s et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_est imate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;upda t e count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let succe ss_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive rew a rd set task_successes (task_successes + 1) ;update counts on task success set post_t raining_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (re ward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end 288 to save - post - training ;save post training performance variables ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + . 000001))] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to run - imitate ;make imitate decision based on specified rate and execute let imitate_yes random 100 / 100 ifelse imitate_yes <= imitate [set i mitate_choice 1] [set imitate_choice 0] set other_chose_b [chose_b] of other trainees with - max [posttraining_success_rate] if imitate_choice = 1 [ ifelse other_chose_b = 0 [let success_a random 100 / 100 ;if Policy A chosen, determine if successfu l ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if v a lue_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if s u ccessful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update c o unts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + . 000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] ] end 289 Appendix D : Study 2C Environment and Code Figure 85 . Snapshot of the modeling environment for Study 2C in NetLogo. 290 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B system1_choose_a ;liklihood of choosing P o licy A as habitual response at tempts_policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;reward received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Polic y B task_successes ;number of times successful at task overall post_training_successes ;number of times successful only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_rate ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into transfer time chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b other_success_r a te ;success rate of most successful other trainee conform_choice ;track decision to conform on each time step other_chose_b ;behavioral choice of most successful other trainee ] globals [ mean_value_estimate_a ;mean of agent value estimates for P o licy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pret raining only all agents mean_posttraining_success_rate ;su c cess rate posttraining only all agents mean_behavioral_transfer_rate ;rate of choosing Policy B in transfer environment all agents true_policy_b_reward ;reward for Policy B after adjusting fo r policy value change ] to setup clear - all ;clears enviro n ment from previous simulation create - trainees num - trainees [ setxy random - xcor random - ycor ;place specified number of agents at random coordinates set value_estimate_a initial_policy_a_esti mate ;set initial value estimate for Policy A for each train e e set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial se t to 0 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;success rate pretraining only initial set to 0 breed [trainees tra inee] ;types of agents allowed in environment Algorithm 16 . NetLogo Code for Study 2C Model 291 set post_training_successes 0 ;num ber of successes for post training initial set to 0 set p osttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0 set chose _b 0 ;set choice tracker to default of Policy A set othe r _success_rate 0 ;setup success rate of most successful other trainee set other_chose_b 0 ;setup choice made by other most successful trainee ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_ v alue) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - tr a ining] ;call subroutine to save post training variables if ticks = (burn_in + transfer_time) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in perio d if ticks > burn_in [trainees - transfer] ;call subroutine for trainee decisions post training if ticks = burn_in [save - bu rn - in] ;call subroutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used to track sim functioning end to trainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a rand om 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_policy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_po licy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;calculate all global variables used to track sim functioning s et mean_value_estimate_a mean [value_estimate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_su c cess_rate] of trainees set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will dr i ve task 292 system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_count + .000001))] end to system - choose ;decide if system2 will intervene, if not, rely on system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < system2_activation_liklihood [system2_decision] if system_ch o ose >= system2_activati on_liklihood [system1_decision] ] end to system1_decision ;agent makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + practice_attempts + .0000 0 1)) - implementation_in tention) ;update habitual decision rate ;note: all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation let choose_a random 100 / 100 ;generate random number to determine which policy to im p lement ifelse choos e_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_succes s es + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (valu e _estimate_a + ((1 / (at tempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimat e _a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_es timate_b + ((1 / (attempts_policy_b + .000 0 01)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate 293 set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (r e ward_b - value_estimate_b))) ;update value estimate for Policy B set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;default to system 2 us i ng highest value estimated policy except at some error rate ifelse num - trainees > 1 [ run - conform ;have trainee choose if it will conform or not if there are other trainees if conform_choice = 0 [let e - greedy random 100 / 100 ;if not imitating r u n egreedy as normal ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ]] ] [ let e - greedy random 100 / 100 ;run choice with some degree of error ifelse e - greedy < exploration_r ate [ run_low_value ] [ run_high_value ] ] end to save - burn - in ;save pretraining performance ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a < = value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on t ask success set post_training_successes (pos t_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attem p ts_policy_a + .000001)) * (reward_a - value_estimat e_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_polic y _a + .000001)) * (reward_a - value_estimate_a))) ;up date value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 10 0 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update co u nts on task success set post_training_succes ses (post_training_successes + 1) ;update counts on task success 294 set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value _estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attemp t s_policy_b + .000001)) * (reward_b - value_estimate_ b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;updat e choice to Policy B ] ] end to run_high_value ;subroutine to choose and execute policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a atte mpts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set rew ard to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_ e stimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if succes s ful ifelse success_b < true_policy_b _reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_polic y_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Poli c y B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B i f value_estimate_b < 0 [set value_estimate_ b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end 295 to save - post - training ;save post training performance variabl e s ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + .000001))] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to run - conform ;make conform decision based on specif i ed rate and execute let conform_yes random 100 / 100 ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose if conforming or not set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees t h at applied b on last step let majority_rule other_chose_b / num - trainees if conform_choice = 1 [ ifelse majority_rule < .50 [let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [ se t reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a atte m pt s_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set rew a rd to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if val ue_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b _ reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_polic y _b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + . 000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_ b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] 296 ] end 297 Appendix E : Study 3A Environment and Code Figure 86 . Snapshot of the mo d eling environment for Model 3A in NetLogo. 298 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B system1_choose_a ;liklihood of choosing Policy A as habitual response attempts _policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;reward received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times s uccessful at task overall post_training_successes ;number of times successful only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_rate ; percentage of times successful in post - training environment behavio ral_trans f er_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into transfer time chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b other_success_rate ;success rate of most successful ot her train e e conform_choice ;track decision to conform on each time step other_chose_b ;behavioral choice of most successful other trainee goal_difference ;difference between performance goal and actual performance j_goal_check ;is the agent short of goal or not? exploration_rate ;each trainees have own exploration rate ] globals [ mean_value_estimate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Pol icy B mean_overall_task_success ;task rate o f success for full simulation mean_pretraining_success_rate ;success rate pretraining only all agents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rat e ;rate of choosing Policy B in transfer envi r onment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ setxy random - xcor random - ycor ;place specified numb e r of agents at random coordinates set value_estimate_a initial_policy_a_estimate ;set initial value estimate for Policy A for each trainee set value_estimate_b initial_policy_b_estimate ;set initial val ue estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 breed [trainees trainee] ;types of agents allowed in environment Algorithm 17 . NetLogo Code for Model 3A 299 set attempts_policy_b 0 ;number time applied Policy B initial set to 0 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;succ e ss rate pretraining only initial set to 0 set post_training_successes 0 ;number of successes for post training initial set to 0 set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 set behavioral_transfer_ r ate 0 ;percentage of time choosing trained policy initial set to 0 set chose_b 0 ;set choice tracker to default of Policy A set other_success_rate 0 ;setup success rate of most successful other trainee set other_chose_b 0 ;setup choice made by other most successful trainee set exploration_rate exploration_rate_0 ;set initial exploration rate ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_value) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines a ctivated if ticks = (burn_in + transfer_time) [save - post - training] ;call subroutine to save post tra i ning variables if ticks = (burn_in + transfer_time) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call s ubroutine to have trainee engage in task during burn in period if ticks > burn_in [trainees - transfer ] ;call subroutine for trainee decisions post training if ticks = burn_in [save - burn - in] ;call subroutine to save pretraining performance update - globa ls ;call subroutine to calculate all global variables used to track sim functioning end to trainees - b u rn - in ;agents engage in work task during burn in ask trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_ a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_policy_a (a t tempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;ca lculate all global variables used to track sim functioning set mean_value_estimate_a mean [value_est i mate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees set mean_posttr a ining_success_rate mean [posttraining_success_rate] of trainees 300 set mean_behaviora l_transfer_rate mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will drive task and update decision variables an d trackers system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_count + .000001))] ask trainees [set goal_difference (perform_goa l - (task_successes / ticks))] ask trainees [ ifelse goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)] ] ask trainees [set exploration_rate (exp l oration_rate_0 + (explore_change * j_goal_check))] end to system - choose ;decide if system2 will intervene, if not, rely on system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < system2_activation_liklihood [system2_decisio n ] if system_choose >= system2_activation_liklihood [system1_decision] ] end to system1_decision ;agent makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + practice _ attempts + .000001)) - implementation_intention) ;update habitual decision rate ;note: all additions of .000001 are to avoid divisions by 0, number small so as n ot to affect simulation let choose_a random 100 / 100 ;generate random number to determine w h ich policy to implement ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success _a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_succe s ses (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value _ estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate _ a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attemp ts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] 301 ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if s uccessful receive reward set task_successes (task_successes + 1) ;update counts on tas k success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;up date count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts _ policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update p olicy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choic e set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;defau l t to system 2 using highest value estimated policy except at some error rate ifelse num - trainees > 1 [ run - conform ;have trainee choose if it will conform or not if there are other trainees if conform_choice = 0 [let e - greedy random 100 / 100 ;i f not imitating run egreedy as normal ifelse e - greedy < exploration_rate [ run_low_value ] [ run_hig h_value ]] ] [ let e - greedy random 100 / 100 ;run choice with some degree of error ifelse e - greedy < exploration_rate [ run_low_value ] [ ru n _high_value ] ] end to save - burn - in ;save pretraining performance ask trainees [set pretraining_succ ess_rate (task_successes / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse v alue_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if su ccessful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ; u pdate counts on task success set post_training_successes (post_training_successes + 1) ;update co unts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate _ a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A 302 if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successe s + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_ e stimate_b + ((1 / (attempts_policy_b + . 000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_ b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to run_high_value ;subroutine to choose and execute policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a ran dom 100 / 100 ;if Policy A chosen, determine i f successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_ successes (post_training_successes + 1) ;upda t e counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_est imate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, de t ermine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1 ) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + . 000001)) * (reward_b - value_estimate_b))) ] ;update value e stimate for Policy B 303 [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to save - post - training ;save post training pe r formance variables ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + .000001))] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_ time + .000001))] end to run - conform ;make conform decisio n based on specified rate and execute let conform_yes random 100 / 100 ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose if conforming or not set other_chose _b count other trainees with [chose_b = 1] ;count number of other trainees that applied b on last step let majority_rule other_chose_b / num - trainees if conform_choice = 1 [ ifelse majority_rule < .50 [let success_a random 100 / 100 ;if Policy A ch osen, determine if successful ifelse success_a < true _ policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attem p ts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if uns u ccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001 )) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_ e stimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_ b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success s e t attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice 304 set value_estimate_b (value_estimate_b + ((1 / (attempts_p olicy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [se t value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to P olicy B ] ] ] end 305 Appendix F : Studies 3B - 1 and 3B - 2 Environment and Code Figure 87 . Snapshot of the modeling environment for Models 3B - 1 and 3B - 2 in NetLogo. 306 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B system1_ choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;reward received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;number of times successful only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_rate ; percentage of times successful in p o st - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into transfer time chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b other_success_rate ; s uccess rate of most successful other trainee conform_choice ;track decision to conform on each time step other_chose_b ;behavior al choice of most successful other trainee goal_difference ;difference between performance goal and actual performance j _goal_check ;is the agent short of goal or not? exploration_rate ;each trainees have own exploration rate ] globals [ mean_va lue_estimate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Polic y B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pretraining o nly all agents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rate ; rate of choosing Policy B in transfer environment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ setxy ra n dom - xcor random - ycor ;place specified number of agents at random coordinates set value_estimate_a initial_policy_a_estimate ;set initial value estimate for Policy A for each trainee breed [trainees train ee] ;types of agents allowed in environment Algorithm 18 . NetLogo Code for Model 3B - 1 307 set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial set to 0 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;s uccess rate pretraining only initial set to 0 set post_training_successes 0 ;number of successes for post training initial set to 0 set posttraining_success_rate 0 ;success rate in posttraining environment in i tial set to 0 set behavioral_transf er_rate 0 ;percentage of time choosing trained policy initial set to 0 set chose_b 0 ;set choice tracker to default of Policy A set other_success_rate 0 ;setup success rate of most successful other trainee set other_chose_b 0 ;setup choice made by other most successful trainee set exploration_rate exploration_rate_0 ;set initial exploration rate ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_i n _value) if true_policy_b_reward > 1 [ set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - t raining] ;call subroutine to save post training variables if ticks = (burn_in + transfer_time) [stop] ;control length of sim tick ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in per i od if ticks > burn_in [trainees - trans fer] ;call subroutine for trainee decisions post training if ticks = burn_in [save - burn - in] ;call subroutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used t o track sim functioning end to trainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [s e t reward_a 0] set attempts_p olicy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a - value_estimate_a))) ] end to update - globals ;calculate all global variables used to track sim functioning 308 set mean_value_estimate_a mean [value_estimate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees set mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_ s uccess_rate] of trainees set m ean_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will d rive task and update decision va riables and trackers system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttra i ning_success_rate (post_training _successes / (transfer_time_count + .000001))] ask trainees [set goal_difference (perform_goal - (task_successes / ticks))] ask trainees [ ifelse goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)] ] ask trainees [set exploration_rate (exploration_rate_0 + goal_difference)] end to system - choose ;decide if system2 will intervene, if not, rely on system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < system2_activation _ l iklihood [system2_decision] if system_choose >= system2_activation_liklihood [system1_decision] ] end to system1_decision ;agent makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + a t t empts_policy_b + practice_attempts + .000001)) - implementation_intention) ;update habitual decision rate ;note: all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation let choose_a random 100 / 100 ;generate r a n dom number to determine which policy to implement ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive re w a rd set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A 309 [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on t ask success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_e s timate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B set atte mpts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;default to system 2 using highest value estimated policy except at some error rate ifelse num - trai nees > 1 [ run - conform ;have trainee choose if it will conform or not if there are other trainees if conform_choice = 0 [let e - greedy random 100 / 100 ;if not imitating run egreedy as normal ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ]] ] [ let e - greedy random 100 / 100 ;run choice with some degree of error ifelse e - greedy < exploration_ r ate [ run_low_value ] [ run_high_value ] ] end to save - burn - in ;save pretraining performance ask trainees [set pretr aining_success_rate (task_successes / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward 310 set task_succe s ses (task_succes ses + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value _ estimate_a (valu e_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate _ a (value_estimat e_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice s et chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set ta s k_successes (tas k_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice s e t value_estimate _b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_ e stimate_b (value _estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count o n Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to run_high_value ;subroutine to choose and execute policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if P olicy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_succes ses + 1) ;update counts on task success set post_training_successes (post_tr a ining_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (valu e_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a) ) ) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimat e_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] 311 set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice s et chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 1 00 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (tas k_successes + 1) ;update counts on task success set post_training_successes ( post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate _b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_est i mate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value _estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to save - post - trai n ing ;save post training performance variables ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + .000001))] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to run - co n form ;make conform decision based on specified rate and execute let conform_yes random 100 / 100 ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose if conforming or not set other_chose_b count other trainees with [ch o se_b = 1] ;count number of other trainees that applied b on last step let majority_rule other_chose_b / num - trainees if conform_choice = 1 [ ifelse majority_rule < .50 [let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [se t reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on tas k success set attempts_policy_a attempt s_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A 312 [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value _ estimate_a < 0 [set value_estimate_a 0] se t attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if succe s sful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update count s on t ask success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Pol i cy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B i f val ue_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] ] end 313 trainees - own [ value_estimate_a ;estimated value of Policy A value_estimate_b ;estimated value of Policy B system1_choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_policy_b ;number time applied Policy B reward_a ;rewa r d received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;number of times successful only post - training pretra i ning_success_rate ;success rate pretraining only posttraining_success_rate ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfer_time_count ; ticks into tra n sfer time chose_ b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b other_success_rate ;success rate of most successful other trainee conform_choice ;track decision to conform on each time step other_chose_b ;behavioral choic e of most successfu l other trainee goal_difference ;difference between performance goal and actual performance j_goal_check ;is the agent short of goal or not? exploration_rate ;each trainees have own exploration rate ] globals [ mean_value_esti m ate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pret raining only all a gents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rate ;rate of choosing Policy B in transfer environment all agents true_policy_b_reward ;reward for Policy B after adjusting fo r policy value ch a nge ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ setxy random - xcor random - ycor ;place specified number of agents at random coordinates set value_estimate_a initial_policy_a_esti mate ;set initial value estimate for Policy A for each trainee set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee breed [trainees trainee] ;types of agents allowed in environment Algorithm 19 . NetLogo Code for Model 3B - 2 314 set attempts_policy_a 0 ;number times applied Policy A initial set to 0 set attempts_policy_b 0 ;number time applied Policy B initial set to 0 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;success rate pretraining only initial set to 0 set post_training_successes 0 ;num ber of successes f or post training initial set to 0 set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0 set chose _b 0 ;set choice t racker to default of Policy A set other_success_rate 0 ;setup success rate of most successful other trainee set other_chose_b 0 ;setup choice made by other most successful trainee set exploration_rate exploration_rate_0 ;set i nitial exploratio n rate ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_value) if true_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] reset - ticks ; reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - training] ;call subroutine to save post training variables if tick s = (burn_in + transfer_time) [stop] ;control length of sim tick ;advan c e time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in period if ticks > burn_in [trainees - transfer] ;call subroutine for tr ainee decisions post training if ticks = burn_in [save - burn - in] ;call s u broutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used to track sim functioning end to trainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a random 100 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_policy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (rew a rd_a - value_estimate_a))) ] end to update - globals ;calculate all global variables used to track sim functioning set mean_value_estimate_a mean [value_estimate_a] of trainees se t mean_value_estimate_b mean [value_estimate_b] of trainees set mean_ov e rall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees 315 set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees set mean_behavioral_transfer_ra t e mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will drive task and update decision variables and trackers system - choose ask trainees [set transfer_time_count (ticks - burn_in)] ask traine e s [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfe r_time_count + .000001))] ask trainees [set goal_difference (perform_goal - (task_suc c esses / ticks))] ask trainees [ ifelse goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)] ] ask trainees [set exploration_rate (exploration_ra te_0 + (.5 - goal_difference))] end to system - choose ;decide if system2 will intervene , if not, rely on system 1 ask trainees [ let system_choose (random 100 / 100) if system_choose < system2_activation_liklihood [system2_decision] if system_choos e >= system2_activation_liklihood [system1_decision] ] end to system1_decision ;agen t makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + practice_attempts + .000001) ) - implementation_intention) ;update habitual decision rate ;note: all additions of .0 0 0001 are to avoid divisions by 0, number small so as not to affect simulation let choose_a random 100 / 100 ;generate random number to determine which policy to imple ment ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training _ successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ; u pdate value estimate for Policy A [set r eward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_ a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A 316 ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set r e ward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B set attempts_policy_ b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;default to system 2 using highest value estimated policy except at some error rate ifelse num - trainees > 1 [ run - conform ;have trainee choose if it will conform or not if there are other trainees if conform_choice = 0 [let e - greedy random 100 / 100 ;if not imitating run egreedy as normal ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high _ value ]] ] [ let e - greedy random 100 / 100 ;run choice with some degree of error ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ] ] end to save - burn - in ;save pretraining performance ask trainees [set pretraining_succe s s_rate (task_succ esses / (burn_in + .000001))] end to run_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if suc c essful ifel se success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update cou n ts on task succes s set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for P o licy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate 317 set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimat e_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determin e if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_s uccesses + 1) ;up d ate counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estima t e for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate s et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Polic y B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_b + 1 ;updat e count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to run_high_value ;subroutine to choose and execut e policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A cho i ce set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_es timate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a)) ) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;u p date count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b rand om 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive r eward set task_successes (task_successes + 1) ;update counts on task success set post_training_s uccesses (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Poli c y B choice 318 set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estima t e s et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_policy_ b + 1 ;updat e count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to save - post - training ;save post training performance variables ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + .000001)) ] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to run - conform ;make conform decision based on specified rate and execute let conform_yes random 100 / 100 ifelse conform_yes <= conform [s e t conform_c hoice 1] [set conform_choice 0] ;choose if conforming or not set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees that applied b on last step let majority_rule other_chose_b / num - trainees if conform_c h oice = 1 [ ifelse majority_rule < .50 [let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_est i mate_a + (( 1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (atte mpts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set ch o se_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_polic y_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_suc c esses + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success 319 set attempts_po licy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (v a lue_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccess ful set reward to 0 and update policy value estimate set value_estimate_b (value_esti m ate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estima te_b 0] set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] ] end 320 Appendix G : Study 3C Environment and Code Figure 88 . Snapshot of the modeling environment for Model 3C in NetLogo. 321 trainees - own [ value_estimate_a ; e stimated value of Policy A value_estimate_b ;estimated value of Policy B system1_choose_a ;liklihood of choosing Policy A as habitual response attempts_policy_a ;number times applied Policy A attempts_policy_b ;number time appl ied Policy B rewar d _a ;reward received on most recent attempt with Policy A reward_b ;reward received on most recent attempt with Policy B task_successes ;number of times successful at task overall post_training_successes ;number of times successfu l only post - training pretraining_success_rate ;success rate pretraining only posttraining_success_rate ; percentage of times successful in post - training environment behavioral_transfer_rate ;rate of choosing Policy B in transfer environment transfe r_time_count ; ticks into transfer time chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b other_success_rate ;success rate of most successful other trainee conform_choice ;track decision to conform on each time step other_chose_b ;behavio r al choice of most successful other trainee goal_difference ;difference between performance goal and actual performance j_goal_check ;is the agent short of goal or no t? exploration_rate ;each trainees have own exploration rate ] globals [ mean_v a lue_estimate_a ;mean of agent value estimates for Policy A mean_value_estimate_b ;mean of agent value estimates for Policy B mean_overall_task_success ;task rate of success for full simulation mean_pretraining_success_rate ;success rate pretraining o nly all agents mean_posttraining_success_rate ;success rate posttraining only all agents mean_behavioral_transfer_rate ;rate of choosing Policy B in transfer environ ment all agents true_policy_b_reward ;reward for Policy B after adjusting for policy value change ] to setup clear - all ;clears environment from previous simulation create - trainees num - trainees [ setxy random - xcor random - ycor ;place specified number of agents at random coordinates set value_estimate_a initial_policy_a_estimate ;se t initial value estimate for Policy A for each trainee set value_estimate_b initial_policy_b_estimate ;set initial value estimate for Policy B for each trainee set attempts_policy_a 0 ;number times applied Policy A initial set to 0 b reed [trainees trainee] ;types of agents allowed in environment Algorithm 20 . NetLogo Code for Model 3C 322 set attempts _ policy_b 0 ;number time applied Policy B initial set t o 0 set task_successes 0 ;number of task successes initial set to 0 set pretraining_success_rate 0 ;success rate pretraining only initial set to 0 set post_training_successes 0 ;number of s u ccesses for post training initial set to 0 set pos ttraining_success_rate 0 ;success rate in posttraining environment initial set to 0 set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0 set chose_b 0 ;se t choice tracker to default of Policy A set other_success_rate 0 ;setup success rate of most successful other trainee set other_cho se_b 0 ;setup choice made by other most successful trainee set exploration_rate exploration_rate_0 ;set initial e x ploration rate ] layout - circle (sort turtles) max - pxcor - 3 set true_policy_b_reward (true_policy_a_reward + change_in_value) if t rue_policy_b_reward > 1 [set true_policy_b_reward 1] if true_policy_b_reward < 0 [set true_policy_b_reward 0] res e t - ticks ;reset time count to 0 end to go ;primary subroutines activated if ticks = (burn_in + transfer_time) [save - post - training] ;call subroutine to save post training variables if ticks = (burn_in + transfer_time) [stop] ;control length of sim ti c k ;advance time if ticks <= burn_in [trainees - burn - in] ;call subroutine to have trainee engage in task during burn in period if ticks > burn_in [trainees - transfer] ;call subroutine for trainee decisions post training if ticks = burn_in [save - burn - in ] ;call subroutine to save pretraining performance update - globals ;call subroutine to calculate all global variables used to track sim fu nctioning end to trainees - burn - in ;agents engage in work task during burn in ask trainees [let success_a random 10 0 / 100 ifelse success_a <= true_policy_a_reward [set reward_a 1 set task_successes (task_successes + 1)] [set reward_a 0] set attempts_policy_a (attempts_policy_a + 1) set value_estimate_a (value_estimate_a + ((1 / attempts_policy_ a ) * (reward_a - value_estimate_a))) ] end to update - globals ;calculate all global variables used to track sim functioning set mean_value_estimate_a mean [value_estimate_a] of trainees set mean_value_estimate_b mean [value_estimate_b] of trainees se t mean_overall_task_success mean [task_successes] of trainees / ticks set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees set mean_posttraining_success_rate me an [posttraining_success_rate] of trainees 323 set mean_behavioral_tr a nsfer_rate mean [behavioral_transfer_rate] of trainees end to trainees - transfer ;call routine to choose which system will drive task and update decision variables and trackers system - ch oose ask trainees [set transfer_time_count (ticks - burn_in)] a s k trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + .000001))] ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_coun t + .000001))] ask trainees [set goal_difference (perform_goal - ( task_successes / ticks))] ask trainees [ ifelse goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)] ] ask trainees [set exploration_rate (exploration_rate_0 + (exp lore_change * j_goal_check))] end to system - choose ;decide if syst e m2 will intervene, if not, rely on system 1 ask trainees [ ifelse value_estimate_b < engagement_threshold [run_policy_a] [ let system_choose (random 100 / 100) ifelse system_choose < system2_activation_liklihood [system2_decision] [system1_d e cision] ] ] end to system1_decision ;agent makes automatic decision about which policy to apply set system1_choose_a ((attempts_policy_a / (attempts_policy_a + att empts_policy_b + practice_attempts + .000001)) - implementation_intention) ;update h abitual decision rate ;note: all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation let choose_a random 100 / 100 ;generate ran dom number to determine which policy to implement ifelse choose_a < system1_cho o se_a [ let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive rewa rd set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001) ) * (reward_a - value_estimate_a))) ;update value estimate for Policy A set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice 324 set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Poli c y B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_traini n g_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_est imate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update val u e estimate for Policy B set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to system2_decision ;default to system 2 using highest value estimated policy except at some error rate ifelse num - trainees > 1 [ run - conform ;have trainee choose if it will conform or not if there are other trainees if conform_choice = 0 [let e - greedy random 100 / 100 ;if not imitating run egreedy as normal ifelse e - greed y < exploration_rate [ run_low_value ] [ run_high_value ]] ] [ let e - greedy random 100 / 100 ;run ch oice with some degree of error ifelse e - greedy < exploration_rate [ run_low_value ] [ run_high_value ] ] end to save - burn - in ;save pretraining p erformance ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))] end to r un_low_value ;subroutine to choose and execute policy with lowest estimated value ifelse value_estimate_a <= value_estimate_b [ let success_a rando m 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [se t reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_su c cesses (post_training_successes + 1) ;update counts on task success set attempts_policy_a attempt s_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - v a lue_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate 325 set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estim a te_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success _ b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful rec eive reward set task_successes (task_successes + 1) ;update counts on task success set post_tra i ning_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_b attempts_policy_b + 1 ;update count on Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (rewa r d_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value e stimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - val u e_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_po licy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end t o run_high_value ;subroutine to choose and execute policy with highest estimated value ifelse value_estimate_a >= value_estimate_b [ let succe ss_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_rewar d [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_t raining_successes (post_training_successes + 1) ;update counts on task success set attempts_policy_a at t empts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (re ward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set r e ward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy _ b_reward [set reward_b 1 ;if successful rec eive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on task success set attempts_pol i cy_b attempts_policy_b + 1 ;update count on Policy B choice 326 set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessf u l set reward to 0 and update policy value e stimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimat e _b 0] set attempts_policy_b attempts_po licy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] end to save - post - training ;save post training performance variables ask trainees [set posttraining_succe s s_rate (post_training_successes / (transfer _time + .000001))] ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))] end to run - conform ;make conform decision based on specified rate and execute let conform_yes ra n dom 100 / 100 ifelse conform_yes <= confo rm [set conform_choice 1] [set conform_choice 0] ;choose if conforming or not set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees that applied b on last step let majority _ rule other_chose_b / num - trainees if conform_choice = 1 [ ifelse majority_rule < .50 [let success_a random 100 / 10 0 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive re w ard set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (p ost_training_successes + 1) ;update counts on task success set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estim ate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ; update value estimate for Policy A if value_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] ] [let success_b random 100 / 100 ;if Policy B chosen, determine if successful ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful re c eive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_succ esses (post_training_successes + 1) ;update counts on task success 327 set attempts_policy_b attempts_policy_b + 1 ;update count o n Policy B choice set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - val ue_estimate_b))) ] ;update value estimate for Policy B [set reward_b 0 ;if unsuccessful set reward to 0 and update policy value e stimate set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b - value_estimate_b))) ;update value estimate for Policy B if value_estimate_b < 0 [set value_estimate_b 0] set attempts_policy_b attempts_p o licy_b + 1 ;update count on Policy B choice set chose_b 1 ;update choice to Policy B ] ] ] end to run_policy_a let success_a random 100 / 100 ;if Policy A chosen, determine if successful ifelse success_a < true_policy_a_rewa r d [set reward_a 1 ;if successful receive reward set task_successes (task_successes + 1) ;update counts on task success set post_training_successes (post_training_successes + 1) ;update counts on t ask success set attempts_policy_a a t tempts_policy_a + 1 ;update count on Policy A choice set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ] ;update value estimate for Policy A [set reward_a 0 ;if unsuccessful set r eward to 0 and update policy value estimate set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a - value_estimate_a))) ;update value estimate for Policy A if val ue_estimate_a < 0 [set value_estimate_a 0] set attempts_policy_a attempts_policy_a + 1 ;update count on Policy A choice set chose_b 0 ;update choice to Policy A ] end 328 REFERENCES 329 REFERENCES A merican Society of Training and Development. (2018). 2018 sta te of the industry report. Ale xandria, VA: A STD Press. Arthur, W., Bennett, W., Stanush, P., L., & McNelly, T. L. (1998). Factors that influence skill decay and retention: A quantitative review and analysis. Human Performance, 11 (1), 57 - 101. Baard, S. K., Rench, T. A., & Kozlowski, S. W. J. (2014). Performance adaptation: A theoretical integration and review. Journal of Management, 40 (1), 48 - 99. Bago, B. and De Neys, W. (2017) Fast logic? Examining the time course assumption of dual process theory. Cognitio n, 158 , 90 109 Baker, J. S., & Frey, P. W. ( 1980). A cusp catastrophe: Hysterisis, bimodality, and inaccessibility in rabbit eyelid conditioning. Learning and Motivation, 10, 520 - 535. Baldwin, T., & Ford, J. K. (1988). Transfer of training: A review and di rections for future research. Personnel Psyc hology, 41 (1), 63 - 105. Baldwin, T., Ford, J. K., & Blume, B. (2009). Transfer of training 1988 - 2008: An updated review and agenda for future research. International Review of Industrial and Organizational Psychol ogy, 24, 41 - 70. Baldwin, T., M agjuka, R. J., & Loher, B. (1991). The perils of participation: Effects of choice of training on trainee motivation and learning. Personnel Psychology, 44, 51 - 65. Bandura, A. (1977). Social Learning Theory . Oxford, England: Prentice - Hall. Bandura, A. (1989 ). Human agenc y in Social Cognitive Theory. American Psychologist, 44 (9), 1175 - 1184). Bandura, A. (1991). Social Cognitive Theory of Self - Regulation. Organizational Behavior and Human Decision Processes, 50 , 248 - 287. Bandura, A., & Cervone , D. (1983). Self - evaluative an d self - efficacy mechanisms governing the motivational effects of goal systems. Journal of Personality and Social Psychology, 45 (5), 1017 - 1028. Banks, J., Carson, I. I., Nelson, B. L., & Nicol, D. M. (2005). Discrete - event sys tem simulation . P earson. Bauer, K. N., Orvis, K. A., Ely, K., & Surface, E. A. (2016). Re - examination of motivation in learning contexts: Meta - analytically investigating the role type of motivation plays in the prediction of key training outcomes. Journal of Business and P sychology, 31 ( 1), 33 - 50. 330 Beier, M. E., & Kanfer, R. (2010). Motivation in training and development: A phase perspective. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 65 - 97). New York, NY: Routledge. Be ll, B., & Kozl owski, S. W. J. (20 08 ). Active learning: Effects of core training design elements on self - regulatory processes, learning, and adaptability. Journal of Applied Psychology, 93 (2), 296 - 316. Bell, B., & Kozlowski, S. W. J. (2010). Toward a theory of learner - ce ntered training design: An integrative framework of active learning. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 263 - 300). New York, NY: Routledge. Bell, B., Tanne nbaum, S., Ford, J. K., Noe, R., & Kr aiger, K. (2017). 100 years of training and development research: What we know and where we should go. Journal of Applied Psychology, 102 (3), 305 - 323. Benner, P. (1982). From novice to expert. American Journal of Nursin g, 82 (3), 402 - 407. Blum e, B., Ford, J . K., Baldwin, T., & Huang, J. (2010). Transfer of training: A meta - analytic review. Journal of Management, 36 (4), 1065 - 1105. Blume, B., Ford, J. K., Surface, E., & Olenick, J. (2019). A dynamic model of training transf er. Human Resource Mana gement Review, 29, 270 - 283. Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71 (356), 791 - 799. Brauer, M., Wasel, W., & Niedenthal, P. (2000). Implicit and explicit components of prejudic e. Review of General Ps ychology, 4, 7 9 - 101. Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American Psychologist, 32, 513 531. Bronfenbrenner, U. (1979). The ecology of human development: Experiments by nature and design. Cambridge, MA: Harvard U niversity Pres s. Campion, M. & Lord, R. (1982). A Control - Syestems conceptualization of the goal - setting and changing process. Organizational Behavior and Human Performance, 30 , 265 - 287. Cannon - Bowers, J. A., & Salas, E., Ta nnenbaum, S. I., & Mathieu, J. E . (1995). Towa rd theoretically based principles of training effectiveness: A model and initial empirical investigation. Military Psychology, 7 (3), 141 - 164. Carver, C., & Scheier, M. (1998). On the self - regulation of behavior . New York, NY: Cambridge Univer sity Press. 331 Ca scio, W. F. (2019). Training trends: Macro, micro, and policy issues. Human Resource Management Review, 29, 284 - 297. Chen, G., Thomas, B., & Wallace, J. C. (2005). A multilevel examination of the relationships among training outcomes, mediati ng regulatory processes, and adaptive. Journal of Applied Psychology, 90 (5), 827 - 841. Cheng, E. (2016). Maintaining the transfer of in - Educational Psychology, 36 (3), 444 - 460. Cheng, E., & Hampson, I. (2008). Tra nsfer of train ing: A review and new insights. International Journal of Management Revi ews, 10 (4), 327 - 341. Clark, F., Sanders, K., Carlson, M. Blanche, E., & Jackson, J. (2007). Synthesis of habit theory. OTJR: Occupation, Participation and Health, 27, 75 - 235. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta - analytic path analysis of 20 years of research. Journal of Applied Psychology, 85 (3), 679 - 707. Cumming, G. (2014). The new statistics: Why and how. Psyc hological Science, 25 (1), 7 - 29. DeShon, R. P. (2012). Multivariate dynam ics in organizational science. In S. W. J. Kozlowski (Ed.), The Oxford Handbook of Organizational Psychology, Vol. 1 (pp. 117 - 142). New York, NY: Oxford University Press. DeShon, R. P. , & Rench, T. A. (2009). Clarifying the notion of self - regulation in org anizational behavior. International Review of Industrial and Organizational Psychology, 24, 217 - 247. Dickinson, A. (1980). Contemporary Animal Learning Theory. Cambridge University Pre ss. Dickinson, A. (1985). Actions and habits: The development of behavio ral autonomy. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences , 67 - 78. Dienes, Z. (2019). How do I know what my theory predicts? A dvances in Met hods and Practices in Psychological Science, 2 (4), 364 - 377. Dienes, Z., & Perner, J. (1999). A theory of implicit and explicit knowledge. Behavioral and Brain Sciences, 22, 735 - 808. Dierdorff, E., & Surface, E. (2008). If you pay for skills, will they lear n? Skill change and maintenance under a skill - based pay system. Journal of Management, 34 (4), 721 - 743. Dishop, C., Olenick, J., & DeShon, R. ( in press ). Principles for Taking a Dynamic Perspective. In Y. Griep, S. D. Hansen, T. Vantilborgh, a nd J. Hofmans (Eds.), Handbook of 332 Temporal Dynamic Organizational Behavior, Vol. 1: A Dynamic Look At Organizational Behavior Topics . Edward Elgar. Donovan, J. & Radosevich, D. (1999). A meta - analytic review of t he distribution of practice effect: Now you see it, now yo . Journal of Applied Psychology, 84 , 795 - 805. Driskell, J. E., Willis, R. P., & Copper, C. (1992). Effect of overlearning on retention. Journal of Applied Psychology, 77 (5), 615 - 622. Dunlosky , J., Rawson, K. A., Mash, E. J., Nathan, M. J., & Willing ham, D. T. (2013). Improving and educational psychology. Psychological Science in the Public Interest, 14 (1 ), 4 - 58. Dweck, C. (1986). Motivational proc esses affectin g learning. American Psychologist, 41 (10), 1040 - 1048. Elliot, A. (2006). The hierarchical model of approach - avoidance orientation. Motivation and Emotion, 30 , 111 - 116. Epstein, J.M. (1999). Agent - based computational models and generative so cial science. Complexity, 4 (5), 41 - 60. Ericsson, K. A. (2006). The influence of experience and deliberate practice on the development of superior expert performance. In K. A. Ericsson, N. Charness, P. J. Feltovic h, and R. R. Hoffman (Eds.), The Cambridge H andbook of Exp ertise and Expert Performance . Cambridge: Cambridge University Press. Ericsson, K. A., Krampe, R. T., & Tesch - Romer, C. (1992). The role of deliberate practice in the acquisition of expert performan ce. Psychological Review, 100 (3), 363 - 406. E vans, J., & St anovich, K. E. (2013). Dual - process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8 (3), 223 - 241. Ford, J. K., Kraiger, K., & Merritt, S. M. (2010). An updated review of the multidimensionality of t raining outcom es: New directions for training evaluation research. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 135 - 165). New York, NY: Routledge. Ford, J. K., Quinones, M. A., Sego, D. J., & Sorra, J. S. (1992). Fa ctors affecting the opportunity to perform trained tasks on the job. Personnel Psychology, 45, 511 - 527. Ford, J. K., Yelon, S. L., & Billington, A. Q. (2011). How much is transferred from training to the job? The 10% delusion as a catalyst fo r thinking abo ut transfer. Performance Improvement Quarterly, 24 (2), 7 - 24. Fork, J. K., Bhatia, S., & Yelon, S. L. (in press). Beyond direct application as an indicator of transfer: A demonstration of five types of use. Performance Improvement Quarterly. 333 F oxon, M. (1997 ). The influence of motivation to transfer, action planning, and manager support on the transfe r process. Performance Improvement Quarterly, 10 (2), 42 - 63. Friedman, S., & Ronen, S. (2015). The effect of implementation intentions on transfer of training. E uropean Journal of Social Psychology, 45, 409 - 416. Gentner, D. R. (1988). Thoughts on expertis e. In C. Schooler & K. W. Schaie (Eds.), Cognitive functioning and social structure over the life course (pp. 81 - 94). Norwood, NJ: Ablex. Gist, M., Stevens, C., & Bavetta, A. (1991). Effects of self - efficacy and post - training intervention on the acquisitio n and maintenance of complex interpersonal skills. Personnel Psychology, 44, 837 - 861. Goldstein, I. L. (1986). Training in organizations: Needs ass essment, devel opment, and evaluation. Pacific Grove, CA: Brooks/Cole. Goldstein, I. L., & Ford, J. K. (2002). Training in organizations: Needs assessment, development, and evaluation, 4 th ed. Belmont, CA: Wadsworth. Gollwitzer, P. M. (1999). Implementation intentions: S trong effects of simple plans. American Psychologist, 54 , 493 503. Gollwitzer, P. M., & Sheeran , P. (2006). Implementation intentions and goal achievement: A meta - analysis of effects and processes. Advances in Experimental and Social Psycholo gy, 38, 69 - 119 . Grand, J. A. (in press). A general response process theory for situational judgement tests. Journal of Applied Psychology. Grand, J. A., Braun, M. T., Kuljanin, G., Kozlowski, S. W. J., & Chao, G. T. (2016). The dynamics of team cognition: A process - orie nted theory of knowledge emergence in teams. Journal of Applied Psychology, 101 (10), 1352 - 1385 . Grant, A. M. (2008). The significance of task significance: Job performance effects, relational mechanisms, and boundary conditions. Journal of Ap plied Psycholo gy, 93 (1), 108 - 124. Greenwald, A., & Banaji, M. (1995). Implicit social cognition: Attitudes, self - esteem, and stereotypes. Psychological Review, 102 (1), 4 - 27. Guastello, S. J. (1987). A butterfly catastrophe model of motivation in organizati ons: Academic performance. Journal of Applied Psychology, 72 (1), 165 - 182. Hackman, J. R. (2003). Learning mo re by crossing levels: Evidence from airplanes, hospitals, and orchestras. Journal of Organizational Behavior, 24 (8), 905 - 922. Hanges, P. J., & Wang , M. (2012). S eeking the Holy Grail in organizational science: Uncovering causality through research design. In S.W.J. Kozlowski (Ed.), The Oxford Handbook of 334 Organizational Psychology, Vol. 1 (pp. 79 - 116). New York, NY: Oxford University Press. Harris, P. , Brearley, I. , Sheeran, P., Barker, M., Klein, W., Creswell, J., LeVine, J., & Bond, R. (2014). Combining s elf - affirmation with implementation intentions to promote fruit and vegetable consumption. Health Psychology, 33 (7), 729 - 736. Hattrup, K., & Jackson , S. E. (1996) . Learning about individual differences by taking situations seriously. In K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 507 547). San Francisco: Jossey - Bass. Hausknecht, J., Halpert, J., Di Paolo, N., & Moriar ti Gerrard, M. (2007). Retesting in selection: A meta - analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology, 92 , 373 385. Healy, K. (2017). Fuck nuance. So ciological Theory, 35 (2), 118 - 127. Hollenbeck, J. R ., Colquitt, J . A., Ilgen, D. R., LePine, J. A., & Hedlund, J. (1998). Accuracy decomposition and team decision making: Testing theoretical boundary conditions. Journal of Applied Psychology, 83 (3), 494 - 50 0. Holton, E. F., Bates, R. A., & Ruona, W. E. A. ( 2000). Develop ment of a generalized learning transfer system inventory. Human Resource Development Quarterly, 11 (4), 333 - 360. Holton, E. G., Bates, R. A., Seyler, D. L., & Carvalho, M. B. (1997). Toward c onstruct validation of a transfer climate instrumen t. Human Resou rce Development Quarterly, 11 (4), 333 - 360. Huang, J. L., Blume, B. D., Ford, J. K., & Baldwin, T. T. (2015). A tale of two transfers: Disentangling maximum and typical transfer and their respective predictors. Journal of Business Psychology, 30, 709 - 732. H uang, J. L., Ford, J. K., & Ryan, A. M. (2017). Ignored no more: Within - person variability enables better understanding of training transfer. Personnel Psychology, 70 (3), 557 - 596. Jaeggi, S. M., Buschkuehl, M., Shah, P., & Jonides, J. (2014). The role of i ndividual differences in cognitive training and transfer. Memory & Cognition, 42, 464 - 480. Jaidev, U. P., & Chirayath, S. (2012). Pre - training, during - training and post - training activities as predictors of transfer of training. The IUP Journa l of Managemen t Research, 11 (4), 54 - 70. Judge, T. A., & Zapata, C. P. (2015). The person - situation debate revisi ted: Effect of situation strength and trait activation on the validity of the Big Five personality traits in predicting job performance. Academy of Management Journal, 58 (4), 1149 - 1179. Kahneman, D. (2011). Thinking Fast and Slow. New York, NY: Farar, Stra us, & Giroux. 335 of diversity training outcomes. Journal of Organizational Behavior, 34 (8), 1076 - 1104. Karoly, P. (1993). Mechan isms of self - regulation: A systems view. Annual Review of Psychology , 44 , 23 52. Keith, N., & Frese, M. (2008). Effectiveness of error managemen t training: A meta - analysis. Journal of Applies Psychology, 93 (1), 59 - 69. Kendzierski, D., Ritter, R., Stump, T. , Anglin, C. (2015). The effectiveness of an implementations intention intervention for fruit and vegetable consumption as moderated by self - sch ema status. Ap petite, 95 , 228 - 238. Kenny, D. A. (2005) . Cross - lagged panel design . Hoboken, NJ: Wiley. Kessler, R. C. (1992). Perceived support and adjustment to stress: Methodological considerations. In H. O. F. Viel & U. Baumann (Eds.), The meaning and m easurement of social support (pp. 259 - 271). New York, NY: Hemisphere. Kim, Y., & Ployhart, R. E. (2014). The eff ects of staffing and training on firm productivity and profit growth before, during, and after the Great Recession. Journal of Applied Psycholog y, 99 (3), 361 - 389. Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco, CA: Berrett - Koehler. Knowles, M. S. (1984). Andragogy in action: Applying modern principles of adult learning. San Francisco: Jossey - Bass. Kolb, D. (1984). Experi ential learning. Englewood Cliffs, NJ: Prentice Hall. Kozlowski, S. W. J., & Chao, G. T. (2012). T he dynamics of emergence: Cognition and cohesion in work teams. Managerial and Decision Economics, 33, 335 - 354. Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research and methods in organizations: Foundations, exten sions, and new directions (pp . 3 - 90). San Francisco, CA: Jossey - Bass. Kozlowski, S. W. J., Gully, S. M., Brown, K. G., Salas, E., Smith, E. M., & Nason, E. R. (2001). Effects of training goals and goal orientation traits on multidimensional training outcomes and performa nce adaptabili ty. Organizational Behavior and Human Decision Processes, 85, 1 - 31. Kraiger, K., & Ford, J. K. (2007). The history of training in industrial/organizational psychology. In L. Koppes (Ed.), The science and practice of industrial and organizatio nal 336 psychology : Historical aspects from the first 100 years. Mahwah, NJ: Lawrence Erlbaum Associates. Kraiger, K., Ford, J. K., & Salas, E. D . (1993). Application of cognitive, skill - based, and affective theories of learning outcomes to new methods of trai ning evaluatio n. Journal of Applied Psychology, 78, 311 - 328. Lai, C., Hoffman, K., & Nosek, B. (2013). Reducing implicit prejudice. Social an d Personality Psychology Compass, 7 (5), 315 - 330. B. (2016). Re ducing implicit racial preferences: II. Intervention effectiveness across time. Journal of Experimental Psychology: General, 14 5 (5), 1001 - 1016. Laker, D. R., & Powell, J. L. (2011). The differences between hard and soft skills and their rela tive impact on training transfer. Human Resource Development Quarterly, 22 (1), 111 - 122. Lancaster, S., Di Milia, L., & Cameron, R. (2013). Su pervisor behaviours that facilitate training transfer. Journal of Workplace Learning, 25 (1), 6 - 22. Langdon, D. G. ( 1997). Selecti ng interventions. Performance Improvement, 36, 11 - 15. Leavitt, K., Qiu, F., & Shapiro, D. L. (in press). Using electronic confe derates for experimental research in organizational science. Organizational Research Methods. Lewin, K. (1943). Psy chology and th e process of group living. Journal of Social Psychology, 17, 113 - 131. Lindsley, D., Brass, D., & Thomas, J. (1995). Efficacy - pe rformance spirals: A multi - level perspective. The Academy of Management Review, 20 (3), 645 - 678. Locke, E. (1968). T oward a theory of task motivation and incentives. Organizational behavior and human performance, 3, 157 - 189. Locke, E. (1975). Personnel atti tudes and motivation. Annual Review of Psychology, 26, 457 - 480. Locke, E., & Latham, G. (1990). A Theory of Goal Se tting and Task Performance. Englewood Cliffs, NJ: Prentice - Hall. London, M. (2012). Lifelong learning. In S. W. J. Kozlowski (Ed.), The Oxfor d Handbook of Organizational Psychology, Vol. 2 (pp. 1199 - 1227). New York, NY: Oxford University Press. Lopes, M., Melo, F. S., K enward, B., & Santos - Victor, J. (2009). A computational model of social - learning mechanisms. Adaptive Behavior, 17 (6), 467 - 183. 337 Lord, R. G., & Hanges, P. J. (1987). A control system model of organizational motivation: Theoretical development and applied im plications. Behavioral Science , 32 (3), 161 - 178. Lord, R., Diefendorff, J., Schmidt, A., & Hall, R. (2010). Self - regulation at work. Annual Review of Psychology, 61 , 543 - 68. Ludvig, E. A., Bellemare, M. G., & Pearson, K. G. (2011). A primer on reinforcement learning in the brain: Psychological, computati onal, and neural perspectives. In E. Alonso and E. Mondragon (Eds.), Computational neuroscience for advancing artificial intelligence: Models, methods, and applications, (pp. 111 - 144). Hershey, PA: IGI Global . March, J. G. (1991). Exploration and exploitat ion in organizational learning. Organization Science, 2 (1), 71 - 87. Martocchio, J. J. (1992). Microcomputer usage as an opportunity: The influence of context in employee training. Personnel Psych ology, 45, 529 - 552. Mathieu, J. E., & Tesluk, P. E. (2010). A multilevel perspective on training and development effectiveness. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 405 - 440). New York, NY: Rout ledge. Mathieu , J. E., Tannenbaum, S. I., & Salas, E. (1992). Influences of individual and situational characteristics on measures of training effectiveness. Academy of Management Journal, 35, 828 - 847. McCrae, R. R., & Costa, P. T. (1987). Validation of th e five - factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52 (1), 81 90. Melnikoff, D. E. and Bargh, J. A. (2018a) The mythical number two. Trends in Cognitive Science, 22 , 280 293. Melnikoff, D. E. a nd Bargh, J. A . (2018b) The insidious number two. Trends in Cognitive Science, 22 , 668 - 669. Meyer, R. D., Dalal, R. S., & Hermida, R. (2010). A review and synthesis of situational strength in the organizational sciences. Journal of Management, 36, 121 - 140. Miller, J. H. , & Page, S. E. (2007). Complex adaptive systems: An introduction to computational models of social life. Princeton, NJ: Princeton University Press. Muthukrishna, M., & Henrich, J. (2019). A problem in theory. Natur e: Human Behaviour . doi: 10 .1038/s41562 - 0 18 - 0522 - 1 Myers, C. G. (in press). Performance benefits of reciprocal vicarious learning in teams. Academy of Management Journal. Myers, D. G. (2004). Psychology (7 th ed.). New York: Worth. 338 Neal, J. W., & Neal, Z. P . (2013). Nested or net work ed? Future dir ections for ecological systems theory. Social Development, 22 , 722 737. Neal, D. T., Wood, W., & Drolet, A. (2013). How do people adhere to goals when willpower is low? The profits (and pitfalls) of strong habits. Journal of Personality and Social Psychol ogy, 104 (6), 959 - 975. Neal, D. T., Wood, W., & Quinn, J. M. (2006). Habits: A repeat performance. Current Directions in Psychological Science, 15, 198 - 202. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J.R. Anderson (Ed.), Cognitive Skills and their Acquisition (pp. 1 - 56). Hillsdale, NJ: Lawrence Earlbaum Associates. Nijman, D. J. M., Nijhof, W. J., Wognum, A. A. M., & Veldkamp, B. P. (2006). Exploring differential effects of supervisor sup port on transf er of training. Journal of European Industrial Training, 30 (7), 529 - 549. Noe, R. A. (2017). Employee Training & Development, Seventh Edition. New York, NY: McGraw - Hill Education. Nye, C., Prasad, J., Bradburn, J., & Elizondo, F. (2018). Impro ving the opera tionalization of interest congruence using polynomial regression. Journal of Vocational Behavior, 104, 154 - 169. Olenick, J., Bhatia, S., & Ryan, A. M. (2016). Effects of g - loading and time lag on retesting in job selection. International Jour nal of Selecti on and Assessment, 24 (4), 324 - 336. Olenick, J., Blume, B., & Ford, J. K. ( in press ). A nonlinear framework for understanding employee training and transfer. European Journal of Work a nd Organizational Psychology. Olenick, J., Walker, R., Brad burn, J., & De Shon, R. (2018). A systems view of the scientist - practitioner gap. Industrial and Organizational Psychology, 11 (2), 220 - 227. Pavlov, P.I. (1927). Conditioned Reflexes. London: Oxford University Press. Payne, S., Youngcourt, S., & Beaubien, J. (2007). A met a - analytic examination of the goal - orientation nomological net. Journal of Applied Psychology, 92 (1), 128 - 150. Peetz, J., Wilson, A. E., & Strahan, E. J. (2009). So far away: The role of subjective temporal distance to future goals in motivat ion and behavi or. Social Cognition, 27 (4), 475 - 495. Pennycock, G., De Neys, W., Evans, J., Stanovich, K. E., & Thompson, V. A. (2018). The mythical dual - process typology. Trends in Cognitive Scienc e, 22 (8), 667 - 668. Pennycook, G., Fugelsang, J. A., & Koehl er, D. J. (201 5). What makes us think? A three - stage dual - process model of analytic engagement. Cognitive psychology , 80 , 34 - 72. 339 Ployhart, R. E., & Moliterno, T. P. (2011). Emergence of the human c apital resource: A multilevel model. Academy of Management Review, 36 (1), 127 - 150. Ployhart, R. E., & Vandenberg, R. J. (2010). Longitudinal research: The theory, design, and analysis of change. Journal of management , 36 (1), 94 - 120. Popper, K. R. (1959). The Logic of Scientific Discovery. London: Hutchinson. Power s, W. (1973). Behavior: The Control of Perception . New York: Aldine/DeGruyter. Railsback, S. F., & Grimm, V. (2012). Agent - Based and Individual - Based Modeling: A Practical In troduction. Princeton, NJ: Princeton University Press. Rogers, E. (2003). Diffusio n of Innovatio ns, Fifth Edition. New York, NY: Free Press. Rouiller, J. Z., & Goldstein, I. L. (1993). The relationship between organizational transfer climate and positive t ransfer of training. Human Resource Development Quarterly, 4, 377 - 390. Rummler, G. (1996). In se arch of the holy performance grail. Training and Development, 26 - 31. Ruona, W., Leimbach, M., F. Holton III, E., & Bates, R. (2002). The relationship between le arner utility reactions and predicted learning transfer among trainees. Internatio nal Journal of Training and Development, 6 (4), 218 - 228. Salas, E., & Kozlowski, S. W. J. (2010). Learning, training, and development in organizations: Much progress and a pee k over the horizon. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and developmen t in organizations (pp. 461 - 476). New York, NY: Routledge. Salas, E., Milham, L. M., & Bowers, C. A. (2003). Training evaluation in the military: Misconceptions, opportunities, and challenges. Military Psychology, 15, 3 - 16. Salas, E., Weaver, S. J., & Shuf fler, M. L. (2012). Learning, training, and development in organizations. In S. W. J. Kozlowski (E d.), The Oxford Handbook of Organizational Psychology, Vol. 1 (pp. 330 - 372). New York, NY: Oxford University Press. Samuel, A. L. (1967). Some s tudies in mach ine learning using the game of checkers. II Recent progress. IBM Journal on Research and Develop ment, 11 (6), 601 - 617. Schmidt, A. M., & DeShon, R. P. (2007). What to do? The effects of goal - performance discrepancies, superordinate goals, an d time on dyna mic goal prioritization. Journal of Applied Psychology, 92, 928 - 941. Schniehotta, F., Scholz, U., & Schwarzer, R. (2005). Bridging the intention - behavior gap: Planning, self - efficacy, and action control in the adoption and maintenance of phys ical exercise. Psychology & Health, 20 (2), 143 - 160. 340 Scholz, U., Nagy, G., Schuz, B., & Ziegelman, J. (2008). The role of motivational and volitional factors for self - regulated running training: Associations on the between and within - person level. British J ournal of Soci al Psychology, 47, 421 - 439. Schunk, D. & Usher, E. (2012). Social Cognitive Theory and Motivation. In R. Ryan (Ed.) The Oxford Handbook of Human Motivation, (pp. 3 - 27). New York, NY: Oxford University Press. Sheeran, P., Webb, T., & Gollwitze r, P. (2005). The interplay between goal intentions and implementation intentions. Personality and social psycho logy bulletin, 31 (1), 87 - 98. Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk - sensitive reinforcement learning. Neural Computation, 26 (7), 1298 - 1328. Singh, V., Dong, A., & Gero, J. S. (2013). Developing a computational model to understand the contributions of social learning modes to task coordination in teams. Artificial Intelligence for Engineering Design, Analys is and Manufacturing. 27, 3 - 17. Sitzmann, T., & Weinhardt, J. M. (2019). Approaching evaluation from a multilevel perspective: A comprehensive analysis of the indicators of training effectiveness. Human Resource Management Review, 29, 253 - 269. Sitzmann, T. , & Yeo, G. (2013). A meta - analytic investigation of the within - person self - efficacy domain: Is self - efficacy a product of past performance or a driver of future performance? Personnel Psychology, 66 , 531 - 568. Skinner, B. F. (1938). The Behavior of Organis ms: An Experimental An alysis . New York, NY: Appleton - Century. Skinner, B. F. (1963). Operant behavior. American Psychologist, 18 (8), 503 - 515. Smith, E. M., Ford, J. K., & Kozlowski, S. W. J. (1997). Building adaptive expertise: Implications for training de sign strategies. In M. A. QuiƱones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 89 - 118). Washington, DC, US: American Psychological Association. Soltis, S. M., Brass, D. J., & Lepak, D. P. ( 2018). Social resource management: Integrating social network theory and human resource management. Academy of Management Annals, 12 (2), 537 - 573. Southerton, D. (2012). Habits, routines and temporalities of consumption: From individual behaviours to the re production of everyday practices. Time & Society, 22 (3), 335 - 355. Spicer, S. G., Mitchell, C. J., Wills, A. J., & Jones, P. M. (2020). Theory protection in associative learning: Humans maintain certain beliefs in a manner that violates prediction error. Jo urnal of Experimental Psychology: Animal Learning and Cognition, 46 (2), 1 51 - 161. 341 Stajkovic, A., & Luthans, F. (1998). Self - efficacy and work - related performance: A meta - analysis. Psychological Bulletin, 124, 240 - 261. Starns, J. J., Cataldo, A. M., Rotello, C. M., Annis, J., Asc Assessing theor etical conclusions with blinded inference to investigate a potential inference crisis. Advances in Methods and Practices in Psychological Science, 2 (4), 335 - 349. Steel, P., & Konig, C . J. (2006). Integrati ng theories of motivation. Academy of Management Re view, 31 (4), 889 - 913. Steele - Johnson, D.., Narayan, A., Delgado, K. M., & Cole, P. (2010). Pretraining influences and readiness to change dimensions: A focus on static versus dynamic issues. The Journal of Applied Behavioral Science, 46 (2), 245 - 274. Stoned ahl, F. and Wilensky, U. (2008). NetLogo Diffusion on a Directed Network model. http://ccl.northw estern.edu/netlogo/mod els/DiffusiononaDirectedNetwork . Center for Connec ted Learning and Computer - Based Modeling, Northwestern University, Evanston, IL. Sun, R., Slusarz, P., & Terry, C. (2005). The interaction and of the explicit and the implicit in ski ll learning: A dual - pr ocess approach. Psychological Review, 112 (1), 159 - 1 92. Sun, S., Vancouver, J., & Weinhardt, J. (2014). Goal choices as planning: Distinct expectancies and value effects in two goal processes. Organizational Behavior and Human Decision Processes, 125 , 220 - 2 33. Surface, E., & Olenick, J. (forthcoming). A mec hanistic model of training transfer. Susskind, D., & Susskind, R. (2017). The Future of the Professions: How Technology Will Transform the Work of Human Experts. New York, NY: Oxford University Press. Sutt on, R. I., & Staw, B. M. (1995). What theory is not . Administrative Science Quarterly, 40 (3), 371 - 384. Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction, Second Edition. Cambridge, MA: The MIT Press. Tannen baum, S. I., Beard, R. L., McNall, L. A., & Salas, E. (2010). Informal le arning and development in organizations. In. S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 303 - 331). New York, NY: Routledge. Tannenb aum, S. I., Mathieu, J . E., Salas, E., & Cohen, D. (2012). Teams are changing: Are research and practice evolving fast enough? Industrial and Organizational Psychology, 5, 2 - 24. Tesauro, G. (2002). Programming backgammon using self - teaching neural n ets. Ar tificial Intelligence, 134 (1 - 2), 181 - 199. 342 Tesauro, G., Gondek, D. C., Lenchner, J., Fan, J., & Prager, J. M. (2013). Analysis of Journal of Artificial Intelligence Research, 47, 205 - 251. Thayer, P. W., & Tea chout, M. S. (1995). A Climat e for transfer model. (Rep No. ALM - TP - 1995 - 0035). Brooks Air Force Base, TX: Air Force Material Command. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psycholog ical Re view, Series of Monogr aph Supplements, II (4). Trentin, G. (2001). From Formal Training to Communities of Practice via Network - Based Learning. Educational Technology,41 (2), 5 - 14. Turton, R., Bruidegom, K., Cardi, V., Hirsch, C. R., & Treasure, J. (2016). No vel methods to help de velop healthier eating habits for eating and weight disorders: A systematic review and meta - analysis. Neuroscience and Biobehavioral Reviews, 61 , 132 - 155. T versky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative rep resentation of uncerta inty. Journal of Risk and Uncertainty, 5, 297 - 323. Vancouver, J. & Day, D. (2005). Industrial and Organization research on self - regulation: From constructs to applications. Applied Psychology: An International Review, 54 (2), 155 - 185. Vancouver, J. (2008). Integrating self - regulation theories of work motiva tion into a dynamic process theory. Human Resource Management Review, 18, 1 - 18. Vancouver, J. (2012). Rhetorical reckoning: A response to Bandura. Journal of Management, 38 (2), 465 - 47 4. Vancouver, J., & Ke ndall, L. (2006). When self - efficacy negatively rel ates to motivation and performance in a learning context. Journal of Applied Psychology, 91 (5), 1146 - 1153. Vancouver, J., & Weinhardt, J. (2015). Modeling the mind and the milieu: Com putational modeling fo r micro - level organizational researchers. Organizat ional Research Methods, 15 (4), 602 - 623. Vancouver, J., Gullekson, N., Morse, B., & Warren, M. (2014). Finding a between person negative effect of self - efficacy on performance: Not jus t a within - person effe ct anymore. Human Performance, 27 (3), 243 - 261. Vanc ouver, J., Moore, K., & Yoder, R. (2008). Self - efficacy and resource allocation: Support for a nonmonotonic, discontinuous model. Journal of Applied Psychology, 93 (1), 35 - 47. Vancouve r, J., Weinhardt, J., & Vigo, R. (2014). Change one can believe in: Addin g learning to computational models of self - regulation. Organizational Behavior and Human Decision Processes, 124, 56 - 74. 343 Vermeulen, R. C. M. (2002). Narrowing the transfer gap: The ad tuations in training. Journal of European Industria l Training, 26 (8), 366 - 374. Verplanken, B., & Orbell, S. (2003). Reflections on past behavior: A self - report index of habit strength. Journal of Applied Social Psychology, 33 (6), 1313 - 1330. Vignoli, M., & Depolo, M. (2019). Transfer of training process. Wh en proactive personality matters? A three - wave investigation of proactive personality as a trigger of the transfer of training process. Personality and Individual Differences, 141, 62 - 67. Vroom, V. (1964). Work and Motivation . New York, NY: John Wiley & So ns, Inc. Weichart, E. R., Turner, B. M., & Sederberg, P. B. (in press). A model of dynamic, within - trial conflict resolution for decision making. Psychological Review. Weick, K. E. (1 976). Educational orga nizations as loosely coupled systems. Administrativ e science quarterly , 21 (1), 1 - 19. Wieber, F., Thurmer, J., Gollwitzer, P. (2015). Promoting the translation of intentions into actions by implementation intentions: Behavioral effects and psychological cor relates. Frontiers in Human Neuroscience, 9 , 1 - 18. Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/ . Center for Connected Learning and Computer - Based Modeling, Nor thwestern University, Evanston, IL. Wilson, T., Lindsey, S., & Schooler, T. (2000). A model of dual attitudes. Psychological Bulletin, 107 (1), 101 - 126. Wilson, T., Lindsey, S., & Schooler, T. (2000). A model of dual attitudes. Psychological Bulletin, 107 ( 1), 101 - 126. Wood, W., Quinn, J. M., & Kashy, D. (2002). Habits in everyd ay life: Thought, emotions, and action. Journal of Personality and Social Psychology, 83, 1281 - 1297. Yammarino, F. J., & Dubinsky, A. J. (1994). Transformational leadership theory: Us ing levels of analysis to determine boundary conditions. Personnel Psycho logy, 47, 787 - 811. Yeh, C - H., & Chen, S - H. (2001). Toward an integration of social learning and individual learning in agent - based computational stock markets: The approach based on p opulation genetic prog ramming. Journal of Management and Economics, 5 (5). Yelon, S. L., & Ford, J. K. (1999). Pursuing a multidimensional view of transfer. 12 (3), 58 - 78. Zerres, A., Huffmeier, J., Freund, P., Backhaus, K., & Hertel, G. (2013). Does it take two to tango? Longitu dinal effects of unilateral and bilateral integrative negotiation tr aining. Journal of Applied Psychology, 98 (3), 478 - 491. 344 Zhang, Y., Olenick, J., Chang, C - H., Kozlowski, S. W. J., & Hung, H. (2018). The I in team: Mining personal soc ial interaction routin e with topic models from long - term team data . Proceedings of the 23 r d International Conference on Intelligent User Interfaces, 421 - 426.