STILL LEARNING: INTRODUCING THE LEARNING TRANSFER MODEL, A FORMAL 
MODEL OF TRANSFER
 
 
By
 
 
Jeffrey David Olenick
 
 
A DISSERTATION
 
 
Submitted to
 
Michigan State University
 
in partial fulfillment of the 
requirements
 
for the degree of
 
 
Psycholo
gy 

 
Doctor of Philosophy
 
 
2020
 
 
ABSTRACT
 
STILL LEARNING: INTRODUCING THE LEARNING TRANSFER MODEL, A FORMAL 
MODEL OF TRANSFER
 
 
By
 
 
Jeffrey David Olenick
 
 
Although training has been a key topic of study in organizational psychology for over a century, 
a
 
century which has seen great progress in our understanding of what a 
quality
 
training program 
entails, a substantial gap persists between what is trained and
 
what is transferred to the job. 
Reduction of the training
-
transfer gap has driven research on tra
nsfer
-
focused interventions 
which have proven effective. However, although we know a lot regarding 
how
 
individuals learn 
new material, and correlates of 
wheth
er 
they
 
transfer that material back to 
their
 
work 
environment, we know 
very little
 
about 
how
 
indiv
iduals go about choosing whether to apply 
their new knowledge to
,
 
typically
,
 
previously
-
encountered situations in their work environment 
and how those decisions unfold over time. Improving our knowledge 
regarding how individuals 
transfer learned material
 
w
ill lead to new insights on how to support the transfer of 
organizationally 
directed
 
training, or any learning event, back to the work environment. Thus, 
the present paper introduces a formal model of the transfer process
, the Learning Transfer Model
 
(LTM)
,
 
which 
proposes a process for how transfer
 
unfolds over time and gives rise
 
to many of 
the findings we have accumulated in the transfer literature. This is accomplished by 
reconceptualizing transfer as its own learning process which is affected by the dua
l nature of 

s
elf
-
regulatory processes
. 
The 
LTM was 
then instantiated in a 
series of 
computational model
s
 
for virtual experimentation.
 
Findings and implications for research and practice are dis
cussed
 
throughout
.
 
 
Copyright by
 
JEFFREY DAVID OLENICK
 
2020
 
 
iv
 
This dissertation is dedicated to my loving family, 
 
without whom I could not have made it this far
 
 
v
 
ACKNOWLEDGEMENTS
 
 
I would like to thank all those who have helped me over the years to 
get
 
to this point
. 
 
To my professors
: Dr. Steve W.J. Kozlowski for being my primary mentor through my doctoral 
career and always pushing me further in my thinking; Dr. J. Kevin Ford for being a good mentor 
and friend, and providing the developmental opport
uni
ties through which the ideas expressed 
within this dissertation were formed; 
Drs. Richard DeShon and Zachary Neal for being a part of 
my committee and introducing me to the kind of theory and models which provide the basis for 
my thinking in this paper;
 
Dr
. Ann Marie Ryan for guiding me when I wanted to switch careers 
and had no experience in this field; and Drs. Michael Stamm and Matthew Pauly
 
for
 
believing in 
me as a struggling undergraduate searching for my place in the world.
 
Thank you to my 
friends who
 
have always been there to provide a distraction from the pressures 
of life.
 
A special thank you to my family, especially my mother and my grandparents for providing me 
with the foundation I needed to reach the success I have. And my father, alt
hough you l
eft my 
life far too soon, you have provided a lifetime of inspiration.
 
Thank you to my son for providing a daily dose of motivation and levity, although it seems like a 
weird way to show it, this paper is very much a labor of love for you.
 
Final
ly, thank y
ou to my wonderful wife, Catherine. 
You pulled me out of the darkest time of my 
life and helped get me back on track. 
All
 
my accomplishments would be impossible without you.
 
 
vi
 
TABLE OF 
CONTENTS
 
 
LI
ST OF TABLES
 
................................
................................
................................
..........................
 
x
 
 
LIST OF FIGURES
 
................................
................................
................................
.......................
 
xi
 
 
LIST OF ALGORITHMS
 
................................
................................
................................
............
 
xvi
 
 
Introducti
on
 
................................
................................
................................
................................
...
 
1
 
 
Review of Transfer Literature
 
................................
................................
................................
.....
 
6
 
 
Computational Modeling and t
he Modeling Cycle
 
................................
................................
...
 
13
 
Transfer Findings for Which to Account
 
................................
................................
..................
 
15
 
Practice and Overlearning
 
................................
................................
................................
.....
 
17
 
Utility Reactions
 
................................
................................
................................
...................
 
18
 
Work Environment
................................
................................
................................
................
 
18
 
Implementation intentions
 
................................
................................
................................
....
 
20
 
Maintenance Curves
................................
................................
................................
..............
 
21
 
Self
-
efficacy
 
................................
................................
................................
..........................
 
22
 
Skill type
 
................................
................................
................................
...............................
 
22
 
Near versus Far Transfer, Adaptive Transfer an
d Adaptive Per
formance
 
............................
 
23
 
 
Study 1: Base Learning Transfer Model
 
................................
................................
..................
 
26
 
Dual Process Models and Habits
 
................................
................................
..............................
 
26
 
Reinforcement Learning
 
................................
................................
................................
...........
 
33
 
The Learning Transfer Model
 
................................
................................
................................
...
 
36
 
 
Study 1: Method
 
................................
................................
................................
..........................
 
44
 
Model outcome metrics
................................
................................
................................
.............
 
44
 
Analysis
................................
................................
................................
................................
.....
 
46
 
 
Study 1: Simulation and Results
 
................................
................................
................................
 
48
 
Model verification
 
................................
................................
................................
.....................
 
48
 
Logical Consistency
 
................................
................................
................................
..............
 
48
 
Parameter Effects Check
 
................................
................................
................................
.......
 
48
 
Simulation Length
 
................................
................................
................................
.................
 
49
 
Policy Value
 
................................
................................
................................
..........................
 
49
 
Policy Value Estimates
 
................................
................................
................................
.........
 
50
 
Explor
ation Rate
 
................................
................................
................................
...................
 
50
 
Generative Sufficiency, Sensitivity and Robustness
 
................................
................................
 
51
 
T
rue policy values
 
................................
................................
................................
.................
 
51
 
Timing of 
interventions
................................
................................
................................
.........
 
55
 
Type 2 Processing
 
................................
................................
................................
.................
 
56
 
 
vii
 
Practice and Overlearning
 
................................
................................
................................
.....
 
58
 
Utility reactions
 
................................
................................
................................
.....................
 
59
 
Transfer trajectories
 
................................
................................
................................
..............
 
60
 
Implementation Intentions
 
................................
................................
................................
....
 
60
 
Exploration rates
 
................................
................................
................................
...................
 
61
 
Exploratory experimentation
 
................................
................................
................................
 
62
 
 
Study 1: Discussio
n
 
................................
................................
................................
.....................
 
65
 
Theoretica
l Implications
 
................................
................................
................................
...........
 
65
 
Practical Implications
................................
................................
................................
................
 
69
 
Conclusion
 
................................
................................
................................
................................
 
72
 
 
Study 2A: Adding Social Learning to the LTM
 
................................
................................
.......
 
74
 
Social Learning Theory
................................
................................
................................
.............
 
74
 
The Formal Transfer Model with Soc
ial Learning
 
................................
................................
...
 
76
 
 
Study 2A: Method, Simulation and Results
 
................................
................................
.............
 
79
 
Virtual Experimentation
................................
................................
................................
............
 
79
 
Model verification
 
................................
................................
................................
.................
 
79
 
Number of Trainees
 
................................
................................
................................
..............
 
80
 
Connectedness
................................
................................
................................
.......................
 
81
 
Interac
tion between Trainees and Connectedness
 
................................
................................
 
81
 
 
Study 2A: Discussion and Conclusion
 
................................
................................
.......................
 
83
 
 
Study 2B and 2C: Rethinking Social Learning Model
 
................................
............................
 
85
 
Model 2B
 
Overview
 
................................
................................
................................
..................
 
88
 
Model 2C Overview
 
................................
................................
................................
..................
 
89
 
 
Study 2B: Method, Simulation and Results
 
................................
................................
..............
 
91
 
Trainees Versus Imitation Experiment
 
................................
................................
.....................
 
91
 
 
Study 2C: Me
thod, Si
mulation and Results
 
................................
................................
.............
 
94
 
Trainees Versus Conformity Experiment
 
................................
................................
.................
 
94
 
 
Study 2B and 2C: Discussion and Conclusion
 
................................
................................
..........
 
97
 
Implications for Theory
 
................................
................................
................................
............
 
97
 
Future modeling of social learning
 
................................
................................
.....................
 
100
 
Other modeling p
ossibilities
 
................................
................................
...............................
 
102
 
Implications for Practice
 
................................
................................
................................
.........
 
103
 
Conclusion
 
................................
................................
................................
..............................
 
104
 
 
Study 3
A: Adding Self
-
Regulation to the Transfer Process Model
 
................................
......
 
105
 
Self
-
Regulation
 
................................
................................
................................
.......................
 
105
 
Hierarchical goal pursuit
 
................................
................................
................................
.....
 
105
 
Self
-
regulatory negative feedback systems
 
................................
................................
.........
 
106
 
 
viii
 
Self
-
Efficacy
 
................................
................................
................................
.......................
 
107
 
The LTM with Self
-
Regulation
 
................................
................................
..............................
 
108
 
 
Study 3A: Method, Simulation
, and Results
 
................................
................................
..........
 
111
 
Virtual Experimentation
................................
................................
................................
..........
 
111
 
 
Study 3A: Discussion
 
................................
................................
................................
................
 
115
 
 
Study 3B: Tweaking Goal Seeking
 
................................
................................
..........................
 
116
 
Model 3B
-
1
 
................................
................................
................................
.............................
 
117
 
Model 3B
-
2
 
................................
................................
................................
.............................
 
117
 
 
Study 3B: Methods, Simulation, and Results
 
................................
................................
.........
 
119
 
Model 3B
-
1
 
................................
................................
................................
.............................
 
119
 
Model 3B
-
2
 
................................
................................
................................
.............................
 
121
 
 
Study 3B: Discussion
................................
................................
................................
.................
 
124
 
 
Study 3C: Engagement Thresholds
 
................................
................................
.........................
 
127
 
Disc
ontinuous Self
-
Efficacy in the LTM
 
................................
................................
................
 
128
 
 
Study 3C: Methods, Simulation, and Results
 
................................
................................
.........
 
130
 
Causal 
Effects of Self
-
Efficacy on Transfer and Performance
 
................................
...............
 
130
 
Effects of Engagement Threshold
 
................................
................................
...........................
 
132
 
 
Study 3C: Discussion
 
................................
................................
................................
................
 
134
 
Theoretical and Research Implications
 
................................
................................
...................
 
134
 
Practical Implications
................................
................................
................................
..............
 
138
 
Conclusion
 
................................
................................
................................
..............................
 
138
 
 
Study 4: Exploring the Full LTM Model
 
................................
................................
................
 
139
 
Experiment 4A:
 
Engagement Thresholds, Value Changes and Implementation Intentions
 
...
 
139
 
Methods
................................
................................
................................
...............................
 
140
 
Results
 
................................
................................
................................
................................
.
 
141
 
Discussion
 
................................
................................
................................
...........................
 
142
 
Experiment 4B: Number of Trainees, Conformity, and Goal Levels
 
................................
.....
 
142
 
Methods
................................
................................
................................
...............................
 
143
 
Results
 
................................
................................
................................
................................
.
 
143
 
Discussion
 
................................
................................
................................
...........................
 
145
 
Experiment 4C: Value Change, Conformity, and Goal Levels
 
................................
..............
 
145
 
Method
s
................................
................................
................................
...............................
 
146
 
Results
 
................................
................................
................................
................................
.
 
147
 
Discussion
 
................................
................................
................................
...........................
 
148
 
Experiment 4D: Type 2 Likelihood, Con
formity, and Goal Levels
 
................................
.......
 
148
 
Methods
................................
................................
................................
...............................
 
149
 
Results
 
................................
................................
................................
................................
.
 
149
 
 
ix
 
Discussion
 
................................
................................
................................
...........................
 
150
 
Overall Discussion
 
................................
................................
................................
..................
 
151
 
 
Overall Discussion
 
................................
................................
................................
.....................
 
152
 
Theoretical Implications 
and Future Research Directions
 
................................
......................
 
157
 
Practical Implications
................................
................................
................................
..............
 
166
 
 
Conclu
sion
 
................................
................................
................................
................................
.
 
170
 
 
APPENDICES
 
................................
................................
................................
............................
 
268
 
Appendix A: Study 1 Environment and Code
 
................................
................................
........
 
269
 
Appendix B: Study 2A Environment and Code
 
................................
................................
......
 
275
 
Appendix C: Stud
y 2B Environment and Code
 
................................
................................
......
 
282
 
Appendix D: Study 2C Environment and Code
 
................................
................................
......
 
289
 
Appendix E: Study 3A Environment and Code
 
................................
................................
......
 
297
 
Appendix F
: Studies 3B
-
1 and 3B
-
2 Environment and Code
 
................................
.................
 
305
 
Appendix G: Study 3C E
nvironment and Code
 
................................
................................
......
 
320
 
 
REFERENCES
 
................................
................................
................................
...........................
 
328
 
 
x
 
LIST OF TABLES
 
 
Table 1. Model 1 Variables
 
................................
................................
................................
.........
 
171
 
 
Table 2. Model 1 Equations.
 
................................
................................
................................
.......
 
172
 
 
Table 3. Overall results for practice effect on behavioral tr
ansfer and performance change in 
Model 1.
 
................................
................................
................................
................................
......
 
173
 
 
Table 4. Experimental comparisons of practice conditions to control for behavioral transfer and 
performance change in Model 1.
 
................................
................................
................................
 
174
 
 
Table 5
. Initial policy value estimate effects on behavioral transfer and performance change in 
Model 1.
 
................................
................................
................................
................................
......
 
175
 
 
Table 6. Implementation level effects on
 
behavioral transfer and performance change in Mode
l 1.
................................
................................
................................
................................
.....................
 
176
 
 
Table 7. Model 2 Variables.
 
................................
................................
................................
........
 
17
7
 
 
Table 8. Model 2 Equations.
 
................................
................................
................................
.......
 
178
 
 
Table 9. Effects of number of trainees on behavioral transfer and pre
-
post performance change in 
Model 2A.
 
................................
................................
................................
................................
...
 
179
 
 
Table 10. Connectedness effec
ts on behavioral transfer and pre
-
post performa
nce change in 
Model 2A.
 
................................
................................
................................
................................
...
 
180
 
 
Table 11. Model 3 Variables.
 
................................
................................
................................
......
 
181
 
 
Table 12. Model 3 Equati
o
ns.
 
................................
................................
................................
.....
 
182
 
 
Table 13. Three
-
way interaction models for Experiment 4A.
 
................................
....................
 
183
 
 
Table 14. Three
-
way interaction models for Experiment 4B.
 
................................
....................
 
184
 
 
Table 15. Three
-
way interaction models for Experiment 4C.
 
................................
....................
 
185
 
 
Table 16. Three
-
way interaction models for Experiment 4D.
 
................................
....................
 
186
 
 
xi
 
LIST OF FIGURE
S
 
 
Figure 1. Conceptual model for initial LTM.
 
................................
................................
.............
 
187
 
Figure 2. Behavioral Transfer for exploration o
f polic
y values in Model 1.
 
..............................
 
188
 

.....
 
189
 
Figure 4.
 
Behavioral Transfer for exploration 
of policy value changes in Model 1.
 
..................
 
190
 

................................
................................
................................
................................
.....................
 
191
 
Figure 6. Behavioral Transfer for exploration of burn
-
in and transfer times in Model 1.
 
..........
 
192
 
Figure 7. Performance change for exploration of burn
-
in and t
ransfer times in Model 1.
 
.........
 
193
 
Figure 8. Predicting behavioral transfer from type 2 processing likelihood in Model 1.
 
...........
 
194
 
Figure 9. 
Pre
dicting performance change from 
type 2 processing likelihood in Model 1.
.........
 
195
 
Figure 10A
-
D. Example transfer trajectories for Mode
l 1.
 
................................
.........................
 
196
 
Figure 11. Exploration rat
e effect on behavioral transfer in Model 1.
 
................................
.......
 
197
 
Figure 12. Exploration rate effect on performance change in Model 1.
 
................................
.....
 
198
 
Figure 13. 
Type 2 likelihood vs implementation intention experimental effect on behavioral 
transfer in Model 1.
 
................................
................................
................................
.....................
 
199
 
Figure 14. Type 2 likelihood vs implementation intention exper
imental effect on performan
ce 
change in Model 1.
 
................................
................................
................................
......................
 
200
 
Figure 15. Type 2 likelihood vs implementation intention experimental effect on behavioral 
transfer in Model 1 heat map.
 
................................
................................
................................
.....
 
201
 
Figure 16. Type 2 likelihood vs implementation intention experimental effect on post training 
performance in Model 1 heat map.
 
................................
................................
.............................
 
202
 
Figure 17. Type 2 li
kelihood vs implementati
on intention experimental effect on performance 
change in Model 1 heat map.
 
................................
................................
................................
......
 
203
 
Figure 18. Proposed conceptual model for LTM with Social Learning.
 
................................
....
 
204
 
Figure 19. Heatmap of interaction effect of number of trainees
 
and connectedness on behavioral 
transfer in Model 2A.
 
................................
................................
................................
..................
 
205
 
 
xii
 
Figure 20. Number of trainees and
 
level of imitation p
redicting 
behavioral transfer in Model 2B 
(replication level).
 
................................
................................
................................
.......................
 
206
 
Figure 21. Number of trainees and level of imitation predicting post trainin
g performance in 
Model 2B (repl
ication level).
 
................................
................................
................................
......
 
207
 
Figure 22. Number of trainees and level of imitation predicting pre
-
post training 
performance in 
Model 2B (condition level).
 
................................
................................
................................
........
 
208
 
Fig
ure 23. Heatmap of trainees and imitation predicting behavioral transfer in 
Model 2B.
 
......
 
209
 
Figure 24. Heatmap of trainees and imitation predicting post training performance in Mo
del 2B.
................................
................................
................................
................................
.....................
 
210
 
Figure 25. Heatmap of trainees and imitation predicting pre
-
post performance change in Model 
2B.
 
................................
................................
................................
................................
...............
 
211
 
Figure 26. Number of 
trainees and le
vel of conformity predicting behavioral transfer in Model 
2C (replication level).
 
................................
................................
................................
.................
 
212
 
Figure 27. Number of trainees and level of conformity predicting post training performance in 
Model 2C (repl
ication level).
 
................................
................................
................................
......
 
213
 
Figure 28. Number of trainees and level of conformity predicting pre
-
post performance change in 
Model 2C (condition level).
 
................................
................................
................................
........
 
214
 
Figure 29. Heat map of number of trainees and level of conformity predicting behavioral transfer 
in Model 2C.
 
................................
................................
................................
...............................
 
215
 
Figure 30. Heat map of number of trainees and level o
f conformit
y predicting post training 
performance in Model 2C.
 
................................
................................
................................
..........
 
216
 
Figure 31. Heat map of number of trainees and level of conformity predicting pre
-
post 
performance change in Model 2C.
 
................................
................................
..............................
 
217
 
Figure 32. Conceptual model for LTM including self
-
regulation.
 
................................
.............
 
218
 
Figure 33. Goal level and exploration rate change predicting post trainin
g perfor
mance in Model 
3A (replication level).
 
................................
................................
................................
.................
 
219
 
Figure 34. Goal level and exploration rate change predicting behavioral transfer in Model 3A 
(replication level).
 
................................
................................
................................
.......................
 
220
 
Figure 35. Goal level and exploration rate change predicting pre
-
post performance change in 
Model 3A (condition level).
 
................................
................................
................................
........
 
221
 
Figure 36. Heat map of goal level an
d exp
loration rate change predicting behavioral transfer in 
Model 3A.
 
................................
................................
................................
................................
...
 
222
 
 
xiii
 
Figure 37. Heat map of goal level and exploration rate change predicting post training 
performance in Model 3A.
 
................................
................................
................................
..........
 
223
 
Figure 38. Heat map of goal level and exploration rate change predicting pre
-
post performance 
change in Model 3A.
 
................................
................................
................................
...................
 
224
 
Figure 39. Observed p
os
t training performance by goal level in Model 3B
-
1.
 
..........................
 
225
 
Figure 40. Observed behavioral transfer by goal level in Model 3B
-
1.
 
................................
.....
 
226
 
Fig
ure 41. Observed pre
-
post 
performance change by goal level in Model 3B
-
1.
 
.....................
 
227
 
Figure 42. Goal level and policy value change predicting behavioral transfer in 
Model 3B
-
1 
(replication level).
 
................................
................................
................................
.......................
 
228
 
Figure 43. Goal level and policy value change predicting post training performance in Model 3B
-
1
 
(replication level).
 
................................
................................
................................
....................
 
229
 
Figure 44.
 
Go
al level and policy value change predicting pre
-
post performance change in Model 
3B
-
1 (condit
ion level).
 
................................
................................
................................
................
 
230
 
Figure 45. Heat map of goal level and policy value change predicting behavioral tr
ansf
er in 
Model 3B
-
1.
 
................................
................................
................................
................................
 
231
 
Figure 46. Heat map of goal level and policy value change predicting post training performance 
in Model 3B
-
1.
 
................................
................................
................................
............................
 
232
 
Figure 47. Heat map of goal level and policy value change predicting pre
-
post performance 
change in Model 3B
-
1.
 
................................
................................
................................
................
 
233
 
Figure 48. Observed post training performance by goal level in Model 3
B
-
2.
 
..........................
 
234
 
Figure 49. Observed behavioral transfer by goal level in Model 3B
-
2.
 
................................
.....
 
235
 
Figure 50. Observed pre
-
post performance change by
 
goal lev
el in Model 3B
-
2.
 
.....................
 
236
 
Figure 51. Goal level and policy value change predicting behavioral transfer in Model 3B
-
2 
(replication level).
 
................................
................................
................................
.......................
 
237
 
Figure 52. Goal level and policy value change predicting post training performance in Model 3B
-
2 (replication level).
 
................................
................................
................................
....................
 
238
 
Figure 53. Goal level and policy value change predicting 
pre
-
post per
formance change in Model 
3B
-
2 (condition level).
 
................................
................................
................................
................
 
239
 
Figure 54. Heat map of goal level and policy value change predicting behavioral transfer in 
Model 3B
-
2.
 
................................
................................
................................
................................
 
240
 
 
xiv
 
Figure 55. Heat map of goal level and policy value change predicting post training performance 
in Model 3B
-
2.
 
................................
................................
................................
............................
 
241
 
Figure 56. Heat map of goal level and p
olicy value cha
nge predicting pre
-
post performance 
change in Model 3B
-
2.
 
................................
................................
................................
................
 
242
 
Figure 57. Observed and predicted behavioral transfer from threshold level in Model 3C.
 
......
 
243
 
Figure 58. Observed and predicted post training performance from threshold level in Model 3C.
................................
................................
................................
................................
.....................
 
244
 
Figure 59. Observed and predicted pre
-
post performance chan
ge from threshold 
level in Model 
3C.
 
................................
................................
................................
................................
...............
 
245
 
Fi
gure 60. Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting behavioral transfer in Experiment 4A (r
eplication level).
 
...................
 
246
 
Figure 61. Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting post training performance in Experiment 4A (replication l
evel).
 
........
 
247
 
Figure 62. Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting pre
-
post training performance change in Experiment
 
4A (condition 
leve
l).
 
................................
................................
................................
................................
..........
 
248
 
Figure 63. Heat map of three
-
way interaction of engagement thresholds, implementation 
intentions, and value change predicting behavioral transfer in Experiment 4A (replication level).
................................
................................
................................
................................
.....................
 
249
 
Figure 64. Heat map of three
-
w
ay interaction of engagement thresholds, implementation 
intentions, and value change predicting post training performance in Experiment 4A (replication 
level).
 
................................
................................
................................
................................
..........
 
250
 
Figure 65. Heat map of three
-
way interaction of engagement thresholds, implementation 
intentions, and value change predicting pre
-
post training performance change in Experiment 4A 
(condition l
evel).
 
................................
................................
................................
.........................
 
251
 
Figure 66. Three
-
way interaction of number of trainees, conformity, and goals predicting 
behavioral transfer in Experiment 4B (replication level).
 
................................
..........................
 
252
 
Figure 67. Three
-
way interaction of number of trainees, conformity, and goals predicting post 
training performance in Experiment 4B (rep
lication level).
 
................................
.......................
 
253
 
Figure 
68. Heat maps of three
-
way 
interaction of number of trainees, conformity, and goals 
predicting 
behavioral transfer in Experiment 4B (replication level).
 
................................
.........
 
254
 
Figure 69. Heat maps of three
-
way int
eraction of number of traine
es, c
onformity, and goals 
predicting post training performance in Experiment 4B (replication level).
 
..............................
 
255
 
 
xv
 
Figure 70. Three
-
way interaction
 
of conformity, goals, and va
lue change predicting behavio
ral 
transfer in Experiment 4C (replication level).
 
................................
................................
............
 
256
 
Figure 71. Three
-
way
 
interaction of conformity, goals, and value change predicting post training 
performance i
n Experiment 4C (replication l
evel).
 
................................
................................
....
 
257
 
Figure 72. Three
-
way interaction of conformity, goals, and value change predicting pre
-
post 
training performance change in Experiment 4C (condition level
).
 
................................
.............
 
258
 
Figure 73. Heat map of three
-
way interaction of conformity, goals, and value change predicting 
behavioral transfer in Experiment 4C (replication level).
 
................................
..........................
 
259
 
F
igure 74. Heat map of three
-
way interaction of conformity, goals, and value change predicting 
post training performance in Experiment 4C (replication level).
 
................................
...............
 
260
 
Figure 75. Heat map of three
-
way 
interaction of conformity, goals, and value change predicting 
pre
-
post training performance change in Experiment 4C (condition level).
 
..............................
 
261
 
Figure 76. Three
-
way 
interaction of type 2 likelihood, c
onformity, and goals predicting 
behavioral transfer in Experiment 4D 
(replication level).
 
................................
..........................
 
262
 
Figure 77. Three
-
way interaction of type 2 likelihood, conformity
, and goals predicting post 
training
 
performance in Experiment 4D (replication level).
 
................................
.......................
 
263
 
Figure 78. Three
-
way interaction of type 2 likelihood, conformity, and
 
goals predicting pre
-
post 
traini
ng performance change in Experiment 4
D (condition level).
 
................................
............
 
264
 
Figure 79. Heat map of three
-
way inter
action type 2 likelihood, conformity, and goals predicting 
behavioral transfer in Experiment 4
D (replication level).
 
................................
..........................
 
265
 
Figure 80. H
eat map of three
-
way interaction of type 2 likelihood, conformity, and goals 
predicting post training performance in Experiment 4D (replication level).
 
..............................
 
266
 
Figure 81. Heat map of three
-
way interaction of type 2 likelihood, conformity, and goals 
predicting pre
-
post training performance change in Experiment 4D (condition level).
 
.............
 
267
 
Figure 82. Snapshot of the modeling environment for Study 1 in NetLogo.
..............................
 
269
 
Figure 83. Snapshot of the modeling environment for Study 2A in NetLogo.
 
...........................
 
275
 
Figure 84. Snaps
hot of the modeling environment for Study 2B in NetLogo.
 
...........................
 
282
 
Figure 85. Snapshot of the modeling environment fo
r Study 2C in NetLogo.
 
...........................
 
289
 
Figure 86. Snapshot of the modeling environment for Model 3A in NetLogo.
..........................
 
297
 
Figure 87. Snapshot of the m
odeling environment for Models 3B
-
1 and 3B
-
2 i
n NetLogo.
 
.....
 
305
 
Figure 88. Snapshot of the modeling environment for Model 3C in NetLogo.
 
..........................
 
320
 
 
xvi
 
LIST OF ALGORITHMS
 
 
Algorithm 1. Value Estimate Calculation
 
................................
................................
.....................
 
40
 
Algorithm 2. Type 1 Process Equation
 
................................
................................
.........................
 
42
 
Algorithm 3. Probability of Choosing Type 
2 Processes
 
................................
..............................
 
42
 
Algorithm 4. Type 1 Process with Implementation Intentions
 
................................
.....................
 
61
 
Algorithm 5. Other Agent Value Estimation
 
................................
................................
................
 
77
 
Algorithm 6. Weighted Value Estimate
 
................................
................................
........................
 
77
 
Algorithm 7. Agent Pe
rformance
 
................................
................................
................................
 
109
 
Algo
rithm 8. Goa
l Discrepancy
 
................................
................................
................................
..
 
109
 
Algorithm 9. Effector Mechanism 1
 
................................
................................
...........................
 
110
 
Algorithm 10. Effector Mechanism 2
 
................................
................................
.........................
 
117
 
Algorithm 
11. Effector Mechanism 3
 
................................
................................
.........................
 
117
 
Algorithm 12. Effector Mechanism 4
 
................................
................................
.........................
 
118
 
Algorithm 13. NetLogo Code for Study 1 Model
 
................................
................................
.......
 
270
 
Alg
orithm 14. NetLogo Code for Study 2A Model
 
................................
................................
....
 
276
 
Algorithm 15. NetLogo Code
 
for Study 2B Model
 
................................
................................
....
 
283
 
Algorithm 16. NetLogo Co
de for Study 2C Model
 
................................
................................
....
 
290
 
Algorith
m 17. NetLogo Code for Model 3A
 
................................
................................
..............
 
298
 
Algorithm 18. NetLogo Code for Model
 
3B
-
1
 
................................
................................
...........
 
306
 
Algorithm 19. NetLogo Code for Model 3
B
-
2
 
................................
................................
...........
 
313
 
Al
gorithm 20. NetLogo Code for Model 3C
 
................................
................................
..............
 
321
 
 
1
 
Introductio
n
 
 
Continuous 
learning
 
is
 
a mantra of organizations, 
often
 
directed
 
towards employees 
to 
emphasize the need for continually improving their knowledge and skills to maintain or increase 
their ability to perform their roles 
and advance their careers 
(
e.g., L
ondon, 2012
). To 
achieve
 
continuous 
learning
, organizations
 
spend e
ver
-
increasing amounts of money on training 
programs, 
averaging
 
more than 
$1,296 per employee
 
in 
2017
 
(
American Society of Training and 
Development, 2018
). Thankfully, organizations benefit
 
from spending on training programs. For 
example, spending 
on train
ing programs aids the development of knowledge, skills, attitudes, 
and other 
characteristics
 
(KSAOs) which feed into the emergence of human capital resources for 
an organization (Ployhart &
 
Moliterno, 2011), 
which
 
in turn helps organizational profi
tability
 
(Kim & Ployhart, 2014). Unfortunately, there 
remains
 
a gap between what is taught in training 
programs and 
what 
gets
 
transferred back to the work environment, 
sometimes
 
referred to as the 
training
-
transfer gap (e.g., 
Vermeulen, 
2002
). Typical stat
ements t
hat only 10
-
percent of trained 
material is transferred to the job are not generally based in fact (Ford, Yelon, & Billington, 
2011), but 
few
 
would argue that no such gap exists. 
Transfer is
 

that results from a trai
ning exp
erience transfers to the job and leads to meaningful changes in 

Blume, Ford, Baldwin, & Huang, 2010
, p. 
1066
). 
Thus, any rate of transfer 
less than 100
-
percent 
theoretica
lly 
results in wasted money on the part of the organization
 
as
 
no 
m
eaningful changes in performance 
occur
.
 
 
What, then, can we do to improve the rates of transfer and reduce the training
-
transfer 
gap? 
Over the last 100 years, r
esearch has taken a multi
-
pro
nged approach to this question, 
seeking to improve training
 
program
s at each of their three stages of pre
-
training, training, and 
post
-
training (
e.g., Jaidev & Chirayath, 2012
), 
which has led to a great deal of knowledge 
 
2
 
regarding the functioning of traini
ng programs (Bell
, Tannenbaum, Noe, & Kraiger, 2017
). That 
knowledg
e has improved programs largely by introducing 
principles to the learning event which 
can 
improve knowledge retention (
e.g., Donovan & Radosevich, 1999
; Dunlosky, Rawson, 
Marsh, Nathan, & W
illingham, 2013
). 
Though 
fewer
 
principles exist wit
hin the post
-
tra
ining 
transfer stage of the training process, several consistent findings have emerged
, including 
the use 
of implementation intentions (e.g., Gollwitzer, 1999), and the perceptions learners
 
hold of the 
utility of their newly gained knowledg
e (e.g., Blume 
e
t al.,
 
2010)
, among others
.
 
 
Unfortunately, 
most
 
studies 
on the transfer of learning 
in organizations 
are essentially 
correlational 
and
/or
 
cross
-
sectional in nature
. 
S
tudies of training int
erventions typically measure 
learning and individua
l difference var
iables 
at the end of training
, and then measure the transfer 
of learning at a single time point in the future. These tendencies can be seen in the types of 
studies available for 
meta
-
analys
es of transfer (e.g., Blume 
et al.,
 
2010; Blume, pe
rsonal 
communica
tion). 
With 
scientific emphasis
 
on understanding causal mechanisms, it is tempting to 
interpret findings with time lags as causal in nature, but temporal precedence is only one 
precondition
 
for establishing causality. 
Unfortunately
, the eff
ects of 
variable
s in transfer 
environments 
are
 
often hard to isolate, especially in real organizations where random assignment 
is often difficult or impossible to achieve
, though such isolation is possible
 
(
see Hanges & Wang, 
2012, for 
a 
discussion of 
c
ausal 
m
odels
). Thus
, even though we are interested in the mechanisms 
which 
explain
 
transfer, we are generally only studying 
correlates 
of transfer, and as we know, 
correlation does not equal causation
, or alt
ernatively prediction does not eq
ual explanation 
(M
u
thukrishna & He
nrich, 2019)
. 
To 
better understand
 
the causal mechanisms that lead to 
transfer we must advance our understanding of transfer as it occurs over 
time and
 
seek to 
discover the dynamic process 
which
 
gives rise to what we curre
ntly 
observe
 
as transfer.
 
The 
 
3
 
stud
y of such dynamic relationships is gaining increasing interest in our field (
e.g., 
DeShon, 
2012), 
and the study of person
-
level processes within the training context has been described as a
 
frontier for research in this 
ar
ena
 
(Salas & Kozlowski, 2010), tra
nsfer
-
specific
 
research 
must 
follow
.
 
A
 
dynamic process 
here refers to
 
the interactions of lower levels of analysis that give rise 
to a 
higher
-
level
 
observed variable in a process of emergen
ce (e.g., Grand, Braun, Kuljanin,
 
Kozlowski, & Chao, 2016; Kozlowsk
i & Klein, 2000), those levels 
in this paper
 
being the 
cognitive processes of an individual and their output behaviors.
 
Other researchers
 
are interested 
in studying dynamic processes 
of tra
nsfer 
and are attempting to unpack 
them
. This can be seen in 
the in
crease of longitudinal designs studying transfer (Baldwin, Ford, & Blume, 2009), and
, for 
example,
 
the use of within
-
person analyses to understand the interplay of motivational changes 
over
 
time 
with changes in transfer (Huang, 
Ford, & Ryan, 201
7
). However
, most studies that 
claim to be interested in such dynamics 
do not really study dynamic 
relationships
, and
 
are 
largely 
restricted to 
cross
-
sectional designs or 
versions of growth modeling 
w
ith fe
w time points 
(e.g., 
Ford, Bhatia, & 
Yelon
, 
in press
;
 
Gist, S
tevens & Baveta, 1991; Cheng, 2016; Dierdorff & 
Surface, 2008; Zerres, Huffmeier, Freund, Backhaus, & Hertel, 2013
)
. 
E
ven when longitudinal 
designs are utilized
,
 
the mere study of change ov
er tim
e does not constitute the study of 
explanatory 
dynamic proces
ses
 
because they rely on time as predictors and time 
is not
 
explanatory
 
(Di
s
hop, Olenick, & DeShon, 
in press
; Ployhart & Vandenberg, 2010
).
 
T
raining
-
transfer 
studies
 
are motivated to show t
hat ch
ange does occur, and thus that the program of 
interest is suc
cessfully affecting outcomes of interest. However, explaining change when it 
occurs is only half the battle
 
and 
any process model should also be able to demonstrate when 
 
4
 
change will not occ
ur, as
 
a lack of change is still likely to be driven by a dynamic p
rocess and a 
lack of change does not imply the lack of dynamic processes (Dishop 
et al.,
 
in press
).
 
Moves to better understand the dynamic process of transfer 
are
 
being made
, but 
only 
a 
few
 
known model
s
 
of transfer couch transfer in an iterative way such t
hat it unfolds via repeated 
attempts that can dynamically 
a
ffect future attempts. Existing models 
of
 
the training process 
that 
do consider time
 
generally 
treat
 
transfer as an outcome that d
oes not 
explicitly 
feed into future 
attempts (e.g., 
Baldwin, Magjuk
a, & Loher, 1991; Bell & Kozlowski, 2009; 
Cannon
-
Bowers
, 
Salas, Tannenbaum, & Mathieu,
 
1995a
; Cheng 
& 
Hampson
, 2008; Colquitt
, 
Lepine, & Noe
, 
200
0
; Thayer & Teachout, 1995
)
, though some con
sider transfer as an input to future training 
cycles (e.g., 
Salas, 
Weaver, & Shuffler, 
2012
; Goldstein, 198
6
)
. 
A m
ore dynamic view of 
transfer can be found in Chen, Th
omas, and Wallace (2005) who proposed a multi
-
level model 
of training outcomes which disc
usses the episodic nature of the post
-
training environment. 
Additio
nally, 
Blume, Ford, Surface and Olenick (
2019
) introduced the Dynamic Transfer Model 
(DTM) 
which describes how trainees decide what to retain from their learning experience
 
and 
apply their 
new KSAOs to their work environment in an iterative way.
 
Their
 
mode
l is already 
impacting
 
emerging research on how trainees transfer their news KSAOs over tim
e (e.g., Vignoli 
& Depolo, 2019). 
However, the DTM has multiple weaknesses. One of these weaknesse
s is that 
the model is a 
verbal
 
model instead of a mathematical for
mal model. Although such models are 
good for describing processes, they generally lack spec
ificity and can struggle to make testable 
hypotheses (
e.g., 
Vancouver, 
2008
). 
A second weakness of
 
the DTM is that it relies heavily on 
self
-
regulation (e.g., Carver
 
& Sheier, 1998), and person
-
situation interactionism (Hattrup & 
Jackson, 1996). Though the
se are good bases from which to begin building a dynamic theory of 
transfer, they leave out much 
o
f what we know about human cognition and social effects, at 
 
5
 
least, 
and therefore do not tell the whole story. Thus, further work is required to improve our 
th
eorizing regarding transfer as its own dynamic process.
 
 
To address existing gaps
, this paper is g
uided by two research questions. First, what is the 
learning and de
cision
-
making process through which individuals go when attempting to transfer 
new
 
knowledg
e to an 
old
 
situation? Second, can a single, relatively simple, formal model of the 
dynamic transf
er process account for our current findings in the transfer literat
ure?
 
 
In addressing
 
these two questions the present 
paper
 
makes four key contributions. Fir
st, a 
process
-
oriented theory of learning transfer is introduced by building a formal mathematical
 
model of that process
, called the Learning Transfer Model 
(
LTM). T
he LTM will
 
begin to build
 
a unifying theory of transfer in the workplace, partially answer
ing calls for psychological science 
to move towards more unifying theories to improve the explanat
ion of human behavior 
(Muthukrishna & Henrich, 2019). 
Second, 
the 
L
TM
 
further 
integrates several disparate but 
related theories, specifically: reinforcement learning (e.g., Sutton & Barto, 2018), Social 
Learning Theory (Bandura, 1977), 
and 
Control Theory (
e.g., Carver & Sheier, 
1
998)
 
within a
 
dual process cognitive model
 
framework
 
(
e.g., Kahneman, 2011)
. 
Third, 
computational 
approaches to reinforcement learning and dual process models will be more fully brought into 
the organizational literature. 
Finally, t
hat integrative formal 
theory
 
is instantiated in a 
computational mo
del
,
 
allow
i
ng
 
for virtual experimentation to explore the effects of the theory, 
the formation of
 
testable predictions which may later be 
evaluated
 
against real
-
world data
, and 
providing pot
entially novel insights into the transfer process from which to bui
ld future p
ractical 
interventions.
 
 
6
 
Review of Transfer Literature
 
 
Prior to building a process theory of transfer, we must take stock of the current transfer 
literature. 
This review will o
ccur in two parts. The first 
is 
an overview meant to give a feel fo
r 
where the field stands, particularly 
regarding
 
our knowledge of tr
ansfer as a 
process and
 
is not 
meant to be exhaustive
. The primary points to be made are, first, that 
despite calls for v
iewing 
transfer as a process (e.g., Foxon, 1997),
 
transfer has 
larg
ely 
been treated as an outcome or a 
product. Second, because of the transfer
-
as
-
outcome view, transfer is typically measured at one 
or very few time points, which 
largely 
forgoes the abilit
y to study transfer as 
a
 
process, with few 
exceptions. Third, the n
atur
e of existing research is largely correlational and cross
-
sectional, 
resulting in a field of inquiry 
which
 
can be characterized as a set of potentially useful but 
unrelated empirical fi
ndings. Fourth, the 
existing 
longitudinal 
transfer 
research does no
t 
ge
nerally examine dynamic processes, even if the
 
authors state they are interested in 
them
. Fifth, 
emerging theory on the training and transfer process is moving in the right direction to
 
unpack 
within
-
person transfer 
processes but
 
has far to go.
 
T
he dis
cuss
ion will 
then 
mov
e
 
towards 
introducing
 
a transfer process theory by more specifically 
describing
 
some key concepts and 
findings which are critical to consider in the early stages of the
ory development.
 
 
B
efore diving into the review, we must define som
e te
rms. Broadly, this paper is 
interested in the learning process experienced by employees. Learning has been given many 
definitions 
(Salas, Weaver, & Shuffler, 2012)
. For example, learnin
g can be viewed as a 
permanent change in the range of possible beha
vior
s 
for 
an organism (
Huber, 1991
). More 
specifically it 
is
 

or
ganizationally
 
directed
 
learning 
experience
 
aimed at introducing ne
w kn
owledge, skills, attitudes, or other characteristics 
 
7
 
(KSAOs) which expand the range of possible behaviors an employee may 
exhibit
 
on the job. 
Training experiences are typically divided 
into three phases: pre
-
training, training, and post
-
training, or va
riat
ions thereof (e.g., Beier & Kanfer, 2010). The present paper is focused 
specifically 
on the processes within 
the
 
post
-
training phase. 
However, not all learning by 
employees is organizat
ionally directed. Instead, employees learn much about how to accomp
lish
 
their jobs and navigate their work environments when they are 
engaging
 
with the relevant tasks 
and environment through informal learning processes (
Tannenbaum, Beard, McNall, & Salas, 
2010
). This paper is concerned with all learning events, formal or 
info
rmal, so the terms learners 
and learning will be 
used interchangeably with trainees and training in this paper and the model 
is suggested to apply to the transfer of learning from eithe
r formal or informal learning events
. 
 
 
The primary outcome of 
inte
rest in the post
-
learning
 
phase is the transfer of trained 
materials back to the work environment. According to Baldwin and Ford (1988), transfer 
consists of generalization and maintenance.
 
Generalization is 
taking that which was gained 
through trainin
g an
d applying it to more or less similar situations as experienced in training once 
back on the job. Maintenance is the continued application of those new KSAOs to the job over 
time. 
The goal 
of this paper is to unpack
 
how
, not merely 
whether
,
 
learners 
tr
ansf
er 
new KSAOs 
to their work environment.
 
To begin studying the 
how 
of transfer, we will begin by assuming 
learners exit their learning experience with the ability to generalize that learning
 
to their work, 
and now must find a way to actually commit to t
hat 
transfer and maintain it over time. Therefore, 
the present paper is focused more directly on the maintenance portion of transfer than on the 
precondition of being able to generalize knowled
ge at all.
 
 
Historically, transfer has been treated as an outco
me, 
or a product, instead of
 
something 
that unfolds over time
 
driven by a process
. Foxon 
(1997) 
noted this tendency 
and its effects on 
 
8
 
our understanding of transfer
, writing 

asuring transfer as a one
-
dimensional 

product,

 
rather than as
sess
ing it in process terms, may have led practitioners to underestimate 

43). Unfortunately, ca
ll
s
 
for more process
-
oriented 
transfer 
research 
have
 
been large
ly u
nheeded 
until recently.
 
Theoretical
 
models of the training process
 
still
 
almost universally treat transfer as 
a single outcome. 
Part of this problem may be traced to the way researchers hav
e traditionally 
treated the organizing model utilized by Baldwi
n an
d Ford in their classic (1988) review. In their 
model, training inputs lead simply to transfer, mediated by training outputs. 
However, instead of 
using the model 
to
 
organize a disparate lit
erature, researchers in part treated it as something to be 
test
ed. 
Although much useful knowledge arose from research inspired by Baldwin and Ford 
(1988), unfortunately the path that research took may have limited progress on the understanding 
of certain a
spects of the training process
 
by 
largely 
ignoring transfer ove
r ti
me
. 
 
 
The treatment of transfer as a product or outcome is t
he key limitation in understanding 
transfer 
specifically 
as a process
. Interpreting transfer as an outcome or product led to
 
the 
tendency to collect transfer measures 
at
 
only one
 
or a very limited
 
number of
 
time points. The 
typical study on transfer effects measure
s
 
covariates of interest before, during, or at the end of a 
training event, or create
s
 
their experimental manipulat
ions 
during training, and measure transfer 
at some later point in time. 
This tendency can be seen in the studies available for meta
-
analyses 
focused on transfer effects (e.g., Blume 
et al.,
 
2010; Blume, personal communication)
, even 
though transfer include
s bot
h generalization and maintenance (Baldwin & Ford, 1988). 
Generaliza
tion could be considered as something 
which 
either happens or does not
 
and 
therefore 
be measured at a single time
, but maintenance implies the continuance of transfer over time and 
thu
s req
uires
 
multiple 
measure
ments
 
to study it. Baldwin, Ford, and Blume (
2009) in their 
 
9
 
updated review 
found
 
the number of time points examined in the transfer environment had 
improved, but that the number was still limited. The lack of 
stronger
 
longitudina
l des
igns, where 
all measures of interest are measured at multiple time 
points, limits the analyses and knowledge 
we may gain. For example, single measurements are inadequate for cross
-
lagged designs that can 
start to unpack dynamic process
-
like relationsh
ips u
nderlying phenomen
a
 
(e.g., 
Kenny, 2005
). 
Thus, transfer research is
 
not 
examining
 
dynamic processes, even when 
researchers
 
are interested 
in them. A recent example is a measurement piece by Ford, Bhatia and Yelon (2019) which 
reports a multidimensiona
l mea
sure of transfer as use. The authors state they are interested in t
he 
dynamics of transfer, but their data collection is a single time point for each participant and 
conduct
 
no dynamic analyses.
 
 
Transfer research 
also
 
lacks a general guiding theory.
 
The l
ack of 
a 
guiding theory 
or 
framework 
has resulted in a large set of
 
potentially useful
 
but largely unrelated empirical 
findings. 
Existing
 
models of the training process are not scientific theories in that they 
do not 
posit universal mechanisms underlying t
he process, especially in a way that can be applied 
directly to tra
nsfe
r
. Instead, existing models are 
generally 
tools for 
organizing
 
the
 
vast set of 
empirical findings in a coherent way
;
 
they do not tie those findings together into a unified whole. 
This c
an be seen in any of a number of
 
reviews
 
which 
have expanded overti
me to include more 
detail because the extent of empirical findings has
 
also 
greatly expanded over the past three 
decades, but the essential structure remains highly similar (e.g., 
Baldwin &
 
Ford, 1988; 
Salas 
et 
al.,
 
2012). 
Within single studies
,
 
on the oth
er hand
,
 
theory can be found to guide hypothesizing.
 
For example, as r
eviewed by Beier and Kanfer (2010), common theories of motivation utilized to 
study transfer include goal choice and se
lf
-
efficacy from self
-
regulation (Bandura, 1977), 
expectancy
 
theory
 
(
a.k.a. Valence, Instrumentality, Expectancy (VIE) Theory; 
Vroom, 196
4), 
 
10
 
individual differences such as 
the 
Big Five personality
 
traits
 
(
McCrae & Costa, 1987
), and 
transfer of training cli
mates (Rouiller & Goldstein, 1993).
 
The predictions made based on t
hese 
theories independently show positive effects on transfer outcomes
 
(e.g., Blume 
et al.,
 
2010), but 
are not integrated into any comprehensive whole, leaving the empirical findings scatte
red and in 
need of some underlying scientific framework 
to
 
unify th
em. Such work also 
answers calls 
for 
scientific frameworks to enhance 
the rigor of psychological science (Muthukrishna & Henrich, 
2019). 
 
 
As mentioned by Baldwin et al
.
 
(2009), the number 
of studies examining multiple time 
points has improved over the yea
rs. In many ways, these studies are 
applications
 
of more typical 
learn
ing 
or performance 
studies which examine learning curves on a task of interest. 
A few 
examples will suffice. Gist, Stev
ens, and Baveta (1991) 
tested post
-
training interventions to 
improv
e maintenance and transfer, finding that pre
-
training self
-
efficacy 
relates
 
to both initial 
and delayed performance on a target test. They also found that the effect of efficacy on 
maintena
nce was moderated by th
e type of training the learner received. 
Van
couver and Kendall 
(2008) made the important point that relationships may differ when examined at the within
-
 
instead of between
-
person
 
level, when they showed efficacy can be negatively re
lated to 
performance an
d motivation in some learning contexts at th
e within
-
person level. Th
eir
 
finding 
opposes the common view that efficacy and performance are positively related, but which is 
typically studied at the between
-
person level. 
Dierdorff and 
Surface (2008) showed t
hat skill
-
based pay was related to skill mai
ntenance over a 
seven
-
year
 
period with multiple measurement 
points. 
F
inally, Scholz, Nagy, Schuz, and Ziegelmann (2008) 
studied 30 initially untrained 
runners for a year, taking 11 measurem
ents of their running t
endencies as they prepared for a 
marathon. A
t the between
-
person level they 
found trend in efficacy predicted trend in the amount 
 
11
 
of running, and that fluctuations in efficacy predicted fluctuations in running. At the within
-
person l
evel, controlling for b
etween
-
person trends, amount of running was 
predicted by efficacy 
and intentions
,
 
among other variables.
 
 
Recent
 
studies 
aim
 
to unpack within
-
person effects 
specifically on 
training transfer. The 
best examples 
may be
 
those of Huang a
nd colleagues. Huang, B
lume, Ford, and Baldwin (2015) 
showed in a m
eta
-
analysis that maximal and typical transfer are 
weakly 
related, and that 
predictors of the two forms of transfer 
differ
. Specifically, maximal transfer was better predicted 
by abilities
,
 
while 
motivational mea
sures were better predictors of typical tran
sfer. 
The
ir
 
findings 
suggest more research is necessary to unpack why 
those
 
factors differentially predict aspects of 
transfer. Huang, Ford, and Ryan (2017) then studied within
-
person varia
bility in transfer in a
 
multi
-
wave design. They 
showed
 
that initial
 
attempts to transfer were best predicted by post
-
training self
-
efficacy and that motivation to transfer better predicted rates of change in transfer. 
Unfortunately
,
 
the basis for this stud
y relies on growth modeling so is not 
truly
 
dynamics 
(DiShop 
et al.
,
 
in press
), but it represents a significant step forward conceptually in 
understanding the within
-
person nature of transfer.
 
 
However, a
ll is not 
lost,
 
and 
research
ers are 
making
 
theoretic
al advances regarding the 
process underlying the 
transfer of learni
ng. 
R
epeated calls are being made to study the training 
process, including transfer, from a multi
-
level perspective. Such arguments center on not only 
the need to be
tter understand higher l
evel organizational effects on training and transfer, but also 
to c
onsider the within
-
person nature of the training and transfer processes (e.g., Mathieu & 
Tesluk, 2010; Sitzman & Weinhardt, 2019
)
. Such calls
 
have in part manifested
 
in micro
-
level 
research
, such as on 
preventing
 
knowledge and skill decay (Cascio, 2019). T
hese advances in 
part emphasize that transfer is 
an episodic process and theory aimed at unpacking that process is 
 
12
 
emerging. 
Blume, Ford, Surface and Olenick (2019) 
described transfer as a 
self
-
regulat
ory
 
process where learners proceed through episodes of 
deciding to retain or discard new KSA
O
s in 
favor of their existing repertoire, attempt to apply those KSA
O
s, receive feedback on their 
attempts, and reiterate the de
cision process. 
That
 
pro
cess interacts with organizational factors to 
determine how it unfo
lds over time. Surface
 
and 
Olenick (forthcoming)
,
 
are developing a 
mechanistic model of transfer 
seeking
 
to unpack the cognitive processes underlying the general 
pro
cess described by Blume 
et al
.
 
(2019). This new model (Surface 
& Olenick,
 
forthcoming) 
desc
ribes how the transfer process relies on cognitive processes 
and the overriding of automatic 
responses to transfer new, non
-
automatic KSA
O
s, and how the individual d
evelops in this 
process 
over time. Both theories make substantial strides in describing tra
nsfer as a process. 
However, they remain limited by their informal linguistic nature. Further work is required to 
build on these models to enhance formalization, inc
rease prediction precisi
on, and falsifiability.
 
 
All
 
these advances are
 
important and have 
provided a wealth of useful information. 
However, much work remains to provide a process
-
oriented
 
explanation for when and why 
learners transfer to their work environments. 
T
his paper argue
s that advancement may be made 
b
y
 
reconceptuali
zing transfer as ano
ther learning process, rather than something theoretically 
removed from the processes which drive a learning event. By reframing transfer as learning, we 
can draw on existing process
-
orient
ed theories of learning to provide a strong fou
ndation from 
which t
o begin
, including
 
both informal natural language theories, and more formal 
mathematical and computational approaches. For example, Tannenbaum et al
.
 
(2010) described a 
dynamic model of inf
ormal learning on the job where employees learn
 
over time through a
n 
iterative process of intent, experience, feedback and reflection
, 
which
 
is
 
affected by 
organizational and individual factors. 
More formal conceptualizations of learning through 
 
13
 
experience
 
can be found, such as reinforcement learning (
e.g., Sutton & Barto
, 2018). Some of 
the basic mechanisms, such as experience, of these theories and others will be evident in 
the 
model explicated below. However, the primary point is 
that transfer theory may
 
be advanced by 
approaching transfer as the pro
cess by which indivi
duals learn if a new KSAO is a good fit for 
their job
.
 
The model presented here will be called the Learning Transfer Model (LTM) for the 
double meaning of transferring learning to a target 
job environment, and individuals going 
through 
what amounts to a pr
ocess of learning to transfer their new KSAO to the target 
environment, or not.
 
This conceptualization emphasizes the individualized nature of the transfer 
process where learners eventual t
ransfer outcomes are largely a function of the 
ability of their 
tra
ining to fit the needs of their job, and them learning through experience the fit between their 
training and their needs.
 
Computational Modeling and the Modeling Cycle
 
 
Before beginning
, it
 
is important to set expectations regarding the
 
approach to theory 
building undertaken in this paper, and 
discuss the 
implications that approach has for the theory 
outlined below. The present paper takes a computational approach to theory building. 
Computa
tional modeling is a useful tool for building n
ew process
-
oriented
 
theory for multiple 


logic, 
especially as it evolves over time (e.g., Vancouve
r, 2008; Vancouver
 
& Weinhardt, 2015
). 
Second, computational modeling allows for the e
xploration of 
the theory

in a low
-
risk environment. Third, those virtual experiments allow for better understanding of phenomena 
of interest, and can, but do not have to, lea
d to novel insigh

theorizing o
r unguided data collections
 
(e.g., Miller & Page, 2012)
,
 
t
hough such insights can 
then be tested using targeted data collections on real subjects 
(e.g., 
Vancouver, Weinhardt, & 
 
14
 
Vigo, 2012
)
.
 
Importantly, a f
ormal theory can also provide specific point estim
ates for effect 
sizes one would expect to observe in the real world. Although mak
ing
 
such specific predictions 
is not the historical norm in psychology
,
 
doing so is a stronger form of scien
ce where we can 
s
upport or refute an underlying theory by assessing
 
the fit of observed effects to predicted ones 
using Bayesian inference (Dienes, 2019).
 
More generally, when one makes the mechanisms of a 
theory explicit as is required by computational mo
deling you can be
 
absolutely sure of what has 
led to the outcomes o
f the model in a way not typically achieved in traditional theory building. 
That is, when we collect empirical data in our field, we often propose hypotheses regarding the 
direction of rela
tionships between
 
constructs of interest which we believe follow fr
om the logic 
of some underlying theory we are drawing upon. For example, we might predict that self
-
efficacy and performance are positively related while drawing on Social Cognitive Theory 
(Bandura, 1977) t
o discuss why we should expect such a relationship
. However, when we only 
measure self
-
efficacy and performance and find the predicted relationship we have not actually 
tested the underlying mechanisms driving that relationship, such as ef
fort
 
(e.g., Vanco
uver & 
Kendall, 2008) and therefore cannot be cert
ain our underlying theory is the actual explanation for 
that relationship, we can only be sure that the relationship is consistent with our expectations. 
 
However, when you use a computatio
nal model of the 
type used in the present research you can 
be sure 
that the mechanisms you specify led to the relationships between any higher
-
level 
emergent properties you may be interested in because they are the only mechanisms in play. 
Finally
, a compu
tational approach
 
to theory building allows for an iterative proces
s 
w
hereby a 
relatively simple form of a theory can be built, explored, then expanded over time as necessary to 
account for phenomena of interest.
 
Researchers have argued that this approach 
is the direction 
in 
 
15
 
which our field should be evolving (e.g., Kozlo
wski & Chao, 2012), and may be of particular use 
for studying the training process (Salas & Kozlowski, 2010).
 
 
The
 
iterative approach to theory building and modeling was described by Railsb
ack and 
Grimm (20
12) as the Modeling Cycle. 
The 
Modeling Cycle is c
omposed of six total steps: 1) 
formulate the question, 2) assemble hypotheses, 3) choose model structure, 4) implement the 
model, 5) analyze the model, and 6) communicate the model. The 
pro
cess
 
is iterative
 
in that step 
five feeds back to step one, except 
when the author decides the time has come for 
communicati
on
. Over time, the theory and associated model is developed and explored, 
becoming increasingly sophisticated or more representative
 
of the phenomeno
n of interest.
 
B
y 
starting 

simple

, this paper ac
knowledges that the resulting theory will not be a perfect picture 
of the transfer process, but that is not the intent. Rather, 
this
 
model
 
provides a starting point for 
future development w
hile hopefully pr
oviding useful insights into the transfer process
.
 
This 
approach
 
stays true to the principle
s
 
of 
theoretical 
parsimony 
as
 
outlined by Box (1976, p. 792) 
that 

s
ince 

ation. On the 
contrary following William of 
Occam,
 
he should seek a
n economical description of natural 


, that a theory 

 
T
ransfer Findings
 
for Which to Account
 
 
Along with his admonition fo


models are wrong the scientist must be 
alert to what is importantly wrong

In this section, 
potentially important
 
concepts and 
fin
dings will be discussed with reasoning for why or why not they need to be included in initial 
steps in building a t
ransfer process theory. 
T
he discussions here are not meant to be in
-
depth 
 
16
 
reviews of each topic. 
T
he goal is to define the concept and genera
l findings, based in meta
-
analytic evidence where possible. This approach is deliberate in considering the initial 
stages of 
development for the present theory
. An overemphasis on examining
 
nuance can inhibit the 
development of sound theories of human beha
vior because it stands in the way of the abstraction 
on which good theory depends (e.g., Healy, 2017). Within psych
ology, researchers are 
incentivized to focus on theoretical contributions 
in their work (e.g., Olenick, Walker, Bradburn, 
& DeShon, 2017) whi
ch for most studies means extending an existing theory by examining a 
new application or moderation of that theory.
 
However, with no incentive to replicate findings 
the supposed nuance gain
ed by such studies can long go unchallenged and cloud the 
developme
nt of a core theory to unify those findings. 
Further, relying on single studies to build 
informal theory is treache
rous at best because interpretations and conclusions from single studies 
c
an differ greatly depending on who does the analysis and interpreta
tion (Starns et al., 2019). 
Thus,
 
it is imperative that 
a potentially unifying 
theory account for general findings 
before 
exploration of more nuanced findings
 
which may be misleading
.
 
To th
is end, the meta
-
analytic 
effects discussed here are not to be trea
ted as precise targets for replication in the models 
explored in this paper. Instead, the meta
-
analytic effects are
 
general guides for the patterns of 
relationships expected from the LTM as
 
there are limitations to the use of meta
-
analyses as exact 
targets
 
such as variability in the contexts in which their underlying studies were conducted, 
measures used, and theoretic
al underpinnings
 
among other between
-
study differences that get 
aggregated
 
across when estimated meta
-
analytic effects
. The model presented i
n this paper is 
meant to be a general theory of training transfer and should therefore represent the general 
findin
gs of applicable meta
-
analyses, but the exact point estimates from those m
eta
-
analyses may 
be overly restrictive targets for a theory in the 
initial stages of development as is the LTM, and 
 
17
 
future work should look to refine the LTM to better target precise
 
effects in their applicable 
research contexts.
 
Practice and Overlearning
 
 
The effects of practice on important 
training 
outcomes is well est
ablished.
 
Practice on 
tasks 
is related to important performance outcomes as individuals tend to improve over time with 
exposure to said task. For example, Hausknecht, Di Paolo, and Moriarti
 
Gerrard (2007) found 
that test scores show an increase upon retest
ing with a meta
-
analytic eff
ect of .26. Such practice 
effects are critical when considering personal outcomes, such as employment decisions (e.g., 
Olenick, Bhatia, and Ryan, 2016). 
Similarl
y, practice is critical within learning contexts for 
improving impo
rtant outcomes
 
and
 
is consid
ered one of the best strategies for improving 
learning and retention (e.g., 
Dunloski, Rawson, Marsh, Nathan, & Willingham, 2013
). 
Within 
the transfer environment
, practice of skills is 
also 
essential for maintenance
, where Arthu
r, 
Bennett, Stanush, and McN
elly (1998) found that skills deteriorated significantly over time
 
without use 
in a meta
-
analysis.
 
 
Related
ly
, researchers have explored the use of overlearning 
as a design feature of 
training. Overlearning is essentially the us
e of extreme levels of pract
ice to develop automaticity 
before the learner leaves the learning event. 
The development of automaticity
 
is a key outcome 
in training and the development of exp
ertise (e.g., Erics
s
on, 2006
; Goldstein & Ford, 200
2
). 
Meta
-
analyti
c investigation of the effec
ts of overlearning on retention show an uncorrected 
relationship between overlearning and 
retention of .298 (Driskell, Willis, & Copper, 1992). 
Thus, 
it is impor
tant for the transfer process theory to account for improvement in 
transfer when 
practice 
and
 
o
verlearning 
are
 
part of the training design before the learner even enters the 
transfer environment. 
 
 
18
 
Utility 
Reactions
 
 
tion 
of
 
the 
usefulness of their learning experience 
(e.g., Ruona, L
eimbach, Holton, & 
Bates, 2002)
, typically collected via
 
an affective reaction 
measure at the end of a training session. 
R
esearchers predict that when adult learners see new 
information as 
useful to them, they are more likely to utilize that information in
 
the future. 
This 
p
rediction fits with training principles 
regarding the need to improve trainee motivation
 
to learn 
or transfer by connecting the material to personal outcomes 
(e.g., 
Bauer
, Orvis, Ely, & Surface, 
201
6
). 
Interestingly, relatively few studi
es actually examine
 
utility reactions despite their 
demonstrated strength in predicting transfer outcomes. In Blume et al
.

-
analysis, 
only nine studies were found that met thei
r inclusion parameters, but those studies demonstrated 
a corrected 
relationship with t
ransfer of .46, making utility reactions one of the strongest overall 
predictors of transfer
 
and important to account for in the present model
.
 
Work Environment
 
 
Work env
ironmental factors have long been considered an important 
driver of
 
transfer, 
often re
ferred to as transfer climate. Transfer climate includes aspects of supervisor and peer 
support, opportunity to use, supervisor sanctions, positive and negative personal 
outcomes, and 
resistance to change (Nijman 
et al.,
 
2006: Rouiller &
 
Goldstein, 1993; H
olton et al
.
, 1997; 
Holton et al
.
, 2000). 
This paper f
ocuses on
 
supervisor and peer support and opportunity to use. 
Supervisor and peer support 
are
 
important antecedents 
of training success (e.g., 
B
aldwin & Ford, 
1988). These two factors
 
are 
part of social
 
support, which is an ability to draw on emotional and 
task resources of others (
Steele
-
Johnson, Narayan, Delgado, & Cole, 2010
). Social support has 
important effects on 
stress and well
-
being of individuals, with perceptions of support b
eing 
potentially mo
re important than actual support (e.g., Kessler, 1992). 
The importance of support
 
 
19
 
for the transfer of training
 
has been confirmed via meta
-
analysis with Blume et al
.
 
(201
0) finding 
a corrected relationship of .21 between support and tran
sfer. 
Several 
studi
es on the effects of 
supervisor support
, specifically,
 
are interested in exploring the mechanisms through which 
support operates
 
to affect training outcomes
. 
For example,
 
Nijman et al
.
 
(2006) found that 
support affects transfer through p
erceptions of trans
fer climate and motivation to transfer. 
However, 
most
 
studies in this area are cross
-
sectional in nature. 
Even
 
Foxon (1997)
, who argued 
for examining transfer as a proces
s, examin
es supervisor support but 
collected measures
 
at a 
single t
ime point in the transfer environment. Similarly, Nijman et al
.
 
(2006) develop a process 
model of 
transfer but
 
are limited to a small sample and a cross
-
sectional design. 
Thus, it is 
import
ant to consider the effects of support for transfer o
f a new KSAO, 
but 
the development of 
support effects over time need further examination.
 
 
The situations in which learners find themselves attempting to apply their new KSAOs 
also 
impact
 
transfer. One im
portant way situations differ is the degree to which 
they are weak 
or 
strong. Situations are strong to the extent they provide clear context clues on the appropriate 
courses of action to take (Meyer, Dalal, & Hermida, 2010). 
Strong situations dictate the a
ctions 
that must be taken while weak ones allow more 
room for indiv
idual differences to influence how 
to proceed, and thus affect related outcomes. 
For example, Judge and Zapata (2015) showed that 
the effects of personality traits on performance were highe
r in weak contexts than in strong 
contexts. In transf
er environment
s situation strength manifest
s
 
in various ways, such as if the 
received training is the organizationally required way to carry out a task transfer would be more 
likely. 
Or
, in the relations
hip between supervisor and trainee, closer and less
-
a
utonomous 
supe
rvision should create a stronger situation and lead the trainee to transfer their new KSAOs in 
a way more consistent with the desires of their supervisor (e.g., Yelon & Ford, 1999).
 
 
20
 
 
All 
su
ch
 
higher
-
level factors fit with calls for multi
-
level investigatio
ns of training and 
transfer effects (e.g., Mathieu & Tesluk, 2010; Sitzman & Weinhardt, 2019). Multi
-
level theory 
(Kozlowski & Klein, 2000) emphasizes the nested nature of phenomena in orga
ni
zational 
psychology. Namely, measurements across time are nested 
within individuals, individuals within 
teams, teams in organizations, and so on. 
N
esting has implications for both how we study 
phenomena, and how phenomena are likely to manifest. It has b
ee
n argued that research should 
examine target phenomen
a
 
from a bra
cketed perspective, including effects of both one level 
above and one level below the target phenomenon (
Hackman, 2003
). 
In the present study this 
includes explication of an individual proc
es
s which occurs over time, 
and
 
higher
-
level effects on 
that proces
s imposed by such concepts as situation, opportunity, and climate
.
 
Overall, 
environmental effects including
 
transfer climate, support, as well as constraints or opportunities 
for us
e
 
have a
 
m
eta
-
analytic relationship of 
.22 with transfer (Blume et al., 201
0).
 
Implementation intentions
 
 
Psychologists in several areas of inquiry have 
studied
 
the potential for 
implementation 
intentions 
to
 
reduce the intention
-
behavior gap (e.g., Schniehotta, Sh
olz & Schwarzer, 2005). 
I
mplementation intentions 
link
 
situational 

-

that when situation X arises the person will respond by doing Y (Gollwitzer, 
1999), 
and
 
have 
been shown to have a substantial 
meta
-
ana
lytic 
effect on goal attainment 
(
Gollwitzer & Sheeran, 
2006)
. They 
also
 

, Webb,
 
& Gollwitzer, 2005).
 
Weiber, Thurmer and Gollwit
zer 
recently 
(2015) 
describe
d
 
the
 
mechanisms 
underlying t
he functioning of implementation intentions. Implementation intenti

-

form
 
a strong relationship between mental representations of the theoretical goal
-
relevant situation and
 
the goal
-
directed action
, 
delega
ting
 
action control to a
 
lower order 
 
21
 
cognitive process, changing the normal top
-
down proces
sing approach 
of
 
goal attainment into a 
more automatic and efficient 
bottom
-
up process.
 
 
Health psychologists have utilized implemen
tation intentions to improve the effects of 
patient educa
tion programs. 
For example, Harris and colleagues showed that both 
implementation intentions and self
-
affirmation increased fruit and vegetable consumption at 
seven
-
day and four
-
month follow
-
ups
 
(Har
ris 
et al.,
 
2014)
. K
en
dzierski
, Ritter, Stump, and 
Anglin
 
(2015) showed the moderating effect of self
-
schema
s
 
on implementat
ion intentions. In 
two studies they showed that implementation intentions increased healthy eating habits among 
individuals who alre
ady held 
a 
self
-
schema of being healthy eaters
, meaning
 
i
mplementation 
intentions work better for individuals who already se
e themselves as approximating the end goal.
 
A
 
systematic review recently suggested the effect of implementation 
intentions
 
has a sma
ll but 
reliable effect on healthy eating behaviors (Turto
n, Bruidegom, Cardi, Hirsch, & Treasure, 
2016).
 
Thus, 
although not 
ubiquitous in organizational training studies, implementation 
intentions show important effects and should be 
accounted for 
in a model of transfer.
 
Maintenance Curves
 
 
M
aintenance is one of
 
the two primary aspects of transfer outlined by Baldw
in and Ford 
(
1988) and Baldwin, Ford, and Blume (2009). Baldwin and Ford (1988) describe possible 
trajectories a learner may take in displaying transfer 
which are labeled maintenance curves
. 
These poten
tial trajectories 
range from
 
initial lack of transfer 
with later in
creases in transfer 
rates
, 
to
 
initially high levels of transfer that decrease over time. 
Such t
rajectories can be studied 
using growth modeling techniques, as accomplished in the study by D
ierdorff and Surface (2008) 
on the effects of skill
-
ba
sed pay on ma
intenance. Unfortunately, because the study of 
maintenance curves requires several waves of data collection, they are rarely studied in primary 
 
22
 
research. A transfer process model should be 
able to explain why an individual may take any one 
of 
the potential
 
general transfer trajectories. One advantage of 
using
 
computational model
ing to 
explore the present theory
 
lies in the ability to explore such curves in an environment that does 
not necess
itate large scale data collections.
 
Self
-
efficacy
 
 
Sel
f
-
efficacy is
 
the belief of an individual in their ability to execute desired behaviors in 
the pursuit of some outcome (Bandura, 1977). Efficacy is a central variable in self
-
regulation 
theory which wil
l be more thoroughly introduced below. Importantly, 
ac
cording to Ba
ndura
,
 
efficacy is the primary way 
in
 
which individuals show agency in affecting their personal 
environments
. Within the learning context
,
 
efficacy drives outcomes through the amount of 
eff
ort the individual is willing to place into the task i
n question (e
.g., Vancouver & Kendall, 
2006). In examining the effect of efficacy on transfer, it is common to collect feelings of efficacy 
at the end of a training event to predict future use. Across s
tudies efficacy has been shown to be a 
moderate predic
tor of transf
er (Blume et al., 2010). Given the centrality of efficacy to the key 
theory of self
-
regulation and the demonstrated effect of efficacy on transfer, efficacy is another 
variable which holds 
importance for the 
LTM
.
 
Skill type
 
 
A potentially crit
ical aspect t
o consider is the nature of the skill targeted for transfer. 
A 
typical delineation between skill types is
 
open versus closed. Closed skills have a relatively 
strictly defined way in which t
hey may be applied, for example there may be only one 
way to 
succes
sfully operate a machine. Open skills are those over which the trainee has more discretion 
regarding how they are applied to their job, for example how to handle an interpersonal 
interactio
n (e.g., Yelon & Ford, 1999). 
Similarly
, Laker (2011) 
introduce
d
 
so
ft and hard skills. 
 
23
 
Hard skills are technical skills or those that define how to do a given task. Soft skills are those 
that have a more inter or intrapersonal focus. These categories are 
l
ike
 
open and closed skills but 
are argued to 
go furthe
r in differen
tiating the skill types in question. The type of skill studied 
may have important implications for transfer on its 
own and
 
affect 
parts of the 
LTM. For 
examp
le, Laker (2011) argues that sof
t skills are less likely to transfer
 
because 
the trainee is 
more li
kely to have prior experience that needs to be overcome, 
and
 
that feedback is more 
difficult to receive accurately. In addition, the level of support for transfer may matter more for 
open a
nd soft skills than closed and hard. For exam
ple, Salas, Milham, an
d Bowers (2003) 
argued that as the military moves towards more open
-
skills training programs a more supportive 
environment 
would
 
be required to enhance transfer 
as
 
trainees would have great
er discretion 
over the implementation of thei
r new skills. Yelon an
d Ford (1999) further discuss the interplay 
between closed versus open skills and the level of autonomy a trainee has from their supervisor 
in determining transfer outcomes. 
 
 
However, it i
s important to begin building explanatory the
ories as simple as pos
sible and 
later iterations may build in complexity. For that reason, the 
initial LTM
 
will be more directly 
applicable to hard or closed skills because they are more straightforward.
 
T
his do
es not mean the 
proposed theory is inapplicab
le to more open
-
type s
kills as the underlying process driving 
transfer is likely the same and future investigations will be required to unpack any nuance 
required to account for differences in transfer outcomes 
between the various skill types.
 
Near versus 
Far
 
Transfer
, Adaptive
 
Transfer
 
and Adaptive Performance
 
 
Near and far represent a
 
key distinction in describing the nature of the transfer task
. 
Near 
transfer is when tasks in the transfer environment 
closely r
esemble
 
those on which the 
learner
 
received i
nstruction
, allowing 
m
ore direct application of what was learned to the 
transfer
 
 
24
 
environment. Far transfer is when the task in the transfer environment is different in some larger 
degree from the task on which t
he 
learner
 
received instruction
, requiring
 
gr
eater adaptation on 
th
e
ir
 
part (Beier & Kanfer, 2010).
 
The type of transfer
 
has potentially differential effects on other 
important 
variables. For example, it was originally demonstrated that self
-
efficacy was 
o
nly 
related to transfer 
when near transfer wa
s required (e.g., Math
ieu, Tannenbaum, & Salas, 1992; 
Martocchio, 1992). However, it was later 
shown
 
that self
-
efficacy is important in determining far 
transfer as well
, though potentially to a different degree
 
(e.g., Kozlowski 
et al.,
 
2001).
 
Related to 
fa
r transfer is adaptive
 
transfer. Adaptive transfer occurs when knowledge from training is 
applied to a task which is not identical to that which was 
trained but
 
is instead a
n
 
adaptation of 
that task. Adaptive tr
ansfer can also involve the generation of nov
el approaches to probl
em 
solving (e.g., Beier & Kanfer, 2010; Smith, Ford, & Kozlowski, 1997).
 
 
More broadly, Baard, Rench, and Kozlowski (2014) reviewed research on adaptation and 
adaptive performance, which ar
e related to generalization. Based on their review, the field of 
ad
aptive performance is largely unorganized, characterized by multiple appro
aches which are not 
in agreement with one another. To provide some structure, the authors 
introduce
 
a taxonomy of 
p
erformance adaptation. The most relevant category they define for 
t
he 
present
 
purposes is that 
of domain
-
specificity, which 
is
 
based in train
ing and skill development. 
They write that 
 

key 
assumption 
of this approach is that specific capabilities underl
ying 
performance adaptation can be learned and that their applicati
on is specific to a 
knowledge and skill domain rather than general across 
a range of work situations. 
The primary 
target
 
for this work is to develop knowledge, skills, and capabilities 
via 
training or other developmental experiences that can increase perfo
rmance in 
 
25
 
a task context that shifts in novelty, difficulty, and/or comple

; 
emphasis in original
)
. 
 
Further, within adaptation research, decision
-
making and learning are importa
nt topics of study, 
which are primary foci of the 
LTM
. Examples of 
research in this domain include decision
-
making tasks (e.g., TANDEM), and 
how individuals adapt their decision making in changing 
situations
 
which drives adaptive performance
.
 
 
However, ada
ptation is 
more concerned with applying existing knowledge to new a
nd 
changing situations, not with applying new knowledge to old situations,
 
which is more the 
domain of transfer. 
This is a
 
close but important distinction.
 
The adaptation of existing 
knowle
dge
 
is important and interesting
, but a large portion of actions un
dertaken by typical 
employees are relatively routine, even in complex jobs
 
(Susskind & Susskind, 201
7
)
. F
urther, 
estimates based on experience samples are that 45
-
percent of behaviors are r
epeated in the same 
location every day (
Neal, Wood, & Quinn, 2006; 
Wood, Quinn, & Kashy, 2002). 
 
 
This paper most directly concern
s
 
situation
s where the encountered situation is stable 
enough
 
that the same 
general
 
approach to the task may be applied
, thus 
avoiding the 
complications of adaptation, skill type, near or far t
ransfer, etc., for the time being. 
This is 
directly applicable to types of
 
jobs that are very consistent in their nature but is also in line with 
the idea that teaching 
principles
 
they can 
apply to a broad range of situations is beneficial
. 
The 
argument is
 
made that the same basic process of learning about the potential uses of a newly 
trained KSAO will be applicable to both situations. However, it is agreed that this process is 
complicated 
by attempts to apply training to more adaptive tasks
. Thus, the 
ini
tial LTM
 
should 
be interpreted as directly applicable to transfer tasks which are broadly definable as near transfer, 
but with potential insights for the processes underlying far transfer a
s well.
 
 
26
 
Study 1: 
Base 
Learning Transfer
 
Model
 
To inv
estigate the no
ted gaps in the transfer literature, the remainder of this paper will be 
dedicated to introducing and exploring a formal model of the transfer process
 
called the Learning 
Transfer Model (LT
M)
. 
The complete model will be described
 
and tested 
in 
multiple
 
ite
rations
 
drawing on existing work in fields 
other 
than organizational psychology to form the basis of the 
proposed transfer process. 
The first model is primarily based on theories of 
D
ual 
P
r
ocess 
C
ogniti
on
(e.g., Kahneman, 2011)
 
and
 
reinforcem
ent learning (e
.g., Sutton & Barto, 2018)
, and 
informed by work on habit formation and change (e.g., Neal, Wood, & Quinn, 2006).
 
Dual Process Models and Habits
 
 
I argue that a primary shortcoming in the e
xisting training and transfer literature for the 
study of transfer 
as a process is a lack of basis in established cognitive theory. One particularly 
underutilized framework, not just in the training literature but across Organizational Psychology 
more broa
dly, is that of Dual Process Cognition. By drawin
g on existing Dual
 
Process Theories 
we can provide an overarching framework from which to explain how learners may process their 
transfer situations and make decisions regarding how to respond. Once establis
hed, we can 
discuss how other important theories 
may further explic
ate key mechanisms withing the dual 
processing framework.
 
One thorough and accessible explanation of dual process theory comes from Nobel 
Laureate Daniel Kahneman (2011), though other versi
ons exist (e.g., Pennycook, Fugelsang, & 
Koehler,
 
2015; Bago & De N
eys, 2017). Kahneman (2011) explains that humans have two 
separate information processing and decision
-
making systems. The first system, conveniently 
labeled System 1, is characterized by f
ast, automatic information processing which requi
res little 
effort 
and makes decisions based on heuristics learned over time
 
which
 
tend to result in an 
 
27
 
acceptable level of success, whatever success may be. Automatic decisions allow humans to 
carry out most
 
of their daily information processing and decisi
on making without 
becoming 
cognitively overloaded, but these decisions also tend to be biased and suboptimal. On the other 
hand, System 2 is an effortful processing system which moves slower and requires con
scious 
cognitive effort. System 2 tends to make m
ore nuanced decisi
ons but may lead to the same 
conclusion
 
which would be
 
made by System 1. Kahneman (2011) also argues that humans are 
lazy cognitive process
o
rs and will default to the use of their System 1 
processing whenever 

 
to cognition and 
decision making has the added benefit of 
arising from behavioral economics, which tends to be more formal in its theorizing and replicates 
more frequently than traditional psychological res
earch. It has been suggested that behavioral 
econ
omics and dual pro
cessing theories show promise for the building of unifying, but 
falsifiable, psychological theory (Muthukrishna & Henrich, 2019
; Popper, 1959
).
 
Criticisms of dual processing theories have b
een levied by many researchers. Evans and 
Stanovich (2013) outlined
 
and responded to the five most common criticisms. Those criticisms 
include 1) dual process theorists have offered multiple and vague definition
s of those processes, 
2) proposed attribute c
lusters are not reliably aligned, 3) the existence of a continuum o
f 
processing styles and not discrete types, 4) single
-
process accounts may be offered for dual
-
process phenomena, and 5) evidence for dual proce
ssing is ambiguous or unconvincing. Evans 
and
 
Stanovich (2013) respond to each of these in turn, but generally s
uch criticisms are levied 
against dual process theories 
en masse
 
instead of against single theories, ignoring specific 
developments within dual 
process theories. 
Their
 
points include that c
haracterizing cognitive 
processing as strictly dichotomous is overs
implified and processing should be viewed as more 
varied, with some processes being more automatic and others less so. Such a view overcomes the
 
 
28
 
continuing charge of unreliable alignment of
 
attribute clusters (Melnikoff & Bargh, 2018a, 
Melnikoff & Bargh, 2
018b; Pennycook, De Neys, Evans, Stanovich, & Thompson, 2018). Evans 
and Stanovich (2013) further outlined that a dual process conceptual approa
ch better fits the data 
patterns of cognition
 
than any other explanation, such as a single process model, and th
at it is 
largely nuances within the field of dual processing itself that remain to be fleshed out rather than 
disregarding the framework as a wh
ole. The view of dual processing as the essen
tial framework 
for cognition becomes stronger when organizing it fr
om a default
-
interventionist perspective. 
The default
-
interventionist perspective views processing as being essentially automatic in nature 
for 
most instances, where we generate automatic r
esponses and it is then up to the more 
deliberate processes to inte
rvene or not. Finally, clarity may be brought by referring to these two 
processes as type 1 and type 2, which is meant to overcome the shortcomi
ngs of using the system 
terminology that give
s the false impression that there are two clearly identifiable proc
essing 
systems. 
 
The current paper cannot clarify the nature of dual processes. Instead, this paper argues 
that the dual process framework, thou
gh imperfect, is a useful dichotomization for
 
forming 
parsimonious explanations for meso
-
level processes which a
re driven by underlying cognitive 
systems. The dichotomy used here will refer to type 1 and type 2, with type 1 processes being 
generally more a
utomatic and unconscious and type 2 being gen
erally more deliberate and 
conscious, though it is understood this 
is not necessarily a perfect characterization. In addition, 
this paper adopts the view of Evans and Stanovich (2013) that the two processing typ
es occur in 
a default
-
interventionist, sequen
tial, fashion. Approaching dual processing from this general 
perspe
ctive will provide a framework from which to approach the transfer process, representing 
an imperfect but significant step forward in understand
ing that process.
 
 
29
 
As previously stated, this 
paper is fundamentally about learning, and r
esearchers have 
previou
sly described dual process models of knowledge and learning. For example, Dienes and 
Perner (1999) distinguished between implicit and explicit k
nowledge. Implicit knowledge 
largely, but not
 
exclusively, being that which is automatic, unconscious, nonverbal
ized, and 
declarative. Implicit knowledge underlies explicit knowledge as knowing something explicitly 
implies you know the information underlyi
ng it but knowing something implicitly does n
ot 
necessitate being able to make it explicit. Sun, Slusarz, and Te
rry (2005) built on the distinction 
between implicit and explicit knowledge by explicating the CLARION model of learning which 
includes both imp
licit, bottom
-
up, and explicit, top
-
down, for
ms of learning in skill acquisition. 
In implicit learning individua
ls gain knowledge through direct experience, which a more 
unconscious form of learning and may not lead to knowledge which the individual can di
rectly 
articulate. Such knowledge occurs in t
he development of learning patterns in complex 
recognition tasks, o
r in the learning of grammatical rules in real or made
-
up languages. Explicit 
knowledge acquisition can be delivered directly from the outside e
nvironment, such as being told 
the decision r
ules required for a given task. Over time, implicit knowledge can w
ork its way up 
to become explicit where the learner can refine rules in a more conscious way. This exemplifies 
the split between more unconsciou
s type 1 and more conscious type 2 processing
 
in a learning 
environment. However, their model is directly concer
ned with skill acquisition, so is more 
directly applicable to the training event itself in an organizational training process, and not to the 
pr
ocess of transferring that skill.
 
The automat
ic nature of type 1 processing is of major importance for the prese
nt paper. 
Successful training interventions have long attempted to develop a degree of automaticity in 
skills that are being targeted. Intervent
ions have been able to develop automaticity p
articularly 
 
30
 
through overlearning approaches (e.g., Arthur, Bennett,
 
Stanush, & McNelly, 1998), which 
effectively have the learner repeat a process until they are engrained to the point of an automated 
response. 
However, what we do not appreciate enough is 
that when we introduce a new KSAO to 
an employee there is likely so
me existing KSAO that the new one must override which, at the 
very least, has a head start on the development of automaticity. We do study exper
tise 
development where the essential process 
is the breaking of old automatic processes and replacing 
them with 
better processes (e.g., Ericsson, 2006). However, this process is covered at a very high 
level and does not reach the granularity of deciding to
 
apply some new given approach over the 
old. 
In addition, the expertise literature is tangential to the training
 
and transfer literature. Within 
the more traditional training literature we appreciate that adult learners come to their learning 
with a person
al history (Knowles, 1984), and that this aff
ects their outcomes from the learning 
event. This essential process
 
of overcoming an existing automatic behavior to implement a new 
one is a central focus of the LTM.
 
 
Focusing on overcoming existing automatic b
ehaviors fits with a broader trend in 
psychol
ogy: the re
-
emergence of interest in habits and habit change. Habit
s are conceptualized in 
many ways in the literature, but can be categorized as tics, neural networks, conditioned 
responses, everyday activities
, routines, customs or rituals, character, or
 
habitus (Clark, Sanders, 
Carlson, Blanche, & Jackson, 2007). Of th
ese, the most important forms of habits for the present 
discussion are 1) conditioned responses 

 
actions learned through reinforcement and 
cond
itioning, 2) everyday activities 

 
things we 
do every day with little or no conscious thought, 
and 3) routines 

 
more complex than single activities, involving sequences and combinations to 
create order. A second typology of habits by Southerton (2013) def
ines habits as either 1) 
 
31
 
dispositions, 2) pro
cedures, or 3) sequences. Dispositions are the most important here,
 
which are 
propensities to act in a particular manner when suitable circumstances arise. 
 
Whether one takes the more macro, dispositional, or mo
re micro response approach to 
habits, they fi
t well with the dual process model. If one takes the broader
, dispo
sitional,
 

any instantiation of an actio
n could be driven by either type 1 or type 2 
processes. This broader 
conceptualization of habits works because e
ven an effortful process may arrive at the same 
conclusion as 
an
 
automatic process. Thus, a habitual reaction in any given situation could be du
e 
to an automatic reaction, or due to a more 

On the other 
hand, viewing habits specifically as behaviors which are in some way automatic 
firmly places habits as the outputs of type 1 processes. From thi
s view, any habitual behavior in 
an organizat
ion is that which an employee may default to through automatic proc
esses. It may 
seem that few work behaviors would fall under such habitual responses with the increasing 
complexity of the work world, but Susski
nd and Susskind (2017) argue most work acts, 
even by 
individuals in relatively complex jobs, are fairly repetiti
ve and mundane, making them ripe for 
habituation. Further, the development of automaticity is essentially the development of habitual 
responses.
 
It is that developed habitual response I arg
ue we must overcome which we do not 
always account for explicitly i
n training research, and especially in considering if newly trained 
KSAOs will transfer back to the work environment.
 
 
Difficulties in overcomin
g existing automatic responses of adult learn
ers is evident in 
some topics of study within the training literatu
re. A primary example can be seen in attempts to 
train for implicit (automatic) racial attitudes to reduce racially biased attitudes and connect
ed 
behaviors, and this point is worth some ex
ploration as it has implications for 
the 
LTM. 
 
32
 
Greenwald and Banaji 
(1995) explained that much of social behavior is driven by implicit or 
unconscious processes which allow individuals to take the correct actions
 
in social situations 
without effortful proce
ssing. However, it makes changing social behavior difficult because
 
many 
decisions in those situations occur outside of direct cognitive control. Relatedly, Wilson, Lindsey 
and Schooler (2000) proposed a dual mo
del of attitudes specifying the relationship 
between 
implicit and explicit attitudes held towards an object or g
roup. Specifically, individuals hold both 
implicit and explicit attitudes that do not necessarily agree with each other. Wilson and 
colleagues a
rgue implicit attitudes are the product of lo
ng
-
term learning processes and are 
usually rooted in childhood expe
riences. Explicit attitudes may agree but are more susceptible to 
learning in adulthood. Which attitude determines behavioral outcomes is driven
 
by the dual 
processes of cognition such that
 
the implicitly learned attitude will drive behavior unless the 
ind
ividual is 
given the opportunity
 
and resources to call on their explicit attitudes. This model 
also explains why the average correlation between
 
implicit and explicit measures of attitudes 
tends to be low (e.g., Brauer, Wasel, & Niedenthal, 2000).
 
 
The dua
l nature of attitudes and cognitive processes pose problems when we attempt to 

des are automatic judgements 
learned over lon
g periods of time makes them habitual, suggesting there are deeply 
ingrained 
cognitive processes and structures which must be altered or overcome to cause lasting change. 
Changing existing habits is possible but
 
difficult, and the longer one uses a given K
SAO 
successfully the harder it will be to change it. In the case of
 
implicit attitudes, such as racial 
attitudes, an employee is likely coming to the learning event with decades of experience using 
that attitude
. Organizations then attempt to affect such a
ttitudes through diversity training, which 
often lasts four hours o
r less (Kalinoski, Steele
-
Johnson, Peyton, Leas, Steinke, & Bowling, 
 
33
 
2013). Thus, it should be no surprise that training initiatives to change r
acial attitudes generally 
fail to cause lasti
ng change in explicit and implicit attitudes, as well as the outcom
e behaviors to 
which those attitudes lead (Lai, Hoffman, & Nosek, 2013; Lai et al., 2016). In most cases, at 
least pertaining to racial attitude
s, individuals are not exposed to a strong en
ough shock to 
fundamentally alter their beginning set point and fai
l to maintain the hoped
-
for change over time 
and merely return to baseline tendency, in this case habit, after some period (Olenick et al., 
in 
p
ress
; Baldwin & Ford, 1988). A similar, thoug
h potentially less extreme, effect likely occurs for 
many KSAOs, an
d the LTM can account for such an effect.
 
Reinforcement Learning
 
 
As mentioned above, one way to view habits is as the product of 
the 
reinforcem
ent of 
actions 
through their past successful 
application
. This framing makes Reinforcement Learning a 
natural pl
ace to look for an existing learning theory to explain learning mechanisms within the 
LTM. R
einforcement 
L
earning has been thoroughly researched
 
by both psychologists and 
computer scientist
s
 
and
 
is both informative regarding how individuals 
learn and
 
has t
he benefit 
of the level of formality and thoroughness required to form overarching theoretical frameworks 
(Muthukrishna & Henrich, 2019).
 
 
Psych
ological study of reinforcement learning date
s to at least the studies of Ivan Pavlov 
(1927), and what 
is 
now kn
own
 

pairing of a stimulus and a reward could result in the later excitation of a re
sponse which had 
previously not been associat
ed with the stimulus. For example, the initial presentation of a be
ll 
does not cause a dog to salivate. However, if over time food is presented in tandem with the bell, 
the dog will begin to salivate with the ri
nging of the bell
 
alone
.
 
More formally, an in
itial 
unconditioned response (salivation) is normally paired with a
 
natural trigger (unconditioned 
 
34
 
stimulus), but later can become a predictable response (conditioned response) to an unnatural 
trigger (condition
ed stimulus). Over time, the dog comes to exp
ect food at the ringing of a bell 
because of previous experience.
 
F
ormalized 
versions
 
of classical conditioning exist, such as the 
Rescorla
-
Wagner model (Wagner, 2008) which proposes, in part, that 
weighting of stimuli and 
response connections are updated 
when animals are surprised by outcomes (e.g., Kamin, 1969).
 
 
Anothe
r
 
model of reinforcement learning can be found in the operant conditioning 
approach of Skinner (1938, 1963), or instrumental condit
ioning in the language of Thorndike 
(
1898
). Both study beh
avior
-
contingent reinforcement, and the subsequent effects of that 
reinforcement on future behaviors. 
C
lassic experiments by Thorndike include the use of puzzle 
boxes in which cats were placed and r
equired to escape. The cats could escape such as by 
pushin
g a lever or pulling a string. Initially the cats would struggle an
d often only escape
d
 
by a 
chance solving of the puzzle, but their ability to escape increased as they gained more practice at 
perfo
rming the required action and were reinforced by being abl
e to 
escape
 
their confinement.
 
 
Computer science drew inspiration f
rom the original research on animal learning 
completed by psychologists 
to
 
development reinforcement learning algorithms (Sutton & 
Barto, 
2018).
 
T
he essential function of a learning agent i
n a reinforcement problem is to identify the 
best behavioral strate
gy, labeled a policy, to apply in a given situation to maximize the reward it 
receives from its environment. As an agent encounters
 
its environment, it applies some 
p
olicy 
a
vailable to it, 
and receives rewards based on the success of that policy. Over time
, the agent 
estimate
s
 
the expected value of that 
p
olicy 
a
nd can compare 
the
 
expected value
s of multiple 
policies. Over time the age
nt applies more and more valuable policies to its task and
 
improves its 
performance. Through this iterative action, feedback,
 
and learning process agents can develop 
novel and powerful solutions to complex problems which are often more efficient and comple
x 
 
35
 
than those which human
s develop
 
on their own. 
Examples i
nclude
 
robots navigating an 
environment (
Sutton & Barto, 2018
), 
and
 
games as varied as 
checkers (e.g., Samuel, 1967), 
Jeopardy! 
(Tesauro, Lechner, Fan, & Prager, 2013), 
and backgammon (
e.g., Tesauro
, 2002
)
.
 
A
lgorithms 
of varying complexity 
for reinforcemen
t learning exist depending on the type 
of learning problem 
(Sutton 
& Barto, 2018). Regardless of the complexity of the chosen 
algorithm s
ome of the
ir
 
essential features can be directly tied to the types of psychological 
conditioning described previously. F
or example, one of the laws of learning discovered by 
Thorndike (18
98) was the 
Law of Effect
, which states that behaviors which produce satisfying 
ou
tcomes are more likely to occur again when presented with the same situation, and those 
which produce unsat
isfying outcomes are less likely to occur again in that situation. 
Sutton and 
Barto 
(2018) 
connect reinforcement algorithms to the 
Law of Effect
, wri
ting
:
 
 
F
irst, reinforcement learning algorithms are 
selectional
, 
meaning that 
they try alternatives and se
lect among them by comparing their 
consequences. Second, reinforcem
ent learning algorithms are 
associative
, 
meaning that the alternatives found by se
lection are associated with 

des
cribed by the Law of Effect, reinforcement learning Is not just the
 
process of 
finding 
actions that produce a lot of reward, but also of 
connecting 
t

(p. 358
-
359, emphasis in 
original).
 
Although computer science applica
tion of reinforcement learning is designed for agents in 
idealized 
environments, their algorithms are useful for understanding and modeling animal 
 
36
 
le
arning in psychology (Sutton & Barto, 2018)
, and may hold the key for understanding transfer 
as a learning 
process
.
 
The Learning Transfer Model
 
 
Using the background of
 
dual 
process theory
 
and
 
reinforcement learning
,
 
I propose the 
Learning Transfer Model (LTM) as a process theory which may account for common effects 
observed in transfer research. The LTM propos
es that learners exit training with a new KSAO
 
for
 
which they must 
learn 
if
 
it is a better fit for their work
 
tasks than their previously used 
KSAOs. 
Once in the transfer environment, learners encounter relevant tasks and must choose 
which 
of their availab
le KSAOs to apply. Based on dual processing
,
 
the learner will have 
an 
initial automatic response based on how
 
habitual that KSAO is at that time. 
Once this initial 
automatic response occurs, it may be intervened upon by more deliberate decision processes i
f 
the learner 
can
 
engage in such processes. However, even in cases 
where more deliberate 
processing is possib
le
,
 
the learner may still apply their old KSAO instead of their new one. Over 
time
,
 
the learner gains experience which will inform their future tra
nsfer decisions, and
,
 
with 
many applications
,
 
develop their new KSA
O into a new automatic response.
 
The basic
 
outline of 
the LTM can be found in Figure 1.
 
 
This description represents the general form of the LTM, but t
o build strong theory for 
testing and 
future development, a key point of this paper is to 
develop a forma
l model. The rest 
of this section will be 
dedicated to explicating that formal model.
 
 
The 
backbone of the 
formal 
LTM
 
is based on the algorithms of k
-
armed bandit problems, 
and unless other
wise noted all information presented in the following discussion is
 
based on 

uctory text to reinforcement learning. In k
-
armed bandit 
problems, a learning agent, synonymous with an individual transferring knowledge in the cu
rrent 
 
37
 
theory, attempts to choose the optimal solution from a number
 
(k) of pre
-
defined behavioral 
options. Th
at choice is made through estimating the long run value of each available policy 
through an iterated sampling and feedback process. K
-
armed bandits
 
have four 
important
 
components. First, each behavioral option avai
lable to the agent is called a policy. 
In 
the LTM
, 
each agent 
has access to
 
two policies representing their pre
-
training 
KSAO
 
(
Policy A
) 
relevant 
to 
the theoretical work situation targeted 
by the training intervention, and the organizationally
-
introduced 
K
SAO relevant
 
to that situation (
Policy B
).
 
The assumption that only two policies are 
of interest for transfer questions makes the approach used here a 2
-
armed bandit problem. 
Second, each p
olicy 
has
 
a reward function, or true value, which dictates the dist
ribution of 
rewards the agent receives whe
n the agent chooses to apply that policy. Third, estimate
s 
of 
the 
value
 
of the policy which represents the predicted reward of the 
Policy A
ccording
 

experiences applying that policy. Thus
, 
the agent i
s estimating the reward of each 
policy a
nd 
attempting to discover the best policy to apply at each time step. Fourth, the agent does not 
always exploit the policy which it currently deems t
he most valuable, and sometimes explores 
other potential
 
policies i
nstead. The inclusion of a minor amount of exploration terms such 
methods as 
E
-
greedy
, where the agent greedily exploits the current most valued 
p
olicy 
b
ut 
explores with some rate of error.
 
 
S
everal important aspects to this approach to reinforc
ement learn
ing
 
are 
worth 
mentioning. First, agents learn based 
on
 
the evaluation of actual actions they take, not from 
instruction by outside entities. 
This is one point which separates the current mo
del from the 
CLARION (Sun et al., 2005) model previously
 
discussed.
 
Second, learning in such agents is 
limited to a single, unchanging situation. That is, the value of each policy is fixed because the 
environment to which they are applicable is unchanging.
 
Sutton and Barto (2018) describe such 
 
38
 
approaches as non
-
associativ
e, where the agent does not need to choose which policy to use in 
different situations. There are more sophisticated reinforcement learning approaches that can be 
applied to changing situat
ions, but these are more complex than necessary 
at this 
stage of 
de
veloping the 
LTM but
 
could be utilized in the future to study adaptive transfer
. For now, we 
will assume the 
transfer 
situation is stable enough to apply their newly learned policy. Third, 
the 
k
-
armed bandit approach assumes that the goal of the
 
agent is t
o maximize the long
-
term value 
of their actions. Fourth, events in bandit problems are episodic as opposed to continuous. Finally, 
the reward received by the agent at each episode is random
ly chosen from a stationary 
distribution of the rewards 
associated 
with that policy.
 
 
The application of k
-
armed bandits to humans in transfer environments requires at least 
three other assumptions to be made. First, individuals
/agents
 
exit their learning 
experience with 
the ability to apply the targeted KSAO r
epresented 
in their new policy.
 
This assumption suggests 
that this model is currently more applicable to maintenance than generalization within the 
transfer space.
 
Second, the learner will not alter t
he given policy 
to
 
fit their own needs once that 
policy 
is created.
 
Third, the agent must possess perfect recall of their experiences 
when
 
attempting to apply the available policies in order to accurately calculate the expected value of 
that policy.
 
 
Given
 
this background, we can fully describe the formal 
LTM
 
a
nd outline 
how a 
computational instantiation of that model would operate. Agents
, 
synonymous with
 
learners
 
from 
here forward
,
 
are presented with an abstract task at each time point. For 
our
 
purposes,
 
the task 
does not actually matter and will remain undefi
ned, other 
than that the task 
is such that the agent 
can be successful or unsuccessful only
. The probability of success on any given attempt is 
defined by the policy which the agent chooses in that at
tempt 
and is 
equal to the true value of 
 
39
 
that policy. For
 
example, i
f a given policy has a true value of .8
0
, the agent will have an 80
-
percent chance of succeeding on the task when applying that policy. In the computational version 
of the model, success on
 
an
 
attempt will be determined by a random draw from a u
niform 
dist
ribution from 0 to 1, with any number below the true value of the 
p
olicy 
b
eing considered 
successful. If successful, the agent is rewarded with 1 point, otherwise 
it 
receives 0. In this way
, 
the mean of a large enough sample of rewards received by the agen
t will approximate the true 
value of the policy. The true value of 
P
olicie
s
 
A
 
and
 
B 
will be represented by the variables R
a
 
and R
b
 
respectively. The random component here adds a crucial sto
chastic element to the model 
(e.g., Railsback & Grimm, 2012) making
 
the model non
-
deterministic, and 
Monte Carlo 
simulation important for exp
loration
.
 
This stochastic component represents the idea that any 

sentially a random draw from all possible 
attempts of that task.
 
 
A
s an agent attempts its task it must estimate the value of its policies. E
stimated values 
for a 
p
olicy 
a
re a dynamic process by which the estimation of the value at any time t + 1 is a 
func
tion of the value estimate at time t, the difference between the ex
pected value and reward on 
a given application of the policy, and a step
-
s
ize parameter that defines the rate of learning for 
the agent. The essential framework for reinforcement algorithms
 
follows this framework of 
 
NewEstimate <
-
 
OldEstimate + StepSize[T
arget 

 
OldEstimate]
 
where Target is the reward at a given time step (Sutt
on & Barto, 2018). In k
-
armed bandits, those 
value estimates can be obtained through action
-
value methods, which us
e the experience of the 
agent to drive the estimation. A simple cal
culation of the value estimate is to average the 
received rewards up to th
at point in time, thus:
 
Q
t
(
a
) = (sum of rewards when 
a
 
taken prior to 
t
)/(number of times 
a 
taken prior to 
t
)
 
 
40
 
whe
re 
Q
t
(
a
) is the value function for 
Policy A
. A more sophisticated w
ay to track the value 
estimate is as a function of the 
n
th reward:
 
where the expected value of 
Policy A
 
at step t + 1 is a function of the estimate at step n plus a 
weighted function of tha
t prior
 
estimate and the received reward 
R
t
 
at that time. 
E
stimatin
g 
values
 
this way defines that value as a dynamic process underlying the primary transfer decision 
process in this model (Dishop et al, 
in press
). In addition, this equation defines the lea
rning r
ate 
as the inverse of the number of steps tak
en
, meaning lea
rning will decrease over time
, fitting
 
with the power law of learning (Newell & Rosenbloom, 1981). The above equation provides a 

ps over
 
time, but the agent also can be 
given an initial estimate o
f that policy. The initial values given 
to 
an agent can 
affect
 
the 
behavioral decisions of that agent over time and can improve long term results under certain 
conditions (Sutton & Barto, 2
018). I
n the 
LTM
, that initial estimate can be defined by 
Q
1
(
a
)
 
and
 
Q
1
(
b
) for 
P
olicies 
A
 
and 
B
 
respectively.
 
 
Tracking the expected value of each policy is only part of the 

learning process. 
Whenever the agent encounters its defined problem
 
the age
nt must choose which policy 
it
 
will 
apply. Typically, this occurs t
hrough action
-
value methods of selection, where the chosen policy 
is the one with the highest estimated value 
Q
t
(
a
) or 
Q
t
(
b
).
 
Let
 
P
t
 
represent the policy the agent 
chooses at a given
 
time p
oint. 
By choosing the
 
highest value policy the agent 
is 
choosing th
e 
policy which it believes will offer the greatest reward at that time point. However, always 
choosing the highest value policy does not allow the agent to effectively test other pot
ential 
solutions. Instead, the agent can be allowed to explore other polic
ies not currently seen as the 
most valu
able
 
to
 
find other
,
 
potentially better
,
 
policies. This is the classic 
exploration
 
versus 


Algorithm 
1
. Value Estimate Calculation
 
 
41
 
exploitation
 
choice seen in studies within the organiz
ational
 
literature
 
(
e.g., March, 
1991). The 
rate of exploration can be def
ined by a variable 
E
;
 
this approach is referred to in reinforcement 
learning as an 
E
-
greedy
 
method (Sutton & Barto, 2018). In the transfer case of choosing between 
two possible polic
ies, th

t
 
is defined as the greater of the two
 
value 
functions 
Q
t
(
a
) and 
Q
t
(
b
) with some probability 1 

 
E
. 
 
 
The process described so far 
is
 
very rational
 
on the part of the agent
. However, not all 
choices by individuals are so
 
clearl
y logical. The form of choice outlined thus far more closely 
aligns
 
with type 2 processing systems from dual processing, however type 2 systems are not 
always engaged and are theorized to intervene, or not, in decisions already made by type 1 
proces
ses (e.
g., Evans & Stanovich, 2013). Thus, we must expand the 
LTM
 
to inclu
de an initial 
automatic decision and learning process to represent type 1 processes, and a mechanism to 
determine if the type 2 processes will intervene in that decision.
 
 
The 
type 1 
proces
s hypothesized here is based on the number of times a policy has be
en 
applied. This is the idea that repetition leads to automaticity and that the more times a stimulus 
and response are paired, the more likely they are to be activated in the f
uture. Let 
Z
t
(
a
) be the 
probability of choosing 
Policy A
 
over 
B
. 
Z
t
(
a
) is a func
tion of the number of times that policy 
h
as been chosen out of potential times it could have been chosen from 
A
 
and 
B
. In addition, the 
agent in a learning transfer context wil
l 
likely 
have
 
experience with their 
Policy A
 
prior to 
entering the learning even
t where 
Policy B
 
is introduced. Thus, 
the 
agent 
should
 
already have 
some value estimate of that 
p
olicy 
b
ased on their experiences and an associated number of times 
they have ap
plied it. How
ever, it is 
also 
possible that the agent 
receives
 
some actual exper
ience 
with their new 
Policy B
 
prior to entering the transfer environment, such as in the learning event 
 
42
 
itself. To account for those applications let 
L
 
represent the number of 
practice atte
mpts the agent 
has had with the new 
Policy B
. The calculation of 
Z
t
(
a
) is then:
 
 
The default choice of the agent at time 
t
 
is 
Policy A
 
at the rate 
Z
t
(
a
), and 
b 
at the rate 1 

 
Z
t
(
a
).
 
 
Once the type 1 process ha
s
 
chosen a policy, it is then up to 
a 
type
 
2 process to intervene. 
However, they do not always do so because 
they are not alway
s able. For example, the agent 
may not have the necessary resources, whether those resources be cognitive, or exterior to the 
agent such as time. It would be possible to t
heorize about the specific effects of 
various
 
factors 
that may affe
ct the likelihood 
of employing type 2 processes
. However, f
or simplicity the present 
model will cover all effects in a percentage chance that type 2 processes are implemented. The 
chance of
 
engaging in type 2 processing at any time point will be defined as
 
S
2
 
and ranges from 0 
to 1. If the agent engages in type 2 processes, then the decision process outlined previously is 
utilized which may or may not result in the same decision arrived at b
y type 1 processes, which 
refines the policy choice of type 2 proce
sse
s to be:
 
 
Which represents the policy 
P
 
chosen at time 
t 
given type 2 processing is the maximum value o
f 
policies 
a 
and 
b 
with a likelihood dependent on the amount of exploration des
ired
,
 
E. 
If the agent 
does not apply type 2 processes, the type 1 d
ecision is utilized. In either case the agent updates 
relevant equations based on the outcome of their action 
and moves on to the next attempt.
 

Algorithm 
2
. Type 1 Process Equation
 

Algorithm 
3
. Probability of Choosing Type 2 Processes
 
 
43
 
 
All parameters and equations for the model c
an be found in Table 1 and Table 2 
respectively. It is important to
 
note that 
almost
 
all aspects of this process could draw on more 
complicated conceptualizations of that individual theory, however that is not the point of starting 
a modeling cycle in an a
rea that has never been covered before. The present model should be
 
viewed as a buil
ding block for future theoretical development.
 
 
44
 
Study 1: 
Method
 
 
The described model above 
was
 
instantiated into an agent
-
based model 
using the 
simulation program NetLogo
 
(Wilensky, 1999)
. Although NetLogo does not offer as much 
flexibili
ty as other programs such as R, NetLogo is a specially designed platform for 
implementing
 
agent
-
based simulations. Although only a single agent is being studied in 
the 
present model
, utiliz
ing this platfor
m allow
s
 
for easy expansion into later 
iterations
 
t
o examine 
multiple agents
 
in networks, teams, organizations, etc. The equations outlined above 
were used
 
to determine the learning and behavior of the agent modeled over time.
 
A snapshot of
 
the 
modeling en
vironment and the code for use in NetLogo are avail
able in Appendix A, and a copy 
of the program itself are available from the author upon request.
 
Model outcome metrics
 
 
To analyze the potential of the model to account for the important tr
aining effects 
d
escribed above, two primary outcomes were chosen to
 
track within the modeling environment. 
Much has been written about what aspects of training outcomes are important to measure to 
describe training success. 

ology describes 
important outcomes at 
four levels: reactions, learn
ing, behavior, and results. 
Much 
research in organizations is limited 
to reactions to training, despite reactions being probably the least informative. Other emphasis 
has been placed on cog
nitive outcomes 
of training, such as learning, which has driven muc
h 
research over the last couple of decades (Kraiger, Ford, & Salas, 1993; Ford, Kraiger, & Merritt, 
2010). These two levels of outcomes have implications for the present models. Utility 
per
ceptions are a t
ype of reaction to
 
training, but l
earning outcomes 
take a background role in 
the 
LTM
 
as the agent having successfully learned the new policy is an assumption made for 
simplicity.
 
 
45
 
 
To measure important outcomes in the modeling for this paper
, a reemphasis must be 
placed on measuring behavior and outcomes. T
he shift in emphasis to focus on cognitive 
outcomes of training moved the field from a focus on behavioral change (Kraiger & Ford
, 2007), 
but these outcomes are focused on effects emerging 
from the training event itself. The study of 
transfer of those 
lear
ning outcomes to on
-
the
-
job behavior is an area of needed research (Ford 
et 
al.,
 
2010) and the present model is intended to help 
describe the process of that transference. 
Behavior and 
perf
ormance
 
outcomes of the agents in the models then become the key va
riables 
of interest. A behavioral measure 
was
 
created as the percentage of time the target policy, 
Policy 
B
, is implemented by th
e agent. 
Measuring behavioral choice outcomes in this way 
al
so 
align
s
 
with definitions of learning which focus directly on beha
vioral change (e.g., Myers, 2004). 
In 
addition
, performance of the agent 
was
 
tracked 
over time and was
 
defined as the percentage 
of 
times the agent successfully completes its abstract task.
 
Additionally, each agent stored their 
performance 
after 
an
 
initial
 
burn
-
in period which represents 
the pre
-
training phase and is a
 
time 
when the agent can only apply its first policy. 
The agent t
hen stored their
 
performance at the end 
of the defined trans
fer
 
period both for their overall performance and their performance
 
just 
within the transfer period
.
 

this model is equal to the
 
percentage of time the agent successfully completes its tas
k. Further, 
saving performance both pre
-
 
and post
-
training allowed 
the model to be analyzed as a pre
-
post 
intervention design, providing for greater insight into the causal effects of adding a def
ined 

 
Further, do
ing such a pre
-
post performance 
comparison aligns with our adopted 
definition of transfer as 

that results from a training experience transfers to the job and lead
s to meaningful 
changes
 
in 

 
et al.
, 
2010, p. 1066
; emphasis added
)
. This will be accomplished via 
 
46
 
calcu

d 
for conditions comparing pre
-
training and post
-
training performance, 
allowing for both easier comparison to e
xisting effect sizes in the research literature, and placing
 
results into a standardized metric to help correct for any idiosyn
crasies that may make the 
interpretation of raw effects misleading.
 
Analysis
 
 
Analysis of computational models does not follow th
e 
typical procedure 
of 
empirical 
research. Instead of testin
g traditional 
statistical
 
models, 
testing of the LTM
 
follow
ed
 
commo
n 
cycles of computational model exploration (e.g., Railsback & Grimm, 2012). Important steps 
include verification, showing generative sufficiency, and exploring sensitivity and robustness. 
Verification includes confirmation that the implemented model
 
is co
nsistent with the proposed 
theory (Banks
, Carson, Nelson, & Nicol
, 2010). This 
was
 
accomplished via logical consistency 
checks by the author, 
and
 
testing of the mechanisms of the model to e
nsure basic relationships 
expected occur when the model is ex
ecuted
. Generative sufficiency entails confirming that the 
model can recreate general effects known from real data. Achieving generative sufficiency does 
not confirm that the proposed model is 
th
e 
explanation of the process being studied, but it does 
confi
rm the
 
model is a 
possible 
explanation of that process (Epstein, 1999). 
Finally, sensitivity 
and robustness 
entail
 
an exploration of the model parameters to determine how sensitive the 
model is t
o changes in initial conditions and violations of assumptions
 
(Rail
sback & Grimm, 
2012)
, which
 
achieves three goals. First, it is not clear at which levels of various parameters 
common effects seen in the literature may manifest, exploring the model allow
s
 
the tuning of 
parameters to more accurately reflect reality.
 
Secon
d, model exploration allows the discovery of 
potential discontinuous effects of parameters where the results of the model change rapidly as 
initial conditions for that parameter change. Thi
rd, it may reveal unexpected or interesting 
 
47
 
findings, which 
i
s
 
not 
the
 
goal of the model but can be useful for providing insight to real world 
phenomena or 
guide future research.
 
All four of these steps were executed and will be outlined 
below.
 
 
Since comp
utational models create simulated data
,
 
output statistics nee
d to b
e 
interpretable without traditional significance tests because 
they
 
lose meaning when you can 
simulate as much data as 
you desire 
and you program in most primary effects (e.g., Railsback & 
Grimm, 2012). Instead, we 
must
 
use summary statistics and cor
relati
ons to describe effects 
of 
interest
. We can use these to calculate effect sizes of parameter changes on model outcomes and 
compare these to meta
-
analytic effects. Another key tool in the co

heat maps, which can provide vis
ualiza
tions 
parameter
 
effects which are easily 
interpretable and
 
can show transition points in model parameters
 
that 
drastic
ally
 
impact 
model outcomes. 
The 
same basic approach 
was
 
utilized for an
alyzing all models in this paper.
 
 
48
 
Study 1: 
Simulation and Results
 
 
In this section I will outline simulations directed at the four main steps in exploration: 
verification, generative sufficiency, 
sensitivity,
 
and robustness checks.
 
Model verification
 
 
Pr
ior to beginning any simulation, the model wa
s subjected to a serie
s of verification 
checks 
(Banks, Carson, Nelson, & Nicol, 2010)
 
outlined here.
 
Logical Consistency
 
The model was executed in NetLogo as outlined in the theoretical development and 
research 
methods. The only alteration of the theory ma
de for computational e
fficiency was to 
code the relationship between type 1 and 2 processing slightly differently than the default
-
interventionist approach outlined in the theory. Instead of making a default dec
ision then 
choosing if the agent will use the
ir type 2 processes to
 
intervene, the agent decides if they will 
use their type 2 processes or not first. If not, then they make a habitual choice and implement it, 
if they do use their type 2 
processes,
 
they ma
ke their more rational decision as if they we
re 
intervening in an e
xisting but now inconsequential habitual reaction. Although the code is not 
strictly default
-
interventionist, its outcomes should be identical 
while 
avoid
ing
 
the computational 
inefficiency 
of performing a default judgement when it wou
ld be overridden anywa
ys.
 
Parameter Effects Check
 
Once implemented in NetLogo, a series of tests were run to ensure that when parameters 
were adjusted, corresponding and expected changes occurred within the mode
l. The following 
outlines a series of tests t
o show the adjustment 
of each parameter corresponds with the desired 
effects
. 
 
 
49
 
Simulation Length
 
The first test confirmed the desired lengths of the pretraining and post
-
training 
simulations. Parameters besides
 
the pretraining and transfer for these simula
tions are of no 
conseq
uence and were held at constant levels. To test the length of the pretraining periods, one 
simulation each was run with a length of 250
-
 
and 500
-
time steps with no transfer time allowed. 
T
hese returned the 250
-
 
and 500
-
time step lengt
hs expected. A similar
 
test was then completed 
with transfer lengths of 250
-
 
and 500
-
time step lengths but no pretraining period. These again 
returned the expected lengths to 250 and 500.
 
Policy Value
 
To test t
he effect of the true value of the policies, t
he success rates of th
e policies were 
checked across a series of simulations. To check the value of 
Policy A
, the simulations focused 
on the pretraining period because only 
Policy A
 
is available to the agent. S
imulations were run 
for 500
-
time steps, with t
rue policy values of .
50 and .75. Success rates for these simulations 
were .50, and .76. Given this was a single simulation, this confirms the expected effect of the 
success rate of 
Policy A
.
 
 
A test of the val
ue of 
Policy B
 
is more complicated because it 
was 
only available in 
the 
transfer environment. To isolate the effect of 
Policy B
, the value of 
Policy A
 
was set to 0, and no 
pretraining time was allowed. In addition, no exploration was allowed and Type 2 thi
nking 
always e
mployed. This should force the agent to apply 
Policy B
 
alone. Values of 
Policy B
 
were 
tested at .50 and .75, with 1000 transfer attempts. True success rates in these conditions for a 
single run were .51 and .75, in line with expectations.
 
 
50
 
P
olicy Value Esti
mates
 
Two tests were completed to check the veracity o
f policy value estimates, corresponding 
to initial and final estimates. Initial estimates should correspond to the set initial value estimate 
for the defined policy, such as .5
0
 
or .75. 
To check this, m
odels were 
executed
 
and the value of 
the policy estima
te at the first time point was verified to be equal to the value set for the 
simulation.
 
 
Additionally, at the end of the simulation, we should expect value estimates to 
approximate the 
true value of th
e underlying policy representing an accurate judgement
 
on the 
part of the agent under ideal conditions. To assess this, models were run to isolate the effects of 
both 
Policy A
 
and 
Policy B
 
at levels of .50 and .75. For 
Policy A
, only pretra
ining time was 
allowed, run for 500 steps (the maximum allowed in the 
simulation). The model was run 10 
times at each level. For these 10 runs, results ranged from .472 to .546 with a mean of .51, and 
from .742 to .802 with a mean of .769 for 
the .50 and .
75 levels respectively. For 
Policy B
, the 
transfer environment was iso
lated and run for 1000 steps, the maximum allowed in this 
simulation. For these runs, the range for the .50 policy was from .487 to .518, with a mean of .50, 
and for the .75
 
policy the r
ange was .734 to .761 with a mean of .75.
 
Exploration Rate
 
To check th
at 
t
ype 2 thinking 
processes are 
willing to explore at a defined rate, the 
simulation was set up with a value for 
Policy B
 
at 0, and 
Policy A
 
at 1, and a 100% chance of 
Type
 
2 processes 
engaging. This should result in 
Policy A
 
being chosen nearly every tas
k 
attempt, except for a rate approximating the defined exploration rate. This simulation was run 10 
times with an exploration rate of 10%. These simulations ranges from .088
 
to .106 
in r
egard to
 
rates of choosing 
Policy B
, with a mean of .096. This is in l
ine with the expected value of .10.
 
 
51
 
 
Given the results observed in these checks, it appears the simulation is 
operating 
as 
expected.
 
Generative Sufficiency
, Sensitivity and 
Robustness
 
 
F
ollowing model verification, a series of experiments were conducted to
 
assess the model 
for generative sufficiency. This section outlines the attempts to determine if the model could 
generally account for existing findings in the training and 
transfer lite
rature. Due to the nature of 
the experimentation, the model was essent
ially simultaneously checked for sensitivity and 
robustness as parameters were tuned to better represent naturally observed phenomena. To 
accomplish this, p
arameters 
were
 
ma
nipulated ini
tially via 
coarse 
sweeps of the available space 
for the parameter of i
nterest, holding all other parameters constant, to determine the effects of the 
parameter
 
and to ensure that 
the model code reliably changes the levels of parameters
 
(which 
is 
in some wa
ys a continuation of the verification process)
. As modeling 
proceed
ed
,
 
experimentation bec
ame
 
iteratively more complex and focused on potentially interesting facets 
of the model
 
in a way guided by the emerging findings of the modeling process.
 
In addition,
 
though 
generally 
desired end results 
were
 
known from meta
-
analys
e
s, l
ittle if any guidance exists 
on how strong a manipulation is from a mathematical standpoint to determine the size of 
manipulation to make in the experimental code 
a priori. 
Therefore, in
itial exploration 
aim
ed
 
to 
tune
 
the model parameters to create reasona
ble transfer outcomes
, for example obtaining 

 
ds 
on pre
-
post measures of performance on .3
-
.5, and not exceedingly large effects such 
as 2 or more.
 
T
rue policy 
values
 
 
The first s
et of models aimed to tune the model into a reasonable parameter space
 
regarding the values of both 
Policy A
 
and 
Policy B
. 
D
efining 
policy values that are both 
 
52
 
representative of the type of tasks 
in which 
we may be interested in the real
 
world, and the 
sep
aration in policy values which will reproduce reasonable transfer effe
cts
 
are
 
important 
consideration
s
. For example, 
if we were interested in improving 
baseball batting skills, the 
success rate of each policy should be very low, such as
 
.25 to approximate 
the batting average of 
Major League Baseball players. On the other han
d, success rates for performing well defined 
tasks on an assembly line 
are likely
 
.95 or higher. 
Most closed skills in regular organizations 
probably exist at this hig
h end of the value 
continuum, but open skills may be much lower. 
 
 
Two slightly different
 
ways of parameterizing the policy values were 
explored
 
here. In 
the first version, the true policy values of A and B were independently set. Based on the above 
discus
sion of the possibl
e range of relevant values, the true values of both 
Policy A
 
and B wer
e 
swept from 
0 to 1 in .05 increments, fully crossed with 500 replications each. Runs were a 250 
burn
-
in and transfer period, exploration rate set to 10%, system 2 act
ivation 50%, and in
itial 
policy value estimates set to .5. To analyze the results heat ma
ps were generated of the effects on 
behavioral transfer rates, and pretraining
-
post
-
training changes in performance as measured in 

d
. Results for behavioral tr
ansfer and performa
nce change can be found in Figures 2 and 
3 respectfully.
 
 
In examining
 
these results
,
 
we can see that behavioral transfer rates range from about 5% 
to about 55%. Low numbers make sense given effect of habitual response and time allowed f
or 
Policy B
 
to over
ride that previously habitual response. 
This low rate of transfer also
 
aligns well 
with expectations given the low amount of transfer commonly cited in the research literature 
(Ford et al., 2011). 
Performance change also shows the genera
lly expected 
patter
n
. We see a 
diagonal where 
when
 
policy values are equal lead to no per
formance change, as expected. There 
are negative performance changes below that diagonal representing training a policy that is less 
 
53
 
valuable than the existing policy, and positive value
s above the diagonal representi
ng 
improvements of the new policy over 
the old. In addition, improvements appear to be stronger 
than corresponding decrements, which makes sense because agents should abandon the new 
policy if they do
 
not
 
see it as an improve
ment. 
The sudden change in magn
itude of effects across 
this diagonal s
uggest a sensitive area of the model where values change suddenly and 
dramatically.
 
Changes
 

d
 
range from 
-
3.28 to 11.69. 
Obviously, the upper and 
lower portions of th
is range are well outside of wh
at we might expect in the training lite
rature, 
indicating that some areas of policy values are essentially 
out of bounds 
regarding their ability to 
replicate reality. However, along the diagonal where values of 
Policy B
 
are j
ust barely higher 
than the valu
es of 
Policy A
, we see performance effe
cts of about 
d 
= .3
0
, indicating that when 
the value of 
Policy B
 
is slightly greater than 
Policy A
 
the model is able to reproduce the essential 
effect of training we expect from research
 
experience.
 
 
Although this ini
tial result is promising, the way these
 
parameters are defined limits the 
ability to vary policy values along with other variables in the future while maintaining low 
enough dimensionality that the results may be interpreted.
 
Thus, the decision was made to
 
redefine the true value of 
Policy B
 
in
 
direct relation to the true value of 
Policy A
. This was 
accomplished by inclusion of a parameter indicating the 
change 
in the true value of the policies 
moving from 
Policy A
 
to 
Policy B
. So, for example, if 
Policy A
 
was given a true value of .5
0
, 
and the 
change in policy value defined as .1
0
, 
Policy B
 
would have a true value of .6
0
. In the 
first set of models we saw that what appears to matter in making the model a plausible 
representati
on of the real world is 
Policy 
B
 
having a slightly greater value than 
Policy A
. By 
reconfiguring the model to define 
Policy B
 
in direct relation to 
Policy A
, we can better 
home in 
on
 
a difference between the two policies which best represents reality. 
 
 
54
 
In
 
the updated model, simulations
 
were run sweeping 
Policy A

 
value fro
m 0 to 1 in .05 
increments, with policy value change swept from 
-
1 to 1 in .05 increments. Runs were a 250 burn 
in, exploration rate set to 10%, system 2 activation 50%, and initial poli
cy value estimates set to 
.5
0
. 
500 replications of each were run. Beha
vioral transfer and performance change results can 
be seen in Figures 4 and 5 respectively. Results show little behavioral transfer when the policy 
change is negative. This is expected a
nd indicates agents are discard
ing the new policy except for 
a small a
mount due to the exploration factor when the new policy is worse than their old one. 
This is a pattern that we would hope to observe in the real world as we would not want to have 
employ
ees using a worse behavior if t
hey do not have to. Overall, transfer r
ates range from about 
6% to 55%. Interestingly, some nonlinearity appears to be occurring where the highest transfer 
happens 
when policy values start initially low and change a lot (as w
ould be expected), but when 
pol
icies start low and only improve a litt
le transfer actually does
 
not
 
occur as much as when 
policies are already valuable and change upwards a little bit. Transfer rates of about 30% run in a 
line along a change of .4
0
 
when 
Po
licy A
 
starts at 0 to .1
0
 
when 
Policy A
 
starts at .7
0
.
 
We see similar 
patterns in performance change. There are slight performance decrements 
when the new policy is worse than the old as we would expect because a worse policy does get 
applie
d
 
sometimes. We
 
also 
see expected incredibly high
 
performance improvements when the 
e
xisting policy is low and the new one is high. 
However,
 
higher 
d
s are seen with higher starting 
policies in many instances as with the above behavioral change. The kind of effect sizes w
e tend
 
to see for performance improvement, or at least expect, occur i
n a similar diagonal 
to behavioral 
transfer 
where it requires less improvement when prior policy values are higher. A great example 
is at a 
Policy A
 
being .7
0
 
and an improvement of only 
.05 is
 
a 
d
 
of .34. 
Given the convergence of 
 
55
 
behavioral transfer rates 
and performance improvement to reasonable ranges when 
Policy A
 
is 
.7
0
 
and policy change is .05 these values were selected for use in further modeling efforts.
 
Timing of interventions
 
 
Wi
th a d
efensible level of policy values to use to define the model, it 
was also important 
to explore the effects of pre
-
training and transfer times on the model to ensure proper time was 
allotted for each. 
History with 
Policy A
 
represents the length of time
 
the a
gent applied that 
p
olicy 
b
efore the introduction of 
Policy B
. Th
at history 
was
 
modeled by a burn
-
in period where 
the only 
p
olicy 
a
vailable to the agent is 
Policy A
. Exploring timing of an intervention 
importantly accounts for the history adult learne
rs bri
ng to their learning events (Knowles, 
1984), and begins to accou
nt for overcoming established automatic responses once that learner 
returns to their work environment. 
It was expected that l
onger periods of time where the agent 
could
 
only access their
 
Polic
y A
 
would
 
result in reduced transfer of 
Policy B
 
as 
A
 
will be mo
re 
likely to be activated by type 1 processes, and this effect will hold longer in the face of repeated 
application of 
Policy B
.
 
 
To explore the effects of training and transfer time 
sim
ulations were run sweeping burn
-
in and transfer time each from 25 to 5
00 time points in 25 step increments. As discovered in the 
above simulations as an interesting and applicable level of policies 
Policy A
 
was set to a reward 
of .70 and the policy chan
ge 
to B at .05. Exploration rate set to 10%, system 2 activation 50%, 
and
 
initial policy value estimates set to .5
0
. 500 replications
 
each
.
 
At the condition level, 
pretraining (burn
-
in) time was correlated with behavioral transfer at 
r
(200000)
1
 
= 
-
.48
 
(
p
 
< .0
01)
, 
 
1
 
S
ample sizes, degrees of freedom, and significance have been reported for statistical anal
yses, but it should be
 
reiterated that traditional interpretation of significance holds no meaning in the context of computational models. 
Sample sizes and associated 
degrees of freedom for statistical tests are arbitrary when one has control over the 
mode
ling environment as mo
re data can always be simulated. Therefore, readers should focus on reported effect 
sizes and interpret any associated significance conclusions w
ith extreme caution. See Cumming (2014) for further 
discussion of the limits of null
-
hypo
thesis significance te
sting and move towards the use of effect sizes to improve 
research in general.
 
 
56
 
and performance change at 
r
(200000)
 
= 
-
.31
 
(
p 
< .001)
, indicating
 
less transfer when the agent 
had used its old policy for longer prior to training, as expected. On the other hand, transfer time 
was related 
r
(200000)
 
= .80 
(
p 
< .001) 
to behaviora
l tra
nsfer, and 
r
(200000)
 
= .58 
(
p 
< .001) 
to 
performance change, indicatin
g that the longer the agent had to attempt transfer the more likely 
they were to do so.
 

d
) 
were 
plotted in heat maps
, these can be found in 
F
igures 
6 and 7. In these 
depictions
, we see 
that e
arlier training improves transfer rates
. There also appears to be a possible augmentation 
effect where t
he combination of early training and a long time to 
adopt
 
the new behavior leads to 
much greater transfer rates. 
Interestingly,
 
it is also apparent that pre
-
training time quickly 
overwhelms the effect of longer transfer time. 
From these results, a burn
-
in of 100 with a transfer 
of 500 might be reasonable to
 
use 
for future exploration of other parameters to produce transfer 
and per
formance improvement levels commensurate with real
-
world levels. These results also 
seem to be suggesting that although the present process may be a good approximation for a 
possibl
e tra
nsfer process, the effect of habits might be too strong. But this coul
d be caused by 
either the habit process itself, or the low level of agent ability to engage in type 2 thinking. As 
such, 
that was explored next
.
 
Type 2 Processing
 
Next, the effect o
f bei
ng able to engage in type 2 processing was examined independently 
of o
ther variables. Here, a greater ability to engage in type 2 processes equates to a greater 
opportunity to use their new skills, or a situation strength where they are free to make t
hat 
t
ransfer choice more independently. For these, 
Policy A
 
value was held 
at .7
0
, 
Policy B
 
as .05 
 
57
 
better, 100 burn in
 
time
, 500 transfer
 
time
, .1
0
 
exploration rate, .50 initial value estimates. 
Likelihood was swept from 0 to 1 at .01 intervals. 500 replication
s were chosen for resolution.
 
Because only one variable was being exam
ined here, a slightly different approach was 
utilized to examine the res
ults. 
First, a correlation between type 2 likelihood and behavioral 
transfer at the replication level revealed a r
elationship of 
r
(50500)
 
= .51
 
(
p 
< .001)
. As the 
likelihood of engagin
g in type 2 behaviors has been argued here to be akin to the opportunity
 
to 

comparable 
effect size was around .30
-
.40 
(Blume et al
., 2011). 
This suggests the model was able to essentially replicate th
e expected 
pattern of results.
 
For further analysis, i
nstead of heat map
s, a linear regression was utilized to 
examine both linear and curvilinear relationships between the likelihood of
 
engaging in type 2 
processing and behavioral transfer and performance
 
change. Through this analysis it was found 
that behavioral transfer was
 
predicted by type 2 likelihood, 
at the condition level, at a linear rate 
of .756
 
(

= 1.43, 
t 
= 
58.253, 
p 
< .001)
, and curvilinear rate of 
-
.233 (

= 
-
.46, 
t 
= 
-
18.520, 
p 
< 
.001)
intercept was 
-
.016
 
(
t 
= 
-
5.848
, 
p 
< .001
; 
F
(2, 98
) = 12949.35, 
p 
< 
.001, R
2
 
= .998
)
. 

d 
of performance change was predicted from type 2 li
kelihood at a linear 
rate of .9
73
 
(

= 1.28, 
t 
= 10.888, 
p 
< .001)
, 
and curvilinear rate of 
-
.249 (

= 
-
.34, 
t 
= 
-
2.882, 
p 
= 
.005; 
intercept of 
-
.292
 
t 
= 
-
15.117, 
p
 
< .001
; 
F(
2, 98
) = 519.44, 
p 
< .001, R
2
 
= .96
).
 
Graphs of 
predicted and observed behavioral
 
transfer and performance chang
e can be found in Figures 
8
 
and 
9
.
 
From these you can see that as type 2 likelihood improves, so does behavioral transfer and 
perform
ance change. Interestingly, performance change displays a negative effect at low levels 
of t
ype 2 likelihood, but turns pos
itive once likelihood is above about 33%. In addition, a
 
likelihood of .8
0
 
seems important as there is a spike in performance improve
ment and transfer 
 
58
 
rate improves to near 
.
40. 
Due to this effect, further models used likelih
oods of .8
0
 
unless stated 
other
wise.
 
Practice and Overlearning
 
 
W
e know that practice with a new skill can improve transfer outcomes, especially when 
that skill is practiced to the point of overlearning. 
of the comparable meta
-
analytic effect of
 
overlearni
ng 
is .298 
(Driskell et al., 19
92). Although the Driskell et al. 
meta
-
analysis uses 
retention as an outcome, retention may be approximated by whether the agent is still applying the 
new 
p
olicy 
a
t the end of the simulation run. 
The effect of overlearning in
 
the present model 
was
 
tested b
y the manipulation of 
the level of
 
L
 
from
 
0 to 200 in steps of 25. 
The high end of 200 was 
chosen because it r
epresents up to twice number of pretraining task attempts. 
For these 
simulations 
Policy A
 
was 
set to .7
0
, change in
 
value to .05, system 2 activat
ion .8
0
, 100 
pretraining 
time step
s
, 500 
post training
 
time steps
, 
and 
exploration 
to 
.1
0
, with 500 replications 
of each condition.
 
 
When examined at the replication level, that is individual agents, we see a correlation 
betw
een practice attempts and behav
ioral transfer 
is 
r
(5500)
 
= 
.118
 
(
p 
< .001)
, and the 
correlation with post training performance (so only performance after the training event) is 
r
(5500)
 
= .074
 
(
p 
< .001
)
. These relationships are in the expected direction, b
ut 
substantially 
lower than the
 
comparable meta
-
analytic ef
fect
. For further analysis, condition
-
level results were 
calculated for behavioral transfer and performance
-
change. These results can be found in Table 
3
. There, you can see that there is a clear b
enefit to practice as we would 
expect, though it is 
difficu
lt to tell the strength of the effect. To remedy this, the data was reanalyzed as a series of 
experiments comparing a control condition with no practice attempts to ever increasing amounts, 
with di
fferences expressed to the cont

d 
f
or both behavioral transfer and 
 
59
 
performance change in Table 
4
. Through these analyses we see that performance improvement 
remains relatively low, even in stronger conditions, while behavioral trans
fer improves quite 
substantiall
y as practice increases. Des
pite this, it is evident that although the general positive 
effect of practice is obtained, it 
may not be easily tuned to better approximate typical research 
findings
.
 
Utility reactions
 
 
Utility re

ion regarding the usefulness
 
of their learning 
experience (e.g., Ruona et al., 2002) and are strong predictors of transfer (Blume et al., 2010). 
Utility reactions can be equated to the initial value estimates of a learning ag
ent in reinforcement 
models. Fo
r example, Sutton 
and
 
Barto 

exploration by the agent. In the same way, a transfer agent in the LTM that has a higher initial 
expectation of the value of their new policy sho
uld be more likely to transfer 
that 
p
olicy 
b
ecause 
they are
 
willing to explore its potential. 
To test this effect, the model was 
explored by varying 
initial policy value estimation for 
Policy B
 
from 0 to 1 in .05 steps. 
Policy A
 
was 
set to .7
0
, 
change in v
alue to .05, 
type
 
2 activation 
.8
0
, 100 
pretraining points, 500 post
 
training, 
exploration .1
0
, and 500 replications per condition
.
 
 
At the replication level, results reveal almost no relationship between initial value 
estimate and our outcomes of interest
. Specifically, the relationshi
p between
 
initial value 
estimate and behavioral transfer was on 
r
(10500)
 
= .02
 
(
p 
= .025)
, and only 
r
(10500)
 
 
= .01
 
(
p 
= 
.254)
 
with post training performance. 
These results are not in line with what was hoped for 
regarding ex
isting effects of utility r
eact
ions. One
 
possibili
ty is that there was too much noise at 
the individual level regarding outcomes, so results were also examined at the condition level. 
There, the relationship between initial value estimates and behavioral t
ransfer was 
r
(21)
 
= 
.476 
 
60
 
(
p 
= .
029) 
and
 
wa
s 
r
(21)
 
= .570
 
(
p 
= .007
) 
for performance change. The condition level results 
for this exploration can also be found in Table 5.
 
Transfer trajectories
 
 
A
 
subset of models 
was
 
run
 
to examine the development of trans
fer over time to explain 
the
 
de
velopment of the various trajectories described by Baldwin and Ford (1988). 
To examine 
these trajectories, models were run using the baseline parameters of 
Policy A
 
value of .7
0
, 
change in policy value of .05, type 2 activati
on of .8
0
, 100 pre
-
training,
 
an
d 500 post training 
time points. The goal here is only to show the transfer trajectories described by Baldwin and 
Ford (1988) are possible within this model. Thus, the model was run several times, examining 
the shape of the b
ehavioral transfer rates wit
hin
 
the modeling environment. Some examples of 
transfer trajectories from the model can be seen in Figure 10A
-
D. These examples show a variety 
of transfer trajectories, such as (A) initially high levels of transfer and later tap
ering off; (B) 
initial failu
re 
to transfer with later increased transfer; (C) immediate and consistent transfer; and 
(D) a general failure to transfer.
 
Implementation Intentions
 
 
As discussed, implementation intentions are used to establish an automatic li
nk between 
situation and res
pon
se to improve the automaticity of that response (e.g., Gollwitzer, 1999). 
Although also impacting the automaticity of applying 
Policy B
, implementation intentions are 
not the same as practice or overlearning which is already 
included in the model. 
To ac
cou
nt for 
the 
improved automaticity brought by implementation intentions let us instead define a variable 
I
 
as the percentage increase in chances of applying 
Policy B
 
when type 1 processes are enacted. 
This changes our calculati
on of 


to be:
 
 
61
 
 
We can then manipulate the level of 
I 
to explore the effects of implementation intentions. 
This 
tweak was coded into the model for exploration. 
Due to the underlying math behind the 
simulation, when the agent has no history of
 
engag
ing in 
Policy A
, the likelihood of doing so 
when only Type 1 processes are available should be equal to 0 minus the defined level of 
implementation intentions. This was
 
verified with a level of implantation intentions at .1
0
, which 
returned a simulat
ed cri
tical value for automatically applying 
Policy A
 
of 
-
.1
0
, as expected.
 
 
Implementation intentions were explored from 0 to .5
0
 
in .05 increments, 500 
replications each. L
ikelihood 
of type 2 processing was 
set to .8
0
, 
Policy A
 
value was held at .7
0
, 
Pol
icy B
 
as .05 better, 100 burn in, 500 transfer, .1
0
 
exploration rate, 
and 
.50 initial value 
estimates.
 
From these, the replication
-
level correlation between implementation in
tentions and 
post training performance was 
r
(5500)
 
= .111
 
(
p 
< .001)
 
and was 
r
(550
0)
 
= .
193
 
(
p 
< .001) 
with 
behavioral transfer. Condition
-
level results for this experiment can be found in Table 6, which 
shows a steady improvement in behavioral transfer as
 
implementation intentions increase. 
However, the effect on performance improvemen
t
 
is m
uch less consistent.
 
Exploration rates
 
It is typical in both reinforcement learning problems (Sutton & Barto, 2018) and in 
organizational 
research 
(e.g., March, 1991) to have an exploration parameter in the model. 
Within this model, exploration repre
sents 
the degree to which an agent is willing to explore 
behavioral policies that they do not currently see as their most valuable. In the real world suc
h 
exploration would be akin to an employee searching for a better way to do their job than their 
curren
t 
domi
nant 
approach. Typically, there is a trade
-
off between exploration and exploitation 


Algorithm 
4
. Type 1 Process with Implementation Inte
ntions
 
 
62
 
for overall performance where some degree
 
of exploration
 
is ben
eficial but too much can hinder
 
performance (e.g., March, 1991). 
One possible implication of this model
 
is id
entifying a degree 
to which trainees should be willing to explore new task approaches in order to maximize their 
performance. To explore this possi
bility, while holding all other parameters constant, the 
exploration parameter was swept in .01 increme
nts fr
om 0 to 1.0.
 
 
In examining this simulation, we find a negative overall relationship between 
exploration
 
and behavioral transfer 
(
r
(50500)
 
= 
-
.368
, 
p 
< .001
) and post training performance (
r
(50500)
 
= 
-
.168
, 
p 
< .001
). Similar results were seen at the 
condit
ion level with behavioral transfer (
r
(101)
 
=
 
-
.600
, 
p 
< .001
) a
nd performance change (
r
(101)
 
= 
-
.514
, 
p 
< .001
).
 
Such relationships are 
initially s
urprising as it was expected that a willingness to explore would allow the agent to find 
more optimal s
olutio
ns. To understand this relationship better a regression was run examining 
both the linear and curvilinear effects of implementation intentions on b
ehavioral transfer and 
performance change. In doing so, we find 
exploration to
 
have a linear relationsh
ip wit
h 
behavioral transfer of 
.882
 
(

= 2.08, 
t 
= 13.333, 
p 
< .001)
, and a curvilinear relationship of 
-
1.136 (

= 
-
2.77, 
t 
= 
-
17.754, 
p 
< .001; 
intercept of .322
, 
t 
= 22.492, 
p 
< .001 ; 
F
(2,98) = 273.85, 
p 
< .001, R
2
 
= .92
). With performance change we see a linear relationship of 1.335
 
(

=
 
2.16, 
t 
= 
10.757, 
p 
< .001)
, and curvilinear relationship of 
-
1.653 (

= 
-
2.76, 
t 
= 
-
13.767, 
p 
< .00
1; 
intercept of .145).
 
In
 
addition, you may find predicted and observed values of behavioral transfer 
and performance change in Figures 11 and 12.
 
These re
sults show the effects of exploration peak 
at some moderate level and further exploration proves detr
imental for transfer outcomes.
 
Exploratory experimentation
 
Finally, one strength of building a computational model of a proposed theory lies in the 
ability
 
to execute virtual experiments which can guide future real
-
world data collections. This 
 
63
 
allows us to
 
test novel moderations or interventions which would be difficult to justify spending 
the resources on to test in real data collections without some prior 
empirical guidance. In 
addition, it could lead to the discovery of novel 
interactions which can lead 
to targeted data 
collections to help further support or refute the veracity of the proposed theory. Given the 
positive relationships found in the present m
odel between implementation intentions, type 2 
likelihood, and our studied outcomes
,
 
a virtual experi
ment was designed to test the mutual 
effects of implementation intentions and type 2 likelihood on those outcomes. Given their 
independent positive effects
 
on behavioral transfer and performance change, it was expected the 
two would have an augmenting effe
ct where high levels of both would result in the highest 
outcome levels.
 
 
To explore this possibility, a virtual experiment was designed where the paramete
rs for 
both implementation intentions and type 2 likelihood were swept from 0 to 1.0 in .05 increment
s, 
fully crossed with each other. The other parameters were held constant at the levels settled on 
above: 100 pre
-
training burn
-
in time periods, 500 post t
raining time points, 
Policy A
 
value of 
.70, change in value of .05, exploration rate of .10, and init
ial policy value estimates of .5
0
. 
To 
explore these effects, both a more traditional multiple regression approach to analyzing 
interactions and heat maps w
ere employed. Predictors were mean centered prior to estimating the 
regression and an interaction ter
m created from the products of our two predictors. In predicting 
behavioral transfer, it was found that type 2 
likelihood 
(
F
(3, 2020496) = 57700.917, 
p 
< .
001, R
2
 
= .663, 
b
0
 
= .646
 
(
t 
= 1457.75, 
p 
< .001)
, b
1 
= 
-
.217
, 

1
 
= 
-
.236 (
t 
= 
-
148.16, 
p 
< 
.001)
) actually 
had a negative main effect, but intentions had the expected positive (b
2
 
= .496
, 

2
 
= 
.539 (
t 
= 
338.45, 
p 
< .001)
) main effect, along with a negativ
e interaction effect (b
3
 
= 
-
.925
,
 

3
 
= 
-
.305 (
t 
= 
-
191.32, 
p 
< .001)
). A similar pattern of 
results was found in predicting post training 
 
64
 
performa
nce (b
0
 
= .732
 
(
t 
= 14774.36, 
p 
< .001)
, b
1 
= 
-
.011, 

1
 
= 
-
.133 (
t 
= 
-
67.09, 
p 
< .001), 
b
2
 
= 
.025, 

2
 
= 
.302 (
t 
= 152.50, 
p 
< .001),
 
b
3
 
= 
-
.046
, 

3
 
= 
-
.169 (
t 
= 
-
85.29, 
p 
< .001)
).
 
These
 
interactions have been graphed in Figures 13 and 14 respectively. From these visualizations, we 
can see that the expected augmentation effect does not emerge. Instead, we see the 
best transfer 
and performance occurs when 
intentions are high, but type 2 pr
ocessing is low. To better 
understand this effect, heat maps were created examining the condition
-
level results on 
behavioral transfer, post training performance, and performance c
hange which can be seen in 
Figures 15
-
17. These heat maps confirm the best o
utcomes occur with high intentions but low 
type 2 likelihood. They also show the interaction effect is more nuanced than suggested by 
traditional analyses in that the worst outcome
s 
only
 
occur when both implementation intentions 
and type 2 likelihood are l
ow, as we originally expected. However, the fact that both the worst 
and best outcomes occur when type 2 likelihood is low, combined with some non
-
linearity in the 
change in effect
s across implementation intentions as type 2 likelihood increases obscures t
he 
benefits of type 2 likelihood in this experiment.
 
 
65
 
Study 1: Discussion
 
 
The primary goals of this paper are to: 1) build a process
-
oriented theory of training 
transfer, 2)
 
fu
rther
 
integrate disparate related theories, 3) 
incorporate
 
dual process cognition and 
reinforcement learning more fully into the organizational sciences, and 4) provide a 
computational model for virtual experimentation which may provide novel insights for 
both 
theory and practice. 
Over the course of several rounds of virtual experimentation
,
 
progress has 
been made towards 
all
 
these goals. Let us discuss a few of the more important theoretical and 
practical implications uncovered thus far.
 
Theoretical Implic
ation
s
 
 
The primary goal of exploring a computational version of the 
LTM
 
was to show the 
theory 
can reproduce
 
common findings in the broader research literature. This is the process of 
showing generative sufficiency (Epstein
, 1999
). In the explorations dis
cusse
d here, it 
appears
 
that the model 
can
 
reproduce general patterns of findings in the literature for several important 
effects, especially 
regarding
 
the direction of those effects, if not the precise magnitude. However, 
not all expected effects were obs
erved
, indicating the model, although promising, is not yet 
complete. Here, I will review the standing on some of those effects.
 
 
First, the explorations here show that the proposed theory 
can
 
reproduce the generally 
expected effects of training on behavio
ral t
ransfer and performance outcomes we see in the 
literature. For example, behavioral transfer rates fall in the 10
-
50% range, which covers typical 
estimates of transfer in organizations (e.g., Ford et al., 2011). However, plausible effects for 
observed 
train
ing 
outcomes only occur in the model within relatively narrow ranges, especially 
regarding the parameters governing the true value of the behavioral policies. This limitation 
could be for at least two reasons. First, it is possible the model breaks do
wn ou
tside of this 
 
66
 
narrow band of policy values and does not necessarily operate in a way clearly mappable onto 
real
-
world phenomena outside of this band. Such a limitation would not itself invalidate the 
theory
, merely place limitations on its generalizab
ility
 
as occurs with any model. Second, it 
could be an indication of the narrow range of situations we tend to study in the research 
literature, which is likely to at least be somewhat part of the explanation. In studying training 
interventions in organiza
tions
, we typically enter an organization to deliver a training program to 
employees who have some degree of experience on the job where they are already successful to a 
greater or lesser extent. The intervention delivered is likely to be a slight improvem
ent o
n 
however they were trained, or however they discovered, to do the job prior to our arrival
, despite 
any organizational claims of the great improvement individuals are likely to see
. This naturally 
creates only slight differences in policy values in t
he te
rms of the presented model, and therefore 
it makes sense that such small differences are where the model best matches existing data. 
Rarely, if ever, in research would we encounter a situation where we are training individuals, and 
collecting the nece
ssary
 
data, who are completely incompetent at a task and providing them with 
the skills to be almost perfectly successful on that task. This situation obviously occurs to some 
extent when new employees are trained from scratch, but this is not the focus of
 
the 
kinds of 
individual studies which are 
generally 
conducted. If we were to compare completely novice 
performance to their later post
-
training 
performance,
 
it is more likely that we would see the 
kind 
of extreme effects demonstrated at the edges of the p
resen
t model. As such, the presented model 
in some regions may be more broadly applicable to studies of the development of expertise than 
just training transfer (e.g., Benner, 1982)
 
 
In addition, implementation intentions (Gollwitzer, 1999) appear to work 
well 
in 
comparison to the limited body of research on their use in 
organizational 
training interventions. 
 
67
 
The observed effects in the present model were in the expected direction, and 
plausibly scaled 
effects were found for both behavioral transfer and per
forma
nce. Unfortunately, there is not yet a 
meta
-
analytic estimate of this effect known to the author of this paper, but typical training results 
for implementation intentions appear to fall in the medium to large effect range (e.g., Friedman 
& Ronen, 2015
), mu
ch as observed here. Thus, the present model appears to account for the 
general effect of intentions
.
 
 
Unfortunately,
 
the effect of practice in the present model creates effects in the desired 
direction but does not really work as would be expected in
 
real
 
training situations. Namely, 
although we know practice 
and overlearning 
opportunities are a key driver of training success
,
 
the present model only creates substantial effects when the level of practice approaches and 
subsequently exceeds the level of
 
expe
rience the agent had previously with the task
 
and not close 
to the 
recorded 
meta
-
analytic effect (
Driskell
 
et al.
, 1992
)
. Such experience in a real training 
environment is obviously impractical as I have argued that the degree to 
which an individual h
as 
prior experience with the task is a major driver of training outcomes and most individuals will 
enter training with large 
amounts of experience. In such situations, small amounts of practice 
should have large effects to 
better 
match research finding
s
. 
F
uture iterations of the model should 
examine how to better account for the practice effect. One idea would be to count practi
ce 
attempts as essentially more impactful than regular attempts, if we assume training
-
based 
practice attempts count for more than 
regular attempts this could fit with the idea of deliberate 
and focused practice being a key to skill development (e.g., 
Eric
sson, Krampe, & Tesch
-
Romer, 
1992
).
 
 
Utility reactions in this model are an interesting case. The
 
comparable
 
meta
-
analytic 
effect 
t
argeted 
was the .46 corrected relationship between utility reactions and transfer described 
 
68
 
by Blume et al (201
0
). When analy
zed at the replication level, which in this case represents 
individual agents, there 
was 
essentially no relationship between the in
itial value estimate
 
of 
Policy B
, the stand in here for utility reactions, and our outcomes. This initially seemed to 
indicat
e that the model did not work 
regarding
 
utility reactions. However, when the data was 
analyzed at the condition level, a correlatio
n between the initial value estimate of 
Policy B
 
and 
transfer was .476, almost perfectly matching the meta
-
analytic estimate.
 
In no other experiment 
run here was there such a great disparity in observed relationships at the individual versus the 
conditiona
l level. It might be the case that in this model the effect of utility reactions gets 
drowned out by random noise when examin
ing individuals. This explanation makes sense when 
investigating
 
a series of individual value estimate trajectories, where it becom
es obvious that the 
initial estimate for B quickly becomes overwhelmed by the weight of experience and ceases to 
have drastic
 
effects. However, once we study hundreds of individuals, that noise averages out 
and the effect of the initial estimate becomes mo
re obvious
. Thus, the effect of initial estimates 
for 
Policy B

ture better than any 
other parameter examined in this study, but they do not do so in the way initially expected
 
and 
may need furth
er
.
 
 
Along similar lines, a set of models were run looking only at the behavioral transfer 
trajectories of single agents in i
ndividual runs of the model. Through these, as displayed in 
Figures 10A
-
D, 
even within the same base parameters of the model agents
 
can follow
 
several 
types of trajectories in their transfer over time. These trajectories display many of the types 
outlined 
for maintenance by 
Baldwin and Ford (1988)
. Thus, even the simplest version of the 
LTM appears capable of generating a classic effe
ct from the transfer literature even without 
substantial empirical guidance given that such trajectories are rarely studied i
n practice.
 
 
69
 
 
As a demonstration of the potential for the present model to guide future research, the 
interaction between implementa
tion intentions and type 2 processing on our chosen outcomes 
was explored. In this experiment, it was expected that the two w
ould have an augmenting effect 
where high levels of each would result in the best outcomes. However, this was not the case. 
Instead
, we saw the best outcomes when implementation intentions were 
high,
 
but type 2 
likelihood was low. One reason for this might
 
be that the effect of automaticity in the model is 
driving the interaction here since both variables affect the automatic process 
either by forcing the 
agent to engage in that process, or directly altering it. 
Thus, when type 2 likelihood is low 
implementation intentions 
can
 
have a more direct effect on outcomes because they have a chance 
to work, whereas their impact becomes diluted
 
when agents can more often engage in type 2 
processes. This is the kind of init
ially counter
-
intuitive finding which can be brought to light by 
computational models. Future research should now test this effect in either laboratory or real
-
world situations
. If the same interaction effect is found which is predicted by the model then 
m
ore support will be lent to the theory proposed in this paper, if the opposite is found then the 
current theory would be falsified.
 
Practical Implications
 
 
A primary goal for t
he LTM and such associated computational models as has been 
explored here is the ability to provide useful insight for real world application. 
The first practical 
takeaway here is that for jobs at any level of current performance, even 
small improvements, 
as 
long as the trainees are able to discern that the new training is an improvement on whatever they 
currently do, 
can
 
lead to fairly substantial gains in performance. In addition, it does not 
necessarily take incredibly large amounts o
f behavioral transfe
r to result in 
substantial
 
performance gains. There are many conditions in the simulations presented here where 
 
70
 
behavioral transfer is 50% or less, but performance improvements display 
simultaneous 
effect 
sizes we would consider to be l
arge in 
the 
traditio
nal research literature. Thus, while it is true 
that a substantial training
-
transfer gap exists, we should take heart in the ability of even moderate 
transfer rates to have substantial effect on important performance outcomes.
 
 
One inte
resting 
finding
 
in t
his model is the apparent strong effect of pre
-
training time on 
the inability of agents to successfully change their behaviors and performance, as we see in 
Figures 6 and 7. This finding aligns with viewing training from a nonlinear dyn
amics perspective 
(O
lenick, Blume, & Ford, in press) which in part suggests that 
training effects are governed by 
attractors which develop with experience over time, and stronger interventions would be required 
to affect permanent change on the job for emp
loyees who had been 
doing the job a certain way 
for longer prior to the intervention. In the present model, that pre
-
training time allows for 
development of such an attractor which the relatively mild intervention studied in this model 
(seen in the policy 
change being set at 
.05) is unable to overcome. Such a finding reemphasizes 
the need for considering the timing of our organizational training interventions as delay in such 
training is likely to lead to sub
-
optimal outcomes.
 
 
The modeling of type 2 likel
ihood for effects on
 
training outcomes may be of particular 
importance. As discussed in the introduction to this paper, the general failure of training 
interventions to result in expected outcomes is of great concern to organizations. 
The LTM
 
suggests that
 
one reason for this
 
failure may be not only a lack of opportunities to use the 
training, but an opportunity for the trainee to make the kind of effortful decisions that are more 
likely to lead to them applying their training instead of reverting to their 
old practices. In fa
ct, 
there appears to be a critical level below which positive effects are essentially impossible and 
that this critical threshold (at 33% likelihood of type 2 thinking in this model) must be passed in 
 
71
 
the transfer environment for positi
ve transfer to occur
 
and result in performance improvements. 
This could be especially important for environments where such time for thinking is not 
necessarily always available, such as fast
-
moving assembly lines
. One implication of this model, 
then, is t
hat organizations sh
ould not only make sure trainees 
can
 
use their training, but also have 
the opportunity to think about the tasks upon which they have been trained.
 
 
Another interesting implication is the degree to which it is useful to encourage 
explora
tion in the transfer
 
environment. The results that observed rates of behavioral transfer 
peak when the exploration rate is about 25%, and performance change follows a similar, but 
noisier, pattern. Such a curvilinear relationship in general is not surprisi
ng as we would expec
t 
exploration rates above 50% to be detrimental because agents are then purposely not exploiting 
their better
-
perceived policy. 
However, it 
is mildly surprising that the optimal exploration rate is 
so much lower than 50%, suggesting tha
t it is better for a
n agent to err on the side of exploiting 
their currently perceived better policy than to explore to some degree. This finding potentially 
informs the implementation of existing tools, such as the popular Error Management Training, 
where
 
learners are encour
aged to make errors as they explore a new KSAO (e.g., Keith & Frese, 
2008). 
Such an approach to training helps improve outcomes as trainees learn from their 
mistakes and push through initial struggles with a new skill. According to the 
present model, this 
error
-
based approach extended to the entire post
-
training period would likely be beneficial for 
outcomes as well, but only to a point. Therefore, we should encourage trainees to continue trying 
a new
ly trained task approach
 
but only to 
a moderate degree be
cause we do want them to settle 
on a behavioral approach for the long term, and we want them to discard approaches which do 
not actually improve performance outcomes.
 
 
72
 
 
Finally, the virtual experiment here exploring the mutual effects of
 
implementation 
inte
ntions and type 2 likelihood can provide us some guidance on how to best include 
implementation intentions in our training designs. The results do suggest that implementation 
intentions are generally beneficial regardless of the environ
ment (assuming here 
that the 

that the required strength to have a substantial effect and the ability of implementation intentions 
to improve our outcomes chang
es based on that env
ironment. When jobs are such that the use of 
type 2 processes is highly likely, the inclusion of implementation intentions is unlikely to have a 
substantial impact on our desired outcomes, though they would still be useful. However, if 
we 
are conducting tr
aining in an environment where type 2 processing is especially 
unlikely
, such as 
a fast
-
paced assembly line or similar environment, 
implementation intentions are likely to be 
highly beneficial to include in our training programs. Howeve
r, we must work to e
nsure these 
intentions are as strong as possible, as weak intentions are also unlikely to have a great effect on 
outcomes.
 
Conclusion
 
 
The above represents the first iteration of modeling in the building of the LTM, which 
has the goal o
f becoming a unifyin
g theory to explicate the moment
-
to
-
moment process 
underlying training transfer. Although not perfect, the general patterns of results appear to align 
well with existing findings. 
W
e know the model is wrong, but the degree to which it i
s 
meaningfully wrong
 
(
Box, 1976
) could be contended to be rather small for the time being
 
as 
exactly replicating existing meta
-
analytic effects is not completely necessary
. Future iterations of 
this portion of the model should attempt to refine the operation of parameters and
 
the math 
governing their effects to better match their real
-
world counterparts, such as the effects of 
 
73
 
practice attempts, but that is a task for anoth
er time. For now, I argue this is a reasonable first 
iteration of the LTM with apparent implications for 
both theory and practice. With that, we know 
that there are substantial ways in which the model as it stands 
is 
meaningfully incorrect
, such as 
not acc
ounting for social learning mechanisms. To rectify this shortcoming, another iteration of 
theorizing and 
modeling was endeavored upon.
 
 
74
 
Study 2
A
: 
Adding Social Learning to the 
LTM
 
The 
first iteration of the LTM
 
appears to have done
 
well in describing the 
transfer 
decision and learning process of a single agent
/learner
, but people in the real world do not lea
r
n 
in isolation. Instead, they also learn from the models around them. Thus, the model 
was
 
iterated 
to include a social learning (Bandura, 1977) proces
s
 
allowing agent
s
 
to learn from 
other 
agents 
in 
their environment
.
 
Social Learning Theory
 
 
Bandura (1977)
 
introduced Social Learning Theory
 
(SLT)
 
partly as a reaction to the then
-
dominant behavioral approaches to learning exemplified by early reinforcement
 
learning. 
Bandura 
(
1977) posits that individuals not only learn from their own experiences, but that the
y
 
also 
learn 
from others. In fact, Bandura argued that most learning occurs through observing 
others in action, a process called modeling. Once the lea
rner observes a model complete an 
action, they can form an idea of how the new behavior is to be performe
d
, and later use that as a 
guide to their own actions. Learning through observ
ing
 
others is more efficient than only 
learning through individual experi
ence as less trial and error is required to lea
r
n a given 
behavior. 

i
zes reciprocal determinism 
between cognitive, 
behavioral, and environmental influences. At the risk of oversimplifying these influences, the 
cognitive
 
processes of the individual affect their behaviors, which affect their environment. The 
individual recei
v
es feedback from the environment based on the effects of their behavior, which 
lead to changes in cognition and behavior in the future.
 
 
The essential
 
proposed 
change in 
the 
LTM
 
once we consider social learning is that there 
is an effect of other individu
a
ls on the learning process of our target learner. The 
LTM account
s
 
for this effect
 
by considering multiple learners engaging in the transfer process simultaneously. 
 
75
 
Obviously, t
his does not represent all potential social learning influences on our target 
l
earner. 
Instead, the current approach best represents an idealized version of a work team or community 
of practice which are all exposed to the same learning intervention and th
en must attempt to 
transfer back to their work environment. Though this concep
t
ualization is admittedly simple, it 
aligns with arguments that involvement in communities of practice can enhance training transfer 
through the sharing of information across the
 
community network (e.g.,
 
Tentin, 2001
). 
 
 
The new 
conceptual 
model, seen in F
i
gure 
18
, incorporates any number of learners in 
addition to 
a
 
target learner. Every learner in the model is assumed to have access to the same two 
policies, and to proceed indiv
idually through the basic decision and learning process 
described 
above
. Howev
e

experience but
 
includes 
feedback from 
the experiences of all other agents. 
That is, it 
is 
a 
mechanis
m whereby other agents in the environment have modeled for the target learner 
t
he 
behaviors represented by the two policies, and the agent is then informed of their effectiveness 
through that observation. 
In this way, the perceived value of each 
p
olicy 
b
ec
omes a type of 
pooled estimate from all learners. This pooling procedure is no
t
 
assumed to take all experiences 
of learners equall
y
. Instead, the pooling for any individual 
is such that they weight their own 
experiences 
differently 
than
 
any of 
their co
-
lea
rner

s
, the extent
 
to
 
which we can control via a 
parameter in the model
. Once 
p
olicy values are updated for each learner, the decision and 
learning process iterates.
 
It is expected that additional learners in the model will improve transfer 
and performance
 
of target learners because it 
reduces
 

u
e 
through the more rapid reduction in sampling error created by more learners gaining experience. 
The faster accrual of experience as a group should allow for 
more quickly
 
disca
rding new 
 
76
 
policies when they are poor, and a decreased likelihood of incorrect
l
y discarding a policy when it 
is good based on 
initial random error that could underestimate policy value.
 
The Formal Transfer Model with Social Learning
 
 
To expand on the 
forma
l version of the LTM
 
to include social learning, 
three
 
essential 
changes
 
must 
b
e 
made to 
the
 
model
: 
additional 
learning agents, 
a way to pool experiences of 
the
 
agents
, and a way to weight the importance of 
group
 
experiences against those of 
the
 
target 
agent. 
The first change is simply conceptual, instead of assuming we are only int
e
rested in one 
agent engaged in the transfer process, multiple agents engag
e
 
in this process simultaneously. 
To 
build on the previous 
reinforcement learning approach from computer science
 
and account for 
all
 
agent experiences
, it could be possible to draw 
o
n algorithms that are designed for multiple 
agents (Sutton & Barto, 2018). However, th
o
se algo
rithms are designed for multiple agents
 
attempting to solve a single problem, giving each agent a chance to explore more possible 
solutions. 
Such
 
approaches are 
n
ot necessarily the best fit for a model of transfer where multiple 

,
 
but only have lim
ited solutions which they could 
apply. 
Future extensions of the present model could explore other options along these line
s
, but 
they 
are not
 
the most parsimonious
 
potential
 
approach
, which is a primary goal of theorizing 
(Box, 1976)
.
 
 
A
 
simpler approach i
s to pool the experiences of multiple learning agents to affect the 
value estimates of each individual agent and thereby a
f
fect application decisions.
 
T
he easiest 
way to pool experiences of other agents in 
a
 

he 

The average 
value then of other agents, for the 
j
th agent in the 
model, can be defined a
s
:
 
 
77
 
 
where N is the number of other agents in the model, and 


is the value estimate of the 
i
th 
other agent
 
for 
Policy A
.
 
C
alculating at the value estimate level avoids an assumption that the 
target agent know
s
 
the outcomes of the 
individ
u
al 
attempts of the other agents, and only 
assumes
 

However, a simple 
averaging of the value estimates of the other agents does assume an equal weighting of the 
opinions of all other age
n
ts,
 
so does not account for network effects and varying strengths of ties 
to those agents. Exploring the effects of networks will be an interesting avenue for future work.
 
 
Regardless of the assumptions, the value estimates of the group must be combined 
w
ith
 
the estimate of the target agent in some way. 
The 
LTM
 
proposes a weighting approach that can 
vary the degree to which the target agent weights their own value estimate over that of the others 
in their group. This approach results in an ability to vary
 
the
 
degree of connectedness between 
the target agent and the rest of the group, 
making the target agent and their group a 
type
 
of 
loosely coupled system (e.g., Weick, 1976). 
Let this level of connectedness be defined by 
C
. 
The 
variable 
C
 
represents a weig
h
tin
g 
factor
 
such that when levels of connectedness are high the 
value estimate of the group will be weighted more heavily than that of the individual. Thus, we 

 
Algorithm 
5
. Other Agent Value Estimation
 

Alg
orithm 
6
. Weighted Value Estimate
 
 
78
 
However, since 


is only calculated when there are multiple agents in the model, when only 
a target agent exists 


. Variables and equations introduced in this section can be 
found in Tables 
7
 
and 
8
 
respectively.
 
 
79
 
Study 2A: 
Method
, Simul
ation and Results
 
 
As with the first model, the 
extension of the LTM
 
was
 
ins
tantiated in a computational 
model by expanding on the model 
from Study 1
 
in NetLogo.
 
A visual of the modeling 
environment and associated code can be found in Appendix 
B
. Otherwise
, methods outlined 
for 
Study 1 apply to this model as well.
 
Virtual Experimentation
 
A primary goal of model exploration at this stage 
was
 
to ascertain the effects of having 
multiple agents 
learning simultaneously, and the degree of influence of those agent
s on 
each 
other
 
in determining transfer outcomes. 
To explore these effects, two simple verification checks 
were made, then three experiments were run to simultaneously check for generative 
sufficiency, 
sensitivity, and robustness.
 
Model verification
 
The tw
o primary changes in this model are the addition of trainees to the modeling 
environment, and the 
mechanism for combining effects of experience from multiple agents for 
use by each individu
al agent. The check to ensure multiple agents are populated into th
e 
environment is simply visual in NetLogo, and it was affirmed that the proper number of agents 
were generated as specified. To check the pooling procedure, levels of connectedness were set
 
at 
0 and 1 and the model run for 500 time steps with 20 agents. Wh
en connectedness is 1, the 
pooled estimate for each agent should be equal to the pooled estimate of all other agents in the 
model. For example, for agent 1, when all other agents have an av
erage estimate for 
Policy B
 
of 
.73, a fully connected model should 
have a pooled estimate of .73 for agent 1. This is indeed the 
case. On the other hand, when connectedness is 0, the pooled estimate for 
Policy B
 
for agent 1 

s own value estimate for that policy. For example, if that agent 
 
80
 
es
timates the value is .70, the pooled estimate should also be 0. This is also indeed the case in 
testing the model. Therefore
,
 
the model appears to be operating as planned.
 
Number of Trainee
s
 
To initially understand the effect of the number of 
learners/trai
nees
 
in the transfer 
environment a series of simulations were completed manipulating the number of trainees from 1 
to 20, 
500 replications each. Other variables held at levels decided upon 
in 
the 
first model (
type
 
2 
likelihood at
 
.8
0
, initial policy estima
tes 
at
 
.5
0
, burn in 
time 
100, transfer time 500, no practice, 
no implementation intentions, true 
Policy A
 
reward .7
0
, change in policy value .05
). 
Additionally, the new c
onnectedness
 
variab
le was set at
 
.5
0
.
 
I
t 
was
 
expected that more agents in 
the model wi
ll improve transfer outcomes by improving the value estimates of 
each 
agent 
through a more rapid increase in sample stability. 
 
In examining these results, it was found that there was almost no relationship at the 
replication level between number of traine
es and either behavioral transfer 
(
r
(10000)
 
= .007
, 
p 
= 
.484
) or post training performance (
r
(10000)
 
= .005
, 
p 
= .617
). At the 
c
ondition
 
level there was a 
slight positive effec
t of the number of trainees on both behavior 
(
r
(
20
)
 
= .14
, 
p 
= .556
) and 
perform
ance (
r
(
20
)
 
= .13
, 
p 
= .585
).
 
Th
is condition
-
level effect can be further examined by 
looking at the condition
-
level results of behavior and pre
-

d
) in Table 
9
. Despite the small positive correlation between the number of 
trainees and 
behavioral transfer
, upon examination of the descriptive statistics in Table 
9
 
it is obvious this 
effect is of little consequence, with transfer only increase from 43% to 44% as the number of 
trainees increases from 1 to 20. On the other hand,
 
there is a substantial impact o
n observed 

d
 
for pre
-
post performance change 
whereas
 
the number of trainees increases from 1 to 
20 the observed effect size increases from .28 to 1.43.
 
 
81
 
Connectedness
 
 
A second set of simulations were run to explore t
he potential effects of the conn
ectedness 
parameter. 
The expected effects of manipulating the degree of the connection between the 
individual and group 
were
 
less clear
 
than for the number of trainees
. 
Assuming having other 
trainees in the model is benefici
al to the agent, it would be exp
ected that more connectedness 
would also be beneficial as the potential detrimental effects of sampling error leading an 
individual agent down a sub
-
optimal path should be diluted the more they take into account the 
experien
ces of other agents.
 
 
To test th
is, the connectedness parameter was swept from 0 to 1.0 in .05 increments, with 
10 agents simulated, and holding all other parameters constant at the same levels in the above 
simulation. Unfortunately, results for this simul
ation were even less impressive 
than those for the 
number of trainees. Relationships between connectedness and behavioral transfer and post 
training performance were 
nonexistent (
r
(10500)
 
= .009
, 
p 
= .356, 
and 
r
10500) = 
.001
, 
p 
= .918
, 
respe
ctively) at the
 
replication level, and were 
mixed at the condition level (
r
(10500)
 
=
 
.217
, 
p 
= 
.345
, 
with behavior,
 
r
(10500)
 
=
 
-
.031
, 
p 
= .894) 
with post training performance, and 
r
(10500)
 
= 
-
.078
, 
p 
= .737) 
with pre
-
post performance change).
 
Table 10 displays the condit
ion
-
level 
outcomes for behavi
oral transfer and pre
-
post performance change effect size. Examination of 
this data confirms no effect of note as behavioral transfer remained 
~44% regardless of 
condition, and pre
-
post performance change was around 
d 
= 1.00 wi
th some random error.
 
Interac
tion between Trainees and Connectedness
 
Given the results of the above simulations it was not expected that an interaction effect 
would be enlightening. However, such a model was proposed for this project, and given the 
intrica
cies of computational models such as these it is possible traditional
 
analyses obscure 
 
82
 
meaningful relationships. Thus, the potential interactive effect of trainees and connectedness on 
transfer outcomes was still explored. It was predicted that a positive 
effect of having multiple 
agents in the model
 
would increase as the d
egree of connectedness increased. This effect was 
expected as the agent should benefit from taking advantage of the extra experiences of their 
colleagues through greater weighting of thos
e experiences and their combined increased 
sampling rate. To test for
 
this, the number of trainees was swept from 1 to 20, and connectedness 
from 0 to 1.0 in .05 steps, fully crossed, while holding all other variables constant at the levels 
chosen from Mod
el 1. 
Moderated multiple regression was used to examine the effects o
f the 
number of trainees and connectedness on behavioral transfer and post training success. In 
alignment with the previous results from this model, there are no discernable main or inter
action 
effects from this 
experiment
. 
In predicting behavioral transfe
r neither the number of trainees 
(
b
0 
= .436; 
b
1 
= 
-
.00004) or connectedness (
b
2 
= .001), nor their interaction (
b
3 
= 
-
.00006) 
demonstrated substantial effects. Similar results were found 
in predicting post
-
trai
ning 
performance (
b
0 
= .722; 
b
1 
= 
-
.000007; 
b
2
 
=
 
.00004; b
3 
= .000007).
 
In accordance with other 
analyses in this paper, heatmaps were also generated to examine potential effects missed by more 
standard analyses. These 
reconfirmed no
 
substantial effects, with differences between conditions 
largely attributable to noise. An exa
mple of this can be seen in Figure 19.
 
 
83
 
Study 2A: Discussion
 
and Conclusion
 
 
The goal of this iteration of the LTM was to account for general effects of social l
earning 
in a training transfer environment in a parsimonious matter. The initial modeling discu
ssed here 
suggests this attempt was a general failure as the expected effects of the primary variables failed 
to emerge.
 
 
Specifically, i
t was expected that the 
number of trainees would improve transfer 
outcomes as the greater numbers would essentially smo
oth out sampling errors for single agents 
which 
could 
lead to suboptimal transfer. This prediction does not appear to have quite been the 
case. At best, there ap
pears to only be a slight improvement in behavioral transfer and post 
training performance as t
he number of agents increase, and nothing like the strong effects 
expected based on SLT (Bandura, 1977)
 
or existing meta
-
analytic social effects in training 
tran
sfer (Blume et al., 2010). A misleading exception to this failure lies in the observed effect 
s
izes comparing pre and 
post
-
training performance, which range from 
d 
= .28 to 1.43. However, 
given the lack of improvement in behavioral transfer and slight rela
tionships between the number 

d 
appears t
o be an artifact of its 

d
 
is in part calculated using the pooled standard deviation of 
the two groups being compared. When the
 
number of agents increases the observed standard 
deviations of performance within those groups
 
decreases as sampling error becomes less 
problematic. Then, when the effects are compared the pooled standard deviation utilized is 
smaller when there are more 
agents, making the effect size appear artificially large. Thus, there is 
an effect of more agen
ts in the model due to their effect on sampling error as was predicted, but 
the effect is not the one which was expected.
 
Further, t
he results for connectedness 
effects were 
even more disappointing. It was hoped that connectedness would be a simple way to 
recreate the 
 
84
 
social support which has a meta
-
analytic effect of .21 on transfer (Blume et al., 2010), but this 
appears to clearly not be the case. More sophistic
ated forms of social learning will need to be 
explored to see if they can account for such soci
al effects.
 
 
Given the clear failure of this integration of social learning into the LTM, we are forced 
to explore other options. It is to the exploration of the
se other options we shall now turn.
 
 
85
 
Study 2B and 2C: Rethinking Social Learning Model
 
 
Unfortu
nately, the initial attempt to include a social learning mechanism in the LTM 
failed. In order to assess other potential options for social learning in the prese
nt context, a 
search was conducted for more existing models of social learning mechanisms. 
Mult
iple 
potential mechanisms have previously been modeled to answer various research questions, such 
as the use of genetic algorithms (e.g., Yeh & Chen, 2001), imit
ation (Richerson & Boyd, 2005)
, 
and emulation (Lopes, Melo, Kenward, & Santos
-
Victor, 2009)
, so
me of which have been 
applied to organizational research such as coordination within teams (e.g., Singh, Dong, & Gero, 
2013)
. Although 
all
 
these approaches, and 
likely others, could prove fruitful
, the present 
modeling will focus on imitation.
 
 
The choice to focus on imitation lies in its use in studying the mutual development of 
culture and genetics in human populations. Richerson and Boyd (2005) described the 
process by 
which human culture and genetics mutually reinforced each other over thousa
nds of years to 
produce 
societ
ies
, and the actual humans within it, that we know today. A primary mechanism 
within models on this relationship 
is
 
social learning as the a
ctions of groups over time are largely 
dictated by the pressures exerted upon the indi
viduals within those groups by the other people 
around them. Over time, these pressures lead to the success and spread of certain cultural 
artifacts and the elimination o
f others
 
and the ability to acquire novel behaviors via social 
learning is a prerequis
ite for cumulative change
. We can see the outcomes of hundreds of 
generations of such pressures in the emergence of complex cultures representing the sum of the 
socially 
selected actions, beliefs, values, etc. that were adaptive for success in the social g
roups 
within which they emerged.
 
 
86
 
 
I argue this view is particularly relevant to examining training in organizations. As has 
been discussed above, social effects are one o
f our most important factors in the success or 
failure of training and subsequent tran
sfer. These social effects come in several guises, such as 
manager and coworker support, climate for transfer, and organizational culture (e.g., Blume et 
al., 2010). 
Much
 
as Richerson and Boyd (2005) view social learning as a mechanism through 
which cultur
e is developed and reinforced, we can view the social effects within training and 
transfer as a form of culture that both affects and is reinforced by the actions of the 
individuals 
within the organization. In our first attempt here at a social learning me
chanism we can already 
see this form of mutual causality. Namely, agents within the model have experiences with their 
task and their experiences combine to form a collect
ive view of the task which is an emergent 
property of the simulated work group. This e
merged view then acts as the cultural context within 
which the agents act, and this context impacts the decisions the agents make. 
These
 
mutually 
causal, simultaneous top
-
down and bottom
-
up (Kozlowski & Klein, 2000) effects then 
dynamically play out over t
ime.
 
Thus, the 
underlying causal relationship modeled in the prior 
iteration of the LTM appears to be in line with other models of culture in the scientific literature, 
b
ut the actual social learning mechanism too simplistic to have the expected effects. T
o rectify 
this shortcoming, two models were built and explored to study the potential effects of imitation.
 
In Richerson and Boyd (2005) imitation occurs when organisms 
c
opy 
others in their 
environment as a way to navigate that environment. As in SLT (Bandura, 1977), imitation allows 
organisms, in this case humans, to learn new behaviors and the consequences of those behaviors 
through observation. This observation improves
 
the
 
rate and outcomes of individual learning, all 
else being equal. Two predominant forms of imitation 
are pertinent here. The first form is 
imitation of the successful
 
where individuals will tend to do the actions the
y
 
see successful 
 
87
 
individuals around t
hem 
doing on the same tasks. Such a mechanism can be seen throughout our 
modern world. For example, many people play and watch major sports with a dream of someday 
being as good as the professionals they see on television. To improve at their own games, a 
comm
on approach is for individuals to attempt to emulate the athletes they see succeeding at the 
same game. Simple searches for tips on golf, for example, return hits on how to drive the ball 
like Rory McIlroy, or putt like Phil Mickelson.
 
Richerson and Bo

 
(2005)
 
second form of 
social l
earning occurs through a 
frequency 
bias 
where learners tend to do the things that the majority of their peers are doing. That is, when 
in a group, people will tend to do the things that the individuals around them are doi
ng, 
using a 

This approach is especially 
adaptive to learners in unfamiliar situations in that they can take cues from those around them on 
how to navigate the novel environment. Richerson and 
Boyd


to survive in their environment.
 
In either case, Richerson and Boyd (2005) argue these social learning mechanisms are 
fast 
and 
frugal forms of learning
 
which offload the burden of learning much information through 
direct experience. 
In addition, these learning mechanisms are biases in favor of following the 
successful or the lead of their social groups. 
Thus, their framing of 
the 
benefit of these social 

cognition (e.g., Kahneman, 2011). 
However, for clarity we need to find a way to readily 
distinguish between the two types of social lea
rnin
g Richerson and Boyd (2005) describe. In 
their description conformity is more about a form of coercion than it is vicarious learning in the 
form of SLT (Bandura, 1977). Therefore, in our nomenclature it does not seem quite correct to 
 
88
 
label it directly 
as a
 
form of imitation. On the other hand, the imitation of the successful does fit 
well with SLT. Thus, it seems proper to let imitation mean specifically 
imitation of the 
successful
, while relabeling the coercive form as 
conformity
. Therefore, the rest o
f th
is paper will 
use 
imitation
 
and 
conformity
 
to refer to these types instead of imitation alone.
 
To examine the potential effects of these mechanisms on the LTM, two independent 
iterations of the theory and associated computational model were made. The f
irst
, referred to as 
Model 2B, focused on imitation, while Model 2C focused on conformity.
 
Model 2B Overview
 
 
This iteration of the LTM explores the effects of social learning through a tendency for 
learners to imitate other successful learners in their 
en
vironment. For this mechanism, it is 
proposed that learners observe others in their environment, judge their performance and their 
behavior, and have some degree of likelihood of following the same behavior that high 
performing other agents are enacting.
 
T
o include this mechanism in the formal version of the 
LTM and its associated computational model, we need to make two adjustments to Model 2A.
 
 
The first change lies in the observational and pooling procedure originally proposed. 
Instead of pooling the v
al
ue estimates of all other learners in their environment, learners must 
instead track the actual performance of their fellow learners. From that set of learners, they must 
then judge which one exhibits the highest performance 
for
 
them to make a judgement 
ab
out how 
to imitate that high performer. This approach does assume that learners have a perfect ability to 
judge the performance of others
 
that was not present in Model 2A
, but this assumption can be 
relaxed and explored in future modeling endeavors.
 
 
Onc
e 
performance and behavioral judgements have been made
, a mechanism needs to be 
created for those observations to affect the behavioral choices of the observing agents. 
Within 
 
89
 
the dual processing framework
,
 
the decision to imitate other agents is proposed 
to
 
fit more 
cleanly into type 2 processes due to the level of cognitive effort required to make accurate 

possible that the mechanism could be placed in type 1 proce
ss
es which would fit with future 
explorations where the assumption of perfect observation is relaxed, but for now we will assume 
the decision to imitate the successful is a more conscious and effortful one than a more automatic 
one. Thus, when individuals 
en
gage in their type 2 processes, they first must 
decide
 
on whether 
to imitate someone else or not. Within the computational version of this model, the likelihood of 
imitating another agent is controlled with a parameter labeled 
imitate
. 
If a learner does 
ch
oose to 
imitate someone else, they must scan their environment for other learners and judge their 
performance to identify the one with the highest level of performance. Once identified, they must 
then observe the behavioral choices that learner is making
 
a
nd then apply the same behavior. In 
the computational model agents carry out this observation process and choose to apply the 
behavioral policy enacted by the most successful other agent in their environment 
on the 
previous task attempt. 
Therefore, the b
eh
avior of any agent 
i
 
in the model at time 
t 
+ 1 is the 
behavior of the highest performing other agent in the model at time 
t, 
when agent 
i 
chooses to 
imitate during task attempt 
t 
+ 1.
 
Model 2C Overview
 
 
Model 2C represents a modification on the theory a
nd
 
model in 2B to change the social 
learning mechanism from imitation to conformity. Thus, instead of a tendency to do the same 
behaviors as successful learners around them, under this model learners tend to do the behaviors 
that the majority of the learne
rs
 
around them are doing. In this case, individuals do not need to 
track the performance of others around them, only their behavioral choices. From these 
 
90
 
observations the learner can use a simple voting procedure to determine which behavioral choice 
is the
 
m
ost common among their group. In the computational version of this proposal, when each 
agent decides to conform to their group their behavior on the task at time 
t 
+ 1 is equivalent to 
the behaviors displayed by the majority of the other agents in the en
vi
ronment at time 
t. 
The 
tendency to conform to the group on any given task attempt is controlled via a 
conform 
parameter.
 
 
91
 
Study 2B: Method, Simulation and Results
 
 
The described theoretical additions to the LTM regarding the use of imitation as a social 
le
arning mechanism were instantiated in a computational model, expanding on the base model 
explored 
in Study 1. A screen shot of the modeling environment and copy of the associated 
simulation code in NetLogo can be found in Appendix C. The new mechanisms i
n 
this model 
were verified through examining the tracking mechanisms of agents to ensure they were 
correctly identifying the performance and behavior of the top performing other agents in their 
environment, and that they would follow the indicated behavior
 
u
nder conditions where they 
always engage in type 2 processes and always imitate the best performers. Once this was 
completed, a small experiment was run to study the effects the number of trainees in the model 
and level of imitation have on our outcomes 
of
 
behavioral transfer and task performance.
 
It was 
expected that 
both the number of trainees and level of imitation would improve transfer 
outcomes as agents would benefit from the increased sampling rate of the agents around them, 
making it more likely t
ha
t at least one agent discovers that 
Policy B
 
is indeed the better 
policy 
and 
that finding would propagate through the rest of the agents.
 
Trainees Versus Imitation Experiment
 
 
To study the effects of the number of trainees and level of imitation in 
this ve
rsion of the 
LTM, a simulation was conducted crossing the number of trainees, swept from 1 to 20, with 
level of imitation rate swept from 0 to 1 in .05 increments. Other variables were held at the levels 
chosen in Model 1: 100 pre
-
training time step
s, 500 
transfer time steps, type 2 likelihood set at 
.8
0
, initial policy estimates were .5
0
, true value of 
Policy A
 
was .7
0
 
with a change in value of 
.05, and 500 replications of each condition.
 
 
92
 
 
In examining the results of this simulation, we see the 
dire
ction o
f 
relationships expected 
both within this model, and more broadly as guided by social effects in the training literature. 
Namely, at the replication level, the number of trainees in the model was positively related to 
both behavioral 
transfer 
(
r
(210
000)
 
= 
.103
, 
p 
< .001
), and post training performance (
r
(210000)
 
= 
.156
, 
p 
< .001
).
 
In addition, the level of imitation was also positively related to both behavioral 
transfer 
(
r
(210000)
 
= 
.495
, 
p 
< .001
), and post training performance (
r
(210000)
 
=
 
.730
, 
p
 
< 
.001
).
 
Then, a mu
ltiple regression analysis was performed to test the joint effects of the number 
of trainees and imitation level on the transfer outcomes. In predicting behavioral transfer, 
both 
the number 
of 
trainees (
F
(
3
, 20999
6
) =
 
89557.07, 
p 
< .001, R
2
 
= 
.75;
 
b
0
 
= .728
, 
t 
= 3096.22, 
p 
< 
.001
; 
b
1 
= .004, 

= .156
, 
t 
= 107.61, 
p 
< .001
) and imitation rates (
b
2 
= .392, 

= .730
, 
t 
= 505.13, 
p 
< .001
) show positive relationships, and a positive interaction (
b
3 
= .006, 

= .064
, 
t 
= 44.04, 
p 
< .001
). In predict
ing post training performance, both the number of trainees (
F
(
3
, 20999
6
) = 
24351.15
, 
p 
< .001, R
2
 
= .
51
;
 
b
0
 
= .736
, 
t 
= 32701.20, 
p 
< .001
; 
b
1 
= .0002, 

= .106
, 
t 
= 54.78, 
p 
< .001
) and imitation rates (
b
2 
= .020, 

= 
.495
, 
t 
= 263.55, 
p 
< .001
) show posi
tive relationships, 
and a positive interaction (
b
3 
= .0003, 

= .046
, 
t 
= 24.36, 
p 
< .001
). Finally, predicting pre
-
post 

d
) across conditions, both the number of trainees (
F
(3, 
41
6) = 1816.19, 
p 
< .001, R
2
 
= .96;
 
b
0
 
= 2.346
, 
t 
= 175.72, 
p 
< .001
; 
b
1 
= .137, 

= .770
, 
t 
= 
59.00, 
p 
< .001
) and
 
imi
tation rates (
b
2 
= 1.849, 

= .548
, 
t 
= 41.93, 
p 
< .001
) show positive 
relationships, and a positive interaction (
b
3 
= .111, 

= .189
, 
t 
= 14.47, 
p 
< .001
)
. 
These
 
interaction e
ffects, displaying an augmenting effect between the number of trainees and imitation 
rates
, were graphed in Figures 20
-
22. In addition, to better understand the nuances of these 
effects, heat maps were generated and c
an be found in Figures 23
-
25.
 
In examin
ing these heat 
maps, we see almost no actual effect of the number of trainees 
beyond that gained by adding 
 
93
 
even one agent to the model. On the other hand, we see a steady improvement in behavioral 
transfer and post tr
aining performance as imitation rates i
ncrease. In the calculation of pre
-
post 
performance 
effects,
 
we see the highest effect sizes when trainees and imitation are high, but this 

d.
 
 
94
 
Study 2C: 
Method, Simulation and Results
 
 
The theoret
ical additions to the LTM described above regarding the use of conformity as 
a social learning mechanism were instantiated in a computational model, expanding on the base 
model 
introduced 
in Study 1. A screen shot
 
of the modeling environment and copy of th
e 
associated simulation code in NetLogo can be found in Appendix D. The new mechanisms in 
this model were verified through examining the tracking mechanisms of agents to ensure they 
were correctly identifying the 
behaviors of their fellow agents, and that 
they would follow the 
indicated behavior of the majority under conditions where they always engage in type 2 
processes and always conform. Once this was completed, 
an
 
experiment was run to study the 
effects of the
 
number of trainees in the model and level 
of conformity have on our outcomes of 
behavioral transfer and task performance.
 
It was expected that both the number of trainees and 
level of 
conformity 
would
 
again
 
improve transfer outcomes as agents would benefi
t from the 
increased sampling rate of the a
gents around them, making it more likely that at least one agent 
discovers that 
Policy B
 
is indeed the better 
p
olicy 
a
nd that finding would 
spread 
through the rest 
of the agents.
 
Thus, as
 
with the imitation model,
 
it was expected that we would see positive
 
relationships between the number of trainees, level of conformity, and the transfer outcomes.
 
Trainees Versus 
Conformity
 
Experiment
 
 
To study the effects of the number of trainees and level of 
conformity
 
in this 
version of 
the LTM, a simulation was conduc
ted crossing the number of trainees, swept from 1 to 20, with 
level of 
conformity
 
swept from 0 to 1 in .05 increments. Other variables were held at the levels 
chosen in Model 1: 100 pre
-
training time steps, 500 tr
ansfer time steps, type 2 likelihood set at
 
.8
0
, initial policy estimates were .5
0
, true value of 
Policy A
 
was .7
0
 
with a change in value of 
.05, and 500 replications of each condition.
 
 
95
 
 
In examining the results of this simulation, we see the 
opposite of t
he 
general 
relationships expected within th
is model. 
A
t the replication level, the number of trainees in the 
model was 
negatively
 
related to both behavioral 
transfer 
(
r
(210000)
 
= 
-
.
2
03
, 
p 
< .001
), and post
 
training performance (
r
(210000)
 
 
= 
-
.1
44
, 
p 
< 
.001
). In addition, the level of 
conformity
 
was
 
also 
negatively
 
related to both behavioral transfer (
r
(210000)
 
 
= 
-
.
742
, 
p 
< .001
), and post training 
performance (
r
(210000)
 
 
= 
-
.528
, 
p 
< .001
).
 
A
 
multiple regression analysis was performed to test 
the joint
 
effects of the number of trainees and 
conformi
ty
 
level on the transfer outcomes. In 
predicting behavioral transfer, both the number of trainees 
(
F
(3, 299996) = 103802.16, 
p 
< .001, 
R
2
 
= .77; 
b
0
 
= .
232
, 
t 
= 907.59, 
p 
< .001
; 
b
1 
= 
-
.006
, 

= 
-
.203
, 
t 
= 
-
146.30, 
p 
< .001
) and 
conformity
 
rates (
b
2 
= 
-
.452
, 

= 
-
.742
, 
t 
= 
-
536.13, 
p 
< .001
) show positive relationships, and a 
negati
ve
 
interaction (
b
3 
= 
-
.007
, 

= 
-
.070
, 
t 
= 
-
50.73, 
p 
< .001
). In predicting post training 
performance, both the n
umber of 
trainees (
F
(3, 299996) = 30371.68, 
p 
< .001, R
2
 
= .55; 
b
0
 
= 
.
712
, 
t 
= 30279.36, 
p 
< .001
; 
b
1 
= 
-
.0003
, 

= 
-
.144
, 
t 
= 
-
78.84, 
p 
< .001
) and 
conformity
 
rates (
b
2 
= 
-
.023
, 

= 
-
.528
, 
t 
= 
-
289.96, 
p 
< .001
) sh
ow 
negative
 
relationships, and a 
negative
 
interaction 
(
b
3 
= 
-
.000
4
, 

= 
-
.052
, 
t 
= 
-
28.63, 
p 
< .001
). Finally, predicting pre
-
post training performance 

d
) across conditions, both the number of trainees (
F
(3, 
416
) = 
3150.12
, 
p 
< 
.001, R
2
 
= .55; 
b
0
 
= 
.058
, 
t 
= 8.78, 
p 
< .001
; 
b
1 
= 
-
.012
, 

= 
-
.104
, 
t 
= 
-
10.28, 
p 
< .001
) and 
conformity
 
rates (
b
2 
= 
-
1.979
, 

= 
-
.918
, 
t 
= 
-
91.21, 
p 
< .001
) show 
negative rel
ationships
, and a 
negative
 
interaction (
b
3 
= 
-
.121
, 

= 
-
.323
, 
t 
= 
-
32.03, 
p 
< .001
).
 
These interaction effects, 
displaying a
 
dep
ressive 
effect between the number of trainees and 
conformity
 
rates, were 
graphed in Figures 2
6
-
2
8
. In addition, to better understand the nuances of these effects, heat 
maps were generated and can be found in 
Figures 2
9
-
31
.
 
These heat maps show that the bes
t 
outcomes indeed occur when the number of trainees and rates of conformity are low. In addition, 
 
96
 
they also show a clear sensitive area in the model and some non
-
linear effects. Specifically, 
outcomes suddenl
y improve as conformity rates drop below about .
45. However, the level this 
change occurs depends on the number of trainees in the model, such that the transition occurs at 
higher levels of conformity when there are fewer trainees in the model. An interest
ing pattern 
also emerges comparing when there is
 
an odd number of trainees in the model 
versus an even 
number such that the transition point to better outcomes occurs at a higher level of conformity 
for even numbers of trainees than for the odd numbers aro
und them. This is likely a statistical 
artifact 
of the voting process in the model rather than something of great significance.
 
 
97
 
Study 2B and 2C: Discussion
 
and Conclusion
 
 
The models introduced in Studies 2B and 2C were meant to explore other 
potential 
mechanisms to account for social effects in train
ing transfer studies which the initially proposed 
model failed to do. From the initial experimenting outlined here, it appears that either have the 
ability to provide interesting insights into the 
transfer process, at least beyond that obtained in the 
orig
inal theory. Here, let us briefly discuss the implications of these models for both theory and 
practice.
 
Implications for Theory
 
 
The primary effects on transfer from a social standpoint lie in the
 
environmental factors 
of perceived support and climate for
 
transfer. Corrected meta
-
analytic estimates for the effects of 
these on transfer are .21 and .27, respectively (Blume et al., 2010). In Model 2B we studied the 
potential effects of imitation on tr
ansfer, which is the tendency to engage in the behaviors ot
her 
successful learners are engaged in. In Model 2C we viewed the social learning mechanism from 
a standpoint of conforming to the behavioral tendencies of the majority of th
e other learners 
around
 
a target learner. These mechanisms could be argued to fit 
conceptually with what is 
occurring in producing the effects we see for support and climate. Namely, these two effects are 
based on the perceptions of learners of the actions of those around them f
or providing the 
necessary social and physical conditions i
n which they can transfer their training. In 
both present
 
models, as argued above, we have created an emergent environment from the actual behaviors of 
agents which in turn produce a social context
 
in which the agents must act. When the social 
environment 
created by the agents is such that it promotes the transfer of the trained policy, this 
is akin to the agents perceiving an environment supportive of their transfer attempts. 
 
 
98
 
 
In addition, the two
 
models might fit conceptually a little better with specifi
c effects. For 
example, Model 2B relies on target agents seeing other successful agents apply a behavior that 
the target agent can then mimic. This is more of a one
-
on
-
one interaction where the suc
cessful 
agent essentially either supports or does not the t

role model for that behavior or not. 
T
his conceptually fits with recommendations for managers to 
promote transfer by modeling desired behaviors to the
ir employees (e.g., Lancaster, Di Milia, & 
Cameron, 2013). 
Although not labeled as managers in the model here, managers could be seen as 
successful employees whom their followers are likely to view as role models as occurs in the 
outlined mechanism in Mode
l 2B. 
To that end, examination of Model 2B shows an ability
 
to 
create relationships in the desired direction, but 
the effects of the actual 
imitation mechanism are 
of a substantially larger magnitude than observed in meta
-
analyses of support effects on tra
nsfer 
(Blume et al., 2010). A reason for the increased effe
ct sizes observed in the present model being 
so much larger than target effect sizes may be the current ability of the agents to perfectly view 
the performance and behaviors of their models. Adding
 
noise to their observations to reflect 
imperfect observati
on in real life may correct this deficiency and bring findings more in line with 
research. For now, this model appears to be a great step forward over the previous iteration of the 
model that fits 
better with existing research both within our field, and wi
th scientific efforts 
around social learning in general.
 
 
The case surrounding Model 2C
 
is considerably more complicated. 
Conceptually, Model 
2C fits more cleanly with effects of transfer climate b
ecause the underlying mechanism in this 
model, conformity, 
is no longer a one
-
on
-
one modeling case, rather it is truly a group
-
level 
consideration. That is, here the agents form a social environment regarding their collective use of 
the behaviors available
 
to them. That social tendency to use either behavior avail
able to them or 
 
99
 
not could be considered the climate for the use of that behavior. Therefore, if the group tends to 
use the trained 
Policy B
, the 
climate
 
would be one that is positive for transfer. 
If the group tends 
to use the pre
-
existing 
Policy A
, there 
would be a negative climate for transfer. Initially it was 
expected that conformity would be positively related to transfer as the group would be more 
likely to collectively realize the trained pol
icy is beneficial for their use and therefore create that 
p
ositive climate. However, that is not what we find in simulation. Instead, we find that 
conformity has a substantial negative effect on transfer, although the absolute magnitudes of 
those effects a
re closer to the meta
-
analytic effects of climate than obse
rved for support in Model 
2B (Blume et al., 2010). 
This finding was initially surprising. However, in retrospect, because 
of 
the dynamic nature of the mechanism the negative relationship
 
perhaps
 
sh
ould be expected. 
Specifically, the 
behaviors of agents at 
any time 
t 
+ 1 are a function of the behaviors of the group 
at time 
t
. 
This in effect creates a heavy bias, especially when conformity is high, towards the 
continued use of 
Policy A
 
because when the agent begins the transfer period the behavior for all 
age
nts at the previous time point was 
Policy A
. To overcome this fact
,
 
it 
takes the agents 
independently choosing to apply 
Policy B
 
and slowly changing the balance of the group until a 
majority are applying 
Policy B
. This process is harder and takes longer wh
en there are more 
agents in the environment, which also accounts for th
e now negative effect of the number of 
trainees in the model. 
 
 
Therefore, 
I argue that 
initially low pressure to conform is actually akin to a climate 
allowing transfer because the age
nts are free to explore the benefits of their training rather than 
bein
g pressured to avoid doing so
 
and the inverse directional relationship with how we 
operationalize this effect in the literature should be expected
. Within actual work groups there 
can b
e substantial pressure placed on workers not to comply with organizatio
nal interventions 
 
100
 
such as training and therefore create a negative climate for transfer, whereas the absence of such 
pressure can be interpreted as positive climates for transfer. Over 
time, as more individuals pick 
up the new behavior that pressure could 
reverse to lead to improved transfer. This essential 
switch in pressure could be a reason why we see the sudden improvement in transfer outcomes 
when conformity drops below .45 in the p
resent model. Early in transfer attempts the agents need 
that freedom t
o do their own exploring, but as transfer goes on there is some benefit to social 
pressure helping bring late adopters over to begin transferring at a greater rate. Thus, it appears 
tha
t with some 
reconceptualization of what the parameters represent, this 
model also provides a 
potential window into existing effects in the transfer literature, though more work will be 
required to explore those effects and verify the mechanisms are correct
. For the time being, given 
the closer absolute magnitude of the effect
s in this model and the more nuanced relationships 
observed with the parameters in this model with the outcomes of interest, this model may 
provide more potential insights than Model 2B
 
for future work. It is for this reason that this 
model is chosen to pr
ovide the basis for the next round of modeling outlined in Study 3, although 
it is acknowledged much work remains to verify this model for long term use.
 
Future modeling of social learn
ing
 
As mentioned in the introduction to these two models, there are oth
er versions of social 
learning which could be interesting to explore in future efforts to model the employee learning 
and transfer process. For example, the use of a genetic algorithm a
pproach (e.g., Yeh & Chen, 
2001) could provide interesting insights for
 
skill development if we were interested in how 
employees might generate their own novel solutions to work tasks and then propagate their 
discoveries to their work groups and the organi
zation at large. Furthermore, existing models 
sometimes model tradeoffs
 
in several dimensions of preference at the same time. Along these 
 
101
 
lines, Lopes et al. (2009) modeled tradeoffs between making decisions based on individual 
preferences, versus making t
hose choices based on social pressures from either imitation or 
emulati
on. In their modeling, momentary choices are based on a tripartite tradeoff between these 
three pressures, where increased weight of any one type lowers the weight of the other two. In 
the present modeling we focused on only a single type of social learnin
g, imitation or 
conformity, and a tradeoff with choices based on personal experience. Future iterations should 
explore the potential simultaneous effects of these pressures.
 
 
In additio
n, models where imitation and conformity are placed in type 1 processes
 
instead 
of type 2 processes should be explored. The argument for why they were placed in type 2 
processes for this model was laid out above, and it seems likely that placing these mech
anisms in 
type 1 processes would only exacerbate the overly strong rela
tionships they displayed with 
transfer and performance compared to meta
-
analytic estimates (Blume et al, 2010)
, although 

description of these 
mechanisms as fast and frugal, though it is not cl
ear they are discussing cognitive load rather 
than general effect.
 
This expectation is due it furthering their ability to override type 2 processes 
prior to being able to make any furth
er conscious judgement. On the other hand, the way they 
were included i
n the model may de facto place them somewhere between the clean separation of 
type 1 and type 2 processes otherwise followed in this paper, as they are only enacted when type 
2 processe
s are called upon but do not make the same level of logical judgement a
s other type 2 
processes modeled and instead override those processes to automatically imitate or conform. 
Thus, the decision to imitate or conform is treated as conscious, but its exec
ution is more 
automatic in nature. The 
middle ground occupied in the pr
esent model by the imitation and 
conformity mechanism does potentially fit with the view of cognitive systems on a continuum 
 
102
 
from conscious and effortful to automatic (Evans & Stanovich
, 2013). Regardless, it might be 
more fruitful to relax the assumptions
 

model, which would add noise to the imitation and conformity decisions and should therefore 
lower the observed relationships to m
ore closely match 
existing
 
meta
-
analytic effects. These are 
future expl
orations which should be undertaken to refine the present model.
 
Other modeling possibilities
 
It is also possible that the models here could be combined with other theories to study 
gro
up effects on training transfer. One intriguing example would be to inc
orporate Diffusion of 
Innovation Theory (e.g., 
Rogers, 2003
) to study the propagation of transfer through a work 
group. As discussed regarding the reasons for the unexpected negative re
lationship between 
conformity and outcomes in Model 2C, a key to overco
ming the momentum of the group is for 
individual agents to adopt the target behavior and slowly bring other agents on board. 
Eventually, a tipping point is reached where it becomes acce
ptable for the group to use the 
trained behavior where a critical mass 
will do so, then over time straggling agents will be 
pressured to adopt the new behavior instead of clinging to the old one. Conceptually this fits 
quite well with the diffusion of inno
vation, where research suggests new innovations are adopted 
in stages 
a
cross 
populations. Initially, new innovations are adopted by only a few individuals, 
called innovators. Some individuals will follow these first adopters eagerly, representing early 
ado
pters but still making up a minority of the population. Once this group
 
has opened the door, 
more and more individuals pick up the innovation in rapid succession as a critical mass is 
reached and soon 
most of
 
the population uses the innovation. Eventually,
 
only a few holdouts 
remain, representing the laggards in the populatio
n (see 
Rogers, 2003,
 
for 
a broader overview
). 
The conformity model here may inadvertently speak to this process in a transfer environment 
 
103
 
where the innovation is the newly introduced be
havioral possibilities. Expanding on the 
proposed mechanisms and explor
ing others from the Diffusion of Innovation literature could 
provide new and useful insights into the social pressures governing transfer rates in 
organizations.
 
 
In addition, it would 
be interesting to pair this approach with network effects to 
understand
 
how training adaptation might propagate across a network of employees. Social 
networks have gained much traction in organizational psychology of late (e.g., 
Soltis, Brass, & 
Lepak, 201
8
), but networks have long been of interest in related fields. One of t
he benefits of 
using NetLogo as the base for the present simulation is the existence of easily accessible network 
models
 
which could be integrated with the present theory. For example, 
NetLogo comes with a 
model to study diffusion of information across dir
ected networks (Stonedahl & Wilensky, 2008). 
Further, related theories could be drawn on to more broadly study the long run development of 
employees as suggested above, but in the ecolo
gical context of their work group as has 
previously been suggested with
 
Ecological Systems Theory (Bronfrenbrenner, 1977, 1979) in 
understanding child development (Neal & Neal, 2013). 
By drawing on such existing modeling 
efforts we could greatly enrich our
 
understanding of the social effects on 
the transfer of training.
 
Such 
work would also provide potential extensions of recent research on vicarious learning 
mechanisms within teams (Myers, in press).
 
Implications for Practice
 
 
The combined effects of the 
above discussions and 
modeling here have some 
implications for 
practice as well. When it comes to promoting transfer
, it is especially important 
to promote a climate wherein learners are free to explore their training and not succumb to group 
pressure to 
r
evert
 
to pre
-
training behaviors. This is important to have esta
blished at the very end 
 
104
 
of training so that the dynamic effect of observing continued use of old behaviors and the 
pressure that use can create is broken to allow greater time for exploration. 
Once that climate is 
established, practitioners should encourag
e learners to observe the most effective individuals in 
their work groups and follow their lead. Through this sequence, one may be able to unlock the 
benefits of both types of social learning e
xplored in the present models.
 
Conclusion
 
 
It seems apparent th
at the models explored in studies 2B and 2C are substantial 
improvements upon the LTM over both the baseline model, and the first attempt to include a 
social learning mechanism. These models ha
ve a stronger basis in the broader scientific literature 
and ap
pear to conceptually fit with ways we think about groups in the organizational sciences. 
However, much work will remain to establish which of these mechanisms, or which mix of them, 
best accoun
ts for transfer effects. For the time being, they appear to off
er potential theoretical 
and practical insights and more work along these lines is encouraged.
 
 
105
 
Study 3A: 
Adding Self
-
Regulation to the Transfer Process Model
 
Unfortunately, the originally prop
osed social learning mechanism for the LTM did not 
operate as e
xpected. However, it appear
s
 
that the alternate models exploring conformity and 
imitation show greater promise for examining the effects of social groups in the transfer process. 
It was also ar
gued that the conformity model fit better conceptually with the
 
social environment 
effect we study through transfer climate, despite having the opposite direction effect initially 
expected. 
Therefore, 
the conformity
 
model w
as
 
used as a basis for a third it
eration of the LTM 
which integrates a perspective that
 
no withi
n
-
person process model could, or at least should, 
avoid addressing 

 
Self
-
R
egulation (e.g., Vancouver, 2008).
 
Self
-
Regulation
 
 
Self
-
regulation 
is the dominant theory of motivation in organizati
onal psychology 
(
Vancouver
 
& Day, 2005) and describes how individuals guide their actions towards goals over 
time (Karoly, 1993).
 
Hierarchical goal pursuit
 
Goals are internally represented desired states of being held by individuals (Lord, 
Diefendorff, Sch
midt, & Hall, 2010)
 
and ar
e the central construct in self
-
regulatory systems
. 
Goals exist in a hierarchy, such that individuals possess both short and long
-
term goals with 
short
-
term goals being nested underneath longer
-
term goals. As lower level goals are
 
completed, 
individuals mo
ve closer to attaining higher
-
level goals. This hierarchical system is the basis of 
theories of self
-
regulation, and the application of self
-
regulation to organizational phenomena
 
such as training transfer
 
(e.g., Carver & Sheier, 
1998; Powers, 1973; 
Blume 
et al.,
 
2019
). At each 
level of the goal hierarchy a self
-
regulatory system monitors goal progress and adjusts system 
outputs 
to
 
maintain the desired goal level (Vancouver & Day, 2005).
 
 
106
 
Self
-
regulatory negative
 
feedback systems
 
Se
lf
-
regulatory systems are 
based around negative feedback loo
ps which attempt to 
minimize discrepancies between the goal of the system and perceived progress towards it 
(Vancouver & Day, 2005). Two major versions of self
-
regulation exist, Social Cognitive T
heory 
(
SCT; 
Bandura, 1991;
 
Schunk & Usher, 2012)
 

Learning Theory
 
(S
L
T)
, and Control Theory (
CT; 
Powers, 1973; Carver & Sheier, 1998). 
Social 
Cognitive and Control versions of self
-
regulation do not differ all that 
substantially and make 
div
erging predictions only in 
specific
 
circumstances (Vancouver, Gullekson, Morse, & Warren, 
2014). However, CT has the distinct advantage of relying on more formal forms of logic than 
SCT does, which is reliant on the narrative appr
oach to theorizing while C
T is more 
computational in nature due to its historical roots. As discussed previously, narrative theory is 
useful for conveying ideas, but suffers in making formal predictions 
(Vancouver, 2012; Adner, 
Polos, Ryall & Sorenson, 200
9).
 
Thus, because this pap
er is building a formal model of transfer 
and CT is a more formal version of self
-
regulation, CT will provide the basis of the present 
theorizing.
 
 
Many specific versions of CT exist (e.g., Powers, 1973; Campion & Lord, 1982; Carv
er 
& Sheier, 1998; Vancouv
er, 2008)
, but all have crucial elements in common. Broadly they all 
consider the nested goal hierarchy previously mentioned, but more specifically the basic negative 
feedback system of regulation relies on just a few key ideas. 
L
ord and Hanges (1987) 
argue that 
the negative feedback systems of CT all have five elements
:
 
1) some standard (goal) which the 
system seeks to maintain, 2) a sensor which monitors the state of the environment, 3) a 
comparator which compares the standard to
 
the sensed environmen
t, 4) a decision mechanism to 
decide whether something should be done to reduce any perceived discrepancy, and 5) an 
 
107
 
effector mechanism which produces some behavior meant to reduce the perceived discrepancy. 
As goal striving unfolds o
ver time the regulator
y system monitors progress towards that goal and 
works to lower discrepancies through multiple mechanisms. The two primary pathways to 
reducing discrepancies are 1) changing output behaviors that affect the perceived environment, 
such
 
as increasing effort,
 
and 2) changing the set
-
point of the goal in question (Campion & Lord, 
1982). 
Within the CT literature it has been a point of contention on which option is most likely to 
occur, with some theorists arguing that goals are more easily 
adjusted,
 
and that beh
avior is 

 
Self
-
Efficacy
 
Arising from the Social Learning and Social Cognitive perspective of Bandura (1977, 
1991), self
-
efficacy 
represents 
the 
othe
r 
central variable in 
self
-
regulation. Self
-
efficacy is the 
belief individuals hold regarding their ability to execute desired behaviors in the pursuit of some 
outcome (Bandura, 1977)
, and 
is the primary mechanism through which individuals exert agency 
(Ba
ndura, 1977, 1989, 199
1). That exertion of agency being represented in which environments 
people choose to enter (Bandura, 1989), and in choosing 
the
 
tasks with which they will engage 
(Bandura & Cervone, 1987). 
Control theorists generally agree that effica
cy is an important 
con
struct (Vancouver, 2012), but they dispute its nature, which complicates 
how efficacy will be 
instantiated in the present modeling.
 
 
Much research on self
-
efficacy followed the predictions of Bandura (1977, 1989, 1991), 
who describes 
efficacy as having a (
nearly) uniform positive effect on performance. Through its 
influence on task and environmental choices, efficacy 
has
 
a .38 meta
-
analytic relationship 
with 
performance 
between individuals (Stajkovic & Luthens, 1998). Theory and resear
ch on the 
within
-
perso
n nature of efficacy is less uniform in its findings. 
In explicating their theories, 
 
108
 
multiple observers,
 
including Bandura (1977)
,
 
have discussed how extremely low levels of 
efficacy will lead to complete task disengagement as a cogni
tive defense mechanism
 
(e.g., 
Lindsley, Brass, & Thomas, 1995). 
Additionally, 
Vancouver
 
and colleagues have argued that the 
relationship between efficacy and performance is not always positive (Vancouver 
et al.,
 
2014), 
and 
in a series of experiments 
have s
uggested that the rela
tionship between efficacy and task 
engagement is discontinuous in nature. Specifically, at very low levels of efficacy individuals are 
unlikely to engage in a task at all, but as their efficacy 
increases,
 
they will suddenly choose to 
engage in the task but will need to put forth maximum effort in order to succeed. Then, as 
efficacy continues to increase the individual will reduce effort as they do not feel full effort is 
required to ensure success, conserv
ing those resources for when t
hey are more necessary 
(Vancouver, Moore, & Yoder, 2008). 
The location of that discontinuity can be moderated by the 
nature of the task in question as well, such as by manipulating the value attached to said task 
(Sun, Vancouv
er, & Weinhardt, 2014). 
 
The 
L
TM
 
with Self
-
Regulation
 
In some ways, motivational aspects 
of goal pursuit 
are already a part of the 
LTM
. Sutton 
and Barto (2018) explain that internal state components of learning agents correspond to animal 
motivational stat
es. In addition, learning agen
ts have a built
-

gradient of its value function, that is, to select actions expected to lead to the most highly
-
valued 

2018, p. 
361). In a normal reinforcement learning
 
problem, the 
maximum attainab
le goal is defined by the environment and the ways in which the programmer 
encodes rewards. 
In a basic reinforcement problem, t
he agent will continuously attempt to 
improve its value states because it does not know what the ma
ximum is. Humans, on the other
 
hand, can decide that they have reached a goal and stop pursuing higher levels of attainment. 
 
109
 
This ability to decide an acceptable level of performance has been reached and voluntarily avoid 
further improvement 
can be achieve
d through self
-
regulatory syst
ems.
 
There are approaches to 
modeling goal directed behaviors in learning agents seeking to navigate an environment in a 
typical reinforcement learning problem (Sutton & Barto, 2018), but those approaches are beyond 
what is re
quired for the present purpose
s and the framework of a simple 2
-
armed bandit problem.
 
 
H
ow can we account for 
specific 
goal directed behavior within the 
LTM
? 
Fir
s
t,
 
we must 
define a performance goal for the agent, 
T
. To understand the relationship between performance 

goal, we must define and track performance. Performance in this model will be 
represented by the variable 
Y
 
and
 
be calculated as the ave
rage performance across all task 
attempts, regardless of 
p
olicy 
a
pplied. In the present model, where 
any successful atte
mpt is 
rewarded with 1 point, the average performance as percentage of successful attempts is 
equivalent to the average reward received 
on task attempts. Thus
,
 
Y 
may be calculated as:
 
 
where 
Y 
at time 
t 
+ 1 is equivalent to the average
 
of all
 
rewards rece
ived by the agent
.
 
 
A
 
comparison must 
then 
be made between the level of performance and the stated goal. 
This can be done through a simple difference variable we shall define as 
D
, calculated as:
 
 
T
he learner must 
then 
decide if they are short of their 
performance goal or not. This decision is 
defined by a variable 
J
, equivalent to 0 if the goal is met, and 1 if it is not. That decision then 
feeds into an effector mechanism where if performance is short of the goal the a
gent chooses to 
change their behav
ior 
to
 
reach it. 
 

Algorithm 
7
. A
gent Performance
 

Algorithm 
8
. Goal Discrepancy
 
 
110
 
In other computational models of the self
-
regulatory system, the effector mechanism 

of th
e self
-
regulatory system he defines the effector mecha
nism in terms of the actions taken to 
close the perceived gap between goals and perceptions where the acts taken are relevant to the 
goals being monitored. For example, if an overarching goal is to writ
e a paper, the acts could be 
a series of steps such as
 
doing research, outlining, etc., that are all governed by their own 
regulatory system. Given the undefined nature of the tasks modeled in this paper, it makes most 
sense then to define the actions an e
ffector mechanism may take in terms of the current act
ions 
the learner/agents may take. Thus, t
he LTM
 
hypothesizes that agents are more likely to explore 
their policy options if they are currently short of their performance goals. To account for this 
effec
t, let 
F
 
represent the degree to which they are more l
ikely to explore on a given task attempt. 
The variable 
F
 
then modulates the exploration parameter 
E 
as a function of whether the agent is 
reaching their goal or not, resulting in the calculation of 
E 
as
:
 
 
The newly added variables and equations 
for all o
f Study 3 
can be found in Tables 
11
 
and 
12
.
 
 
Algorithm 
9
. Effector Mechanism 1
 
 
111
 
Study 3A: 
Method
, Simulation, and Results
 
The outlined model for including self
-
regulatory mechanisms was instantiated in a new 
model in NetLogo. A 
screen 
capture of the modeling environment and code for this mo
del can be 
found in Appendix E.
 
Virtual Experimentation
 
Although some key 
findings connected broadly to self
-
regulatory effects underl
ie
 
transfer 
findings 
were 
already explored in experiments above, two key effects require illumination here. 
Namely the eff
ects of goal setting and efficacy.
 
Following the nature of self
-
regulatory
 
systems, it stands 
to reason 
that the higher the 
level goal set in a system, the higher we would expect the performance outputs of that system to 
be. 
A higher set goal creates a lar
ger discrepancy between that goal and perceived reality. When 
individuals 
sense this discrepancy, they act to reduce it (Carver & Sheier, 1998). 
This basic 
finding, that higher goals lead to higher performance, is the essence of goal theory as outlined by 
Locke and his colleagues 
(Locke, 1968, 1975; Locke & Latham, 1990)
. 
Althou
gh it is often 
suggested that goals are specific and challenging yet attainable
, e
ven especially difficult goals 
can enhance performance if feedback regarding that performance is pro
vided (Campion & Lord, 
1982).
 
Within the training literature, goal choice 
play
s
 
a key motivational role in the post
-
learning transfer phase (e.g., Beier & Kanfer, 2010). 
Post
-
training goals show a small 
meta
-
analytic 

lume 
et al.,
 
2010). 
To test for the effect of goal 
setting in the 
LTM
, 
T
 
was
 
systematically manipulated. 
In addition, any discrepancies between 
observed performance and the set goal have effects on behavior in the present model through the 
level of explora
tion the agent is wi
lling to engage in. There was no 
a priori 
expectation of which 
levels of change in the baseline exploration rate would result in expected relationships between 
 
112
 
goals and outcomes, therefore the level of exploration change, 
F, 
when short
 
of the goal was 
sim
ultaneously explored.
 
 
To explore the joint effects of goal level and change in exploration rate, a simulation was 
executed that crossed goal level, swept from 0 to 1.0 in .05 increments, against change in 
exploration from 
-
.1 to 1.0 in
 
.05 increments. 
F 
b
egan at 
-
.1
0
 
because the baseline exploration 
was held at .1
0
, as established in prior simulations, and it was decided to cover the full range of 
final exploration rates. 
All other variables were held constant: likelihood of type 2 proc
essing at 
.80, value
 
of 
Policy A
 
at .70, change in value at .05, initial policy estimates at .50, 100 pre
-
training time steps, and 500 transfer time steps. 
 
Initial results of this simulation were surprising as the relationship between goal level and 
the o
utcomes of behaviora
l 
transfe
r 
(
r
(241500)
 
= 
-
.126
, 
p 
< .001
) and post training performance 
(
r
(241500)
 
= 
-
.065
, 
p 
< .001
) were negative at the replication level and the condition level 
(
r
(483)
 
= 
-
.335
, 
p 
< .001,
 
and 
r
(483)
 
= 
-
.326
, 
p 
< .001,
 
respectively). 
Similarly, relationships 
betw
een exploration rate change and behavioral transfer (
r
(241500)
 
= 
-
.097
, 
p 
< .001
) and 
post 
training performance (
r
(241500)
= 
-
.050
, 
p 
< .001
) were also negative at the replication and 
condition (
r
(483)
 
= 
-
.249
, 
p
 
< .001,
 
and 
r
(4
83)
 
= 
-
.257
, 
p 
< .001
) levels. 
Additionally, at the 
condition level, both goal level (
r
(483)
 
= 
-
.25
9
, 
p 
< .001
) and exploration rate change (
r
(483)
 
= 
-
.234
, 
p 
< .001
) were negatively 
related to the effect size of pre
-
post 
performance change. Given 
the surp
ris
ing
 
nature of these relationships, further analyses were completed in an effort to better 
understand them
. 
Moderated mult
iple regression analyses showed the combined effects of goal 
level (
F
(3, 241496) = 3503.46, 
p 
< .0
01, R
2
 
= .20; 
b
0
 
= .411
, 
t 
= 783.4
8, 
p 
< .001
; 
b
1
 
= 
-
.110, 

1
 
= 
-
.126
, 
t 
= 
-
63.48, 
p 
< .001
) and exploration rate change (
b
2
 
= 
-
.077, 

2
 
= 
-
.097
, 
t 
= 
-
48.69, 
p 
< 
.001
) had negative main effects on behavioral transfer, and a negative interaction (
b
3
 
= 
-
.335, 

3
 
 
113
 
= 
-
.128
, 
t 
= 
-
64.11, 
p 
< .001
). Similarly, in predicti
ng post training performance, goal
 
level (
F
(3, 
241496) = 
61.56
, 
p 
< .001, R
2
 
= .
53; 
b
0
 
= .721
, 
t 
= 3725.10, 
p 
< .001
; 
b
1
 
= 
-
.005, 

1
 
= 
-
.065
, 
t 
= 
-
8.40, 
p 
< .001
) and exploration rate 
change (
b
2
 
= 
-
.004, 

2
 
= 
-
.050
, 
t 
= 
-
6.41, 
p 
< 
.001
) had 
negative main 
e
ffects, and a negative interaction (
b
3
 
= 
-
.016, 

3
 
= 
-
.067
, 
t 
= 
-
8.55, 
p 
< .001
). 
Finally, in predicting pre
-
post performance change, goal level (
F
(3, 479) = 43.54, 
p 
< .001, R
2
 
= 
.46; 
b
0
 
= .289
, 
t 
= 46.86, 
p 
< .001
; 
b
1
 
= 
-
.130, 

1
 
= 
-
.259
, 
t 
= 
-
6.40, 
p 
< 
.001
) and explorati
on rate 
change (
b
2
 
= 
-
.107, 

2
 
= 
-
.234
, 
t 
= 
-
5.78, 
p 
< .001
) had negative main effects, and a negative 
interaction (
b
3
 
= 
-
.460, 

3
 
= 
-
.304
, 
t 
= 
-
7.50, 
p 
< .001
).
 
These interaction effects, showing the 
mutually de
pressive effects of these
 
variables on our outcomes, have been graphed in Figures 33
-
35. 
 
For further analysis, heat maps of these results were generated and can be found in 
Figures 36
-
3
8. These heat maps are especially informative as they show that tradit
ional statistics 
are unab
le to 
fully 
describe 
the underlying data pattern
. In these simulations, the heat maps 
suggest that goal level had no effect when it was below .70, that is when t
he goal was below the 
true value of 
Policy A
. However, when the goal l
evel was above .70, the e
ffects of goals on the 
outcomes depended on the rate of exploration change. When the exploration change was very 
low, or very high, outcomes were worse than when
 
exploration changes were low to moderate in 
magnitude. Thus, when sho
rt of their goal, it was 
beneficial for agents to explore to some degree, 
but not too much. 
The extreme negative effects when exploration changes were very low or very 
high may be maskin
g the expected positive effect of goals. To test this, correlations be
tween goal 
level and outc
omes only across replications where the change in exploration rate was .10 were 
calculated. These relationships were indeed positive 
(
r
(
21)
 
= .105
, 
p 
= 
.651,
 
for behavior, 
r
(21)
 
= 
 
114
 
.061
, 
p 
= .793,
 
for post training performance),
 
whi
ch at least matches the e
xpected direction of 
the effect of goal set point.
 
To explore the possibility that this model works within certain parameter ranges, a second 
simulation
 
was run holding the change in exploration rate to .10 while sweeping goal leve
l from 
0 to 1.0 in .01 in
crements, 500 replications each. These results reveal a 
similar 
r
(21)
 
= .
095
, 
p 
= 
.682,
relationship between goals and behavioral transfer, and an 
r
(21)
 
= .05
, 
p 
= .830,
 
relatio
nship 
between goals and post training performance
.
 
 
115
 
St
udy 3A: Discussion
 
 
The r
esults of the initial exploration of the effects of goal level on the LTM did not match 
expectations. 
Instead of the small positive relationships expected between goals and transfer 
outcomes (Blume et al., 2010) the overall observe
d relationship was n
egati
ve. However, it 
appears this negative effect is driven by especially bad outcomes when the change in exploration 

change are detrime
ntal does fit with p
revio
us findings in this paper in that especially high levels 
of exploration do not allow the agent to 
exploit the policies they do happen to find as more 
productive. On the other hand, especially low, and in this simulation negative, c
hanges in the 
explor
ation
 
rate represent a degree of disengagement from trying to transfer, and therefor would 
not result in positive transfer outcomes because the agent has in effect stopped trying to do so. 
When investigated further, it may be that the e
ffects of goals in t
he mo
del only work as expected 
within certain ranges of the change in exploration rate. This was shown as plausible in that the 
relationship between goal and behavioral transfer approximates the meta
-
analytic effect (Blume 
et al., 2010)
 
when only examining
 
the 
effect whe
re
 
the change in exploration rate is .10.
 
 
Given these findings, overall, it was concluded this is a plausible model 
provided
 
parameters are held within certain ranges. However, prior to exploring the model further it was
 
decided to 
investig
ate
 
o
ther 
potential mechanism
s
 

action effects to ascertain which may be more broadly applicable to the transfer environment.
 
 
116
 
Study 3B: Tweaking Goal Seeking
 
 
Prior to fully accepting M
odel 3A it was decid
ed th
at 
at two
 
other potential 
implementation
s
 
of the self
-
regulatory system should be explored for the LTM. Model 3A relied 
on an effector mechanism that blindly makes the same adjustment to behavior regardless of the 
degree to which o
ne is short of the d
esire
d goal. However, this does not necessarily align 
completely with reality. 
One other potential interpretation is that when individuals are further 
from their desired goal, they will take more drastic actions to close that gap. Leavi
ng aside, for 
the ti
me be
ing, the issue of disengagement in the face of extreme deficits between goals and 
current states
,
 
an increase in motivation generally fits with the CT view of self
-
regulation 
where 
as an individual approaches their goal motivation 
would only be mainta
ined 
through the increase 
of their goal level in order to maintain a deficit, or motivation would be redirected towards the 
completion of other goals 
(Carver & Sheier, 1998).
 
 
A second potential effector mechanism would be one which rai
ses exploration as a
n 
ind
ividual nears their goal. Thus, when an individual is close to their goal but not quite there, 
they may work harder to find a way to push their 
current state to finally come in line with their 
desired one instead of backing off. Th
is would imply an in
verse
 
relationship between the 

view fits with other theories such as the Temporal Motivation Theory (Steel & Konig, 2006) 
which states that
 
the expected value 
of en
gaging in a task increases as the temporal distance to 
that task decreases, raising the motivation of the individual to engage in that task. Additionally, 
some research has shown that levels of motivation increase as subjective jud
gement of how close 
one i
s to their goals increases, and that close goals specifically increase focus on the process of 
meeting that goal (e.g., Peetz, Wilson, & Strahan, 2009). 
Such a relationship would fit with an 
 
117
 
effector mechanism that increases explor
ation when goals are
 
clos
e but unreached to a greater 
extent than when those goals are further away.
 
 
Given these two other possibilities for effects of goal deficits, two alternative effector 
mechanisms were explored for the LTM.
 
Model 3B
-
1
 
 
The first 
al
ternate effector mechanis
m proposes a direct link between the perceived 

, and the degree to which 
they are willing to explore their behavioral options. Specifically, the degree to whi
ch
 
they are 
short of their 
goal increases their desire to explore to the extent they see themselves as short of 
that goal. Mathematically this makes the variable 
F
 
outlined in Model 3B to be a dynamic 
variable instead of a static one. Now, 
F 
will be calculat
ed as:
 
 
Stating that 
F 
at time 
t 
+ 1 is equal to the observed difference, 
D

l and 
observed state at time 
t
.
 
Model 3B
-
2
 
 
The second alternate effector mechanism proposes an inverse relationship such that 
exploration will be greatest wh
en one is just
 
short of t
he set goal, and that rate will taper off as 
the distance to the goal increases. The simplest way to create such a relationship is to simply 
adjust 
F 
to be
 
 
However, this will create extremely large relative values of 
F
, making ex
ploration essentially 1 
e
very time an individual agent is short of their goal. As seen in the simulations for Model 3A, 


Algorithm 
10
. Effector Mechanism 2
 

Algorithm 
11
. Effector Mechanism 3
 
 
118
 
this is not ideal or realistic. To place an upper limit on that
 
change in exploration then it was 
chosen to calculate 
F 
as
 
 
which wil
l limit exploration rates
 
to .5 + baseline defined exploration rate, which we typically 
are setting at .1, when individual agents are just short of their 
goals.
 
 
Algorithm 
12
. Effector Mechanism 4
 
 
119
 
Study 3B: Methods, Simulation, and Results
 
 
The two mechanisms outlined above were in
stantiate
d into two mirrored compu
tational 
models in NetLogo where the only difference is the calculation of 
F
. A snapshot of the modeling 
environment and copies of the simulation code for each can be found in Appendix 
F.
 
Model 3B
-
1
 
 
To explore the effect 
of treati
ng 
F 
as a direct positive
 
function of the difference between 

to 1.0 in .01 steps. Other variables were held constant: type 2 likelihood at .80, 
Policy
 
A
 
value 
at 
.70, change in value a
t .05, baseline exploration rate at .10, initial policy estimates at .5
0
, 1 
trainee, 100 pre
-
training time points and 500 post
-
training time points.
 
Initial results suggest 
correlations between goals and behavioral transfer 
(
r
(101)
 
= .701
, 
p 
< .001
) and po
st training 
performance (
r
(101)
 
= .392
, 
p 
< .001
)
 
are positive,
 
as would be expected. To better understand 
the
 
nature of the
 
effect, graphic depicti
ons were created of the mean observed behavior, post 
training performance, and pre
-
post performance improvem
ent in Figures 39
-
41. These results 
show the relationship between goals and these outcomes is not uniform. Instead, goals have no 
real effect when g
oals are well below the set 
value of 
Policy A
. Then, as goals approach and pass 
the
 
value of 
Policy A
, outco
mes rapidly improve until leveling out once goals reach a level just 
higher than the value of 
Policy A
.
 
For pre
-
post performance change specifically
, positive 
outcomes begin to occur around a goal level of .60.
 
 
Given the observed 
positive relationship bet
ween goals and outcomes in this model, and 
the potentially interesting effects of the semi
-
discontinuous nature of their effects, a small 
experiment
 
was run to explore these effects further. 
Specifically, goal level (varied from 0 
to 1.0 
in .05 increments)
 
was crossed with changes in value from 
Policy A
 
to 
Policy B
 
(varied from 
-
 
120
 
1.0 to 1.0 in .05 increments) to study the effect of changing goals again
st changing behavioral 
options. In this simulation, it was found that goals again h
ad a positive effect on b
oth behavioral 
transfer 
(
r
(430500)
 
= .638
, 
p 
< .001
) and post training performance (
r
(430500)
 
= .032
, 
p 
< .001
)
, 
as well as pre
-
post 
performance change (
r
(861)
 
= .094
, 
p 
= .006
)
. In addition, value cha
nge also 
had positive effects 
on behavior (
r
(430500)
 
= 
.336
, 
p 
< .001
)
, 
post training performance 
(
r
(430500)
 
= .589
, 
p 
< .001
), 
and pre
-
post performance change (
r
(861)
 
= .611
, 
p 
< .001
) 
as
 
would be expected. 
A moderated multiple regression analysis was then completed. In predicting 
beh
avioral transfer it w
as f
ound that goal level (
F
(3, 430496) = 181108.91, 
p 
< .001, R
2
 
= .75; 
b
0 
= .224
, 
t 
= 688.04, 
p 
< .001
; 
b
1
 
= .183, 

1 
= .336
, 
t 
= 331.21, 
p 
< .001
), and value change (
b
2
 
= 
.678, 

2 
= .638
, 
t 
= 629.60, 
p 
< .001
) had positive main effects, and a positive interaction (
b
3
 
= 
.351, 

3
= .196
, 
t 
= 192.95, 
p 
< .001
).
 
Similarly, in predicting post training performance it was 
fou
nd that goal 
level
 
(
F
(3, 430496) = 301747.87, 
p 
< .001, R
2
 
= .82; 
b
0 
= .720
, 
t 
= 6484.78, 
p 
< 
.001
; 
b
1
 
= .128, 

1 
= .589
, 
t 
= 681.17, 
p 
< .001
), and value change (
b
2
 
= .013, 

2 
= .032
, 
t 
= 36.73, 
p 
< .001
) had positive main effects, although the effect of 
value 
change was very small after 
controlling for the effect of goal level, and a positive interaction (
b
3
 
= .411, 

3
= .574
, 
t 
= 663.25, 
p 
< .001
). 
Fina
lly, pre
-
post performance change 
displayed similar positive relationships with 
goals (
F
(3, 
857
) =
 
9
08.08
, 
p 
<
 
.001, R
2
 
= .87; 
b
0 
= .331
, 
t 
= 6.32, 
p 
< .001
; 
b
1
 
= 3.236, 

1 
= .611
, 
t 
= 36.57, 
p 
< .001
), and value change (
b
2
 
= .972, 

2 
= .094
, 
t 
= 5.62, 
p 
< .001
), and a positive 
interaction (
b
3
 
= 10.761, 

3
= .615
, 
t 
= 36.82, 
p 
< .001
). These
 
positive interacti
ons a
re depicted in 
Figures 42
-
44.
 
For further investigation, heat maps of these effects are depicted in Figure 45
-
47.
 
As with the simulation of goals alone for this model, goal level had basically no effect on either 
behavioral transfer, performance, or p
erfor
mance improvement when goals were below about 
.60. When goals reach .60, there is a sudden and rapid change in the pattern of results where the 
 
121
 
best outcomes occur when goals are slightly above the baseline value of 
Policy A
, and the 
change in value o
f the
 
policies is moderately positive. If goals become too high, outcomes 
become worse as the agent begins to search for a better option than those available instead of 
exploiting the available options. In addition, outcomes are only especially bad when go
als a
re 
high
,
 
and the new policy is substantially worse than the existing policy.
 
Model 3B
-
2
 
 
To explore the effect of treating 
F 

goal and perceived state, as with Model 3B
-
1, an initial simulation swe
pt the performance goal 
variable from 0 to 1.0 in .01 steps. Other variables were held constant: type 2 likelihood at .80, 
Policy 
A
 
value at .70, change in value at .05, baseline exploration rate at .10, initial policy 
estimates at .5
0
, 1 trainee, 100 pre
-
training time points and 500 post
-
training time points.
 
Initial 
results suggest correlations between goals and behavioral 
transfer
 
(
r
(50500)
 
= .
802
, 
p 
< .001
) 
post training performance (
r
(50500)
 
= .
307
, 
p 
< .001
)
 
a
re positive, as 
previously observed
. 
Visua
ls
 
were created of the mean observed behavior, post training performance, and pre
-
post 
performance improvement in Figures 
48
-
50
. T
hese results 
also 
show the relationship between 
goals and these outcomes is not uniform
 
but in a different way than with Model
 
3B
-
1
. 
Here, 
goals still do not have a noticeable effect at extremely low levels, but they begin to impact 
outcomes at a lower lev
el than in Model 3B
-
1. Additionally, their effect on behavior and 
performance does not come so suddenly and drastically. Inste
ad, as goals increase behavioral 
transfer and post training performance gradually increase until leveling out around .50 and 
.73,
 
respectively.
 
For pre
-
post performance change we only begin to observe positive effects when 
goals reach at least .40.
 
 
122
 
 
Having
 
found a 
positive relationship between goals and outcomes in this mode
l, the same 
experiment run for Model 3B
-
1 was 
executed for this model as well
.
 
I
t was found that goals 
again had a positive effect on both behavioral 
transfer 
(
r
(430500)
 
= .
123
, 
p 
< .001
) and post 
training performance (
r
(430500)
 
= .
315
, 
p 
< .001
), as well as pre
-
post per
formance change 
(
r
(861)
 
= .094
,
 
p 
= .006
). In addition, value change also had post training performance 
(
r
(430500)
 
= .
827
, 
p 
< .001
), and pre
-
post performance change (
r
(43
0500)
 
= .611
, 
p 
< .001
) as 
would be expected
,
 
but a negative effect on behavioral transfer positive effects on behav
ior 
(
r
(861)
 
= 
-
.525
, 
p 
< .001
)
.
 
A moderated multiple regression analysis was then completed. 
In 
predicting behavioral transfer 
it 
was found 
that goal level (
F
(3, 430496) = 218770.33, 
p 
< .001, 
R
2
 
= .78; 
b
0 
= .
347
, 
t 
= 1372.82, 
p 
< .001
; 
b
1
 
= .
107
, 

1 
= .
123
, 
t 
= 
-
547.17, 
p 
< .001
)
 
had a 
positive main effect
, 
but
 
value change (
b
2
 
= 
-
.
233
, 

2 
= 
-
.525
, 
t 
= 127.98, 
p 
< .001
) had 
a negative
 
main ef
fect, and a positive interaction (
b
3
 
= .
822
, 

3
= .
560
, 
t 
= 583.56, 
p 
< .001
).
 
In
 
predicting 
post training performance it was found that goal level (
F
(3, 430496) = 
567084
.
97
, 
p 
< .001, R
2
 
= 
.89; 
b
0 
= .
607
, 
t 
= 4698.99, 
p 
< .001
; 
b
1
 
= .
196
, 

1 
= .
315
, 
t 
= 12
08.04, 
p 
< .001
), and value 
change (
b
2
 
= .
264
, 

2 
= .
827
, 
t 
= 459.89, 
p 
< 
.001
) had positive main effects
,
 
and a 
negative
 
interaction (
b
3
 
= 
-
.126
, 

3
= 
-
.119
, 
t 
= 
-
174.33, 
p 
< .001
). 
Finally, pre
-
post performance chang
e 
displayed positive relationships with
 
goals (
F
(3, 
857
) = 
844.33
, 
p 
< .001, R
2
 
= .
86
; 
b
0 
= 
-
2.523
, 
t 
= 
-
29.02, 
p 
< .001
; 
b
1
 
= 
4.081
, 

1 
= .
244
, 
t 
= 48.20, 
p 
< .001
), and value change (
b
2
 
= 
7.083
, 

2 
= 
.
828
, 
t 
= 14.21, 
p 
< .001
), and a 
negative
 
interaction (
b
3
 
= 
-
1.397
, 

3
= 
-
.049
, 
t 
= 
-
2.88, 
p 
= .004
). 
These interactions are depicted in
 
Figures 
51
-
53
. For 
further investigation, heat maps of these 
effects are depicted in Figure 
54
-
56
.
 
Interestingly, these analyses show that the best performance 
outcomes occur when value changes and goals are both
 
high, which we would expect. However, 
 
123
 
a greater degree of beha
vioral transfer 
occurs
 
when goals and value changes are low, counter to 
expectations.
 
 
124
 
Study 3B: Discussion
 
 
Models 3B
-
1 and 3B
-
2 were meant to explore other potential effector mechanisms with
in 
the self
-
regulatory processes of the LTM
 
which are consistent with existing theory and research 
findings
. This was undertaken after finding the originally proposed mechanism explored in 
Model 3A may only approximate meta
-
analytic estimates in the transf
er literature under a limited 
range of para
meters. The present models changed the value of 
F 
from an 
a priori 
set effect of 

on perceived differences between one

 
Initial simulati
on results for these 
models are mixed.
 
 
First, Model 3B
-
1 did display the expected overall positive relationships between goal 
level and transfer outcomes of behavior and performance. This represents an improvemen
t over 
the overall model of 3A, where initi
al results suggested overall negative effects of goals instead 
of positive ones. However, the magnitude of the goal effects in Model 3B
-
1
 
are much larger than 
the suggest
ed
 
.08 in transfer research (Blume et al., 
2010).
 
The same can be said of Model 3B
-
2, 
where relationships were in the expected directions, but of abnormally large magnitude.
 
 
E
ven so, there are some potentially intriguing results. For example, the finding that 
behavioral transfer rates actually rev
erse at very high goal levels in this model
 
may be a sign of 
agents finding that a roughly 50
-
percent exploration rate is optimal given two behavioral choices 
as they desperately search for an option which may complete their goal. The combined lack of 
tran
sfer at low goal levels, and this upper lim
it on transfer in this case may provide an 
explanation for low transfer rates commonly cited in the literature (Ford et al., 2010). For 
workers with low goals for their personal performance, 
all
 
these models (3A, 
3B
-
1, and 3B
-
2) 
suggest we will not see hig
h degrees of behavioral transfer, although the exact amount differs by 
 
125
 
model. 
Further, when goals are very much higher than the achievable performance through 
available means transfer does not occur to an extreme 
extent because the agent does not just 
sett
le and acquiesce to use the best available policy, but keep
s
 
searching for an option which will 
fulfill their goals. For real world employees, the same calculus could be in play where among 
workers with high goals
 
a failure to directly transfer received tr
aining may not be out of a failure 
to recognize the improvement of the training over whatever their old approach is, but represent a 
recognition that the training is not good enough and a result of their personal 
pursuit of other, not 
necessarily organizat
ionally directed, options to achieve their goals. Such an insight could 
provide guidance to future research projects.
 
 
Further, this set of models may provide practical guidance on the post training setting of 
goa
ls. 
Following goal theory (Locke & Latham, 
1990), goals for transfer are set following 
training to, ideally, be specific, challenging, and attainable. The relationship between those goals 
and actual transfer could be said to be disappointing given the weak
 
meta
-
analytic relationship 
between goals a
nd transfer (Blume et al., 2010). The findings here suggest that we may need to 
focus more on those post training goals being attainable to keep them in the range where they 
can have a substantial effect on later 
transfer. Further, on the research side, wh
en we study those 
goals, we may need to change the way we analyze their effects. We traditionally rely on ordinary 
least squares regression and correlational approaches to study these effects, but it has been 
sugg
ested that more advanced analytic technique
s could improve our understanding of transfer 
(Olenick et al., in press), and the effect of goals on that transfer is a good example. In selection 
research it has recently been shown that taking a fit approach and
 
associated polynomial 
regression technique
s can greatly improve the ability of interests to predict work performance 
(
Nye, Prasad, Bradburn, & Elizondo, 2018). Similarly, given the interplay between goal levels 
 
126
 
and the value of the received training indic
ating an interplay where different goals ma
y work 
better with different levels of value for the trained behavior, we could study the congruence 
between set goals and the value of the training. In this way we may better estimate the value of 
goal setting wi
thin the training and transfer field.
 
 
Desp
ite the potential insights gained from these models in total, the results of the 
simulations run for this paper in studies 3A and 3B suggest the best model for use in transfer 
may
 
be that proposed in Model 3A. Thi
s conclusion results from the very close ma
tch between 
simulated effects of goal level, provided the model is within certain ranges of parameters, for 
Model 3A while the effects observed in Models 3B
-
1 and 3B
-
2 are 
larger than
 
the meta
-
analytic 
effect of .
08 (Blume et al., 2010). 
Further, the mecha
nism tested in Model 3A is more 
parsimonious than those tested in 3B
-
1 and 3B
-
2, and it provides a degree of control for further 
simulation. 
The approach Model 3A takes is one more akin to treating not just goals,
 
but the 
effects those goals have on decisi
ons as an individual difference within the transfer environment, 
adding to existing studies of individual differences such as personality, goal orientations, need 
for cognition, and implicit theories of learning (
e.g., Jaeggi, Buschkuehl, Shah, & Jonides, 
2014). 
Given the results showing Model 3A replicates the effects of goals on transfer closely, provided 
F 
is set to plausible ranges, and the potential it implies for future research, Model 3A was 
retained for use
 
in the full LTM.
 
However, it is acknowledg
ed that future work will be required 
to explore this and other effector mechanisms, especially in regard to applying the model to any 
particular task of interest.
 
 
127
 
Study 3C: Engagement Thresholds
 
 
Having establish
ed that the originally proposed self
-
regulatory system in Model 3A 
generally 
outperforms two other alternatives, shown in Models 2B
-
1 and 2B
-
2, 
in recreating 
regulatory effects in transfer research, another set of self
-
regulation findings a
nd implications 
was explored. Thus far in exploring self
-
regulation in training transfer we have focused on the 
effects of goals. Now, we must explore the effect of self
-
efficacy, which has long been a central 
variable in self
-
regulatory models.
 
 
Self
-
effi
cacy, the belief
 

1977), has been argued as the central motivational variable by which individuals have agency 
over their environments (Bandura, 1989). Decades of research have established a clea
r general 
patter
n of higher efficacy relating to higher task performance (Stajkovic & Luthans, 1998), and 
this 
effect 
has been meta
-
analytically established in training transfer where post training efficacy 
has a corrected relationship with transfer of .22
 
(Blume et al., 
2010). 
However, in the last decade, 
some minor but important disagreements have arisen over the nature of self
-
efficacy. 
Importantly, it has been argued that when studied in a causal manner self
-
efficacy is 
a
 
product of 
performance, and not
 
necessarily the
 
other way around. In this case, Sitzmann and Yeo (2013) 
found that 
within individuals performance predicted self
-
efficacy at 

= .30 when controlling for 
linear trajectories, but self
-
efficacy only predicted performance at 

= .06 under th
e same 
condition
s. Thus, it would be beneficial for the present model to display a general positive 
relationship between efficacy and transfer, and performance, but also replicate the differences in 
the causal strength of the efficacy
-
performance relations
hip.
 
 
Further wo
rk by Vancouver and colleagues has challenged the traditional view of self
-
efficacy having a monotonously positive effect on important outcomes such as performance and 
 
128
 
task engagement. Over various studies, they have found that self
-
efficac
y can have negat
ive 
effects in learning tasks under some conditions (
Vancouver, 
Gullekson
 
et al., 2014
), and that the 
relationship between efficacy and task engagement is actually discontinuous in nature 
(Vancouver et al., 2008; Sun et al., 2014).
 
This dis
continuous relat
ionship suggests that at very 
low levels of efficacy for a task, individuals will refrain from engaging in that task and instead 
conserve their resources for tasks they are more confident in. As efficacy levels for a task 
increase, eventual
ly a threshold i
s passed where suddenly those individuals will choose to engage 
in the task and will outlay substantial resources in order to improve their odds of success. In the 
transfer environment it is possible that learners would not even attempt to 
transfer their l
earning 
if they do not believe they can succeed at the application of that learning, which would 
drastically reduce transfer rates and provide another potential explanation for the common belief 
that transfer rates are disappointingly low. 

knowledge no studies have examined 
the effect of efficacy on training transfer from a discontinuous perspective. Therefore, the 
present model will explore the potential effects of a discontinuous model of efficacy on transfer 
to guide futur
e research.
 
Disc
ontinuous Self
-
Efficacy in the LTM
 
 
A
s currently conceived, the 
LTM and its computational equivalent
 
does not directly 
incorporate a variable labeled efficacy. 
However
, 
since 
efficacy is a perception of the individual 
regarding their 
ability to complete a 
task (Bandura, 1977)
, and that efficacy is the product of past 
performance (Sitzmann & Yeo, 2013), 
the equivalent of an efficacy evaluation is already present
 
within the LTM
. The underlying value of each policy which the learner may a
pply to their 
encounte
red situation is the percentage probability of that policy succeeding in that situation. The 

p
olicy 
a
nd receive 
 
129
 
feedback to inform that estimate. Thus, t
heir estimate of the p

efficacy 
because
 
it is their estimate of the likelihood of their succeeding at applying the policy. Therefore, 
nothing needs to be directly changed in the existing model to incorporate efficacy as a construct.
 
However, 
as discussed, 
the relationship between efficacy and task engagement is not 
actually linear (Vancouver et al., 2008; Sun et al., 2014). As 
it 
exists in the present model
,
 
the 
likelihood of using any 
p
olicy 
a
vailable to the learner is a positively l
inear function of the 
e
stimated value of that policy. That is, even though the exact choices made by an individual on a 
given task attempt is dependent on several dynamic variables, the underlying relationship is that 
as the estimated value of a policy inc
reases the likelihood o
f using that policy will increase. If the 
relationship between efficacy and engagement is non
-
linear, then the existing underlying 
relationship is incorrect. To remedy this, a single variable needs to be added to our overall 
model. W
e will call this variab
le the 
engagement threshold, 
labeled 
V, 
and will represent the 
value estimate below which the learner will not choose to implement that 
p
olicy 
a
nd will instead 
opt for the other 
p
olicy 
a
vailable to them. As tasks are encountered and 
policy decisions are 
ma
de by the agents, each learning agent in the model independently compares their value 
estimate of that policy to the cutoff level defined by 
V
. If that policy has a lower value estimate 
than that threshold level the agent will choose
 
the other policy, but 
only if the other policy option 
lies above the threshold, otherwise the original policy choice will be implemented.
 
 
130
 
Study 3C: Methods, Simulation, and Results
 
 
The addition of an engagement threshold and necessary code to ensure a
gents only 
applied beha
viors above that threshold when possible was made to the expanding computational 
model of the LTM. A screen capture of the modeling environment and associated code can be 
found in Appendix G.
 
Causal Effects of Self
-
Efficacy on Transf
er and Performance
 
 
Sin
ce efficacy is a value in this model which develops of its own accord
,
 
the effects of 
efficacy 
were
 
not explored via direct manipulation. Instead 
a model was executed which
 
track
ed
 
the level of 
the 
estimated value 
of 
Policy B
 
over time to attain a measure 

for that policy. Given the mechanics of the model as presented in this paper, it 
was
 
expected that 
the value estimate will be related to outcomes of interest in the way efficacy is found to be in the 
ps
ychological literature. Specif
ically, the estimated value of a policy will be positively related to 
the likelihood that the learner will choose that policy on a given task attempt and thus transfer it 
to the task from their theoretical learning environment
, and the perceived value of t
he policy will 
be positively related to task performance
 
(Blume et al., 2010; Sitzmann & Yeo, 2013; Stajkovic 
& Luthans, 1998).
 
 
To study the dynamic relationships between efficacy, performance, and transfer
,
 
1000 
replications
 
of a model with one agent wer
e run for the established 500 time point transfer 
length. At each time point, the value estimate of 
Policy B
, whether or not 
Policy B
 
was applied at 
that time point, and the reward (representing task success or failure) for th
at time point were 
saved. All 
other variables were held constant as before, with type 2 likelihood at .80, value of 
Policy A
 
at .70, change in policy value of .05, exploration rate of .10, and starting policy value 
estimates of .50.
 
 
131
 
 
To analyze this data, 
correlations were computed bet
ween the saved variables, adjusting 
the data set to account for causal ordering. Only time points within the transfer period were 
analyzed to remove any biasing effects of the data regarding 
Policy B
 
during the pretraining 
pha
se when those values were not 
affecting the behavior of the agent.
 
Additionally, only the 
value of 
Policy B
 
is of interest as it is the target of transfer and therefore the subject of the 
efficacy measurements typically taken at the end of training.
 
First,
 
the estimated value of 
Policy
 
B
 
at each time 
is causally related to 
performance
 
at that time point. The relationship between 
these two variables was found 
to 
be 
r
(
500000
)
 
= .065
, 
p 
< .001
. This relationship nearly 
perfectly replicates the relationship fou
nd by Sitzmann and Yeo (2013) 
for the same effect. 
Next, 
the value of 
Policy B
 
was related to the tendency to choose 
Policy B
 
at that time point, 
representing behaviora
l transfer. This relationship was found to be 
r
(
500000
)
 
= .369
, 
p 
< .
001
, 
which is 
in the correct direction 
for transf
er 
as found by 
Blume et al. (2010). 
 
Finally, a lag variable was required to test the effect of performance 
to align performance 
on one task attempt with the value estimate of 
Policy B
 
on the next attempt. When
 
Policy B
 
is 
chosen at time 
t
,
 
the resulting p
erformance at that time point should have a causal relationship 
with the value estimate of 
Policy B
 
at time 
t 
+ 1. To isolate these effects, only time points where 
Policy B
 
was applied in the transfer environme
nt were analyzed. Among these time points, the
 
relationship between performance and the value estimate of 
Policy B
 
was 
r
(207936)
 
= .
048
, 
p 
< 
.001
,
 
which is in the expected direction according to meta
-
analytic estimates, but substantially 
less in magnitude (Sitzmann & Yeo, 2013).
 
It is possible that th
e length of the transfer run 
obfuscates the relationship between these two variables as the value es
timate of 
Policy B
 
stabilizes over time and therefore would not be greatly affected by a single performance. Within 
most research studies we are unable to s
tudy any length of time close to 500 data points long and 
 
132
 
instead the dynamic relationship between e
fficacy and performance is based on a much shorter 
time period. To test if a shorter time period would better approximate the expected relationship, 
the cor
relation between performance and the value estimate of 
Policy B
 
on the next time point 
was estimated
 
for both the first 100 transfer attempts, and the first 25 transfer attempts. In the 
first 100 transfer attempts the relationship 
was 
r
(41587)
 
= .064
, 
p 
< 
.001,
 
and was 
r
(5199)
 
= 
.056
, 
p 
< .001,
 
in the first 25. These rel
ationships suggest it is not merel
y the time period 
examined which accounts for the difference between the meta
-
analytic relationships and those 
generated by the present model.
 
Effects of En
gagement Threshold
 
 
Unlike with the general effects of our efficacy stand
-
in, the policy value estim
ate, an 
experiment
 
was completed
 
to explore the effects of the discontinuous model of efficacy as it 
applies to transfer. 
To test the effect of
 
the 
engage
 
v
ariable, 
V
,
 
a simulation swept the parameter 
from 0 to 
1.0
 
in .
0
1 increments. It 
was
 
expected that transfer 
would
 
diminish as the threshold 
level of 
engage
 
increases. The logic of that relationship being that not only will it be more likely 
for the value e
stimation of the target policy to fall below that threshold overall
, t
hat effect 
is
 
exacerbated by the instability of small samples where even high true values for policies will 
often have lower estimates of that value in the initial stages of transfer onl
y because of sampling 
error
. This incorrect early judgement would
 
sometimes result i
n the abandonment of a 
policy 
b
efore 
its true value is revealed to the learner. Such an effect would seem logically consistent 
with experience given that some learners will
 
not apply their new KSAO because they feel it is 
too difficult. As such, this also 
represents an initial relaxation of the assumption that learners 
enter the transfer environment with the ability to successfully apply their new KSAO.
 
 
133
 
 
In running the full 
parameter sweep of the engagement threshold, all other parameters 
were held constant
 
at our established levels: type 2 likelihood was set to .80, .10 exploration rate, 
true value of 
Policy A
 
.70, a .05 change in value to 
Policy B
, 100 pre
-
training time poi
nts, 500 
transfer time points, 500 replications each, with 
one 
agent in each model. 
However, unlike 
previously, the initial estimate for the value of 
Policy B
 
was set to 1.0 instead of .50. This change 
was made to refrain from artificially limiting initial
 
transfer attempts by a parameter which in 
this simulation was not our focus, instea
d allowing any reluctance from the agent in applying 
Policy B
 
to arise from its own experience.
 
 
Initial examination of the results of this experiment confirmed expectation
s outlined 
above. The relationship between threshold level and behavioral transfer 
(
r
(50500)
 
= 
-
.365
, 
p 
< 
.001
), and post training p
erformance (
r
(50500)
 
= 
-
.214
, 
p 
< .001
)
 
were both negative. 
To further 
understand the relationship between the engagement th
reshold and transfer outcomes, mean 
results for each condition for behavioral transfer, post training performance, and the effect size of 
pre
-
post training performance change have been plotted
 
in Figures 57
-
59. In addition, best fitting 
trend lines with a 
quadratic term were plotted to better visually illustrate the general pattern. 
The 
pattern of 
all
 
these findings indicate that transfer outcomes are relatively high when the 
engagement thresho
ld is low. However, when the threshold reaches about .50, transf
er outcomes 
begin to deteriorate rapidly as they transition to a lower set point starting around .80 where those 
outcomes display essentially no transfer. In addition, performance change as ex
pressed in 

d 
becomes
 
negative when threshold levels exce
ed about .60, which is well below the 
.75 
true value of 
Policy B
.
 
 
134
 
Study 3C: Discussion
 
 
The present study explored the effects of self
-
efficacy within the LTM. Specifically, it 
suggested that
 
the value perceptions for the behavioral policy representing th
e targeted transfer 
behavior would display relationships with outcome variables that have been observed in the 
literature. Further, it explored the effects of the discontinuous model of self
-
e
fficacy (e.g., 
Vancouver et al., 2008) on transfer. Here I shall
 
discuss the implications of these simulations for 
both theory and practice.
 
Theoretical and Research Implications
 
 
Overall, the effects of the value estimate for 
Policy B
 
in the present model continue to be 
mixed. As you will recall, it was argued in a pr
evious simulation that the effect of the initial 
value estimate for 
Policy B
 
should approximate the effect of utilit
y reactions we observe in the 
transfer literature (Blume et al., 2010). However, the expected relationship did not emerge at the 
replication
 
level, but did to some degree at the condition level, leaving the support for the 
expected effect as plausible but 
needing some future refinement. Similarly, the results for the 
effect of and on 
Policy B
 
value estimates were mixed in the present study. On
 
the one hand, 
all
 
the relationships between 
Policy B
 
value estimates, transfer, and performance were in the 
expecte
d direction. In addition, the magnitude of the causal effect of the policy estimate and 
performance was essentially identical to that observ
ed in the research literature (Blume et al., 
2010). Thus, it could be argued that the general pattern of results fro
m this model fits that which 
was 
expected, and generative sufficiency has been achieved
. 
 
 
In addition
, the general effects of the inclusion
 

the present model 
fit expectations
. Overall, the effect of having a thr
eshold for when 
to apply a 
given policy was such that high thresholds decreased behavioral transfer and performance 
 
135
 
outcomes. Unfortunately,
 
there are no known studies to which the effect observed here can be 
directly compared, although the observed effect
 
fits with general expectations from the work by 
Vancouver and colleagues on the nuanced effects of self
-
efficacy. However, we cannot direct
ly 
compare the effect sizes observed here to theirs to enhance the claim of generative sufficiency as 
the tasks used
 
in their work are not transfer related, nor do they collect data in a comparable way. 
For example, Vancouver et al. (2008) use a task calle
d the Hurricane Game where participants 
must click on squares of various sizes, representing different levels of eff
icacy for doing so, as 
they randomly jump around 
a computer 
screen. There is no real learning component to this task, 
and they do not collec
t and report data on the behavioral strategies employed by their 
participants to compare how those strategies to each other. 
Therefore, future work is needed to 
apply the prese
nt simulation to more applicable learning and transfer related tasks which are 
d
esigned to study the discontinuous nature of self
-
efficacy.
 
 
Along with applying the present theory to more directly comparable data, the nature of 
the discontinuous effect of 
self
-
efficacy in the present model needs to be further tuned and 
explored. As im
plemented in this version of the LTM, effort is assumed to be constant across all 
levels of self
-
efficacy 
if
 
the agent has decided to engage in the targeted behavior.
 
That is, 
the 
agents either fully engage with the behavior or they do not. The discontinuo
us model of self
-
efficacy (Vancouver et al., 2008) suggests that this is not quite the case. 
The discontinuous
 
model does suggest that individuals completely disengage from tas
ks which are below that 

 
above that threshold there is a negative 
relationship between efficacy and effort. In their studies on this phenomenon (Vancouver et al., 
2008; Sun et al., 2014), Vancouver an
d colleagues use time allocation as a measure of effort 
applied to the task, but
 
in the LTM it is currently assumed all resources are applied as long as the 
 
136
 
threshold is met. Future iterations of the present model should examine the effects of resource 
all
ocation to relax the assumption that individuals always fully engage or do not a
nd explore the 
impact of a tapering 
off
 
resource allocation by agents at high levels of efficacy. It could be the 
case that very high levels of efficacy are then detrimental to
 
transfer while the highest levels of 
transfer occur when efficacy is just high 
enough to get a learner to engage. Such a finding would 
provide a potential explanation for the surprisingly low relationship found in the literature 
between efficacy and trans
fer (Blume et al, 2010) as the negative effect of high levels of efficacy 
would 
mask its overall benefits.
 
 
One surprising outcome of the discontinuous effect of the threshold model explored here 
is worth some discussion. Specifically, 
although expected to
 
a lesser degree than was observed, 
it 
is surprising to see the threshold have n
egative effects on transfer at levels so far below the true 
value of 
Policy B
. The reason for this likely has to do with sampling errors by the agents. In the 
early stages of t
ransfer, the value estimate for 
Policy B
 
can
 
fluctuate quite wildly as the agent
 
does not have much experience with that policy. On the other hand, even in early transfer 
attempts the same agent has at least 100 experiences with 
Policy A
 
and therefore alre
ady has a 
relatively stable and accurate estimate of the value of 
Policy A
. This
 
results in a situation where 
in early transfer attempts the agent will have a good idea of the true value of 
Policy A
, and 
therefore their theoretical efficacy for that behavi
or as it has been argued that the value of the 
p
olicy 
a
nd efficacy are equivalen
t in this model, and if that true value is above the threshold 
where they are willing to use that behavior. Simultaneously, they are unsure of the true value, 
and therefore the
ir efficacy, for 
Policy B
 
and just a couple poor experiences with 
Policy B
 
can 
e
asily lead to their value estimate falling below the threshold and them discarding the 
p
olicy 
b
efore ever truly giving it a fair chance. It is worth noting that this discarding
 
of 
Policy B
 
based 
 
137
 
on these experiences again fits with general predictions of r
ecent narrative theorizing around the 
transfer process (Blume et al., 2019). The negative effect of the threshold then occurs at a lower 
level than the true value of 
Policy B
 
b
ecause 
even relatively lower thresholds will sometimes 
lead to the agent erroneo
usly discarding 
Policy B
 
based on few experiences.
 
Combined with this 
effect, sometimes the value of 
Policy A
 
will be overestimated based on pre
-
training experience, 
making it 
even less likely the agent will decide to transfer 
Policy B
. Then, in the transf
er 
environment
,
 
that agent discards 
Policy B
 
only to potentially learn over time 
Policy A
 
is not as 
valuable as it believed, and the overperformance of 
Policy A
 
observed in the
 
pre
-
training 
environment will tend to even out over the force of the extra time
 
simulated in the transfer 
environment. This over
-
estimation then correction likely explains the observed negative effects 
seen in pre
-
post training performance comparisons her
e. 
Despite the initially surprising nature of 
this effect, it would again help e
xplain the general belief in low levels of training transfer if 
individuals are giving up on that transfer in part due to a misreading of the benefits of their 
training compare
d to their personal willingness to employ that training.
 
Overall, given these re
sults, the present model is potentially viable for studying the basic 
patterns of relationships we might expect in the transfer environment. 
F
uture research 
should
 
fine 
tune th
e way in which the policy value estimates operate to better match real
-
world obs
ervations. 
Or, the model will require further exploration to understand under what parameter combinations 
the expected relationships may be reproduced. For example, it could be
 
that when the difference 
in policy values from 
Policy A
 
to B are even smaller t
han .05, the relationship between the value 
estimate and transfer may similarly decrease as the agent would erroneously choose to apply 
Policy A
 
more often. However, this would
 
also decrease the overall rate of transfer and 
potentially mo
v
e the model out o
f acceptable ranges in other ways. 
 
 
138
 
Practical Implications
 
 
Th
e interesting finding that negative transfer outcomes begin at threshold levels well 
below the true value of a trained behavioral policy has significant implications for how we 
approach transfer
 
in real organizations. In the present simulations we see that
 
it is
 
possible to 

behavior is well below that theoretically required for them to do so. Therefore, i
n our training 
interventions we should take extra care in ensu
ring trainees are willing to try their new training 
back on the job multiple times before judging whether to retain or discard it for future use. This 
could include measures taken within the tra
ining program itself, such as providing examples of 
the traini
ng working to provide evidence that it should be useful, or during the transfer phase 
such as check
-
ins on their progress and supervisor support 
early in the transfer process 
before 
the learner 
has a chance to discard the training as not being useful.
 
Conc
lusion
 
 
The model explored in this study represents the final iteration of the LTM for the present 
paper. Given the pattern of observed results it appears the model can be defensibly applied to 
the 
study of training transfer as it is able to largely reprod
uce expected patterns of relations and 
results. However, more work will need to be done in the future to fine tune aspects of the model 
to better fit existing data. For now, the model appears to
 
be a useful first step towards accounting 
for transfer effect
s with a dynamic process theory, and that the model could provide potentially 
novel and useful insights for future research and practice.
 
 
139
 
Study 
4
: Exploring the Full LTM Model
 
Over the course o
f the present paper, we have explored several iterations of a 
process
-
oriented theory of learning transfer called the Learning Transfer Model. This evolving theory 
was instantiated in a series of computational models and explored to establish generative 
su
fficiency for existing research findings in the transfer liter
ature. Based on the simulations 
presented here, it appears that this process has largely, though not completely, been successful. 
However, work is not yet done. One strength of computational mod
els is the ability to run novel 
experiments in a low
-
risk envi
ronment to provide insights for theory and practice that would not 
normally be feasible, 
if not
 
completely impossible, in a traditional research environment. 
Therefore, the final 
study
 
of this p
aper takes advantage of the developed modeling platform to 
dem
onstrate some of the types of experiments that can be conducted in this environment and 
discusses some of the implications of those findings.
 
The experiments executed here were chosen 
a priori 
f
or the apparent potential novelty of the effects that we do no
t typically study in the 
transfer literature, as well as their ability to demonstrate effects we may not be able to easily 
study in real world environments without prior guidance.
 
Experiment 4A:
 
Engagement Thresholds, Value Changes
 
and Implementation Inten
tions
 
 
The first exploratory experiment pitted level of engagement threshold, value changes, and 
implementation intentions against each other. We saw in Study 3C that engagement thresholds 
have 
an overall negative effect on transfer outcomes with a rapid c
hange in outcomes as those 
thresholds approach the values of the available behavioral policies. One possible implication of 
this finding is that thresholds for trainees need to be surprisingly 
l
ow
 
to ensure positive transfer 
outcomes given the trained KSAO
. On the other hand, it might suggest that individuals with 
especially 
high
 
thresholds for engagement would require especially valuable new KSAOs from a 
 
140
 
training event to ensure their successful
 
transfer outcomes. 
In part, the present experiment 
explores t
he tradeoffs between these two factors in order to guide decisions regarding training for 
individuals based on their likely willingness to engage with the given task using their trained 
KSAO, an
d the theoretical performance value of that KSAO.
 
 
However, on
e way to overcome the reluctance to engage the task with the trained KSAO 
may be to pair that training with implementation intentions 
to
 
make the response more automatic 
(Gollwitzer & Sheeran, 2
006). The positive effects of implementation intentions were 
d
emonstrated in Study 1. It was expected that implementation intentions would reduce the 
negative effects of thresholds on transfer. 
It was further expected that implementation intentions 
would h
ave a larger effect on transfer outcomes when engagement thres
holds are high, but the 
value of improvement for the new policy is low. This was expected because when the value of 
the new policy is already high it should more often be able to overcome the th
reshold without the 
need for the extra intervention of impleme
ntation intentions.
 
Methods
 
To explore these effects, a three
-
way experiment was designed using the computational 
version of the LTM settled upon in Study 3C. To limit the number of runs require
d, the range of 
parameters simulated were limited more to rang
es where effects were most salient in previous 
simulations. 
To this end, engagement threshold was limited to the range of .50 to 1.0, swept in 
.05 increments; implementation intentions were swep
t in .05 increments from 0 to .05; and value 
change to 
Policy
 
B
 
was limited to 
-
.10 to .30, in .05 increments. Other variables were held 
constant as before, with the true value of 
Policy A
 
being .70, type 2 likelihood of .80, pre
-
training time points of 10
0, with 500 transfer time points, but initial policy value es
timates were 
 
141
 
again set to 1.0 to ensure no artificial limiting of transfer due to the threshold variable, 
one 
agent 
was simulated in each run. 500 replications for each condition were created.
 
Re
sults
 
Initial analyses suggest the effects of all three varia
bles explored here 
are
 
in the expected 
direction across replications on our outcomes of interest. Implementation intentions had small 
but positive relationships with behavioral transfer 
(
r
(297000
)
 
= 
.026
, 
p 
< .001
) and post training 
performance (
r
(297000)
 
= 
.020
, 
p 
< .001
), while changes in policy value had substantial positive
 
relationships with both behavioral transfer (
r
(297000)
 
= 
.649
, 
p 
< .001
) and post training 
performance
 
(
r
(297000)
 
= .721
,
 
p 
< .001
)
. On the other hand, engagement 
thresholds are
 
negatively related to both behavioral transfer (
r
(297000)
 
= 
-
.283
, 
p 
< .001
) and post training 
performance (
r
(297000)
 
= 
-
.168
, 
p 
< .001
).
 
Given the nature of the present experiment
,
 
it is not 
advisab
le to interpret these correlations for the
ir strength as the targeted conditions could be 
either enhancing or truncating 
them
, but it is notable that they are in the expected directions.
 
Next, moderated multiple regression analyses were completed predictin
g behavioral transfer
,
 
post training perfo
rmance
, and the effect size for pre
-
post performance change
 
from the three
-
way interaction of implementation intentions, engagement thresholds
 
and 
value change. The 
resulting parameters for these models can be foun
d in Table 
13
, and graphs of the interacti
ons in 
Figures 
60
-
62
. 
Heat maps were then generated at the condition level to explore these effects 
further and
 
can be found in Figures 
63
-
65
. These analyse
s reveal that when the value of a policy 
is low, transfer 
is generally poor unless the threshold for
 
engagement is low and implementation 
intentions are high. Such a pattern is okay though, because in the case of low value we generally 
actually do not want transfer to occur, unless there is a non
-
performance reas
on to do so, as it 
will reduce performance
. When the new policy has a high value, the effect of threshold level 
 
142
 
dominates the rate of transfer such that low threshold levels are very beneficial and high levels 
are very detrimental. Beyond the effect of thr
esholds, having strong implementation inte
ntions 
only have a noticeable affect when thresholds are already low. Patterns of results for both post 
training performance and pre
-
post training performance change are similar. 
 
Discussion
 
The results for this ex
periment were somewhat surprising, especia
lly when it came to the 
effect of implementation intentions. It was expected that implementation intentions would have a 
stronger effect when policy values were 
low,
 
but
 
threshold levels were high, essentially acti
ng as 
a way to overcome the detrimental effects of high engagement thresholds. This was not the case. 
Instead, implementation intentions showed their strongest effects when engagement thresholds 
were already low
, suggesting implementation intentions did no
t act as a way to overcome a 

already willing to do so. This is the type of surprising finding that a model such 
as this can put 
forth to guide future researc
h 
and 
opens the model to falsification. If this unexpected finding 
holds up to further scrutiny it would suggest that in designing training events one should first 
focus on encouraging trainees to lower their en
gagement threshold before worrying about the 
use 
of implementation intentions. We know implementation intentions are generally effective 
additions to training events (
Friedman & Ronen, 2015
), but their use may be for naught if our 
learners are unwilling to
 
engage with the trained KSAO anyways.
 
Experi
ment 4B: Number of Trainees, Conformity, and Goal Levels
 
 
A primary strength of the modeling platform built in this paper is the ability to explore 
social effects on transfer outcomes without requiring the hundreds, or even thousands, of 
individuals that w
ould be required just to explore these ideas using real wo
rld data. This allows 
 
143
 
us to look for potential effects of interest from the theory and use that modeling to guide future 
targeted data collections and utilize our limited resources more judiciously.
 
To this end, the rest 
of the exploratory simulations disc
ussed here focus on the social effects of the conformity 
mechanism established in Study 2
C
. 
The simulations in Study 2C showed that high levels of 
conformity were extremely detrimental to 
transfer o
utcomes, especially after the number of 
agents reached abo
ut 3 or 4. One possible way to overcome the pressures of the group to 
conform is for individuals to have higher goals that will lead to them exploring behavioral 
possibilities more even in the face 
of that pressure. To test this possibility, the initial si
mulation 
from Study 2C crossing number of trainees with level of conformity was extended to include an 
effect of goals which was introduced in Study 3A. It was expected that conformity would still 
h
ave a negative effect, especially as the number of trainee
s increased, but that negative effect 
would be tempered by increased goals.
 
Methods
 
The final model from Study 3C was again used to conduct this exploration. Trainees were 
swept from 1 to 20 in 1 tr
ainee increments, conformity from 0 to 1.0 in .05 incremen
ts, and goals 
from 0 to 1.0 in .05 increments. Other variables were held constant as before, with the true value 
of 
Policy A
 
being .70, type 2 likelihood of .80, pre
-
training time points of 100, wit
h 500 transfer 
time points, and initial policy value estim
ates of .50. 500 replications were completed for each 
condition.
 
Results
 
Initial results largely 
produce expected 
relationships between variables of interest here 
and behavioral transfer and post tr
aining performance across all replications. The number of 
trainees in the model was negatively related to both 
transfe
r (
r
(4410000)
 
= 
-
.199
, 
p 
< .001
), and 
 
144
 
post training performance 
(
r
(4410000)
 
= 
-
.145
, 
p 
< .001
). The same was found for the 
relationships b
etween conformity and transfer (
r
(4410000)
 
= 
-
.766
, 
p 
< .0
01
) and post training 
performance (
r
(4410000)
 
= 
-
.556
, 
p 
< .001
). However, goal levels were positively related to both 
transfer (
r
(4410000)
 
= .093
, 
p 
< .001
) and post training performance (
r
(4410000
)
 
= .068
, 
p 
< 
.001
)
. To further understand these simulated effects, multiple moderated regression analyses 
were completed testing the three
-
way interaction of trainees, conformity, and goals on behavioral 
transfer and post training performance. Parameter e
stimates for these 
models can be found in 
Table 
14
. Additionally, graphic depictions of these interactions can be found in Figures 
66
 
and 
67
. Additionally, 
heat maps of these results at the condition level are depicted in Figures 
68
 
and 
69
. 
Due to the misl
eading results with
 
changing numbers of trainees observed in previous 
simulations, effect sizes for pre
-
post performance change were not computed for this experiment. 
The general effect of conformity in this experiment is identical to that observed in Stud
y 2C 
where conformi
ty levels above about .45 largely eliminate the transfer of the new policy. 
However, we do see that goals have a
n
 
effect where they essentially push this boundary slightly 
higher, such that it now occurs around .50 conformity. We also se
e an example of a p
otentially 
misleading result when relying on only traditional methods to examine these results, where the 
regression model and simple slopes analysis suggests an effect of the number of trainees such 
that fewer trainees is very detriment
al when goals are l
ow, but more trainees are detrimental 
when goals are high. When we examine the heat maps of the results instead, we see that the 
effect of the number of trainees across levels of goals is largely the same and this apparent 
interaction sh
ould not be over in
terpreted.
 
 
145
 
Discussion
 

severely depressed transfer outcomes once the degree of conformity reaches about .45. The 
likely reason for this is that t
he
 
default behavior i
s to not transfer, so the pressure to follow along 
at the next time step will tend to keep transfer low. Once conformity is low enough to allow 
exploration the agents are much more likely to explore and discover the benefits of their t
ra
ining 
and therefore
 
begin to transfer. 
What we see that is new here is a tempering effect of high goals 
on the depressive effect of conformity. 
Specifically, it appears that high goals shift the sensitive 
area between failure to transfer and where transf
er
 
begins improving f
rom a conformity level of 
about .45 to about .50. This is a small but potentially very important effect suggesting that good 
goal setting may help push some individuals who would otherwise be on the fence regarding 
successfully transfe
rr
ing their training 
back to their work environment towards overcoming the 
pressures of the social world around them and doing so.
 
Experiment 4C:
 
Value Change, Conformity, and Goal Levels
 
 
Another way to potentially overcome the negative effects of conform
it
y on transfer 
outco
mes would be to improve the performative value of the newly trained KSAO
 
represented 
by Policy B
. Doing so should provide extra incentives initially for individuals to break from their 
work groups and begin using their newly trained KS
AO
. Then, the pressure to conform should 
benefit
 
high
-
valued KSAOs once transfer has begun to spread that tendency quickly through the 
group and improve overall outcomes. Similarly, especially low
-
valued KS
AOs should quickly be 
discarded by the group 
in
 
fa
vo
r of keeping their old KSAO in place. Therefore, it is expected that 
outcomes will be made more extreme, positively and negatively, by different levels of value 
change. 
It is also expected that the positi
ve effects seen when values are high will be furth
er
 
 
146
 
enhanced when goals are moderately high due to the increased exploration undertaken by agents, 
but not when goals are so high individuals are unwilling to exploit the better policy once it is 
found.
 
Meth
ods
 
The final model from Study 3C was again used a
s 
the base model to conduct this 
exploration. 
G
oals 
were swept 
from 0 to 1.0 in .05 increments
, conformity from 0 to 1.0 in .05 
increments, and three levels of value change at 
-
.10, .05, and .20
. 
These cond
itions for value 
change provide equidistant condit
io
ns of one negative behavior we should want the agents to 
discard, one representing the typical change we have discussed throughout this paper, and one 
especially beneficial training event. 
Other variables
 
were held constant, with the true value of 
Policy
 
A
 
being .70, type 2 likelihood of .80, pre
-
training time points of 100, with 500 transfer 
time points, and initial policy value estimates of .50. 500 replications were completed for each 
condition. However
, given the results from the exploration in 4B, an
d 
previous simulations of the 
number of trainees in the model in Study 2, it was decided to choose a constant number of agents 
for the simulated work group. Based on those results, it was decided to simulat
e groups of 3 as it 
appears that results pretty we
ll
 
stabilize once this number is reached. Limiting the simulation to 3 
agents also has the benefit of 
being large enough to be traditionally considered teams 
(Tannenbaum, Mathieu, Salas, & Cohen, 2012), but
 
go beyond the study of dyadic relationships
. 
In a
dd
ition, limiting the teams to 3 instead of a larger number would reduce the burden on 
participant recruitment for any future attempts to apply the results of the present simulations to 
empirical investigat
ions.
 
 
147
 
Results
 
Findings
 
suggest the effects of al
l th
ree variables explored here 
are 
generally 
in the 
expected direction across replications on our outcomes of interest. 
Value change was positively
 
behavioral transfer (
r
(661500)
 
=
 
.533
, 
p 
< .001
) and post 
training performance (
r
(661500)
 
= 
.
711
, 
p 
< .001
)
. Co
nformity 
had 
negat
ive
 
relationships with both behavioral transfer (
r
(661500)
 
= 
-
.
670
, 
p 
< .001
) and post training performance (
r
(661500)
 
= 
-
.
369
, 
p 
< .001
). 
However
, 
goal 
level showed a positive relation
ship with
 
behavioral transfer (
r
(661500)
 
= 
.044
, 
p 
< 
.001
)
, but a 
negati
ve one with
 
post training performance (
r
(661500)
 
= 
-
.016
, 
p 
< .001
).
 
Given the small 
nature of this negative relationship and the possibility of negative interactions with the 
other 
variables here, this finding should not outweigh th
e ot
her effects of goals observed in this paper.
 
M
oderated multiple regression analyses were completed predicting behavioral transfer
,
 
post 
training performance
, and pre
-
post performance change 
from 
the three
-
way interaction of 
conformity, goal level,
 
and 
valu
e change. The resulting parameters for these models can be 
found in Table 
15
, and graphs of the interactions in Figures 
70
-
72
. Heat maps were then 
generated at the condition level to explore thes
e effects 
further and
 
can be found in Figures 
73
-
75
. 
As b
efor
e, high levels of conformity have substantial negative effects on transfer outcomes. 
It also does not appear in the regression analysis that high policy values can overcome those 
negative effects
 
of conformity, but we do potentially gain some nuance on
 
the
 
effects of goals 
and see they have a slight effect only when both conformity and value changes are low. In 
examining the heat maps, we gain a greater understanding of the effects, especially an 
effect 
such that when value change is especially low, beh
avio
ral transfer is best when goals are high. 
But when values are high the best transfer occurs when goals are lower. We see the opposite 
essential pattern for performance, in that performance is wor
st at high goal levels when value 
 
148
 
change is low, and best
 
whe
n values are high with low conformity and moderate goal levels.
 
In 
the heat maps, it does appear that high values change the discontinuity for conformity slightly, 
provided goals are not extremel
y high, such that positive outcomes occur at slightly hig
her 
levels 
of conformity.
 
Discussion
 
Th
ese results do not have the 
strength
 
to overcome the negative consequences of social 
pressure in the model that was expected. There are slight positive effects 
of having highly valued 
new KSAOs in overcoming the detri
ment
al effects of conformity, but these are similarly weak as 
those seen for goals overall in the previous experiments. Further, the effects of goals in the 
model becomes clearer as agents again expl
ore sub
-
optimally under many conditions, but a 
positive e
ffec
t of conformity, if there is one, is that agents do not improperly explore undesirable 
policies if their social group does not allow them to do so. Along the same lines, the transfer that 
does oc
cur when values are low tends to be maladaptive as agents
 
mak
e the mistake of 
continuing to apply their training when they should not, largely as a function of high goals and 
the freedom to do such exploration. An interesting implication here is for traini
ng which an 
organization knows will reduce performance, b
ut m
ay have other necessities, such as legal 
compliance. In such cases it is apparent that the organization will need to work to overcome 
substantial individual and group 
processes to make the new tr
aining successfully transfer back to 
the work environment
s. I
t is in such cases where physical tools, such as checklists or software, to 
assist with compliance 
seem
 
likely to be of extra value.
 
Experiment 4D: 
Type 2 Likelihood, Conformity, and Goal 
Levels
 
 
A final exploratory simulation examined the effect of th
e ab
ility for individuals to engage 
in type 2 cognitive processes on observed transfer outcomes across conformity and goal levels. 
 
149
 
In this experiment, no direct predictions were made 
a priori 
as it is unclear what the effect of 
changing levels of type 2 li
keli
hood might be in this complex 
simulation
. One might think that 
allowing individuals to engage in deeper cognitive processing would better allow them to think 
about the benefits of their ne
wly trained KSAOs, but it would also allow them to think more 
ab
out 
the potential consequences of not conforming to their social group. This counteractive 
effect could wash out any gains from improving cognitive processing. At the same time, lower 
type 2 p
rocessing would lead to initial difficulties in transfer as trai
nees
 
habitually apply their old 
KSAOs to the presented task, but would provide potential benefits in countering the effects of 
their social groups if they are able to establish their newly tra
ined KSAO as their habitual 
response. These contradictory possib
ilit
ies were explored in this experiment.
 
Methods
 
For a final time, the model coming from Study 3C was used to explore the joint effects of 
type 2 likelihood, conformity, and goal levels. 
For 
this experiment, conformity
, 
goals
, and type 2 
likelihood
 
were a
gain
 
swept from 0 to 1.0 in .05 increments.
 
Other variables were held constant, 
with the true value of 
Policy A
 
being .70, type 2 likelihood of .80, pre
-
training time points of 
100, with 500 t
ransfer time points, and initial policy value estimates of .50, 
3 tr
ainees per 
simulation. 500 replications were completed for each condition.
 
Results
 
Initial analyses suggest the effects of all three variables explored here have effects in the 
expected di
rection across replications on our outcomes of interest. 
Goal le
vels
 
again
 
had small 
but positive relationships with behavioral 
transfer 
(
r
(4630500)
 
=
 
.068
, 
p 
< .001
) and post training 
performance (
r
(4630500)
 
= .
038
, 
p 
< .001
), while 
type 2 
likelihood had
 
positive relationships 
with both behavioral transfer (
r
(4630500)
 
= .
542
, 
p 
< .001
) and post training performance 
 
150
 
(
r
(4630500)
 
= .
302
, 
p 
< .001
).
 
Conformity again showed negative relationships with
 
both 
behavioral transfer (
r
(4630500)
 
= 
-
.593
,
 
p 
< .001
) and post training performance (
r
(4630500)
 
= 
-
.330
, 
p 
< .001
).
 
M
odera
ted 
multiple regression analyses were completed predicting behavioral 
transfer
, 
post training performance
, and pre
-
post training performance change
 
from the three
-
way interactio
n of 
type 2 likelihood, conformity, and goals
. The resulting parameters for the
se 
m
odels can be found in Table 
1
6
, and graphs of the interactions in Figures 
76
-
78
. Heat maps 
were then generated at the condition level to explore these effects further and ca
n be found in 
Figures 
79
-
81
.
 
The moderation results initially suggest a typical
 
mod
eration effect where we see 
the best transfer outcomes when conformity is low and type 2 likelihood is high, largely 
regardless of goal level, and all other combinations res
ult in poor outcomes. Our heat maps 
generally confirm this effect with little e
lse 
to add, with the exception that very high levels of 
type 2 likelihood are the only levels which substantially overcome the effects of conformity.
 
Discussion
 
It was unclear what to expect 
a priori 
for the present simulation, and it was found that 
potent
ial 
beneficial effects of goals and type 2 likelihood were essentially wiped out at all levels 
of conformity
 
with the only exception being the ability of high ty
pe 2 likelihood to lead to 
positive outcomes. Importantly, the effect of goals in overcoming th
e ef
fects of conformity were 
almost non
-
existent once controlling for the effect of type 2 likelihood. Interestingly, type 2 
likelihood appears to do a better jo
b than any other intervention tested here in overcoming the 
negative effects of conformity, but
 
typ
e 2 likelihood must be high. This effect suggests that 
in 
designing 
training interventions attending to environmental characteristics will be of great 
concer

 
151
 
environ
ment
 
allow them to engage in the kind of cognitive processes and exploration required to 
lead them to discover their training is beneficial to completing the rel
evant task.
 
Overall Discussion
 
 
The four experiments described here were meant to be demonstrat
ions
 
of the potential of 
the modeling platform developed throughout this paper to provide novel insights and guidance 
for future research and practice in organiz
ational training and transfer. 
One of the primary 
strengths of computationally modeling theorie
s su
ch as the LTM lies in the ability to conduct 
such explorations in a low
-
cost and risk
-
free environment prior to committing the resources 
necessary to do simi
lar explorations in empirical data collections. 
In these experiments, results 
suggested that
 
th
e po
wer of social learning as seen in the mechanism of conformity is a powerful 
depressing effect on transfer outcomes. Unfortunately, overcoming this effect is 
not necessarily 
easy, though goals and the ability to engage in type 2 cognitive processes show
 
som
e promise.
 
The
se results can be used to guide future data collections to continue testing the present model, 
and potentially for guidance in designing and su
pporting effective organizational training events.
 
 
152
 
Overall Discussion
 
 
Training represents on
e of
 
the classic areas of inquiry and practice in Organizational 
Psychology, with over 100 years of research to show for it (Bell et al., 2017). In that time, we
 
have developed a substantial body of knowledge which has allowed us to continuously improve 
th
e wa
y we deliver training interventions in organizations and thereby improve training 
outcomes (
Bell et al., 2017; Salas et al., 2012).
 
Unfortunately, this base 
of knowledge focuses 
largely on the training event itself and generally treats the transfer of 
that
 
training as a cross
-
sectional outcome (Foxon, 1997). This typical approach necessarily limits our knowledge 
because we are not generally studying transfer a
s a process that itself unfolds over time. The 
failure to study transfer as a process is unfort
unat
e as we have acknowledged that it is indeed a 
longitudinal 
phenomenon for at least 30 years (Baldwin & Ford, 1988). However, in practice, 
few studies measure
 
transfer longitudinally, and even fewer unpack the dynamic processes 
driving that transfer, wi
th f
ew notable exceptions (e.g., Dierdorff & Surface, 2008; Huang et al., 
2015; Huang et al., 2017).
 
 
Recently, a group of researchers, including the present aut
hor, has begun more 
substantially to attempt to unpack the processes underlying training transf
er. 
Most prominently, 
Blume et al. (2019) described training transfer as a self
-
regulatory
-
driven process
, labeled the 
Dynamic Transfer Model (DTM),
 
where traine
es iteratively attempt to transfer their learning to 
their work environment and subsequently ke
ep o
r discard their newly acquired KSA
O
s based 
upon the feedback they received. The primary drawbacks to their model lie in its narrative 
nature, and its failure
 
to unpack the cognitive and learning mechanisms underlying the proposed 
feedback process. Surf
ace 
and Olenick (forthcoming) 
are attempting to push the DTM to a lower 
level of abstraction and begin theorizing about how the transfer process may be driven by
 
the 
 
153
 
interpretation of environmental cues and subsequent execution of available behavioral scri
pts,
 
based largely in the same Dual Processing framework used in the present paper. However, their 
advancement still relies on narrative theorizing. 
Then, Olenic
k et al. (in press) 
began to push 
transfer research towards using more mathematical bases by ap
plyi
ng non
-
linear dynamics to 
discuss training and transfer as a process of discontinuous shifts where old patterns of behaviors, 
represented by attractors in a 
mathematical sense, must be broken free from and new patterns 
formed. Their lens demonstrates h
ow t
ransfer trajectories can be modeled as dynamic processes 
that unfold over time as governed by mathematical attractors, which provides a more formal 
framework
 
from which to build future research.
 
 
The Learning Transfer Model presented in this paper repr
esen
ts a culmination, of sorts, 
of these efforts. The LTM takes the step of fully formalizing the learning and decision 
mechanisms I propose underly the process 
of learning/training transfer in organizations. In doing 
so, the LTM integrates theories from a
cros
s psychology, 
using
 
Dual Process Cognition (e.g., 
Kahnemann, 2011)
 
as a broad framework
, self
-
regulation (e.g., Carver & Sheier, 1998), and 
Social Learning Theory (Bandura, 1977), with theories from outside of psychology, such as 
computational reinforc
emen
t learning (Sutton & Barto, 2018). Further, computational 
approaches to social learning
 
were borrowed from 
studies of gene
-
culture coevolution
 
(Richerson & Boyd, 2005) to discuss the effects of social learning on transfer from a lens of the 
simultaneou
s em

The fina
l model, demonstrated via experiments in Study 4, broadly suggests that learners return 
to their work environment and must apply some new KSAO to their work instead 
of s
ome 
existing KSAO they already were using. When encountering the applicable task, the l
earner 
initially decides quickly and automatically, via type 1 cognitive processes, which KSAO to apply 
 
154
 
based on previous experience. In some cases, 
the individual w
ill 
have the opportunity to engage 
in deeper levels of cognitive processing and make a more
 
conscious and informed decision 
regarding which available KSAO they should apply, these are governed by type 2 cognitive 
processes. Once an approach is chosen, the 
lear
ner applies that choice to their task and receives 
feedback regarding the successfulnes
s of their attempt. That feedback allows them to learn over 
time which of their available KSAOs can best allow them to perform the task to their desired 
level. If th
e ne
w KSAO is perceived to be better, regardless of if it is actually better or not, than 
t
heir previous KSAOs the learner will transfer that new KSAO over the long term. Complicating 
matters, individuals do not always actually attempt tasks because they l
ack 
the confidence in 
their ability to succeed so may decide not to even attempt to transfe
r their learning. Further, 
these 
learning and decision processes do not take place in a vacuum as learners are often embedded in 
work groups. The environment for tra
nsfe
r is then a simultaneous emergent phenomenon 
governed by the individual experiences of 
all the learners in their transfer attempts, and a causal 

climate around 
them
 
through either conforming or imitating mechanisms. As these decisions and 
learning eve
nts play out over time, an individual may follow any one of a nearly infinite set of 
transfer trajectories that
,
 
in the end
,
 
result in what we traditionally observe 
as s
uccessful transfer 
or not.
 
 
This overall theory was formalized and instantiated into a 
computational model
 
in 
NetLogo
, building from existing mathematical frameworks such as computational reinforcement 
learning. A series of simulations then explored th
e mo
dels and developed them in an iterative 
fashion. The goal for this iterative process wa
s to explore each model and according to 
established modeling steps check the models for verification, generative sufficiency, robustness, 
 
155
 
and sensitivity (Railsback
 
& G
rimm, 2012). 
In addition, this process importantly opened the 
theory to an initial roun
d of falsification (Popper, 1959). 
Overall, this process suggested the 
LTM, as originally proposed, was 
in many respects
 
successful in its initial attempts to accoun
t 
fo
r broad patterns of findings within the transfer literature
, but not completely so
. For
 
example, 
the LTM was able to reproduce a range of behavioral transfer rates typically discussed in the 
literature (e.g., Ford et al., 2011), and general effect size
s fo
r performance improvement we may 
expect in real world situations. However, it was also 
found that these findings were only true for 
some areas of the potential parameter space covered by the model, which were used in later 
simulations for further explo
rati
on. Such findings do not any more invalidate 
the present theory 
than do traditional tes
ts of narrative theories in organizational psychology to establish boundary 
conditions (e.g. Grant, 2008; Hollenbeck, Colquitt, Ilgen, LePine, & Hedlund, 1998; Yamma
rino
 
& Dubinsky, 1994).
 
Instead, it appears that the LTM is a plausible process explanation
 
for 
general transfer findings 
provided the model is within certain parameters
. Outside of those 
parameters the model may not apply to the phenomena of interest for 
at l
east two reasons. First, 
it may be that the theory itself breaks down outside of the es
tablished parameter ranges which 
produce the kinds of relationships and results we are used to seeing in the research literature. If 
this is the case, the model woul
d be
 
falsified for those conditions and need to be further refined 
to operate under those c
onditions if it is deemed necessary, much as we would iterate a narrative 
theory. Second, as argued previously, it could be that it is not the theory that breaks dow
n, b
ut 
rather the limited range of conditions in which we tend to do our research. The mode
l may be 
able to simulate 
conditions outside the bounds of reality, and therefore would not need to be 
applicable to them, and the breakdown in these ranges is there
fore
 
not a shortcoming.
 
 
156
 
 
However, one of the strengths of formal theorizing and computation
al modeling is the 
greater ability to falsify and iterate theories than achieved through traditional narrative theory 
building. This strength is clearly shown in Stu
dy 2
 
where the initially proposed pooling 
mechanism was incapable of replicating the expect
ed social effects observed in the transfer 
literature. This model, being overly parsimonious and subsequently falsified via virtual 
experimentation, was able to be i
tera
ted by testing two alternate models of social learning
 
borrowed from modeling of cultur
al effects on populations (Richerson & Boyd, 2005) which 
utilized mechanisms of imitation and conformity. Unlike the originally proposed mechanism in 
the LTM, 
both
 
m
echa
nisms appeared to provide some plausible results and novel insights into 
the nature of 
social effects in the transfer environment. Following some exploration, it was 
argued that with some reconsideration of how we operationalize culture and climate for
 
tra
nsfer, 
the conformity model may fit current findings in the research literature better 
and was retained 
for further exploration.
 
 
Over the course of the iterative theorizing and model
-
building approach outlined 
throughout this paper, a final version of
 
the
 
LTM was accepted, for now, and more fully 
explored in Study 4. Through this process, i
t is argued that the present paper has accomplished 
its primary goals of 1) providing a formal, process
-
oriented theory of training transfer, 2) 
integrating multiple
 
dis
parate theories to explain that process, 3) bringing outside theories, such 
as computat
ional reinforcement learning
 
and dual process cognition
, more into the organizational 
psychology literature, and 4) building a modeling platform that would allow for
 
the
 
thorough 
exploration of the proposed theory for both theoretical and practical implica
tions. It is to these 
implications we now turn.
 
 
157
 
T
heoretical Implications
 
and Future Research Directions
 
 
al as
 

In that spirit, the present paper sought to further our underst
anding of one of the most practically 
impactful research areas in 
all
 
organizational psychology, training and transfer, by introducing a 
mechanistic process theory 
of tr
ansfer. To support the veracity of this theory, the Learning 
Transfer Model, a computat
ional model was generated and explored 
to
 
account for existing 
general findings in the research literature, a process referred to as establishing generative 
suffici
ency.
 
As discussed throughout this paper, these simulations suggest that the LTM 
can 
reprodu
ce
 
the general patterns of many research findings in this space. Therefore, it is argued 
that 
the LTM, as currently specified, 
generally 
provides a plausible
 
proces
s
-
exp
lanation for 
training transfer
.
 
The ability of the present model to broadly account for many findings in the 
transfer literature is a critical first step in building a unifying theory for this area of our science 
and continue to improve our scientific
 
rigo
r (
Muthukrishna & Henrich, 2019
).
 
 
The general success of the LTM displayed in this paper has a couple of interesting 
implications for how we think about training and transfer in our literature. First, Blume et al. (in 
press) recently suggested the ne
ed fo
r more work on transfer as an i
ndividualized process where 
trajectories between individuals are likely to be highly idiosyncratic. Modeling the LTM 
reaffirms this case as it was evident that individual trajectories of agents can vary substantially. 
On
e fur
ther implication of the LTM in 
terms of that individualization process is the importance 
of viewing transfer from a perspective of need fulfillment. Throughout this paper we have seen 
agents are only likely to transfer their training when that trainin
g rep
resents an improvement over 
the
ir old behaviors, the training allows them to meet their personal goals, and they are allowed 
the ability to ascertain that benefit. Thus, if an individual is unable to discern how or whether 
 
158
 
their training meets their o
wn ne
eds then transfer is unlikely. 
Future work should continue to 
unpack this individualized nature of training transfer.
 
 
Further, the development of the LTM in this paper should encourage other researchers to 
look more closely at other fields as they be
gin t
o develop formal models of thei
r own processes 
of interest. As a field, organizational psychology has not been on the forefront of the 
development of formal models and many other fields, from computer science, to biology, to 
economics, have been using
 
math
ematical tools to model their p
rocesses for decades. We could 
likely draw on their already existing models and associated mathematical approaches to inform 
much of our own work on the organizational processes in which we are interested. Being willing 
to us
e their work will keep us from 
reinventing the wheel when it comes to discovering many of 
the same essential processes. Similarly, through integrating models from across the sciences we 
can likely help place a break on continued construct and theoreti
cal p
roliferation where many 
researc
hers from many different fields all study the same essential phenomenon but develop their 
own theories and constructs to explain and describe those phenomena. The historically siloed 
approach to science has likely slowed
 
our 
knowledge accumulation and led 
to sprawling and 
confused literatures passing each other like ships in the night as each independently seek to solve 
similar problems. The ability of the LTM to provide a process capable of largely reproducing 
typical tr
ansfe
r findings in a relatively pars
imonious model by integrating knowledge from 
across several disparate fields should 
provide further impetus for interdisciplinary work in the 
future.
 
 
However, i
t is not contended that the present paper has established t
he LT
M as the 
correct 
model of train
ing transfer, only that it is a plausible explanation, or at least a plausible 
step in establishing such a theory. Perfection was never the goal of the present theorizing, and the 
 
159
 
LTM cannot be evaluated against such a s
tanda
rd. As Box (1976) contends, all
 
theories are 
wrong, the goal is to remain parsimonious while providing an explanation for the phenomena at 
hand. The LTM, although integrating multiple disparate theories, only has a few actual 
mechanisms when expressed
 
form
ally, making the overall model 
fairly parsimonious while still 
appearing to be broadly applicable to transfer research. 
The question then becomes not 
necessarily whether the theory is incorrect, but in which ways it is 
meaningfully wrong 
(Box, 
1976).
 
 
As h
as been discussed through the r
esults of the simulations above, there are at least a 
couple of ways in which the current version of the LTM is, or was, meaningfully wrong. For 
example, the effects of practice in the simulations was in the correct dire
ction
, but obviously not 
capable of 
reproducing the desired effects. This is problematic as practice effects are some of our 
best
-
established tools for improving learning outcomes (e.g., Dunloski et al., 2013). 
Additionally, even though in some cases the e
ffect
s of policy value estimates in 
the LTM worked 
nearly perfectly as with the recreation of the effect of efficacy on transfer, those value estimates 
only reproduced the effect of
 
utility reactions
 
at the condition level and not the individual level 
as e
xpect
ed
. On the extreme end, it was 
shown that the initially proposed social learning 
mechanism for the LTM was inadequate for producing the desired social effects. Already within 
this paper two alternative mechanisms were proposed and explored, with both 
showi
ng greater 
potential for illumi
nating social effects in the transfer process.
 
 
Future iterations of the LTM, combined with targeted data collections, will be required to 
fine tune these mechanisms. In the case of value estimates in relation to utility
 
reac
tions the 
underlying mathematic
s will need adjusting. The current effect of initial value estimates quickly 
becomes swamped by the experience of the learning agent, and therefore does not substantially 
 
160
 
affect the willingness of the agent to continue e
ngagi
ng with a task in the face of i
nitial failure. If 
this effect can be 
drawn out over time by changing the updating procedure for value perceptions, 
the initial estimate may be able to better approximate the effect of utility reactions that it was 
thoug
ht th
ose initial estimates would app
roximate. Similarly, the effect of practice within the 
LTM is not strong enough. It is not feasible, in most situations, for practice attempts to 
approximate the number of attempts an individual had using the behavior th
ey ar
e trying to 
replace. Therefore,
 
the mathematical effect of practice attempts must be increased in some way. 
One way to accomplish this would be a multiplier on the practice attempt variable indicating the 
relative effectiveness of those practice attem
pts. 
Low numbers of this moderator v
ariable, such 
as the 
de facto
 
1 it is set to in the present model, would represent poor practice. Higher numbers 
could represent better practice, such as following recommendations for spaced practice, recall 
effects, etc
. tha
t would improve the strength of
 
those practice attempts. Future iterations of the 
model should explore these possibilities.
 
 
As for the social learning mechanisms, more modeling and data collection will be 
necessary to decide if the imitation, conform
ity, 
or a possible mix of both (e.g.
, Lopes et al., 
2009) is needed to account for the social effects observed in transfer environments. 
Future 
empirical work should be partnered with versions of the social learning mechanisms tested in the 
LTM to ascertai
n whi
ch models better fit observed d
ata regarding social interactions, learning, 
and how those lead to transfer or the lack there of over time. Targeted data collections and 
further modeling should then trade off in an iterative way to refine the models an
d det
ermine 
which has the stronger s
upport in the real world. Doing this would be a prime example of strong 
theoretical development (Sutton & Staw, 1995), which is one of the primary draws of engaging 
in computational modeling.
 
 
161
 
 
More generally, studies wi
ll be
 
required to begin directly par
ameterizing the model 
against real data and go beyond the replication of general results. Several good examples of such 
approaches exist in the organizational sciences, ranging from the study of motivational 
phenomena (e
.g., 
Vancouver, Weinhardt, & Vino, 2
014
), to the study of response processes to 
situational judgement tests (Grand, in press). 
However, it is unlikely that many opportunities 
exist to collect data within real organizations at the level of granularity requi
red t
o fit the LTM to 
the kind of mo
ment to moment decisions that are being proposed to drive transfer patterns. Such 
a collection would
, almost of necessity, be highly intrusive and distracting to the point of overly 
interfering with normal organizational
 
oper
ations. For this reason, I reit
erate the calls of other 
papers (e.g., Blume et al., 2019; Olenick et al., in press) to look for opportunities to use new 
technologies which can collect data on decisions and behaviors 
in situ
 
in near real time. These 
in
clude
 
the ability to collect data on
 
momentary use of electronic systems, or sociometric badges 
to 
study
 
interaction patterns (e.g., 
Zhang, Olenick, Chang, Kozlowski, & Hung, 2018)
 
which 
could provide windows into both individual and group behavioral 
norms
. 
 
Al
ternatively, experimental parad
igms will need to be adapted 
to study the mechanisms 
outlined in this paper. Existing options include: a) scheduling tasks which track decisions made 
over many time points to study motivational processes (e.g., DeShon & 
Rench
, 2009; Schmidt & 
DeShon, 2007)
, and b) a radar simulation task called TANDEM which can track participant 
decisions down to individual clicks of a mouse and time spent on various tasks, in a difficult 
environment where much learning is possible (e.g.,
 
Bell
 
& Kozlowski, 2008
).
 
A major drawback 
of such platforms, however, is that the odds of success on the task attached to any specific 
behavior are unknown and might not be possible to be known without extensive simulation or 
prior data collection. This p
oses 
a problem in testing
 
the LTM as it relies on the underlying 
 
162
 
probability of success associated with each behavioral option. Possibilities to overcome this 
limitation lie in using games which are well studied by mathematicians regarding the various 
prob
abili
ties associated with
 
different tactics. It would also be idea
l
 
if such a game allowed for 
many repetitions of the same essential task with an obvious success or failure in short amounts of 
time. Natural fits here include poker and blackjack, which hav
e wel
l established guidel
ines for 
play, happen quickly, and participants can typically be taught different strategies with little 
difficulty. 
Any of these could also potentially be adapted to study the social pressures 
surrounding transfer of any training 
inter
ventions programed i
nto the study environment 
using
 
electronic confederates (Leavitt, Qiu, & Shapiro, 2019), or other real participants.
 
 
Regardless of the research platform utilized, the present model has apparent implications 
for how we analyze trai
ning 
data. 
Many of the si
mulations discussed in this paper suggest the 
effects of various parameters do not always demonstrate the types of smooth, linear effects we 
typically study in the organizational literature, or at least that we typically capture us
ing o
ur 
ordinary least sq
uares regression analytics. Instead, variables often display fairly sudden and 
rapid changes in their effects as levels in the variable of interest change. One example of such an 
effect was the change in behavioral transfer across 
level
s of the threshold v
ariable in Study 3C 
where behavioral transfer rates rapidly decreased from 
a 
threshold level of .60 to about .70. Such 
a pattern is not a complete discontinuity, but it suggests a pattern that may be better analyzed 
through nonline
ar me
thods. For example, 
a cusp catastrophe model could assess the likelihood of 
a target falling on either level of the observed rate of behavioral transfer while treating threshold 
level as a control variable for the location of that discontinuity. 
Such 
model
s have long been use
d 
in studies of animal 
and human 
learning (e.g., 
Baker & Frey, 1980; Guastello, 1987
), and have 
 
163
 
recently been suggested for greater use in the study of 
organizational 
training and transfer 
(Olenick et al., in press). The simulated 
resul
ts of the LTM in thi
s paper reaffirm this suggestion.
 
 
Future iterations of the LTM should also seek to include other emerging research on 
human learning and decision making and its potential effects on transfer outcomes. For example, 
Spicer, Mitchell
, Wil
ls, and Jones (2020)
 
suggest that humans protect their established causal 
beliefs instead of updating them when their predictions 
do not
 
match observed outcomes
,
 
violati
ng
 
existing prediction error models. 
Their findings c
ould be matched with 
the LTM
 
to 
di
scuss why in transfe
r space learners/agents 
do not
 
necessarily accurately update their beliefs 
regarding the value of their behavioral policies in the face of experience.
 
For example, one of the 
biases operating in type 2 processing systems could be a
 
disc
ounting of the effec
ts of failures for 
learning about the utility of 
Policy A
. When the learner enters the transfer environment then, not 
only does their new policy have to outperform 
Policy A
 
outright to convince the learner it is 
better for the task
, but
 
also overcome any b
ias of the learner ignoring failures of 
Policy A
 
in a 
protection of their prior beliefs. This is an intriguing idea that at least anecdotally fits with 
experience in real organizational 
environments and
 
seems to be worth further ex
plora
tion.
 
 
Another interesting possibility would be 
to combine
 
with other computational models 
that explore 
pertinent aspects of the transfer process that are not yet included in the present 
model. For example, the LTM currently assumes that trainees 
can
 
accur
ately p
erceive their 
environment in order to activate the relevant decision processes discussed here. This assumption 
can be relaxed by incorporating mechanisms in other models, such as Weichart, Turner, and 

nes d
ecision
 
making to understand how decisions 

The incorporation of similar mechanisms into the LTM would allow us to model how learners 
 
164
 
might interact with
 
thei
r envir
onment
al cues to activate the relevant behavioral scripts 
represented by the policies used in the terminology of reinforcement learning. One interesting 
interaction would likely occur with the ability to identify the relevant environmental cues
 
to f
ully 
re
alize the benefits of implementation intentions. As discussed previously, implementation 
intentions are described as if
-
then type rules where the learner applies the relevant response in 
the presence of the correct cue (Gollwitzer, 1999). For t
his m
echanis
m to operate, the individual 
must be able to recognize the cue and doing so requires paying sufficient attention to the relevant 
environmental factors. Therefore, there is likely a moderating effect of attention on the effects of 
implementation
 
inte
ntions 
within transfer environments.
 
 
Another frontier for the LTM will be to account for more and evolving behavioral 
options. Many tasks have specific ways 
they are
 
supposed to be carried out, to which the current 
version of the LTM is most applicab
le. H
owever,
 
many tasks are more open, allowing trainees 
greater discretion over how exactly they approach the task (e.g., Yelon & Ford, 1999). 
To 
incorporate many different behavioral options, the LTM should be expanded to utilize 
reinforcement principals
 
for 
multipl
e behaviors. The k
-
armed bandit approach used here is 
technically capable of assessing multiple policies at a time, but more sophisticated models exist 
(Sutton & Barto, 2018). Other reinforcement algorithms are likely better fits for different 
types
 
of tra
nsfer questions
,
 
and they should be systematically explored for that fit.
 
Similarly, it may be 
possible that different types of learning, reinforcement or otherwise, are better fits for the 
learning mechanisms occurring within either type 1 pro
cesse
s or ty
pe 2 processes during transfer 
events.
 
The present approach was chosen as a starting point as
 
historical research on animal 
learning and applied reinforcement learning models largely focuses on naïve learners (see Sutton 
& Barto, 2018 for a dis
cussi
on), wh
ile the 
specific question being addressed in the present paper 
 
165
 

However, as suggested in the CLARION model (Sun et al., 2005), the type of experiential 
learn
ing that lies at the heart of the reinforcement algorithms used in this paper (Sutton & Barto, 
2018) are proposed to fit with type 1 processes but not necessarily with type 2 learning 
processes
, although we are interested in more than 
the explicit inf
ormin
g of an individual 
regarding the usefulness of new KSAOs in the present case, thus tackling a different question 
than CLARION
. 
Further research and modeling to refine these mechanisms to best fit the 
transfer environment will be required.
 
 
In addition
, 
the
 
present paper has only focused on a single learning and transfer event, 
where a single old behavior must be overcome for transfer to occur. However, the development 
of individuals within organizations, and more broadly expertise, can be viewed as the
 
cons
tant 
breaking of these old habits and establishment of new ones (Ericsson, 2006; Olenick et al., in 
press). In traditional reinforcement learning problems, such as an agent discovering the most 
efficient way to navigate a maze, the agent generates sol
ution
s to its environment and learns their 
values over time (Sutton & Barto, 2018). In the same way, general employee development could 
be viewed as a series of pseudo
-
randomly generated solutions to organizational problems where 
the learner then chooses w
hich 
to apply to their particular work situation or not, over time 
developing preferences for some behavioral policies over others and requiring new policies to 
overcome that preference in order for transfer to occur. Through such an approach we could go 
b
eyond
 
the study of the transfer of a single learning event to better understand sequential 
learning events. Simultaneously, such models can account for changing environments (Sutton & 
Barto, 2018) which would open the LTM to application further to question
s of 
far transfer (
Beier 
& Kanfer, 2010
), and 
problems of adaptability (e.g., Baard et al., 2014).
 
 
166
 
 
A final key area for exploration, both within the present version of the LTM and across 
future versions will be to explore the many other potential combinat
ions 
of interventions and 
effects that were not in this paper. 
For example, once practice effects are refined, how 
might
 
they 
interact with implementation intentions? 
Much as we expected
,
 
initially
,
 
that improving 
engagement in type 2 processes would augme
nt ef
fects of implementation intentions but the 
model suggests that is incorrect, it would seem logical that both practice and implementations 
would be beneficial and augment each other. However, maybe once one effect is accounted for 
the other provides no
 
gain
 
in transfer outcomes, and therefore it would not be worth the effort 
and cost to utilize both in a training intervention. 
The LTM could provide such guidance for 
future 
investigations into 
these
 
interactive effects and therefore guidance for the effi
cient
 
practical application of research findings. It is to those more practical implications we now turn.
 
Practical Implications
 
 
Many practical implications for the individual models explored in this paper have been 
discussed throughout. However, there ar
e a f
ew overarching implications which warrant 
discussion. First, not only does the LTM and computational results have implications for how we 
measure transfer for research, it has implications for how we measure transfer for training 
evaluation. In this p
aper,
 
the outcomes tracked were at the behavioral and performance levels of 
the classic Kirkpatrick (1994) typology. To merely encourage organizations to evaluate training 
outcomes at these levels would be banal
, although they should do so more frequently 
than 
is 
currently the standard. What the modeling in this paper suggests further is that the timing of the 
measurement for these outcomes is of great importance. It is commonly stated in the research 
literature that the timing of measurements should be cho
sen b
ased on the timing of the 
phenomenon of interest (e.g., 
Hanges & Wang, 2012
), and this clearly pertains to the estimation 
 
167
 
of transfer outcomes in the LTM. Specifically, if transfer measurements are taken too early the 
outcomes of interest may not have
 
had 
a chance to emerge and stabilize which could lead to a 
drastic over or underestimate of the final effect of a training event. To make matters worse, the 
models here suggest that transfer may more likely emerge later than one might expect, causing 
an u
ndere
stimate of the effect of training, and therefore potentially leading an organization to 
incorrectly conclude their training was ineffective. Therefore, patience is urged in the timing of 
the collection of transfer data when possible to improve the fin
al es
timates of the effect of 
training.
 
 
In fact, the timing of every aspect of training appears to be of incredible performance. In 

he longer one waits to intervene, the 
harder it 
likely 
is to cr
eate 
lasting change

 
(pagination not yet assigned) due to the formation 
over time of an attractor due to the recurrent success of the targeted behavior. Their piece only 
applied a mathematical lens to training to make such a suggestion, and the present pap
er fu
rther 
demonstrates their point via modeling. In the initial exploration of the LTM in Study 1, we saw a 
drastic effect on training outcomes according to how long the pre and post training time frames 
ran
. What is occurring in the simulation is the ess
entia
l formation of the kinds of attractors 
Olenick et al. (in press) were discussing as the agent gained experience with their task. The 

sufficient to create a str
ong e
nough attractor that agents struggled to form new patterns unless 
given five times as many attempts to change that behavior. Such difficulties only become greater 
the longer the pre
-
training period is allowed to extend
, as we see in the difficulty of 
overc
oming 
implicit biases through training when those biases are the result of years or decades of 
experience (
Lai, Hoffman, & Nosek, 2013; Lai et al., 2016
)
. Although the exact number of trials 
 
168
 
likely does not map cleanly onto 
any given real
-
world task, 
the o
verall message is clear for the 
timing of training interventions: the sooner, the better. The advice for any practitioner choosing 
when to hold a key training event, at least regarding a task the trainees are already completing in 
some way, is to impl
ement
 
the intervention as soon as feasible as any delay is likely to make the 
task of causing permanent on
-
the
-
job change even more difficult.
 
 
Olenick et al. (in press) also suggest that the strength of the intervention will be critical in 
overcoming esta
blish
ed KSAOs, especially when they are long
-
held patterns. One way to 
increase the strength of a single training event should theoretically lie in stacking multiple kinds 
of best practices or 
training 
enhancers into a 
learning 
event when possible. For exa
mple,
 
maybe 
as a training designer you incorporate both spaced practice and implementation intentions, and 
from the present modeling you want to target the transfer environment to improve their use of 
type 2 cognitive processes.
 
Independently, 
all
 
these ad
ditio
ns should improve learning and 
transfer outcomes, so it seems logical that doing all of them would be even more beneficial. 
However, the modeling in this paper suggests that may not always be the case. Instead, some 
types o
f interventions may not be a
ble t
o effectively stack with each other to further improve 
outcomes and might 
even
 
interfere with one another. In such a case, adding extra apparent 
enhancements to a training event could result in decreased return on investmen
t for the event as 
energy is
 
wast
ed in implementing unhelpful tools. Therefore, training designers should think 
carefully about which such tools will best fit with their planned training event to enhance desired 
outcomes.
 
 
Finally
, the LTM suggests there m
ay be other individual diffe
rence
s
 
and environmental 
factors
 
to consider when choosing who might be a good candidate for a given training event. 

 
169
 
for training i
s assessed, which includes p
erson
al characteristics such as ability, attitudes, 
personality, and motivation, and if their work environment will facilitate the desired outcomes 
(Langdon, 1997; Noe, 2017; Rummler, 1996). 
Some of these characteristics are dir
ectly 
informed by the LTM. F
or ex
ample, we saw an interesting interplay between goals and the 
outcomes of training which suggests that individuals with extremely high goals might not be 
good fits for trainings which do not allow them to reach said goals. R
ather, the focus should be 
o
n ind
ividuals whose current goals match well with what the training is offering. 
Further, we 
know individuals who are learning, or mastery in other nomenclature, oriented are focused on 
increasing their ability on their targeted
 
tasks and this leads to imp
roved
 
performance outcomes 
over time (e.g., 
Dweck, 1986; Elliott, 1999; Payne, Youncourt, & 
Beaubein
, 200
), and part of 
doing so tends to be a greater willingness to explore the task for better solutions, leading to 
poorer perfo
rmance early in those tasks
 
but g
reater success over time
 
(e.g., 
Bell & Kozlowski, 
2008
). In a similar vein, the present model shows that moderate levels of exploration in response 

mes. Thus, the model 
reinfor
ces t
he potential importance of targeting individuals who are learning oriented for training 
interventions, or even adding a new measure directed specifically at their willingness to search 
for better task approaches in the face
 
of 
adversity. 
Finally, on t
he en
vironmental side, we want to 
ensure that not only do trainees have the theoretical opportunities to apply their training in the 
sense that the correct situations present themselves, we want to ensure those trainees have the
 
time and ability to think m
ore d
eeply about the situation and engage their type 2 cognitive 
processes to improve the chances that they will make the correct decision regarding whether or 
not to use their training.
 
 
170
 
Conclusion
 
 
The Learning Transfer Model 
introduced in this paper ha
s
 
four
 
central aims. First and 
foremost, it provides a 
formal 
process
-
oriented theory which has the potential to unify many 
current effects in the transfer literature under a single umbrella. Second, 
it 
further 
integrates 
multipl
e important theories across 
disci
plines both from within and outside of psychology
. 
Additionally
, 
the LTM
 
brings important 
formal
 
models of reinforcement learning, and dual 
process models of cognition further into organizational psychology. 
Finally, the LT
M 
was 
instantiated
 
into a co
mputa
tional model to provide a powerful tool for future theoretical 
development
 
and practical application
. 
The present work is not meant to be the final word on any 
of the theories incorporated into the LTM, or even be the final
 
word on the mechanisms driv
ing 
t
ransfer in organizational contexts over time. Instead, the LTM as presented here is meant to 
provide a plausible and parsimonious model of the transfer process to drive future research and 
practice. To that end, o
ver the co
urse of several virtual expe
rimen
ts, the overall generative 
sufficiency of the model was largely established
, although pieces of the model were falsified and 
subsequently revised
, and novel implications of the model were explored. Substantial work 
remains to fully validate the presen
t mod
el against real world observations, which will inevitably 
lead to various tweaks to the underlying mathematics driving the proposed mechanisms in the 
LTM. However, the model established in this paper represents a su
bstantial step in a formal 
process m
odel 
of the transfer process.
 
 
171
 
Table 
1
.
 
Model 1 Variables
 
Variable
 
Definition
 
a
 
Policy A
 
b
 
Policy B
 
R
a
 
True reward for 
Policy A
 
R
b
 
True reward for 
Policy B
 
Q
t
(
a
)
 
Value estimate for 
Policy A
 
at time 
t
 
Q
t
(
b
)
 
Value estimate for 
Polic
y B
 
at time 
t
 
R
ta
 
Reward received at time 
t
 
given 
Policy A
 
R
tb
 
Reward received at time 
t
 
given 
Policy B
 
Q
1
(
a
)
 
Initial value estimate for 
Policy A
 
Q
1
(
b
)
 
Initial value estimate for 
Policy B
 
P
t
 
Policy chosen at time 
t
 
E
 
Error rate in choosing most 
valua
ble policy, 
also referred to as exploration
 
 
S
2
 
Probability of activating System 
2
 
decision 
process
 
 
Z
t
(a)
 
Probability of choosing to apply 
Policy A
 
automatically in system 1
 
 
L
 
Number of times an agent has attempted their 
new policy in practice be
fore 
entering the 
transfer environment
 
 
I
 
Effect of forming an implementation intention 
to activate 
Policy B
 
in 
the presented situation
 
 
172
 
Table 
2
.
 
Model 1 Equations.
 
Equation
 
Definition
 

Value estimate at time 
t 
+ 1 for 
Policy A
 
where 
t
a
 
is the number of times 
Policy A
 
has 
been applied
 
 
Value estimate at time 
t 
+ 1 for 
Policy B
 
where 
t
b
 
is the number of times 
Policy B
 
has 
been applied
 
 
Policy chosen at 
t 
is the maximum expected 
value from policies 
a 
and 
b 
with a probability 
of 1 

 
E 
given the use of System 2
 
 
Probability of choosing to apply 
Policy A
 
automatically in system 1 as calculated from 
the number of tim
es that policy has been 
chosen out of poss
ible applications and 
accounting for implementation intentions
 
 
173
 
Table 
3
.
 
Overall results for practice effect on behavioral transfer and performance change in 
Model 1.
 
Number Practice attempts
 
Behavioral Transfer
 
Performance Chang
e
 
0
 
.4
7
 
.
43
 
25
 
.4
9
 
.
48
 
50
 
.
48
 
.
31
 
75
 
.5
0
 
.
33
 
100
 
.5
1
 
.
38
 
125
 
.5
4
 
.
54
 
150
 
.5
7
 
.
54
 
175
 
.
55
 
.
63
 
200
 
.5
5
 
.
54
 
 
174
 
Table 
4
.
 
Experimental comparisons of practice conditions to control for behavioral transfer and 
performance change
 
in Model 1.
 
Number Practice attempts
 
Behavioral Transfer
 
Performance Change
 
25
 
.2
5
 
.
01
 
50
 
.
08
 
.
04
 
75
 
.
51
 
.
02
 
100
 
.5
6
 
.
05
 
125
 
.
97
 
.
12
 
150
 
1.50
 
.
14
 
175
 
1.15
 
.
14
 
200
 
1.25
 
.
14
 
 
175
 
Table 
5
.
 
Initial policy value estimate effects 
on behavioral transfer and performance change in 
Model 1.
 
Initial 
Policy B
 
Estimate
 
Behavioral Transfer
 
Pre
-
Post Performance (d)
 
.00
 
.4
3
 
.
17
 
.05
 
.
42
 
.
21
 
.10
 
.4
4
 
.
23
 
.15
 
.
43
 
.
19
 
.20
 
.
44
 
.2
8
 
.25
 
.
45
 
.
43
 
.30
 
.
44
 
.
38
 
.35
 
.
43
 
.
33
 
.40
 
.
42
 
.3
0
 
.45
 
.4
5
 
.
27
 
.50
 
.
46
 
.3
5
 
.55
 
.
44
 
.
40
 
.60
 
.
45
 
.
28
 
.65
 
.
42
 
.2
7
 
.70
 
.
43
 
.
31
 
.75
 
.4
5
 
.36
 
.80
 
.4
2
 
.
30
 
.85
 
.4
5
 
.36
 
.90
 
.
45
 
.
31
 
.95
 
.
45
 
.
39
 
1.00
 
.
47
 
.
41
 
 
176
 
Table 
6
.
 
Implementation level effects on behavioral transfer and performance chan
ge in Model 
1.
 
Implementation Level
 
Behavioral Transfer
 
Pre
-
Post Performance
 
0
 
.43
 
.31
 
.05
 
.45
 
.17
 
.10
 
.47
 
.34
 
.15
 
.50
 
.41
 
.20
 
.51
 
.39
 
.25
 
.54
 
.52
 
.30
 
.56
 
.33
 
.35
 
.55
 
.40
 
.40
 
.57
 
.32
 
.45
 
.59
 
.50
 
.50
 
.60
 
.47
 
 
177
 
Table 
7
.
 
Mo
del 2 Variables.
 
Variable
 
Definition
 
G
t
(
a
)
 

Policy A
 
at time 
t
 
 
G
t
(
b
)
 

Policy B
 
at time 
t
 
 
C
 
Level of connected to group of co
-
learners
 
 
wQ
t
(a)
 
Weighted value estimat
e for 
Policy A
 
 
wQ
t
(b)
 
Weighted value estimate for 
Policy B
 
 
178
 
Table 
8
.
 
Model 2 Equations.
 
Equation
 
Definition
 

Calculation of the average value estimates of 
other transfer agents 1
-
N as the sum of all the 
value estimates for each agent, 
i, 
divided by 
the number of agents for 
Policy A
. 
 
 
Calculation of the average value estimates of 
other tra
nsfer agents 1
-
N as the sum of all the 
value estimates for each agent, 
i, 
divided by 
the number of agents for 
Policy B
. 
 
 
Weighted value estimation 
for 
Policy A
 
when 
N > 0
 
 
Weighted value estimation for 
Policy B
 
when 
N > 0
 
 
179
 
Table 
9
.
 
Effects of number of trainees on behavioral transfer and pre
-
post performance change 
in Model 2A
.
 
Trainees
 
Behavioral Transfer
 
Pre
-

 
1
 
.43
 
.28
 
2
 
.44
 
.52
 
3
 
.42
 
.54
 
4
 
.44
 
.68
 
5
 
.45
 
.78
 
6
 
.44
 
.91
 
7
 
.44
 
.86
 
8
 
.43
 
.84
 
9
 
.44
 
1.03
 
10
 
.43
 
.99
 
11
 
.44
 
1.09
 
12
 
.44
 
1.00
 
13
 
.44
 
1.19
 
14
 
.44
 
1.21
 
15
 
.43
 
1.21
 
16
 
.44
 
1.29
 
17
 
.44
 
1.24
 
18
 
.44
 
1.40
 
19
 
.44
 
1.42
 
20
 
.44
 
1.43
 
 
180
 
Table 
10
.
 
Connectedness effects on behavioral transfer and pre
-
post performance change in 
Model 2A.
 
Connectedness
 
Behavioral Transfer
 
Pre
-

 
.00
 
.44
 
1.12
 
.05
 
.44
 
1.12
 
.10
 
.43
 
.9
5
 
.15
 
.44
 
.84
 
.20
 
.44
 
1.06
 
.25
 
.44
 
1.06
 
.30
 
.43
 
.84
 
.35
 
.43
 
1.01
 
.40
 
.43
 
.91
 
.45
 
.44
 
1.05
 
.50
 
.43
 
.92
 
.55
 
.44
 
1.03
 
.60
 
.43
 
.95
 
.65
 
.44
 
1.10
 
.70
 
.44
 
.94
 
.75
 
.43
 
1.01
 
.80
 
.44
 
1.08
 
.85
 
.44
 
1.09
 
.90
 
.44
 
1.02
 
.95
 
.44
 
.96
 
1.00
 
.44
 
.88
 
 
181
 
Table 
11
.
 
Model 3 Variables.
 
Variable
 
Definition
 
T
 
Goal of target agent
 
 
Y
 
Performance of target agent
 
 
D
 
Difference between performance and goal
 
 
J
 
Decision mechanism, takes 0 if goal is met, 1 
if not
 
 
F
 
How much exploration increas
es when 

 
V
 
Threshold below which agent will not apply 
policy
 
 
182
 
Table 
12
.
 
Model 3 Equations.
 
Equation
 
Definition
 

Performance 
Y 
is the average of all previous
ly 
experienced rewards
 

Difference 
calculated as the difference 

performance
 
 
Error rate in choosing highest valued 
Policy 
A
s changed by comparison of current 
performance
 
to goal
 
 
183
 
Table 
13
.
 
Three
-
way interaction models for Experiment 4A.
 
 
Behavioral Transfer
 
Post Training Performance
 
Pre
-

d
 
Predictor
 
b
 

t
 
p
 
b
 

t
 
p
 
b
 

t
 
p
 
Constant
 
.280
 
 
554.85
 
< .001
 
.761
 
 
6282.61
 
< .
001
 
1.157
 
 
16.97
 
< .001
 
Intentions
 
.060
 
.026
 
20.2
8
 
< .001
 
.012
 
.020
 
16.9
2
 
< .001
 
.233
 
.016
 
.58
 
.560
 
Threshold
 
-
.706
 
-
.283
 
-
221.4
1
 
< .001
 
-
.107
 
-
.168
 
-
139.11
 
< .001
 
-
1.973
 
-
.125
 
-
4.5
8
 
< .001
 
Value Change
 
1.983
 
.649
 
507.8
7
 
< .001
 
.559
 
.721
 
595.95
 
< .001
 
1
4.149
 
.733
 
26.
80
 
< .001
 
Intentions*Threshold
 
-
.221
 
-
.015
 
-
11.84
 
< .001
 
-
.029
 
-
.008
 
-
6.43
 
< .001
 
-
.409
 
-
.004
 
-
.16
 
.871
 
Intentions*Value Change
 
.361
 
.020
 
15.7
9
 
< .001
 
.109
 
.024
 
19.87
 
< .001
 
2.597
 
.023
 
.84
 
.401
 
Threshold*Value Change
 
-
2.139
 
-
.111
 
-
86.6
5
 
< 
.001
 
-
.603
 
-
.123
 
-
101.61
 
< .001
 
-
11.008
 
-
.090
 
-
3.
30
 
.001
 
Intentions*Threshold*Value 
Change
 
-
.275
 
-
.002
 
-
1.90
 
.057
 
-
.128
 
-
.004
 
-
3.69
 
< .001
 
-
1.576
 
-
.002
 
-
.08
 
.936
 
Dfs for all models are 8, 296991
 
 
184
 
Table 
14
. Three
-
way interaction m
odels for Experiment 4B.
 
 
Behavioral Transfer
 
Post Training Performance
 
Predictor
 
b
 

t
 
p
 
b
 

t
 
p
 
Constant
 
.712
 
 
4602.11
 
< .001
 
.247
 
 
141606.60
 
< .001
 
Trainees
 
.000
 
-
.145
 
-
699.
16
 
< .001
 
-
.007
 
-
.199
 
-
373.1
3
 
< .001
 
Conformity
 
-
.024
 
-
.556
 
-
2690.
95
 
< .001
 
-
.476
 
-
.776
 
-
1434.90
 
< .001
 
Goals
 
.003
 
.068
 
325.
94
 
< .001
 
.058
 
.093
 
174.56
 
< .001
 
T
rainees*Conformity
 
.000
 
-
.055
 
-
267.85
 
< .001
 
-
.008
 
-
.076
 
-
141.4
3
 
< .001
 
Trainees*Goals
 
.000
 
-
.001
 
-
153.
55
 
< 
.001
 
.000
 
-
.001
 
-
81.51
 
< .001
 
Conformity*Goals
 
-
.004
 
-
.032
 
-
3.
19
 
.001
 
-
.090
 
-
.044
 
-
2.39
 
.017
 
Trainees*Conformity*Goals
 
.000
 
-
.007
 
-
33.35
 
< .001
 
-
.003
 
-
.009
 
-
17.2
6
 
< .001
 
Dfs for all models are 8, 4409991
 
 
185
 
Table 
15
.
 
Three
-
way interaction models for Experiment 4C.
 
 
Behavioral Transfer
 
Post Training Performance
 
Pre
-

d
 
Predictor
 
b
 

t
 
p
 
b
 

t
 
p
 
b
 

t
 
p
 
Constant
 
.223
 
 
2739.
29
 
< .001
 
.726
 
 
33146.55
 
< .001
 
.736
 
 
44.91
 
< 
.00
1
 
Conformity
 
-
.473
 
-
.670
 
-
1754.
87
 
< .001
 
-
.056
 
-
.369
 
-
779.08
 
< .001
 
-
2.690
 
-
.381
 
-
49.69
 
< 
.00
1
 
Goals
 
.031
 
.044
 
115.
48
 
< .001
 
-
.002
 
-
.016
 
-
34.1
9
 
< .001
 
-
.070
 
-
.010
 
-
1.29
 
.196
 
Value Change
 
.964
 
.553
 
1448.
33
 
< .001
 
.269
 
.711
 
1502.
67
 
< .001
 
12.945
 
.742
 
96.7
5
 
< 
.00
1
 
Conformity*Goals
 
-
.041
 
-
.018
 
-
46.54
 
< .001
 
.007
 
.014
 
29.
73
 
< .001
 
.260
 
.011
 
1.4
6
 
.146
 
Conformity*Value 
Change
 
-
2.181
 
-
.379
 
-
991.81
 
< .001
 
-
.572
 
-
.459
 
-
96
9.
18
 
< .001
 
-
27.448
 
-
.476
 
-
62.1
1
 
< 
.00
1
 
Goals*Value Change
 
-
.272
 
-
.047
 
-
123.
84
 
< .001
 
.008
 
.006
 
13.10
 
< .001
 
.940
 
.016
 
2.1
3
 
.034
 
Conformity* 
Goals*Value Change
 
.607
 
.032
 
83.59
 
< .001
 
.010
 
.002
 
5.
27
 
< .001
 
-
.673
 
-
.004
 
-
.46
 
.645
 
Dfs for all models are 8, 
661491
 
 
186
 
Table 
16
.
 
Three
-
way interaction models for Experiment 4D.
 
 
Behavioral Transfer
 
Post Training Performance
 
Pre
-

d
 
Predictor
 
b
 

t
 
p
 
b
 

t
 
p
 
b
 

t
 
p
 
Constant
 
.145
 
 
3982.6
5
 
< .001
 
.707
 
 
120782.0
7
 
< .0
01
 
-
.137
 
 
-
176.95
 
< .001
 
Type 2
 
.288
 
.542
 
2386.4
5
 
< .001
 
.014
 
.302
 
743.
90
 
< .001
 
.699
 
.608
 
272.84
 
< .001
 
Conformity
 
-
.314
 
-
.593
 
-
2608.9
1
 
< .001
 
-
.016
 
-
.330
 
-
811.81
 
< .001
 
-
.763
 
-
.663
 
-
297.75
 
< .001
 
Goals
 
.036
 
.068
 
297.86
 
< .001
 
.002
 
.038
 
92.34
 
< .001
 
.0
88
 
.077
 
34.4
8
 
< .001
 
Type2*Conformity
 
-
.580
 
-
.331
 
-
1456.80
 
< .001
 
-
.029
 
-
.185
 
-
454.8
9
 
< .001
 
-
1.401
 
-
.369
 
-
165.6
 
< .001
 
Type 2*Goals
 
.053
 
.030
 
133.9
1
 
< .001
 
.003
 
.017
 
41.4
2
 
< .001
 
.143
 
.038
 
16.88
 
< .001
 
Conformity*Goals
 
-
.061
 
-
.035
 
-
152.4
0
 
< .001
 
-
.003
 
-
.020
 
-
48.2
1
 
< 
.001
 
-
.141
 
-
.037
 
-
16.62
 
< .001
 
Type 
2*Conformity*Goals
 
-
.070
 
-
.012
 
-
53.06
 
< .001
 
-
.004
 
-
.007
 
-
17.09
 
< .001
 
-
.184
 
-
.015
 
-
6.59
 
< .001
 
Dfs for all models are 8, 4630491
 
 
187
 
Figure 
1
. 
Conceptual model for initial LTM.
 
 
188
 
Figure 
2
. 
Behavioral Transfer for exploration of policy values in Model 1.
 
 
189
 
Figure 
3
. 

 
190
 
Figure 
4
. 
Behavioral Transfer for exploration of policy value changes in Model 1.
 
 
191
 
Figure 
5
. 

 
Note: the white rectangle on t
he right is because the v
alue is undefined since their performance 
pretraining was always perfect there is no variability on which to calculate an effect size.
 
 
192
 
Figure 
6
. 
Behavioral Transfer for exploration of burn
-
in and 
transfer times in Model 1.
 
 
193
 
Fig
ure 
7
. 
Performance change for exploration of burn
-
in and transfer times in Model 1.
 
 
194
 
Figure 
8
. 
Predicting behavioral transfer from type 2 processing likelihood in Model 1
.
 
 
195
 
Figure 
9
. 
Predicting performance change from type 2 processing likelihood in Model 1.
 
 
196
 
Figure 
10
A
-
D. 
Example transfer trajectories for Model 1.
 
A.
 
 
B.
 
 
C.
 
 
D.
 
 
197
 
Figure 
11
. 
Exploration rate effect on behavioral 
transfer in Model 1.
 
 
198
 
Figure 
12
. 
Exploration rate effect on performance change in Model 1.
 
 
199
 
Figure 
13
. 
Type 2 likelihood vs 
implementation intention experimental effect on behavioral 
transfer in Model
 
1.
 
 
200
 
Figure 
14
. 
Type 2 likelihood vs implementation intention experimental effect on performance 
change in Model 1.
 
 
201
 
Figure 
15
.
 
Type 2 likelihood vs implementation intention experimental effect on beh
avioral 
transfer in Model 1 heat map.
 
 
202
 
Figure 
16
. 
Type 2 likelihood vs implementation intention experimental effect on post 
training 
performance in Model 1 heat map.
 
 
203
 
Figure 
17
. 
Type 2 likelihood vs i
mplementation intention experimental effect on performance 
change in Model 1 heat map.
 
 
204
 
Figure 
18
. 
Proposed 
conceptual model for LTM with Social Learning.
 
 
205
 
Figure 
19
. 
Heatmap of interaction effect of
 
number of trainees and connectedness on behavioral 
transfer in Model 2A.
 
 
206
 
Figure 
20
.
 
Number o
f trainees and level of imitation predicting behavioral transfer in Model 2B 
(replication level).
 
 
207
 
Figure 
21
. 
Number of trainees and level of imitation predicting post training performance in 
Model 2B (replication level).
 
 
208
 
Figure 
22
. 
Number of trainees and level of imitation predicting pre
-
post training performance in 
Model 2B (cond
ition level).
 
 
209
 
Figure 
23
. 
Heatmap of trainees and imitation predicting behavioral transfer i
n Model 2B.
 
 
210
 
Figure 
24
. 
Heatmap of trainees and imitation predicting post training performance in Model 2B.
 
 
211
 
Figure 
25
. 
Heatmap of trainees and imitation predicting pre
-
post performance change in 
Model 
2B.
 
 
212
 
Figure 
26
. 
Number of trainees and level of conformity predicting behavioral transfer in Model 
2C (rep
lication level).
 
 
213
 
Figure 
27
. 
Number of trainees and level of conformity 
predicting post training performance in 
Model 2C (replication level).
 
 
214
 
Figure 
28
. 
Number of trainees and level of conformity pre
dicting pre
-
post performance change 
in Model 2C (condition level).
 
 
215
 
Figure 
29
. 
Heat map of number of trainees and level of conformity predicting behavioral transfer 
in Model 2C.
 
 
216
 
Figure 
30
. 
Heat map of
 
number of trainees and level of conformity predicting post training 
performance in Model 2C.
 
 
217
 
Figure 
31
. 
Heat map of number of trainees and level of conformity predicting pre
-
post 
performance change in Model 2C.
 
 
218
 
Figure 
32
. 
Conceptual model for LTM 
including self
-
regulation.
 
 
219
 
Figure 
33
. 
Goal level and exploration rate change predicting post training performance in Model 
3A (replication level).
 
 
220
 
Figure 
34
. 
Goal level and exploration rate change p
redicting behavioral transfer in Model 3A 
(replication level).
 
 
221
 
Figure 
35
. 
Goal level and exploration rate change predicting pre
-
post performance change in 
Model 3A (condition le
vel).
 
 
222
 
Figure 
36
. 
Heat
 
map of goal level and exploration rate change predicting behavioral transfer in 
Model 3A.
 
 
223
 
Figure 
37
. 
Heat map of goal level and exploration rate change predicting post training
 
performance in Model 3A.
 
 
224
 
Figure 
38
. 
Heat map of goal level and exploration rate change predicting pre
-
post performance 
change in Model 3A.
 
 
225
 
Figure 
39
. 
Observed post training performance by goal level
 
in Model 3B
-
1.
 
 
Note scale intenti
onally not starting at 0 to show sudden shift in percentages more clearly.
 
 
226
 
Figure 
40
. 
Observed behavioral transfer by goal level in Model 3B
-
1.
 
 
227
 
Figure 
41
. 
Observed 
pre
-
post performance change by 
goal level in Model 3B
-
1.
 
 
228
 
Figure 
42
. 
Goal level and policy value change predicting behavioral transfer in Model 3B
-
1 
(replication level).
 
 
229
 
Figure 
43
. 
Goal level and pol
icy value change predicting pos
t training performance in Model 
3B
-
1 (replication level).
 
 
230
 
Figure 
44
. 
Goal level and policy value change predicting pre
-
post performance change in Model 
3B
-
1 (condition level).
 
 
231
 
Figure 
45
. 
Heat map of goal level and policy value change predicting behavioral transfer in 
Model 3B
-
1.
 
 
232
 
Figure 
46
. 
Heat map of goal level and policy value change predicting post training performance 
in 
Model 3B
-
1.
 
 
233
 
F
igure 
47
. 
Heat map of goal level and policy value change predicting pre
-
post performance 
change in Model 3B
-
1.
 
 
234
 
Figure 
48
. 
Observed post training performance by goal level in Model 3B
-
2.
 
 
Note scale int
entionally not starting at 0 to show sudden shift in percentages more clearly.
 
 
235
 
Figure 
49
. 
Observed behavioral transfer by goal level in Model 3B
-
2.
 
 
236
 
Figure 
50
. 
Observed pre
-
post perform
ance change by g
oal level in Model 3B
-
2.
 
 
237
 
Figure 
51
. 
Goal level and policy value change predicting behavioral transfer in Model 3B
-
2 
(replication level).
 
 
238
 
Figure 
52
. 
Goal level and policy value 
change predicting post
 
training performance in Model 
3B
-
2 (replication level).
 
 
239
 
Figure 
53
. 
Goal level and policy value change predicting pre
-
post performance change in Model 
3B
-
2 (condition level).
 
 
240
 
Figure 
54
. 
Heat map of 
goal level and policy value change predicting behavioral transfer in 
Model 3B
-
2.
 
 
241
 
Figure 
55
. 
Heat map of goal level and policy value change predicting post training performance 
in Model 3B
-
2.
 
 
242
 
Figure 
56
. 
Heat map of goal level and policy value change predicting pre
-
post performance 
change in Model 3B
-
2.
 
 
243
 
Figure 
57
. 
Observed and predicted behavioral transfer from threshold level in Model 3C.
 
 
244
 
Figure 
58
. 
Observed and predicted post training performance from threshold level in Model 3C.
 
 
Note scale intentionally not starting at 0 to show shift in percentages more clearly.
 
 
245
 
Figure 
59
. 
Observed an
d predicted pre
-
post performance
 
change from threshold level in Model 
3C.
 
 
246
 
Figure 
60
. 
Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting behavioral transfer in 
Experiment 4A (replication level).
 
 
247
 
Figure 
61
. 
Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting post training performance in Experiment 4A (replication level).
 
 
Note: Y axis does
 
not
 
start at 0 to better highlight eff
ect
 
 
248
 
Figure 
62
. 
Three
-
way interaction of engagement thresholds, implementation intentions, and 
value change predicting pre
-
post training performance change in Experiment 4A (
condition 
level).
 
 
249
 
Figure 
63
. 
Heat map of three
-
way interaction of engagement thresholds, implementation 
intentions, and value change predicting behavioral transfer in Experiment 4A (replication 
level).
 
 
250
 
Figure 
64
. 
Heat map of three
-
way interaction of en
gagement thresholds, implementation 
intentions, and value change predicting post training performance in Experiment 4A 
(replication level).
 
 
251
 
Figure 
65
. 
Heat 
map of three
-
way interaction of engagement thresholds, implementation 
i
ntentions, and value change predicting pre
-
post training performance change in Experiment 
4A (condition level).
 
 
252
 
Figure 
66
. 
Three
-
way interaction of 
number of trainees, conformity, and goals predicting 
behavioral transfer in Ex
periment 4B (replication level).
 
 
253
 
Figure 
67
. 
Three
-
way interaction of number of trainees, conformity, and goals predicting post 
training performance 
in Experiment 4B (replication level).
 
 
Note: Y axis does not start at 0 to
 
better illustrate effect.
 
 
254
 
Figure 
68
. 
Heat maps of three
-
way interaction of number of trainees, conformity, and goals 
predicting behavioral 
transfer in Experiment 4B (replication level).
 
 
255
 
Figure 
69
. 
H
eat maps of three
-
way interaction of number of trainees, conformity, and goals 
predicting post training performance in Experiment 4B (replication level).
 
 
256
 
Figure 
70
. 
Three
-
way interaction of conformity, goals, and value change
 
predicting behavioral 
transfer in Experiment 4C (replication level).
 
 
257
 
Figure 
71
. 
Three
-
way interaction of conformity, goals, and 
value change predicting post training 
performance in Experiment 4C (replication level).
 
 
Not
e: Y axis does
 
not
 
start at 0 to better highlight effect
 
 
258
 
Figure 
72
. 
Three
-
way interaction of conformity, goals, and value 
change predicting pre
-
post 
training performance change in Experiment 4C (condition level).
 
 
259
 
Figure 
73
. 
Heat map of three
-
way interaction of conformity, goals, and value change predicting 
behavioral transfer in Experiment 4C (r
eplication level).
 
 
260
 
Figure 
74
. 
Heat map of three
-
way interaction of conform
ity, goals, and value change predicting 
post training performance in Experiment 4C (replication level).
 
 
261
 
Figure 
75
. 
Heat map of three
-
way interaction of conformity, goals, and value change predicting 
pre
-
post training perfor
mance change in Experiment 4C (condition level).
 
 
262
 
Figure 
76
. 
Three
-
way 
interaction of type 2 likelihood, conformity, and goals predicting 
behavioral transfer in Experiment 4D (replication level).
 
 
263
 
Figure 
77
. 
Three
-
way interaction of type 2 likelihood, conformity, and goals predicting post 
training performance in Experiment 4D (replication level).
 
 
Note: Y axis does
 
not
 
start at 0 to better highlight effect
 
 
264
 
Figure 
78
. 
Three
-
way interaction of type 2 likelihood, conformity, and goals predicting pre
-
post 
training performance change in Experiment 4D (condition level).
 
 
265
 
Figure 
79
. 
Heat map of three
-
way interaction type 2 likelihood, conform
ity, and goals predicting 
behavioral transfer in Experiment 4D (replication level).
 
 
266
 
Figure 
80
. 
Heat map of three
-
way interaction of type 2 likelihood, conformity, and goals 
predicting post training performance in Experiment 4D
 
(replication level).
 
 
267
 
Figure 
81
. 
Heat map of 
three
-
way interaction of type 2 likelihood, conformity, and goals 
predicting pre
-
post training performance change in Experiment 4D (condition level).
 
 
268
 
AP
P
ENDICES
 
 
269
 
Appendix A
:
 
Study 1 Environment and Code
 
Figure 
82
. 
Snapshot of the modeling environment for Study 1 in NetLogo.
 
 
270
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1
_
choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward received
 
on most recent attempt with 
Policy B
 
  
task_successes ;number 
of times successful at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_rat
e
 
; percentage of times successful in post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks into transfer time
 
  
]
 
 
globals [
 
  
mean_value_estimate_a ;mean of agent value estim
a
tes for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value 
estimates for 
Policy B
 
  
mean_overall_task_success ;task rate of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining only all agents
 
  
mean_posttraining_succes
s
_rate ;success rate posttraining only all agents
 
  
mean_behavi
oral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment all 
agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy value change
 
]
 
 
to setup
 
  
clear
-
all ;cle
a
rs environment from previous simulation
 
  
create
-
trainees num
-
trainees [ ;place specified number of agents at center of grid
 
    
set value_estimate_a initial_policy_a_estimate ;set initial value estimate for 
Policy A
 
for each trainee
 
    
set value_estimat
e
_b initial_policy_b_estimate ;set initial value estimate for 
Policy B
 
for each trainee
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set attempts_policy_b 0 ;number time applied 
Policy B
 
initial set t
o 0
 
    
set task_succ
e
sses 0 ;number of task successes initial set to 0
 
    
set pretraining_success_rate 0 ;success rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;number of successes for post training initial set to 0
 
    
set pos
ttraining_success_rat
e
 
0 ;success rate in posttraining environment initial set to 0
 
    
set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 
0
 
breed [trai
nees trainee] ;types o
f agents allowed in environment
 
 
Algorithm 
13
. NetLogo Code for Study 1 Model
 
 
271
 
  
]
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if
 
true_policy_b_reward
 
> 1 [set true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training] ;ca
ll subroutine to save
 
post 
training variables
 
  
if ticks = (burn_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task 
during burn in period
 
  
if tick
s > burn_in [trainees
-
transfer] ;call subroutine for trainee decisions post training
 
  
if ticks = burn_in [save
-
burn
-
in] ;call subroutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used to track sim 
functioning
 
end
 
 
to t
r
ainees
-
burn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a random 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set reward_a 0]
 
    
set attempts_po
l
icy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calcul
ate all global variables used to track sim functioning
 
  
set mean_value_estimate_a mean [
v
alue_estimate_a] of trainees
 
  
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
  
set me
a
n_posttraining_success_rate mean [posttraining_success_rate] of trainees
 
  
set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to train
ees
-
transfer ;call routine to choose which system will drive task
 
  
system
-
choose
 
  
ask t
r
ainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees 
[set posttraining_success_rate (post_training_successes / 
(transfer_time_count + .000001)
)
]
 
 
272
 
end
 
 
to system
-
choose ;decide if system2 will intervene, if not, rely on system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < syste
m2_activation_liklihood [system2_decision]
 
  
if system_choose >= system2_activation_likli
h
ood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agent makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + 
practice_attempts + .000001)) 
-
 
implementation_intention)
 
;update habitual decision rate ;note: 
all additions of .000001 are to avoid divisions by 0, number small so as n
ot to affect simulation
 
  
let choose_a random 100 / 100 ;generate random number to determine which policy to 
implement
 
    
ifelse choose_a < sy
s
tem1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success
_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task 
s
uccess
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_p
o
licy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 
0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a +
 
.000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attemp
ts_policy_a + 1 ;update count on 
Policy A
 
choice
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
 
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_succe
sses (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on ta
s
k 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value
_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
 
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate
_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set attem
p
ts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
273
 
 
to 
system2_decision ;default to system 2 using highest value estimated policy except at 
some error rate
 
  
let e
-
greedy random 100 / 100
 
    
ifelse e
-
greedy < exploration
_
rate [ run_low_value ] [ run_high_value ]
 
end
 
 
to save
-
burn
-
in ;save pretraining performa
nce
 
  
ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest es
t
imated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 
100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successe
s
 
(task_successes + 1) ;update counts on task success
 
        
set post_training_successes 
(post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_es
t
imate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_est
imate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a 
(
value_estimate_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a)))
 
;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
c
hoice
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if s
uccessful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts o
n
 
task success
 
        
set post_training_successes (post_training_successes + 1) ;update c
ounts on task 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (att
e
mpts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_pol
i
cy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if val
ue_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
to run_high_v
a
lue ;subroutine to choose and execute policy with highest estimated value
 
 
274
 
   
ifelse value_estimate_
a >= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set rewar
d
_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts o
n task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_polic
y
_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (att
empts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 a
n
d update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_pol
icy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attem
p
ts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
    
]
 
    
]
 
    
[let success_b r
andom 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
   
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_trainin
g_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choic
e
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b
 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
       
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b +
 
1 ;upd
a
te count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
training ;save post training performance variables
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + 
.000001))]
 
  
ask trainees [set behavioral_trans
fer_rat
e
 
(attempts_policy_b / (transfer_time + 
.000001))]
 
end
 
 
275
 
Appendix B
: Study 2A Environment and Code
 
Figure 
83
. 
Snapshot of the modeling environment for Study 2A in NetLogo.
 
 
276
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated valu
e
 
of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;reward re
c
eived on most recent attempt with 
Policy A
 
  
reward_b ;reward
 
received on most recent attempt with 
Policy B
 
  
task_successes ;number of times successful at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretraining
_
success_rate ;success rate pretraining only
 
  
posttraining_su
ccess_rate ; percentage of times successful in post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks into transfer
 
time
 
  
other_agent_estimate_a ;value estimate of other agents
 
in model for 
Policy A
 
  
other_agent_estimate_b ;value estimate of other agents in model for 
Policy B
 
  
grouped_value_estimate_a ;combined value estimate of target agent and other agents for 
Pol
i
cy A
 
  
grouped_value_estimate_b ;combined value estimate of t
arget agent and other agents for 
Policy B
 
  
]
 
 
globals [
 
  
mean_value_estimate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Policy B
 
 
mean_overall_task_success ;task rate of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining only all agents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rate ;rat
e
 
of choosing 
Policy B
 
in transfer environment all 
agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy value change
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ ;place speci
f
i
ed number of agents at center of grid
 
    
set value_estimate_a initial_policy_a_estimate ;set initial value estimate for 
Policy A
 
for each trainee
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value estimate for 
Policy B
 
for each traine
e
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set attempts_policy_b 0 ;number time applied 
Policy B
 
initial set to 0
 
breed [trainees trainee] ;types of agents allowed
 
in environment
 
 
Algorithm 
14
.
 
NetLogo Code for Study 2A Model
 
 
277
 
    
set task_successes 0 ;number of task successes initial set to 0
 
    
set pretraining_success_rate 0 
;
s
uccess rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;number of successes for post training initial set to 0
 
    
set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0
 
    
set behavioral_tran
s
f
er_rate 0 ;percentage of time choosing trained policy initial set to 
0
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if true_policy_b_reward > 1 [set true_policy_b_reward 1]
 
  
if tru
e
_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training] ;call subroutine to save post 
training variables
 
  
if tick
s = (bur
n
_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task 
during burn in period
 
  
if ticks > burn_in [trainees
-
transfer] ;call subroutine for tr
ainee de
c
isions post training
 
  
if ticks = burn_in [save
-
burn
-
in] ;call subroutine to save pretraining performance
 
  
ifelse num
-
trainees > 1 [pool_experiences] [no_pool_experiences] ;set group estimate 
depending on if more than 1 agent or not
 
  
update
-
glob
als ;cal
l
 
subroutine to calculate all global variables used to track sim 
functioning
 
end
 
 
to trainees
-
burn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a random 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward
_a 1
 
   
set task_successes (task_successes + 1)]
 
      
[set reward_a 0]
 
    
set attempts_policy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;c
alculate
 
all global variables used to track sim functioning
 
  
set mean_value_estimate_a mean [value_estimate_a] of trainees
 
  
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / 
ticks
 
  
s
et mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
  
set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees
 
  
set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees
 
 
278
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will drive task
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask trainees [set be
havioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [se
t
 
posttraining_success_rate (post_training_successes / 
(transfer_time_count + .000001))]
 
end
 
 
to system
-
choose ;decide if system2 will intervene, if not, rely o
n system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_
a
ctivation_liklihood [system2_decision]
 
  
if system_choose >= system2_activation_liklihood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agent makes automati
c decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_po
l
icy_a + attempts_policy_b + 
practice_attempts + .000001)) 
-
 
implementation_intention) ;update habitual decision rate ;note: 
all additions of .000001 are to avo
id divisions by 0, number small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;
g
enerate random number to determine which policy to 
implement
 
    
ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, deter
mine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful 
r
eceive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) 
;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count
 
on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy valu
e
 
estimate
 
        
s
et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
    
]
 
  
]
 
    
[let success
_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
 
set post_tra
ining_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
 
279
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b 
 
+ .000001)) * 
(rewa
rd_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001
)
) * 
(reward_b 
-
 
val
ue_estimate_b))) ;update value estimate for 
Policy B
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 using highest value estimated policy excep
t
 
at 
some error rate
 
  
let e
-
greedy random 100 / 100
 
    
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_value ]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretraining_success_rate (task_successes / (burn_in + .0
0
0001))]
 
end
 
 
to 
run_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_p
o
licy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on t
ask 
success
 
        
set attempt
s
_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuc
c
essful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if val
ue_estimate_a < 0 [set value_es
t
imate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_
reward [set reward_b 1 ;if succ
e
ssful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy
_b attempts_policy_b + 1 ;updat
e
 
count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful 
set reward to 0 and update poli
c
y value estimate
 
 
280
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b
 
0]
 
    
set attempts_policy_b a
t
tempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
to run_high_value ;subroutine to choose and execute policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
ch
o
sen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_succes
ses + 1) ;update counts on task success
 
        
set post_training_successes (post_training_succ
e
sses + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (valu
e_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;updat
e
 
value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimat
e_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value esti
m
ate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
    
]
 
 
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ife
l
se success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
succe
s
s
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b
  
+ .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[se
t
 
reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .00000
1)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estima
t
e_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
training ;save
 
post training performance variables
 
  
ask trainees [set posttraining_success_rate (post_traini
n
g_successes / (transfer_time + 
.000001))]
 
 
281
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + 
.000001))]
 
end
 
 
to pool_experiences ;pool experiences from all agents for decision making
 
  
ask trainees [set other_agent_estimate
_
a (mean [value_estimate_a] of other traine
es)]
 
  
ask trainees [set other_agent_estimate_b (mean [value_estimate_b] of other trainees)]
 
  
ask trainees [set grouped_value_estimate_a (((1 
-
 
connectedness)*(value_estimate_a))+(connectedness * other_agent_esti
m
ate_a))]
 
  
ask trainees [set grouped_value
_estimate_b (((1 
-
 
connectedness)*(value_estimate_b))+(connectedness * other_agent_estimate_b))]
 
end
 
 
to no_pool_experiences ;if only 1 agent then group estimate is equal to personal estimate
 
  
ask trainees [set g
r
ouped_value_estimate_a (value_estimate_a)]
 
  
ask trainees [set grouped_value_estimate_b (value_estimate_b)]
 
end
 
 
282
 
Appendix C
: Study 2B Environment and Code
 
Figure 
84
. 
Snapshot of the modeling environment for Study 2B in NetLogo.
 
 
283
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
syste
m1_choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_po
l
icy_b ;number time applied 
Policy B
 
  
reward_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward receiv
ed on most recent attempt with 
Policy B
 
  
task_successes ;number of times successful at task overall
 
  
post_training_successes ;n
u
mber of times successful only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_r
ate ; percentage of times successful in post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transf
e
r environment
 
  
transfer_time_count ; ticks into transfer time
 
  
chose_b ;track behavioral choice of last task attempt, 0 = ch
ose a, 1 = chose b
 
  
other_success_rate ;success rate of most successful other trainee
 
  
imitate_choice ;track decision to imitat
e
 
on each time step
 
  
other_chose_b ;behavioral choice of most successful other trainee
 
  
]
 
 
globals [
 
  
mean_value_estimate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Policy B
 
  
mean_overall_ta
s
k_success ;task rat
e of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining only all agents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rate ;rate of choosing 
Pol
i
cy B
 
in transfer en
vironment all agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy value change
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy random
-
xcor random
-
ycor
 
;place specified nu
mber of 
agents at random coordinates
 
    
set value_estimate_a initial_policy_a_estimate ;set initial value estimate for 
Policy A
 
for each 
trainee
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value estimate for 
Policy 
B
 
for each 
trainee
 
 
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set attempts_policy_b 0 ;number time applied 
Policy B
 
initial set to 0
 
    
set task_successes 0 ;number of task successes initial set to 0
 
    
set pretrainin
g
_success_rate 0 ;su
ccess rate pretraining only initial set to 0
 
breed [trainees trainee] ;types of agents allowed in environment
 
 
Algo
rithm 
15
.
 
NetLogo Code for Study 2B Model
 
 
284
 
    
set post_training_successes 0 ;number of successes for post training initial set to 0
 
    
set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0
 
    
se
t
 
behavioral_transfe
r_rate 0 ;percentage of time choosing trained policy initial set to 0
 
    
set chose_b 0 ;set choice tracker to default of 
Policy A
 
    
set other_success_rate 0 ;setup success rate of most successful other trainee
 
    
set other_chose_b 0
 
;setup choice made 
by other most successful trainee
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if true_policy_b_reward > 1 [set true_policy_b_reward 1]
 
  
if true_policy_b_reward <
 
0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training]
 
;call subroutine to save post training 
variables
 
  
if ticks = (burn_in + transfer_tim
e
) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task during burn 
in period
 
  
if 
ticks > burn_in [trainees
-
transfer] ;call subroutine for trainee decisions post traini
n
g
 
  
if ticks = burn_in [save
-
burn
-
in] ;call subroutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used to track 
sim functioning
 
end
 
 
to trainees
-
burn
-
in ;agents engage in work task during burn in
 
  
a
sk trainees [let success_a random 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set reward
_a 0]
 
    
set attempts_policy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (valu
e
_estimate_a + ((1 / attempts_policy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calculate all global variables used to track sim functioning
 
  
set mea
n_value_estimate_a mean [value_estimate_a] of trainees
 
  
set mean_value_estimate_b mea
n
 
[value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
  
set mean_posttraining_success_rate mean [posttraining_success_rate]
 
of trainees
 
  
set mean_behaviora
l_transfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will drive task
 
 
285
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask tra
i
nees [set behavioral_transfer_ra
te (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_count + 
.000001))]
 
end
 
 
to system
-
choose ;decide if system2 will intervene, i
f
 
not, rely on system 1
 
  
ask tra
inees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_activation_liklihood [system2_decision]
 
  
if system_choose >= system2_activation_liklihood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agent m
a
kes automatic decision about whi
ch policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + 
practice_attempts + .000001)) 
-
 
implementation_intention) ;update habitual decision rate ;note: 
all additions of .0000
0
1 are to avoid divisions by 0, n
umber small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;generate random number to determine which policy to 
implement
 
    
ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
c
hosen, 
determine if successful
 
 
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_su
c
cesses + 1) ;update counts on ta
sk success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;upd
a
te value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
s
et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value es
t
imate for 
Policy A
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update 
choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse su
c
cess_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;upd
ate counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
   
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set rewa
r
d_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
286
 
        
s
et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set attempts_policy_b a
t
tempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update 
choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 using highest value estimated policy except at some 
error rate
 
  
ifelse num
-
trainees > 1 [
 
    
r
un
-
imitate ;have trainee choose if it will imitate or not if there are other trai
nees
 
    
if imitate_choice = 0 [let e
-
greedy random 100 / 100 ;if not imitating run egreedy as normal
 
      
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_va
l
ue ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run choice with some degree of err
or
 
    
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretraining_success_
r
ate (task_successes / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 
;if 
Policy A
 
chosen, determine if succes
s
ful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (pos
t_training_successes + 1) ;update counts
 
on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimat
e_a))) ] ;update value estimate for 
Poli
c
y A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;up
date value estimate for 
Policy A
 
      
i
f
 
value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 10
0 / 100 ;if 
Policy B
 
chosen, determine i
f
 
successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_succes
ses (post_training_successes + 1) ;updat
e
 
counts on task success
 
 
287
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value
_estimate_b))) ] ;update value estimate 
f
or 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b +
 
1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to run_high_value ;subroutine to choose and execute p
o
licy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a ran
dom 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
    
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_
successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
s
et value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_est
imate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;upda
t
e count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let succe
ss_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive rew
a
rd
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_t
raining_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy 
B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(re
ward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b +
 
1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
288
 
 
to save
-
post
-
training ;save post training performance variables
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + 
.
000001))]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))]
 
end
 
 
to run
-
imitate ;make imitate decision based on specified rate and execute
 
  
let imitate_yes random 100 / 100
 
  
ifelse imitate_yes <= imitate [set 
i
mitate_choice 1] [set imitate_choice 0]
 
  
set other_chose_b [chose_b] of other trainees with
-
max [posttraining_success_rate]
 
  
if imitate_choice = 1 [
 
    
ifelse other_chose_b = 0 [let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if 
successfu
l
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on
 
task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy 
A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if v
a
lue_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if s
u
ccessful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
    
set post_training_successes (post_training_successes + 1) ;update c
o
unts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .
000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for
 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) *
 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
  
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
      
]
 
end
 
 
289
 
Appendix D
: Study 2C Environment and Code
 
Figure
 
85
. 
Snapshot of the modeling environment for Study 2C in NetLogo.
 
 
290
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_choose_a ;liklihood of choosing 
P
o
licy A
 
as habitual response
 
  
at
tempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward received on most recent attempt with 
Polic
y
 
B
 
  
task_successes ;number of times successful at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_rate ; percentage of times successful
 
in 
post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks into transfer time
 
  
chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b
 
  
other_success_r
a
te 
;success rate of most successful other trainee
 
  
conform_choice ;track decision to conform on each time step
 
  
other_chose_b ;behavioral choice of most successful other trainee
 
  
]
 
 
globals [
 
  
mean_value_estimate_a ;mean of agent value estimates for 
P
o
licy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Policy B
 
  
mean_overall_task_success ;task rate of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pret
raining only all agents
 
  
mean_posttraining_success_rate ;su
c
cess rate posttraining only all agents
 
  
mean_behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment all agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting fo
r policy value change
 
]
 
 
to setup
 
  
clear
-
all ;clears enviro
n
ment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy random
-
xcor random
-
ycor ;place specified number of 
agents at random coordinates
 
    
set value_estimate_a initial_policy_a_esti
mate ;set initial value estimate for 
Policy A
 
for each 
train
e
e
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value estimate for 
Policy B
 
for each 
trainee
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set 
attempts_policy_b 0 ;number time applied 
Policy B
 
initial se
t
 
to 0
 
    
set task_successes 0 ;number of task successes initial set to 0
 
    
set pretraining_success_rate 0 ;success rate pretraining only initial set to 0
 
breed [trainees tra
inee] ;types of agents
 
allowed in environment
 
 
Algorithm 
16
. NetLogo Code for Study 2C Model
 
 
291
 
    
set post_training_successes 0 ;num
ber of successes for post training initial set to 0
 
    
set 
p
osttraining_success_rate 0 ;success rate in posttraining environment initial set to 0
 
    
set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0
 
    
set chose
_b 0 ;set choice tracker to default of 
Policy A
 
    
set othe
r
_success_rate 0 ;setup success rate of most successful other trainee
 
    
set other_chose_b 0 ;setup choice made by other most successful trainee
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
 
set true_policy_b_reward (true_policy_a_reward + change_in_
v
alue)
 
  
if true_policy_b_reward > 1 [set true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
tr
a
ining] ;call subroutine to save post training 
variables
 
  
if ticks = (burn_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task during burn 
in perio
d
 
  
if ticks > burn_in [trainees
-
transfer] ;call subroutine for trainee decisions post training
 
  
if ticks = burn_in [save
-
bu
rn
-
in] ;call subroutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used to
 
track sim functioning
 
end
 
 
to trainees
-
burn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a rand
om 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set
 
reward_a 0]
 
    
set attempts_policy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_po
licy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calculate all global variables used to track sim functioning
 
  
s
et mean_value_estimate_a mean [value_estimate_a] of trainees
 
  
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_su
c
cess_rate] of trainees
 
  
set mean_posttraining_success_rate mean [posttraining_success_rate] of trainees
 
  
set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will dr
i
ve task
 
 
292
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttraining_success_rate (post_training_successes /
 
(transfer_time_count + 
.000001))]
 
end
 
 
to system
-
choose ;decide if system2 will intervene, if not, rely on system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_activation_liklihood [system2_decision]
 
  
if system_ch
o
ose >= system2_activati
on_liklihood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agent makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + 
practice_attempts + .0000
0
1)) 
-
 
implementation_in
tention) ;update habitual decision rate ;note: 
all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;generate random number to determine which policy to 
im
p
lement
 
    
ifelse choos
e_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, 
determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_succes
s
es + 1) ;update counts 
on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (valu
e
_estimate_a + ((1 / (at
tempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimat
e
_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0
 
;update choice to 
Policy A
 
    
]
 
    
]
 
   
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes 
+ 1) ;update counts on task success
 
       
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_es
timate_b + ((1 / (attempts_policy_b  + .000
0
01)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
293
 
        
set value_estimate_b (value_estimate_b 
+ ((1 / (attempts_policy_b + .000001)) * (r
e
ward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 us
i
ng highest value estimated policy except at some 
error rate
 
  
ifelse num
-
trainees > 1 [
 
    
run
-
conform ;have trainee choose if it will conform or not if there are other trainees
 
    
if conform_choice = 0 [let e
-
greedy random 100 / 100 ;if not imitating r
u
n egreedy as normal
 
      
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_value ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run choice with some degree of error
 
    
ifelse e
-
greedy < exploration_r
ate [ run_low_value ] [ run_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse value_estimate_a 
<
= value_estimate_b [ let success_a random 100 / 100 
;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on 
t
ask success
 
        
set post_training_successes (pos
t_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attem
p
ts_policy_a  + .000001)) * (reward_a 
-
 
value_estimat
e_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_polic
y
_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;up
date value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice
 
to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 10
0 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update co
u
nts on task success
 
        
set post_training_succes
ses (post_training_successes + 1) ;update counts on task success
 
 
294
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 
/
 
(attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value
_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attemp
t
s_policy_b + .000001)) * (reward_b 
-
 
value_estimate_
b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;updat
e
 
choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to 
run_high_value ;subroutine to choose and execute policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
   
ifelse success_a < true_policy_a_reward 
[set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task
 
success
 
        
set attempts_policy_a atte
mpts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
   
[set reward_a 0 ;if unsuccessful set rew
ard to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_
e
stimate_a < 0 [set value_estimate_a 0]
 
   
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if succes
s
ful
 
      
ifelse success_b < true_policy_b
_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts
 
on task success
 
        
set attempts_polic
y_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Poli
c
y B
 
      
[set reward_b 0 ;if unsuccessful
 
set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
i
f
 
value_estimate_b < 0 [set value_estimate_
b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
295
 
 
to save
-
post
-
training ;save post training performance variabl
e
s
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + 
.000001))]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))]
 
end
 
 
to run
-
conform ;make conform decision based on specif
i
ed
 
rate and execute
 
  
let conform_yes random 100 / 100
 
  
ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose if 
conforming or not
 
  
set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees t
h
at
 
applied b on last step
 
  
let majority_rule other_chose_b / num
-
trainees
 
  
if conform_choice = 1 [
 
    
ifelse majority_rule < .50 [let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if 
successful
 
      
ifelse success_a < true_policy_a_reward 
[
se
t reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a atte
m
pt
s_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set rew
a
rd to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if val
ue_estimate_a < 0 [set value_estimate_a 0]
 
   
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b
_
reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
    
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_polic
y
_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .
000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful
 
set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) *
 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_
b
 
0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
 
296
 
      
]
 
end
 
 
297
 
Appendix E
: Study 3A Environment and Code
 
Figure 
86
. 
Snapshot of the mo
d
eling environment for Model 3A in NetLogo.
 
 
298
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts
_policy_a
 
;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward received on most recent attempt with 
Policy B
 
  
task_successes ;number of times s
uccessful
 
at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_rate ; percentage of times successful in post
-
training environment
 
  
behavio
ral_trans
f
er_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks into transfer time
 
  
chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b
 
  
other_success_rate ;success rate of most successful ot
her train
e
e
 
  
conform_choice ;track decision to conform on each time step
 
  
other_chose_b ;behavioral choice of most successful other trainee
 
  
goal_difference ;difference between performance goal and actual performance
 
  
j_goal_check ;is the agent short of goal or
 
not?
 
  
exploration_rate ;each trainees have own exploration rate
 
  
]
 
 
globals [
 
  
mean_value_estimate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Pol
icy B
 
  
mean_overall_task_success ;task rate 
o
f success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining only all agents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rat
e ;rate of choosing 
Policy B
 
in transfer envi
r
onment all agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy value change
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy 
random
-
xcor random
-
ycor ;place specified numb
e
r of 
agents at random coordinates
 
    
set value_estimate_a initial_policy_a_estimate ;set initial value estimate for 
Policy A
 
for each 
trainee
 
    
set value_estimate_b initial_policy_b_estimate ;set initial val
ue estimate for 
Policy B
 
for each 
trainee
 
   
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
breed 
[trainees trainee] ;types of agents allowed in environment
 
 
Algorithm 
17
. NetLogo Code for Model 3A
 
 
299
 
    
set attempts_policy_b 0 ;number time applied 
Policy B
 
initial set to 0
 
    
set task_successes 0 ;number of task successes initial set 
to 0
 
    
set pretraining_success_rate 0 ;succ
e
ss rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;number of successes for post training initial set to 0
 
    
set posttraining_success_rate 0 ;success rate in posttraining environment 
initial set to 0
 
    
set behavioral_transfer_
r
ate 0 ;percentage of time choosing trained policy initial set to 0
 
    
set chose_b 0 ;set choice tracker to default of 
Policy A
 
    
set other_success_rate 0 ;setup success rate of most successful other trainee
 
    
set other_chose_b 0 ;setup choice made by
 
other most successful trainee
 
    
set exploration_rate exploration_rate_0 ;set initial exploration rate
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if true_policy_b_reward > 1 [set
 
true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines a
ctivated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training] ;call subroutine to save post tra
i
ning 
variables
 
  
if ticks = (burn_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call s
ubroutine to have trainee engage in task during burn 
in period
 
  
if ticks > burn_in [trainees
-
transfer
]
 
;call subroutine for trainee decisions post training
 
  
if ticks = burn_in [save
-
burn
-
in] ;call subroutine to save pretraining performance
 
  
update
-
globa
ls ;call subroutine to calculate all global variables used to track sim functioning
 
end
 
 
to trainees
-
b
u
rn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a random 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_
a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set reward_a 0]
 
    
set attempts_policy_a (a
t
tempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;ca
lculate all global variables used to track sim functioning
 
  
set mean_value_estimate_a mean [value_est
i
mate_a] of trainees
 
  
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
  
set mean_posttr
a
ining_success_rate mean [posttraining_success_rate] of trainees
 
 
300
 
  
set mean_behaviora
l_transfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will drive task and update decision 
variables an
d
 
trackers
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttraining_success_rate (post_training_successes
 
/ (transfer_time_count + 
.000001))]
 
  
ask trainees [set goal_difference (perform_goa
l 
-
 
(task_successes / ticks))]
 
  
ask trainees [
 
    
ifelse  goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)]
 
  
]
 
  
ask trainees [set exploration_rate (exp
l
oration_rate_0 + (explore_change * j_goal_check))]
 
end
 
 
to system
-
choose ;decide if 
system2 will intervene, if not, rely on system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_activation_liklihood [system2_decisio
n
]
 
  
if system_choose >= system2_activation_liklihood [system1_decision]
 
  
]
 
end
 
 
to 
system1_decision ;agent makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + 
practice
_
attempts + .000001)) 
-
 
implementation_intention) ;update habitual decision rate ;note: 
all additions of .000001 are to avoid divisions by 0, number small so as n
ot to affect simulation
 
  
let choose_a random 100 / 100 ;generate random number to determine w
h
ich policy to 
implement
 
    
ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, 
determine if successful
 
      
ifelse success
_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_succe
s
ses (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value
_
estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 
0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate
_
a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attemp
ts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy 
A
 
    
]
 
 
301
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if s
uccessful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on tas
k
 
success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;up
date count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts
_
policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update p
olicy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b
 
+ .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choic
e
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;defau
l
t to system 2 using highest value estimated policy except at some 
error rate
 
  
ifelse num
-
trainees > 1 [
 
    
run
-
conform ;have trainee choose if it will conform or not if there are other trainees
 
    
if conform_choice = 0 [let e
-
greedy random 100 / 100 ;i
f
 
not imitating run egreedy as normal
 
      
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_hig
h_value ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run choice with some degree of error
 
    
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ ru
n
_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretraining_succ
ess_rate (task_successes / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse 
v
alue_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if su
ccessful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;
u
pdate counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update co
unts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate
_
a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 
/
 
(attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
 
302
 
   
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b
 
0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successe
s
 
+ 1) ;update counts on task success
 
    
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_
e
stimate_b + ((1 / (attempts_policy_b  + .
000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_
b
 
+ ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b +
 
1 ;update count on 
Policy B
 
choice
 
      
set
 
chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to run_high_value ;subroutine to choose and execute policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a ran
dom 100 / 100 ;if 
Policy A
 
chosen, determine 
i
f successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_
successes (post_training_successes + 1) ;upda
t
e counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate
 
for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_est
imate_a))) ;update value estimate for 
Policy 
A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, de
t
ermine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
    
set post_training_successes (post_training_successes + 
1
) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .
000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value 
e
stimate for 
Policy B
 
 
303
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) *
 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for
 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
training ;save post training pe
r
formance variables
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + 
.000001))]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_
time + .000001))]
 
end
 
 
to run
-
conform ;make conform decisio
n
 
based on specified rate and execute
 
  
let conform_yes random 100 / 100
 
  
ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose if 
conforming or not
 
  
set other_chose
_b count other trainees with [chose_b = 1] ;count number of
 
other trainees that 
applied b on last step
 
  
let majority_rule other_chose_b / num
-
trainees
 
  
if conform_choice = 1 [
 
    
ifelse majority_rule < .50 [let success_a random 100 / 100 ;if 
Policy A
 
ch
osen, determine if 
successful
 
      
ifelse success_a < true
_
policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attem
p
ts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a 
 
+ .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if uns
u
ccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001
)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_
e
stimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_
b
 
< true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task 
success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
s
e
t attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
 
304
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_p
olicy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0
 
;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b +
 
.000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [se
t
 
value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
P
olicy B
 
    
]
 
      
]
 
      
]
 
end
 
 
305
 
Appendix F
: Studies 3B
-
1 and 3B
-
2 Environment and Code
 
Figure 
87
. 
Snapshot of the modeling environment for Models 3B
-
1 and 3B
-
2 in NetLogo.
 
 
306
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_
choose_a ;liklihood of choosing 
Policy
 
A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward received 
on most recent attempt with 
Policy B
 
 
task_successes ;number of times successful at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_rate ; percentage of times successful in p
o
st
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks 
into transfer time
 
  
chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b
 
  
other_success_rate ;
s
uccess rate of most successful other trainee
 
  
conform_choice ;track decision to conform on each time step
 
  
other_chose_b ;behavior
al choice of most successful other trainee
 
  
goal_difference ;difference between performance goal and actual performance
 
  
j
_goal_check ;is the agent short of goal or not?
 
  
exploration_rate ;each trainees have own exploration rate
 
  
]
 
 
globals [
 
  
mean_va
lue_estimate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Polic
y
 
B
 
  
mean_overall_task_success ;task rate of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining o
nly all agents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rate 
;
rate of choosing 
Policy B
 
in transfer environment all 
agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy 
value change
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy ra
n
dom
-
xcor random
-
ycor ;place specified number 
of agents at random coordinates
 
    
set value_estimate_a initial_policy_a_estimate ;set
 
initial value estimate for 
Policy A
 
for each trainee
 
breed [trainees train
ee] ;types of agents allowed in environment
 
 
Algorithm 
18
. NetLogo Code for Model 3B
-
1
 
 
307
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value
 
estimate for 
Policy B
 
for each trainee
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set attempts_policy_b 0 ;number time applied 
Policy B
 
initial set to 0
 
    
set task_successes 0 ;number of task successes initial set to
 
0
 
    
set pretraining_success_rate 0 ;s
uccess rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;number of successes for post training initial set to 0
 
    
set posttraining_success_rate 0 ;success rate in posttraining environment in
i
tial set to 0
 
    
set behavioral_transf
er_rate 0 ;percentage of time choosing trained policy initial set to 
0
 
    
set chose_b 0 ;set choice tracker to default of 
Policy A
 
    
set other_success_rate 0 ;setup success rate of most successful other trainee
 
  
set other_chose_b 0 ;setup choice made
 
by other most successful trainee
 
    
set exploration_rate exploration_rate_0 ;set initial exploration rate
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_i
n
_value)
 
  
if true_policy_b_reward > 1 [
set true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
t
raining] ;call subroutine to save post 
training variables
 
  
if ticks = (burn_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task 
during burn in per
i
od
 
  
if ticks > burn_in [trainees
-
trans
fer] ;call subroutine for trainee decisions post training
 
  
if ticks = burn_in [save
-
burn
-
in] ;call subroutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used 
t
o track sim 
functioning
 
end
 
 
to 
trainees
-
burn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a random 100 / 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[s
e
t reward_a 0]
 
    
set attempts_p
olicy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calculate all global variables used to track sim functioning
 
 
308
 
 
set mean_value_estimate_a mean 
[value_estimate_a] of trainees
 
  
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_
s
uccess_rate] of trainees
 
  
set m
ean_posttraining_success_rate mean [posttraining_success_rate] of trainees
 
  
set mean_behavioral_transfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will 
d
rive task and update 
decision va
riables and trackers
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttra
i
ning_success_rate (post_training
_successes / 
(transfer_time_count + .000001))]
 
  
ask trainees [set goal_difference (perform_goal 
-
 
(task_successes / ticks))]
 
  
ask trainees [
 
    
ifelse  goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)]
 
  
]
 
  
ask trainees [set exploration_rate (exploration_rate_0 + goal_difference)]
 
end
 
 
to system
-
choose ;decide if system2 will intervene, if not, rely on system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_activation
_
l
iklihood [system2_decision]
 
  
if system_choose >= system2_activation_liklihood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agent makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + a
t
t
empts_policy_b + 
practice_attempts + .000001)) 
-
 
implementation_intention) ;update habitual decision rate ;note: 
all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;generate r
a
n
dom number to determine which policy to 
implement
 
    
ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive re
w
a
rd
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy
 
A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
 
309
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 
0
 
;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes
 
+ 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on t
ask 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_e
s
timate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b
 
+ ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set atte
mpts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 using highest value estimated policy except at 
some error rate
 
  
ifelse num
-
trai
nees > 1 [
 
    
run
-
conform ;have trainee choose if it will conform or not if there are other trainees
 
    
if conform_choice = 0 [let e
-
greedy random 100 / 100 ;if not imitating run egreedy as 
normal
 
      
ifelse e
-
greedy < exploration_rate [ run_low_value 
] [ run_high_value ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run choice with some degree of error
 
    
ifelse e
-
greedy < exploration_
r
ate [ run_low_value ] [ run_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretr
aining_success_rate (task_successes / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest
 
estimated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
 
310
 
        
set task_succe
s
ses (task_succes
ses + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value
_
estimate_a (valu
e_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate
_
a (value_estimat
e_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy
 
A
 
choice
 
      
s
et chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set ta
s
k_successes (tas
k_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
s
e
t value_estimate
_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_
e
stimate_b (value
_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count o
n
 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to run_high_value ;subroutine to choose and execute policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 / 100 ;if 
P
olicy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_succes
ses + 1) ;update counts on task success
 
        
set post_training_successes (post_tr
a
ining_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (valu
e_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a)
)
) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimat
e_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update
 
value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
 
311
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
s
et chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 
1
00 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (tas
k_successes + 1) ;update counts on task success
 
        
set post_training_successes 
(
post_training_successes + 1) ;update counts on task 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate
_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_est
i
mate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value
_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b)))
 
;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
trai
n
ing ;save post training performance variables
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time + 
.000001))]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + 
.000001))]
 
end
 
 
to run
-
co
n
form ;make conform decision based on specified
 
rate and execute
 
  
let conform_yes random 100 / 100
 
  
ifelse conform_yes <= conform [set conform_choice 1] [set conform_choice 0] ;choose 
if conforming or not
 
  
set other_chose_b count other trainees with [ch
o
se_b = 1] ;count number of other 
trainees that
 
applied b on last step
 
  
let majority_rule other_chose_b / num
-
trainees
 
  
if conform_choice = 1 [
 
    
ifelse majority_rule < .50 [let success_a random 100 / 100 ;if 
Policy A
 
chosen, 
determine if successful
 
  
ifelse success_a < true_policy_a_reward [se
t reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on tas
k
 
success
 
        
set attempts_policy_a attempt
s_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
 
312
 
  
[set reward_a 0 ;if unsuccessful set reward
 
to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * 
(reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value
_
estimate_a < 0 [set value_estimate_a 0]
 
    
se
t attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if succe
s
sful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update count
s
 
on t
ask 
success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Pol
i
cy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
i
f val
ue_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
      
]
 
end
 
 
313
 
 
trainees
-
own [
 
  
value_estimate_a ;estimated
 
value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time applied 
Policy B
 
  
reward_a ;rewa
r
d received on most
 
recent attempt with 
Policy A
 
  
reward_b ;reward received on most recent attempt with 
Policy B
 
  
task_successes ;number of times successful at task overall
 
  
post_training_successes ;number of times successful only post
-
training
 
  
pretra
i
ning_success_rate 
;success rate pretraining only
 
  
posttraining_success_rate ; percentage of times successful in post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfer_time_count ; ticks into tra
n
sfer time
 
  
chose_
b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b
 
  
other_success_rate ;success rate of most successful other trainee
 
  
conform_choice ;track decision to conform on each time step
 
  
other_chose_b ;behavioral choic
e
 
of most successfu
l other trainee
 
  
goal_difference ;difference between performance goal and actual performance
 
  
j_goal_check ;is the agent short of goal or not?
 
  
exploration_rate ;each trainees have own exploration rate
 
  
]
 
 
globals [
 
  
mean_value_esti
m
ate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Policy B
 
  
mean_overall_task_success ;task rate of success for full simulation
 
  
mean_pretraining_success_rate ;success rate pret
raining only all 
a
gents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment all agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting fo
r policy value ch
a
nge
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy random
-
xcor random
-
ycor ;place specified number of 
agents at random coordinates
 
    
set value_estimate_a initial_policy_a_esti
mate ;set initial
 
value estimate for 
Policy A
 
for each 
trainee
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value estimate for 
Policy B
 
for each 
trainee
 
breed [trainees trainee] ;types of agents allowed in environment
 
 
Algorithm 
19
. NetLogo Code for Model 3B
-
2
 
 
314
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
    
set 
attempts_policy_b
 
0 ;number time applied 
Policy B
 
initial set to 0
 
    
set task_successes 0 ;number of task successes initial set to 0
 
    
set pretraining_success_rate 0 ;success rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;num
ber of successes 
f
or post training initial set to 0
 
    
set posttraining_success_rate 0 ;success rate in posttraining environment initial set to 0
 
    
set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0
 
    
set chose
_b 0 ;set choice 
t
racker to default of 
Policy A
 
    
set other_success_rate 0 ;setup success rate of most successful other trainee
 
    
set other_chose_b 0 ;setup choice made by other most successful trainee
 
    
set exploration_rate exploration_rate_0 ;set i
nitial exploratio
n
 
rate
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if true_policy_b_reward > 1 [set true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
reset
-
ticks 
;
reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training] ;call subroutine to save post training 
variables
 
  
if tick
s = (burn_in + transfer_time) [stop] ;control length of sim
 
  
tick ;advan
c
e time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task during burn 
in period
 
  
if ticks > burn_in [trainees
-
transfer] ;call subroutine for tr
ainee decisions post training
 
  
if ticks = burn_in [save
-
burn
-
in] ;call s
u
broutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used to track sim functioning
 
end
 
 
to trainees
-
burn
-
in ;agents engage in 
work task during burn in
 
  
ask trainees [let success_a random 100 / 100
 
 
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set reward_a 0]
 
    
set attempts_policy_a (attempts_policy_a + 1)
 
   
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_a) * (rew
a
rd_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calculate all global variables used to track sim functioning
 
  
set mean_value_estimate_a mean [value_estimate_a] of trainees
 
  
se
t mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
set mean_ov
e
rall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
 
315
 
  
set mean_posttraining_success_rate mean 
[posttraining_success_rate] of trainees
 
  
set mean_behavioral_transfer_ra
t
e mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will drive task and update decision 
variables and trackers
 
  
system
-
choose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
ask traine
e
s [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfe
r_time_count + 
.000001))]
 
  
ask trainees [set goal_difference (perform_goal 
-
 
(task_suc
c
esses / ticks))]
 
  
ask trainees [
 
    
ifelse  goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)]
 
  
]
 
  
ask trainees [set exploration_rate (exploration_ra
te_0 + (.5 
-
 
goal_difference))]
 
end
 
 
to system
-
choose ;decide if system2 will intervene
,
 
if not, rely on system 1
 
  
ask trainees [
 
  
let system_choose (random 100 / 100)
 
  
if system_choose < system2_activation_liklihood [system2_decision]
 
  
if system_choos
e >= system2_activation_liklihood [system1_decision]
 
  
]
 
end
 
 
to system1_decision ;agen
t
 
makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + attempts_policy_b + 
practice_attempts + .000001)
) 
-
 
implementation_intention) ;update habitual decision rate ;note: 
all additions of .0
0
0001 are to avoid divisions by 0, number small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;generate random number to determine which policy to 
imple
ment
 
    
ifelse choose_a < system1_choose_a [ let success_a random 100 / 100 ;if 
Policy
 
A
 
chosen, 
determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes
 
+ 1) ;update counts on task success
 
        
set post_training_successes (post_training
_
successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;
u
pdate value estimate for 
Policy A
 
      
[set r
eward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value
 
estimate for 
Policy A
 
    
set attempts_policy_
a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
 
316
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse
 
success_b < true_policy_b_reward [set reward_b
 
1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
 
set attempts_policy_b attempts_policy_b
 
+ 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set r
e
ward_b 0 ;if unsuccessful set reward to 0 and 
update policy value estimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
    
set attempts_policy_
b
 
attempts_policy_b + 1 ;update count on 
Policy
 
B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 using highest value estimated policy except at some 
error rate
 
  
ifelse num
-
trainees > 1 [
 
 
run
-
conform ;have trainee choose if it will 
conform or not if there are other trainees
 
    
if conform_choice = 0 [let e
-
greedy random 100 / 100 ;if not imitating run egreedy as normal
 
      
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high
_
value ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run choice with some degree of error
 
    
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining performance
 
  
ask trainees [set pretraining_succe
s
s_rate (task_succ
esses / (burn_in + .000001))]
 
end
 
 
to run_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if suc
c
essful
 
      
ifel
se success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update cou
n
ts on task succes
s
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
P
o
licy A
 
      
[set
 
reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
317
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
if value_estimat
e_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determin
e
 
if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_s
uccesses + 1) ;up
d
ate counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estima
t
e for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
s
et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Polic
y
 
B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;updat
e count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to run_high_value ;subroutine to choose and execut
e
 
policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let success_a random 100 /
 
100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
 
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes
 
(post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
cho
i
ce
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_es
timate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
     
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))
) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;u
p
date count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b rand
om 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive 
r
eward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_s
uccesses (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Poli
c
y B
 
choice
 
 
318
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estima
t
e
 
        
s
et value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_policy_
b
 
+ 1 ;updat
e count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
training ;save post training performance variables
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time
 
+ 
.000001))
]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))]
 
end
 
 
to run
-
conform ;make conform decision based on specified rate and execute
 
  
let conform_yes random 100 / 100
 
  
ifelse conform_yes <= conform [s
e
t conform_c
hoice 1] [set conform_choice 0] ;choose if 
conforming or not
 
  
set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees that 
applied b on last step
 
  
let majority_rule other_chose_b / num
-
trainees
 
  
if conform_c
h
oice = 1 [
 
    
ifelse majority_rule < .50 [let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if 
successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes +
 
1) ;update 
counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_est
i
mate_a + ((
1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a +
 
((1 / (atte
mpts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set ch
o
se_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_polic
y_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_suc
c
esses + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
 
319
 
        
set attempts_po
licy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (v
a
lue_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccess
ful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_esti
m
ate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estima
te_b 0]
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
    
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
      
]
 
end
 
 
320
 
Appendix G
: Study 3C Environment and Code
 
Figure 
88
. 
Snapshot of the modeling environment for Model 3C in NetLogo.
 
 
321
 
 
trainees
-
own [
 
  
value_estimate_a ;
e
stimated value of 
Policy A
 
  
value_estimate_b ;estimated value of 
Policy B
 
  
system1_choose_a ;liklihood of choosing 
Policy A
 
as habitual response
 
  
attempts_policy_a ;number times applied 
Policy A
 
  
attempts_policy_b ;number time appl
ied 
Policy B
 
  
rewar
d
_a ;reward received on most recent attempt with 
Policy A
 
  
reward_b ;reward received on most recent attempt with 
Policy B
 
  
task_successes ;number of times successful at task overall
 
  
post_training_successes ;number of times successfu
l only post
-
training
 
  
pretraining_success_rate ;success rate pretraining only
 
  
posttraining_success_rate ; percentage of times successful in post
-
training environment
 
  
behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environment
 
  
transfe
r_time_count ; ticks
 
into transfer time
 
  
chose_b ;track behavioral choice of last task attempt, 0 = chose a, 1 = chose b
 
  
other_success_rate ;success rate of most successful other trainee
 
  
conform_choice ;track decision to conform on each time step
 
  
other_chose_b ;behavio
r
al choice of most successful other trainee
 
  
goal_difference ;difference between performance goal and actual performance
 
  
j_goal_check ;is the agent short of goal or no
t?
 
  
exploration_rate ;each trainees have own exploration rate
 
  
]
 
 
globals [
 
  
mean_v
a
lue_estimate_a ;mean of agent value estimates for 
Policy A
 
  
mean_value_estimate_b ;mean of agent value estimates for 
Policy B
 
  
mean_overall_task_success ;task rate of 
success for full simulation
 
  
mean_pretraining_success_rate ;success rate pretraining 
o
nly all agents
 
  
mean_posttraining_success_rate ;success rate posttraining only all agents
 
  
mean_behavioral_transfer_rate ;rate of choosing 
Policy B
 
in transfer environ
ment all agents
 
  
true_policy_b_reward ;reward for 
Policy B
 
after adjusting for policy
 
value change
 
]
 
 
to setup
 
  
clear
-
all ;clears environment from previous simulation
 
  
create
-
trainees num
-
trainees [ setxy random
-
xcor random
-
ycor ;place specified number 
of 
agents at random coordinates
 
    
set value_estimate_a initial_policy_a_estimate ;se
t
 
initial value estimate for 
Policy A
 
for each 
trainee
 
    
set value_estimate_b initial_policy_b_estimate ;set initial value estimate for 
Policy B
 
for each 
trainee
 
    
set attempts_policy_a 0 ;number times applied 
Policy A
 
initial set to 0
 
b
reed [trainees trainee] ;types of agents allowed in environment
 
 
Algorithm 
20
. NetLogo
 
Code for Model 3C
 
 
322
 
    
set attempts
_
policy_b 0 ;number time applied 
Policy B
 
initial set t
o 0
 
    
set task_successes 0 ;number of task successes initial set to 0
 
    
set pretraining_success_rate 0 ;success rate pretraining only initial set to 0
 
    
set post_training_successes 0 ;number of s
u
ccesses for post training initial set to 0
 
    
set pos
ttraining_success_rate 0 ;success rate in posttraining environment initial set to 0
 
    
set behavioral_transfer_rate 0 ;percentage of time choosing trained policy initial set to 0
 
    
set chose_b 0 ;se
t
 
choice tracker to default of 
Policy A
 
    
set other_success_rate 0 ;setup success rate of most successful other trainee
 
    
set other_cho
se_b 0 ;setup choice made by other most successful trainee
 
    
set exploration_rate exploration_rate_0 ;set initial e
x
ploration rate
 
  
]
 
  
layout
-
circle (sort turtles) max
-
pxcor 
-
 
3
 
  
set true_policy_b_reward (true_policy_a_reward + change_in_value)
 
  
if t
rue_policy_b_reward > 1 [set true_policy_b_reward 1]
 
  
if true_policy_b_reward < 0 [set true_policy_b_reward 0]
 
  
res
e
t
-
ticks ;reset time count to 0
 
end
 
 
to go ;primary subroutines activated
 
  
if ticks = (burn_in + transfer_time) [save
-
post
-
training] ;call
 
subroutine to save post training 
variables
 
  
if ticks = (burn_in + transfer_time) [stop] ;control length of sim
 
  
ti
c
k ;advance time
 
  
if ticks <= burn_in [trainees
-
burn
-
in] ;call subroutine to have trainee engage in task during burn 
in period
 
  
if ticks 
> burn_in [trainees
-
transfer] ;call subroutine for trainee decisions post training
 
  
if ticks = burn_in [save
-
burn
-
in
]
 
;call subroutine to save pretraining performance
 
  
update
-
globals ;call subroutine to calculate all global variables used to track sim fu
nctioning
 
end
 
 
to trainees
-
burn
-
in ;agents engage in work task during burn in
 
  
ask trainees [let success_a random 10
0
 
/ 100
 
    
ifelse success_a <= true_policy_a_reward [set reward_a 1
 
      
set task_successes (task_successes + 1)]
 
      
[set reward_a 0]
 
    
set attempts_policy_a (attempts_policy_a + 1)
 
    
set value_estimate_a (value_estimate_a + ((1 / attempts_policy_
a
) * (reward_a 
-
 
value_estimate_a))) ]
 
end
 
 
to update
-
globals ;calculate all global variables used to track sim functioning
 
  
set mean_value_estimate_a mean [value_estimate_a] of trainees
 
 
set mean_value_estimate_b mean [value_estimate_b] of trainees
 
  
se
t
 
mean_overall_task_success mean [task_successes] of trainees / ticks
 
  
set mean_pretraining_success_rate mean [pretraining_success_rate] of trainees
 
  
set mean_posttraining_success_rate me
an [posttraining_success_rate] of trainees
 
 
323
 
  
set mean_behavioral_tr
a
nsfer_rate mean [behavioral_transfer_rate] of trainees
 
end
 
 
to trainees
-
transfer ;call routine to choose which system will drive task and update decision 
variables and trackers
 
  
system
-
ch
oose
 
  
ask trainees [set transfer_time_count (ticks 
-
 
burn_in)]
 
  
a
s
k trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time_count + 
.000001))]
 
  
ask trainees [set posttraining_success_rate (post_training_successes / (transfer_time_coun
t + 
.000001))]
 
  
ask trainees [set goal_difference (perform_goal 
-
 
(
task_successes / ticks))]
 
  
ask trainees [
 
    
ifelse  goal_difference > 0 [set j_goal_check (1)] [set j_goal_check (0)]
 
  
]
 
  
ask trainees [set exploration_rate (exploration_rate_0 + (exp
lore_change * j_goal_check))]
 
end
 
 
to system
-
choose ;decide if syst
e
m2 will intervene, if not, rely on system 1
 
  
ask trainees [
 
    
ifelse value_estimate_b < engagement_threshold [run_policy_a]
 
    
[
 
  
let system_choose (random 100 / 100)
 
  
ifelse system_choose < system2_activation_liklihood [system2_decision] [system1_d
e
cision]
 
    
]
 
  
]
 
end
 
 
to system1_decision ;agent makes automatic decision about which policy to apply
 
  
set system1_choose_a ((attempts_policy_a / (attempts_policy_a + att
empts_policy_b + 
practice_attempts + .000001)) 
-
 
implementation_intention) ;update 
h
abitual decision rate ;note: 
all additions of .000001 are to avoid divisions by 0, number small so as not to affect simulation
 
  
let choose_a random 100 / 100 ;generate ran
dom number to determine which policy to 
implement
 
    
ifelse choose_a < system1_cho
o
se_a [ let success_a random 100 / 100 ;if 
Policy A
 
chosen, 
determine if successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive rewa
rd
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
 
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  
+
 
.000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
 
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)
)
 
* (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
    
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
 
324
 
      
set chose_b 0 
;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Poli
c
y B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful receive reward
 
        
set task_successes (task_successes +
 
1) ;update counts on task success
 
        
set post_training_successes (post_traini
n
g_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
        
set value_estimate_b (value_est
imate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ]
 
;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_b (value_estimate_b +
 
((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update val
u
e estimate for 
Policy B
 
    
set attempts_policy_b attempts_policy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to system2_decision ;default to system 2 using highest value estimated policy except
 
at some 
error rate
 
  
ifelse num
-
trainees > 1 [
 
    
run
-
conform ;have trainee choose if it will conform or
 
not if there are other trainees
 
    
if conform_choice = 0 [let e
-
greedy random 100 / 100 ;if not imitating run egreedy as normal
 
      
ifelse e
-
greed
y
 
< exploration_rate [ run_low_value ] [ run_high_value ]]
 
  
]
 
  
[
 
  
let e
-
greedy random 100 / 100 ;run ch
oice with some degree of error
 
    
ifelse e
-
greedy < exploration_rate [ run_low_value ] [ run_high_value ]
 
  
]
 
end
 
 
to save
-
burn
-
in ;save pretraining 
p
erformance
 
  
ask trainees [set pretraining_success_rate (task_successes / (burn_in + .000001))]
 
end
 
 
to r
un_low_value ;subroutine to choose and execute policy with lowest estimated value
 
   
ifelse value_estimate_a <= value_estimate_b [ let success_a rando
m
 
100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_reward [se
t reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_su
c
cesses (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempt
s_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
v
a
lue_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward
 
to 0 and update policy value estimate
 
 
325
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estim
a
te_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a +
 
1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success
_
b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful rec
eive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_tra
i
ning_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count on
 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(rewa
r
d_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value e
stimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
val
u
e_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_po
licy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
t
o
 
run_high_value ;subroutine to choose and execute policy with highest estimated value
 
   
ifelse value_estimate_a >= value_estimate_b [ let succe
ss_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_rewar
d
 
[set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_t
raining_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a at
t
empts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (re
ward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set r
e
ward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
 
set attempts_policy_a attempts_policy_a +
 
1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy
_
b_reward [set reward_b 1 ;if successful rec
eive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on task success
 
        
set attempts_pol
i
cy_b attempts_policy_b + 1 ;update count on
 
Policy B
 
choice
 
 
326
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
value_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessf
u
l set reward to 0 and update policy value e
stimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimat
e
_b 0]
 
    
set attempts_policy_b attempts_po
licy_b + 1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
end
 
 
to save
-
post
-
training ;save post training performance variables
 
  
ask trainees [set posttraining_succe
s
s_rate (post_training_successes / (transfer
_time + 
.000001))]
 
  
ask trainees [set behavioral_transfer_rate (attempts_policy_b / (transfer_time + .000001))]
 
end
 
 
to run
-
conform ;make conform decision based on specified rate and execute
 
  
let conform_yes ra
n
dom 100 / 100
 
  
ifelse conform_yes <= confo
rm [set conform_choice 1] [set conform_choice 0] ;choose if 
conforming or not
 
  
set other_chose_b count other trainees with [chose_b = 1] ;count number of other trainees that 
applied b on last step
 
  
let majority
_
rule other_chose_b / num
-
trainees
 
  
if conform_choice = 1 [
 
    
ifelse majority_rule < .50 [let success_a random 100 / 10
0 ;if 
Policy A
 
chosen, determine if 
successful
 
      
ifelse success_a < true_policy_a_reward [set reward_a 1 ;if successful receive re
w
ard
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (p
ost_training_successes + 1) ;update counts on task success
 
        
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy
 
A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estim
ate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set reward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;
update value estimate for 
Policy A
 
      
if value_estimate_a < 0 [set value_estimate_a 0]
 
    
set attempts_policy_a attempts_policy_a 
+
 
1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
    
]
 
    
[let success_b random 
100 / 100 ;if 
Policy B
 
chosen, determine if successful
 
      
ifelse success_b < true_policy_b_reward [set reward_b 1 ;if successful re
c
eive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_succ
esses (post_training_successes + 1) ;update counts on task success
 
 
327
 
        
set attempts_policy_b attempts_policy_b + 1 ;update count o
n
 
Policy B
 
choice
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b  + .000001)) * 
(reward_b 
-
 
val
ue_estimate_b))) ] ;update value estimate for 
Policy B
 
      
[set reward_b 0 ;if unsuccessful set reward to 0 and update policy value 
e
stimate
 
        
set value_estimate_b (value_estimate_b + ((1 / (attempts_policy_b + .000001)) * (reward_b 
-
 
value_estimate_b))) ;update value estimate for 
Policy B
 
      
if value_estimate_b < 0 [set value_estimate_b 0]
 
    
set attempts_policy_b attempts_p
o
licy_b +
 
1 ;update count on 
Policy B
 
choice
 
      
set chose_b 1 ;update choice to 
Policy B
 
    
]
 
      
]
 
      
]
 
end
 
 
to run_policy_a
 
  
let success_a random 100 / 100 ;if 
Policy A
 
chosen, determine if successful
 
      
ifelse success_a < true_policy_a_rewa
r
d [set reward_a 1 ;if successful receive reward
 
        
set task_successes (task_successes + 1) ;update counts on task success
 
        
set post_training_successes (post_training_successes + 1) ;update counts on t
ask success
 
        
set attempts_policy_a a
t
tempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a  + .000001)) * (reward_a 
-
 
value_estimate_a))) ] ;update value estimate for 
Policy A
 
      
[set reward_a 0 ;if unsuccessful set 
r
eward to 0 and update policy value estimate
 
        
set value_estimate_a (value_estimate_a + ((1 / (attempts_policy_a + .000001)) * (reward_a 
-
 
value_estimate_a))) ;update value estimate for 
Policy A
 
      
if val
ue_estimate_a < 0 [set value_estimate_a 0]
 
 
set attempts_policy_a attempts_policy_a + 1 ;update count on 
Policy A
 
choice
 
      
set chose_b 0 ;update choice to 
Policy A
 
    
]
 
end
 
 
328
 
REFERENCES
 
 
329
 
REFERENCES
 
 
A
merican Society of Training and Development. (2018).
 
2018 sta
te of the industry report.
 
Ale
xandria, VA: A
STD Press.
 
Arthur, W., Bennett, W., Stanush, P., L., & McNelly, T. L. (1998). Factors that influence skill 
decay and retention: A quantitative review and analysis. 
Human Performance, 11
(1), 57
-
101.
 
Baard, S.
 
K., 
Rench, T.
 
A., & Kozlowski, S.
 
W.
 
J. (2014). 
Performance adaptation: A theoretical 
integration and review. 
Journal of Management, 40
(1), 48
-
99.
 
Bago, B. and De Neys, W. (2017) Fast logic? Examining the time course assumption of dual 
process theory. 
Cognitio
n, 158
, 90

109
 
Baker, J.
 
S., &
 
Frey, P.
 
W. (
1980). A cusp catastrophe: Hysterisis, bimodality, and inaccessibility 
in rabbit eyelid conditioning. 
Learning and Motivation, 10, 
520
-
535.
 
Baldwin, T., & Ford, J.
 
K. (1988). Transfer of training: A review and di
rections for future 
research. 
Personnel Psyc
hology, 41
(1), 63
-
105.
 
Baldwin, T., Ford, J.
 
K., & Blume, B. (2009). Transfer of training 1988
-
2008: An updated 
review and agenda for future research. 
International Review of Industrial and 
Organizational Psychol
ogy, 24, 
41
-
70.
 
Baldwin, T., M
agjuka, R.
 
J.,
 
& Loher, B. (1991). The perils of participation: Effects of choice of 
training on trainee motivation and learning. 
Personnel Psychology, 44, 
51
-
65.
 
Bandura, A. (1977).
 
Social Learning Theory
. Oxford, England: Prentice
-
Hall.
 
Bandura, A. (1989
). Human agenc
y in Social Cognitive Theory. 
American Psychologist, 44
(9), 
1175
-
1184).
 
Bandura, A. (1991). Social Cognitive Theory of Self
-
Regulation. 
Organizational Behavior and 
Human Decision Processes, 50
, 248
-
287.
 
Bandura, A., & Cervone
, D. (1983). Self
-
evaluative an
d self
-
efficacy mechanisms governing the 
motivational effects of goal systems. 
Journal of Personality and Social Psychology, 
45
(5), 1017
-
1028.
 
Banks, J., Carson, I. I., Nelson, B. L., & Nicol, D. M. (2005). 
Discrete
-
event sys
tem simulation
. 
P
earson.
 
Bauer,
 
K. N., Orvis, K. A., Ely, K., & Surface, E. A. (2016). Re
-
examination of motivation in 
learning contexts: Meta
-
analytically investigating the role type of motivation plays in the 
prediction of key training outcomes. 
Journal 
of Business and P
sychology, 31
(
1), 33
-
50.
 
 
330
 
Beier, M.
 
E., & Kanfer, R. (2010). Motivation in training and development: A phase perspective. 
In. S.
 
W.
 
J. Kozlowski & E. Salas (Eds.), 
Learning, training, and development in 
organizations 
(pp. 65
-
97). New York, 
NY: Routledge.
 
Be
ll, B., & Kozl
owski, S.
 
W.
 
J. (20
08
). 
Active learning: Effects of core training design elements 
on self
-
regulatory processes, learning, and adaptability. 
Journal of Applied Psychology, 
93
(2), 296
-
316.
 
Bell, B., & Kozlowski, S.
 
W.
 
J. 
(2010). Toward a theory
 
of learner
-
ce
ntered training design: An 
integrative framework of active learning. In. S.
 
W.
 
J. Kozlowski & E. Salas (Eds.), 
Learning, training, and development in organizations 
(pp. 263
-
300). New York, NY: 
Routledge.
 
Bell, B., Tanne
nbaum, S., Ford, J.
 
K.,
 
Noe, R., & Kr
aiger, K. (2017). 100 years of training and 
development research: What we know and where we should go. 
Journal of Applied 
Psychology, 102
(3), 305
-
323.
 
Benner, P. (1982). From novice to expert. 
American Journal of Nursin
g, 82
(3),
 
402
-
407.
 
Blum
e, B., Ford, J
.
 
K., Baldwin, T., & Huang, J. (2010). Transfer of training: A meta
-
analytic 
review. 
Journal of Management, 36
(4), 1065
-
1105.
 
Blume, B., Ford, J.
 
K., Surface, E., & Olenick, J. (2019). A dynamic model of training transf
er. 
Human Resource Mana
gement Review,
 
29, 
270
-
283.
 
Box, G.
 
E.
 
P. (1976). Science and statistics. 
Journal of the American Statistical Association, 
71
(356), 791
-
799.
 
Brauer, M., Wasel, W., & Niedenthal, P. (2000). Implicit and explicit components of prejudic
e. 
Review of General Ps
ychology, 4, 
7
9
-
101.
 
Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. 
American 
Psychologist,
 
32,
 
513

531.
 
Bronfenbrenner, U. (1979). 
The ecology of human development: Experiments by nature and 
design.
 
Cambridge, MA: Harvard U
niversity Pres
s.
 
Campion, M. & Lord, R. (1982). A Control
-
Syestems conceptualization of the goal
-
setting and 
changing process. 
Organizational Behavior and Human Performance, 30
, 265
-
287.
 
Cannon
-
Bowers, J.
 
A., & Salas, E., Ta
nnenbaum, S.
 
I., & Mathieu, J.
 
E
. (1995). Towa
rd 
theoretically based principles of training effectiveness: A model and initial empirical 
investigation. 
Military Psychology, 7
(3), 141
-
164.
 
Carver, C., & Scheier, M. (1998). On the self
-
regulation of behavior
. New York, NY: Cambridge 
Univer
sity Press.
 
 
331
 
Ca
scio, W.
 
F. (2019). Training trends: Macro, micro, and policy issues. 
Human Resource 
Management Review, 29, 
284
-
297.
 
Chen, G., Thomas, B., & Wallace, J.
 
C. (2005). A multilevel examination of the relationships 
among training outcomes, mediati
ng regulatory 
processes, and adaptive. 
Journal of 
Applied Psychology, 90
(5), 827
-
841.
 
Cheng, E. (2016). Maintaining the transfer of in
-

Educational Psychology, 36
(3), 444
-
460.
 
Cheng, E., & Hampson, I. (2008). Tra
nsfer of train
ing: A review and new insights. 
International 
Journal of Management Revi
ews, 10
(4), 327
-
341.
 
Clark, F., Sanders, K., Carlson, M. Blanche, E., & Jackson, J. (2007). Synthesis of habit theory. 
OTJR: Occupation, Participation and Health, 27, 
75
-
235.
 
Colquitt,
 
J.
 
A., LePine, J.
 
A., & Noe, R.
 
A. (2000). Toward an integrative theory
 
of training 
motivation: A meta
-
analytic path analysis of 20 years of research. 
Journal of Applied 
Psychology, 85
(3), 679
-
707.
 
Cumming, G. (2014). The new statistics: Why
 
and how. 
Psyc
hological Science, 25
(1), 7
-
29.
 
DeShon, R.
 
P. (2012). Multivariate dynam
ics in organizational science. In S.
 
W.
 
J. Kozlowski 
(Ed.), 
The Oxford Handbook of Organizational Psychology, Vol. 1 
(pp. 117
-
142). New 
York, NY: Oxford University Press.
 
DeShon, R. P.
, & Rench, T. A. (2009). Clarifying the notion of self
-
regulation in org
anizational 
behavior. 
International Review of Industrial and Organizational Psychology, 24, 
217
-
247.
 
Dickinson, A. (1980). 
Contemporary Animal Learning Theory. 
Cambridge 
University Pre
ss.
 
Dickinson, A. (1985). Actions and habits: The development of behavio
ral autonomy. 
Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
, 
67
-
78.
 
Dienes, Z. (2019). How do I know what my theory predicts? 
A
dvances in Met
hods and Practices 
in Psychological Science, 2
(4), 364
-
377.
 
Dienes, Z., 
& Perner, J. (1999). A theory of implicit and explicit knowledge. 
Behavioral and 
Brain Sciences, 22, 
735
-
808.
 
Dierdorff, E., & Surface, E. (2008). If you pay for skills, 
will they lear
n? Skill change and 
maintenance under a skill
-
based pay system. 
Journal 
of Management, 34
(4), 721
-
743.
 
Dishop, C., Olenick, J., & DeShon, R. (
in press
). Principles for Taking a Dynamic Perspective.
 
In Y. Griep, S. D. Hansen, T. Vantilborgh, a
nd J. Hofmans 
(Eds.), 
Handbook of 
 
332
 
Temporal Dynamic Organizational Behavior, Vol. 1: A Dynamic Look At Organizational 
Behavior Topics
. Edward Elgar.
 
Donovan, J. & Radosevich, D. (1999). A meta
-
analytic review of t
he distribution of practice 
effect: Now you 
see it, now yo

. Journal of Applied Psychology, 84
, 795
-
805.
 
Driskell, J.
 
E., Willis, R.
 
P., & Copper, C. (1992). Effect of overlearning on retention. 
Journal of 
Applied Psychology, 77
(5), 615
-
622.
 
Dunlosky
, J., Rawson, K.
 
A., Mash, E.
 
J., Nathan, M.
 
J., & Willing
ham, D.
 
T. (2013). Improving 

and educational psychology. 
Psychological Science in the Public Interest, 14
(1
), 4
-
58.
 
Dweck, C. (1986). Motivational proc
esses affectin
g learning. 
American Psychologist, 41
(10), 
1040
-
1048.
 
 
Elliot, A. (2006). The hierarchical model of approach
-
avoidance orientation. 
Motivation and 
Emotion, 30
, 111
-
116.
 
 
Epstein, J.M. (1999). Agent
-
based computational models and generative so
cial science. 
Complexity, 4
(5), 41
-
60.
 
Ericsson, K. A. (2006). The influence of experience and deliberate practice on the development 
of superior expert performance. In K. A. Ericsson, N. Charness, P. J. Feltovic
h, and R. R. 
Hoffman (Eds.), 
The Cambridge H
andbook of Exp
ertise and Expert Performance
. 
Cambridge: Cambridge University Press.
 
Ericsson, K. A., Krampe, R. T., & Tesch
-
Romer, C. (1992). The role of deliberate practice in the 
acquisition of expert performan
ce. 
Psychological Review, 100
(3), 363
-
406.
 
E
vans, J., & St
anovich, K.
 
E. (2013). Dual
-
process theories of higher cognition: Advancing the 
debate. 
Perspectives on Psychological Science, 8
(3), 223
-
241.
 
Ford, J.
 
K., Kraiger, K., & Merritt, S.
 
M. (2010). An updated review of the multidimensionality 
of t
raining outcom
es: New directions for training evaluation research. 
In. S.
 
W.
 
J. 
Kozlowski & E. Salas (Eds.), 
Learning, training, and development in organizations 
(pp. 
135
-
165). New York, NY: Routledge.
 
Ford, J.
 
K., Quinones, M.
 
A., Sego, D.
 
J., & Sorra, J.
 
S. (1992). Fa
ctors affecting the opportunity 
to perform trained tasks on the job. 
Personnel Psychology, 45, 
511
-
527.
 
Ford, J.
 
K., Yelon, S.
 
L., & Billington, A.
 
Q. (2011). How much is transferred from training to 
the job? The 10% delusion as a catalyst fo
r thinking abo
ut transfer. 
Performance 
Improvement Quarterly, 24
(2), 7
-
24.
 
Fork, J.
 
K., Bhatia, S., & Yelon, 
S.
 
L. (in press). Beyond direct application as an indicator of 
transfer: A demonstration of five types of use. 
Performance Improvement Quarterly.
 
 
333
 
F
oxon, M. (1997
). The influence of motivation to transfer, action planning, and manager support 
on the transfe
r process. 
Performance Improvement Quarterly, 10
(2), 42
-
63.
 
 
Friedman, S., & Ronen, S. (2015). The effect of implementation intentions on transfer 
of 
training. 
E
uropean Journal of Social Psychology, 45, 
409
-
416.
 
 
Gentner, D.
 
R. (1988). Thoughts on expertis
e. In C. Schooler & K.
 
W. Schaie (Eds.), 
Cognitive 
functioning and social structure over the life course 
(pp. 81
-
94). Norwood, NJ: Ablex.
 
Gist, M.,
 
Stevens, C., 
& Bavetta, A. (1991). Effects of self
-
efficacy and post
-
training 
intervention on the acquisitio
n and maintenance of complex interpersonal skills. 
Personnel Psychology, 44, 
837
-
861.
 
Goldstein, I.
 
L. (1986). 
Training in organizations: Needs ass
essment, devel
opment, and 
evaluation. 
Pacific Grove, CA: Brooks/Cole.
 
Goldstein, I.
 
L., & Ford, J.
 
K. (2002).
 
Training in organizations: Needs assessment, 
development, and evaluation, 4
th
 
ed. 
Belmont, CA: Wadsworth.
 
Gollwitzer, P. M. (1999). Implementation
 
intentions: S
trong effects of simple plans. 
American 
Psychologist, 54
, 493

503.
 
Gollwitzer, P. M., & Sheeran
, P. (2006). Implementation intentions and goal achievement: A 
meta
-
analysis of effects and processes. 
Advances in Experimental and Social Psycholo
gy, 
38, 
69
-
119
.
 
Grand, J. A. (in press). A general response process theory for situational judgement tests. 
Journal of Applied Psychology.
 
Grand, J.
 
A., Braun, M.
 
T., Kuljanin, G., Kozlowski, S.
 
W.
 
J., & Chao, G.
 
T. (2016). The 
dynamics of team cognition: 
A process
-
orie
nted theory of knowledge emergence in 
teams. 
Journal of Applied Psychology, 101
(10), 1352
-
1385
.
 
Grant, A. M. (2008). The significance of task significance: Job performance effects, relational 
mechanisms, and boundary conditions. 
Journal of Ap
plied Psycholo
gy, 93
(1), 108
-
124.
 
Greenwald, A., & Banaji, M. (1995). Implicit social cognition: Attitudes, 
self
-
esteem, and 
stereotypes. 
Psychological Review, 102
(1), 4
-
27.
 
Guastello, S.
 
J. (1987). A butterfly catastrophe model of motivation in organizati
ons: Academic 
performance. 
Journal of Applied Psychology, 72
(1), 165
-
182.
 
Hackman, J.
 
R. (2003). Learning mo
re by crossing levels: Evidence from airplanes, hospitals, 
and orchestras. 
Journal of Organizational Behavior, 24
(8), 905
-
922.
 
Hanges, P.
 
J., & Wang
, M. (2012). S
eeking the Holy Grail in organizational science: Uncovering 
causality through research design.
 
In S.W.J. Kozlowski (Ed.), 
The Oxford Handbook of 
 
334
 
Organizational Psychology, Vol. 1 
(pp. 79
-
116). New York, NY: Oxford University 
Press.
 
Harris, P.
, Brearley, I.
, Sheeran, P., Barker, M., Klein, W., Creswell, J., LeVine, J., & Bond, R. 
(2014). Combining s
elf
-
affirmation with implementation intentions to promote fruit and 
vegetable consumption. 
Health Psychology, 33
(7), 729
-
736.
 
Hattrup, K., & Jackson
, S. E. (1996)
. Learning about individual differences by taking situations 
seriously. In K. R. Murphy (Ed.),
 
Individual differences and behavior in organizations
 
(pp. 507

547). San Francisco: Jossey
-
Bass.
 
Hausknecht, J., Halpert, J., Di Paolo, N., & Moriar
ti Gerrard, M.
 
(2007). Retesting in selection: 
A meta
-
analysis of coaching and practice effects for tests of cognitive ability. 
Journal of 
Applied Psychology, 92
, 373

385.
 
Healy, K. (2017). Fuck nuance. 
So
ciological Theory, 35
(2), 118
-
127.
 
Hollenbeck, J. R
., Colquitt, J
. A., Ilgen, D. R., LePine, J. A., & Hedlund, J. (1998). Accuracy 
decomposition and team decision making: Testing theoretical boundary conditions. 
Journal of Applied Psychology, 83
(3), 494
-
50
0.
 
Holton, E.
 
F., Bates, R.
 
A., & Ruona, W.
 
E.
 
A. (
2000). Develop
ment of a generalized learning 
transfer system inventory. 
Human Resource Development Quarterly, 11
(4), 333
-
360.
 
 
Holton, E.
 
G., Bates, R.
 
A., Seyler, D.
 
L., & Carvalho, M.
 
B. (1997). Toward c
onstruct 
validation of a transfer climate instrumen
t. 
Human Resou
rce Development Quarterly, 
11
(4), 333
-
360.
 
Huang, J.
 
L., Blume, B.
 
D., Ford, J.
 
K., & Baldwin, T.
 
T. (2015). A tale of two transfers: 
Disentangling maximum and typical transfer and their respective predictors. 
Journal of 
Business Psychology, 
30, 
709
-
732.
 
H
uang, J.
 
L., Ford, J.
 
K., & Ryan, A.
 
M. (2017). Ignored no more: Within
-
person variability 
enables
 
better understanding of training transfer. 
Personnel Psychology, 70
(3), 557
-
596.
 
Jaeggi, S. M., Buschkuehl, M., Shah, P., & Jonides, J. (2014).
 
The role of i
ndividual differences 
in cognitive training and transfer. 
Memory & Cognition, 42, 
464
-
480.
 
Jaidev,
 
U.
 
P., & Chirayath, S. (2012). Pre
-
training, during
-
training and post
-
training activities as 
predictors of transfer of training. 
The IUP Journa
l of Managemen
t Research, 11
(4), 54
-
70.
 
Judge, T.
 
A., & Zapata, C.
 
P. (2015). The person
-
situation debate revisi
ted: Effect of situation 
strength and trait activation on the validity of the Big Five personality traits in predicting 
job performance. 
Academy
 
of Management
 
Journal, 58
(4), 1149
-
1179.
 
Kahneman, D. (2011). 
Thinking Fast and Slow. 
New York, NY: Farar, Stra
us, & Giroux.
 
 
335
 

of diversity 
training outcomes.
 
Journal of 
Organizational Behavior,
 
34
(8), 1076
-
1104.
 
Karoly, P. (1993). Mechan
isms of self
-
regulation: A systems view. 
Annual Review of 
Psychology
, 
44
, 23

52.
 
Keith, N., & Frese, M. (2008). Effectiveness of error managemen
t training: A 
meta
-
analysis. 
Journal of Applies Psychology, 93
(1), 59
-
69.
 
Kendzierski, D., Ritter, R., Stump, T.
, Anglin, C. (2015). The effectiveness of an 
implementations intention intervention for fruit and vegetable consumption as moderated 
by self
-
sch
ema status. 
Ap
petite, 95
, 228
-
238.
 
Kenny, D. A. (2005)
. Cross
-
lagged panel design
. Hoboken, NJ: Wiley.
 
Kessler, 
R.
 
C. (1992). Perceived support and adjustment to stress: Methodological 
considerations. In H.
 
O.
 
F. Viel & U. Baumann (Eds.), 
The meaning and m
easurement of 
social support 
(pp. 259
-
271). New York, NY: Hemisphere.
 
Kim, Y., & Ployhart, R.
 
E. (2014). The eff
ects of staffing and training on firm productivity and 
profit growth before, during, and after the Great Recession. 
Journal of Applied 
Psycholog
y, 99
(3), 361
-
389.
 
Kirkpatrick, D.
 
L. (1994). 
Evaluating training programs: The four levels. 
San Francisco, CA: 
Berrett
-
Koehler.
 
Knowles, M. S. (1984). 
Andragogy in action: Applying modern principles of adult learning. 
San 
Francisco: Jossey
-
Bass.
 
Kolb, D. 
(1984). 
Experi
ential learning. 
Englewood Cliffs, NJ: Prentice Hall.
 
Kozlowski, S.
 
W.
 
J., & Chao, G.
 
T. (2012). T
he dynamics of emergence: Cognition and cohesion 
in work teams. 
Managerial and Decision Economics, 33, 
335
-
354.
 
Kozlowski, S.
 
W.
 
J., & Klein, K.
 
J. (2000). A 
multilevel approach to theory and research in 
organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. 
Kozlowski (Eds.), 
Multilevel theory, research and methods in organizations: 
Foundations, exten
sions, and new 
directions
 
(pp
. 3
-
90). San Francisco, CA: Jossey
-
Bass.
 
Kozlowski, S.
 
W.
 
J., Gully, S.
 
M., Brown, K.
 
G., Salas, E., Smith, E.
 
M., & Nason, E.
 
R. (2001). 
Effects of training goals and goal orientation traits on multidimensional training 
outcomes and performa
nce adaptabili
ty. 
Organizational Behavior and Human Decision 
Processes, 85, 
1
-
31.
 
Kraiger, K., & Ford, J.
 
K. (2007). The history of training 
in industrial/organizational psychology. 
In L. Koppes (Ed.), 
The science and practice of industrial and organizatio
nal 
 
336
 
psychology
: Historical aspects from the first 100 years. 
Mahwah, NJ: Lawrence Erlbaum 
Associates.
 
Kraiger, K., Ford, J.
 
K., & Salas, E.
 
D
. (1993). Application of cognitive, skill
-
based, and 
affective theories of learning outcomes to new methods of trai
ning evaluatio
n. 
Journal of 
Applied Psychology, 78, 
311
-
328.
 
Lai, C., Hoffman, K., & Nosek, B. (2013). Reducing implicit prejudice. 
Social an
d Personality 
Psychology Compass, 7
(5), 315
-
330.
 

B. (2016). Re
ducing implicit 
racial preferences: II. Intervention effectiveness across time. 
Journal of Experimental 
Psychology: General, 14
5
(5), 1001
-
1016.
 
Laker, D.
 
R., & Powell, J.
 
L. (2011).  The differences between hard and soft skills and their 
rela
tive impact on
 
training transfer. 
Human Resource Development Quarterly, 22
(1), 
111
-
122.
 
Lancaster, S., Di Milia, L., & Cameron, R. (2013). Su
pervisor behaviours that facilitate training 
transfer. 
Journal of Workplace Learning, 25
(1), 6
-
22.
 
Langdon, D. G. (
1997). Selecti
ng interventions. 
Performance Improvement, 36, 
11
-
15.
 
Leavitt, K., Qiu, F., & Shapiro, D. L. (in press). Using electronic confe
derates for experimental 
research in organizational science. 
Organizational Research Methods.
 
Lewin, K. (1943). Psy
chology and th
e process of group living. 
Journal of Social Psychology, 17, 
113
-
131.
 
Lindsley, D., Brass, D., & Thomas, J. (1995). Efficacy
-
pe
rformance spirals: A multi
-
level 
perspective. 
The Academy of Management Review, 20
(3), 645
-
678.
 
Locke, E. (1968). T
oward a theory
 
of task motivation and incentives. 
Organizational behavior 
and human performance, 3, 
157
-
189.
 
Locke, E. (1975). Personnel atti
tudes and motivation. 
Annual Review of Psychology, 26, 
457
-
480.
 
Locke, E., & Latham, G. (1990). 
A Theory of Goal Se
tting and Task
 
Performance. 
Englewood 
Cliffs, NJ: Prentice
-
Hall.
 
London, M. (2012). Lifelong learning. In S.
 
W.
 
J. Kozlowski (Ed.), 
The Oxfor
d Handbook of 
Organizational Psychology, Vol. 2 
(pp. 1199
-
1227). New York, NY: Oxford University 
Press.
 
Lopes, M., 
Melo, F. S., K
enward, B., & Santos
-
Victor, J. (2009). A computational model of 
social
-
learning mechanisms. 
Adaptive Behavior, 17
(6), 467
-
183.
 
 
337
 
Lord, R. G., & Hanges, P. J. (1987). A control system model of organizational motivation: 
Theoretical development 
and applied im
plications.
 
Behavioral Science
,
 
32
(3), 161
-
178.
 
Lord, R., Diefendorff, J., Schmidt, A., & Hall, R. (2010). Self
-
regulation at work. 
Annual Review 
of Psychology, 61
, 543
-
68.
 
Ludvig, E. A., Bellemare, M. G., & Pearson, K. G. (2011). A primer on
 
reinforcement
 
learning 
in the brain: Psychological, computati
onal, and neural perspectives. In E. Alonso and E. 
Mondragon (Eds.), 
Computational neuroscience for advancing artificial intelligence: 
Models, methods, and applications, 
(pp. 111
-
144). Hershey, 
PA: IGI Global
.
 
March, J. G. (1991). Exploration and exploitat
ion in organizational learning. 
Organization 
Science, 2
(1), 71
-
87.
 
Martocchio, J.
 
J. (1992). Microcomputer usage as an opportunity: The influence of context in 
employee training. 
Personnel Psych
ology, 45, 
529
-
552.
 
Mathieu, J.
 
E., & Tesluk, P.
 
E. (2010). A 
multilevel perspective on training and development 
effectiveness. 
In. S.
 
W.
 
J. Kozlowski & E. Salas (Eds.), 
Learning, training, and 
development in organizations 
(pp. 405
-
440). New York, NY: Rout
ledge.
 
Mathieu
, J.
 
E., Tannenbaum, S.
 
I., & Salas, E. (1992). Influences of individual and situational 
characteristics on measures of training effectiveness. 
Academy of Management Journal, 
35, 
828
-
847.
 
McCrae, R.
 
R., & Costa, P.
 
T. (1987). Validation of th
e five
-
factor 
model of personality across 
instruments and observers. 
Journal of Personality and Social Psychology, 52
(1), 81

90.
 
Melnikoff, D.
 
E. and Bargh, J.
 
A. (2018a) The mythical number two. 
Trends in Cognitive 
Science, 22
, 
280

293.
 
Melnikoff, D.
 
E. a
nd Bargh, J.
 
A
. (2018b) The insidious number two. 
Trends in Cognitive 
Science, 22
, 668
-
669.
 
Meyer, R.
 
D., Dalal, R.
 
S., & Hermida, R. (2010). A review and synthesis of situational strength 
in the organizational sciences. 
Journal 
of Management, 36, 
121
-
140.
 
Miller, J.
 
H.
, & Page, S.
 
E. (2007). 
Complex adaptive systems: An introduction to computational 
models of social life. 
Princeton, NJ: Princeton University Press.
 
Muthukrishna, M., & Henrich, J. (2019). A problem in theory. 
Natur
e: Human Behaviour
. doi: 
10
.1038/s41562
-
0
18
-
0522
-
1
 
Myers, C. G. (in press). Performance benefits of reciprocal vicarious learning in teams. 
Academy 
of Management Journal.
 
Myers, D.
 
G. (2004). 
Psychology 
(7
th
 
ed.). New York: Worth.
 
 
338
 
Neal, J. W., & Neal, Z. P
. (2013). 
Nested or net
work
ed? Future dir
ections for ecological systems 
theory. 
Social Development, 22
, 722 

737.
 
Neal, D. T., Wood, W., & Drolet, A. (2013). How do people adhere to goals when willpower is 
low? The profits (and pitfalls) of strong habits. 
Journal of Personality and 
Social 
Psychol
ogy, 104
(6), 959
-
975.
 
Neal, D. T., Wood, W., & Quinn, J.
 
M. (2006). Habits: A repeat performance. 
Current 
Directions in Psychological Science, 15, 
198
-
202.
 
Newell, A., & Rosenbloom, P.
 
S. (1981). Mechanisms of skill
 
acquisition and the law of
 
practice. In 
J.R. Anderson (Ed.), 
Cognitive Skills and their Acquisition 
(pp. 1
-
56). 
Hillsdale, NJ: Lawrence Earlbaum Associates.
 
Nijman, D.
 
J.
 
M., Nijhof, W.
 
J., Wognum, A.
 
A.
 
M., & Veldkamp, B.
 
P. (2006). Exploring 
differential effects of supervisor sup
port on transf
er of training. 
Journal of European 
Industrial Training, 30
(7), 529
-
549.
 
Noe, R. A. (2017). 
Employee Training & Development, Seventh Edition. 
New York, NY: 
McGraw
-
Hill Education.
 
Nye,
 
C., Prasad, J., Bradburn, J., & Elizondo, F. (2018). Impro
ving the opera
tionalization of 
interest congruence using polynomial regression. 
Journal of Vocational Behavior, 104, 
154
-
169.
 
Olenick, J., Bhatia, S., & Ryan, A.
 
M. (2016). Effects of 
g
-
loading and
 
time lag on retesting in 
job selection. 
International Jour
nal of Selecti
on and Assessment, 24
(4), 324
-
336.
 
Olenick, J.,
 
Blume, B., & Ford, J. K. (
in press
). A nonlinear framework for understanding 
employee training and transfer.
 
European Journal of Work a
nd Organizational 
Psychology.
 
Olenick, J., Walker, R., Brad
burn, J., & De
Shon, R. (2018). A systems view of the scientist
-
practitioner gap. 
Industrial and Organizational Psychology, 11
(2), 220
-
227.
 
Pavlov, P.I. (1927). 
Conditioned Reflexes. 
London: Oxford 
University Press.
 
Payne, S., Youngcourt, S., & Beaubien, J.
 
(2007). A met
a
-
analytic examination of the goal
-
orientation nomological net. 
Journal of Applied Psychology, 92
(1), 128
-
150.
 
Peetz, J., Wilson, A. E., & Strahan, E. J. (2009). So far away: The role
 
of subjective temporal 
distance to future goals in motivat
ion and behavi
or. 
Social Cognition, 27
(4), 475
-
495.
 
Pennycock, G., De Neys, W., Evans, J., Stanovich, K.
 
E., & Thompson, V.
 
A. (2018). The 
mythical dual
-
process typology. 
Trends in Cognitive Scienc
e, 22
(8), 667
-
668.
 
Pennycook, G., Fugelsang, J. A., & Koehl
er, D. J. (201
5). What makes us think? A three
-
stage 
dual
-
process model of analytic engagement.
 
Cognitive psychology
,
 
80
, 34
-
72.
 
 
339
 
Ployhart, R. E., & Moliterno, T.
 
P. (2011). Emergence of the human c
apital resource: A 
multilevel model. 
Academy of Management 
Review, 36
(1),
 
127
-
150.
 
Ployhart, R. E., & Vandenberg, R. J. (2010). Longitudinal research: The theory, design, and 
analysis of change.
 
Journal of management
,
 
36
(1), 94
-
120.
 
Popper, K. R. (1959). 
The Logic of Scientific Discovery. 
London: Hutchinson.
 
Power
s, W. (1973). 
Behavior: The Control of Perception
. New York: Aldine/DeGruyter.
 
Railsback, S.
 
F., & Grimm, V. (2012). 
Agent
-
Based and Individual
-
Based Modeling: A Practical
 
In
troduction. 
Princeton, NJ: Princeton University Press.
 
Rogers, E. (2003). 
Diffusio
n of Innovatio
ns, Fifth Edition. 
New York, NY: Free Press.
 
Rouiller, J.
 
Z., & Goldstein, I.
 
L. (1993). The relationship between organizational transfer climate 
and positive t
ransfer of training. 
Human Resource Development Quarterly, 4, 
377
-
390.
 
Rummler, G.
 
(1996). In se
arch of the holy performance grail. 
Training and Development, 
26
-
31.
 
Ruona, W., Leimbach, M., F. Holton III, E., & Bates, R. (2002). The relationship between le
arner 
utility reactions and predicted learning transfer among trainees. 
Internatio
nal Journal of
 
Training and Development, 6
(4), 218
-
228.
 
Salas, E., & Kozlowski, S.
 
W.
 
J. (2010). Learning, training, and development in organizations: 
Much progress and a pee
k over the horizon. 
In. S.
 
W.
 
J. Kozlowski & E. Salas (Eds.), 
Learning, training, 
and developmen
t in organizations 
(pp. 461
-
476). New York, NY: 
Routledge.
 
Salas, E., Milham, L.
 
M., & Bowers, C.
 
A. (2003). Training evaluation in the military: 
Misconceptions, opportunities, and challenges. 
Military Psychology, 15, 
3
-
16.
 
Salas, E., Weaver,
 
S.
 
J., & Shuf
fler, M.
 
L. (2012). Learning, training, and development in 
organizations. In S.
 
W.
 
J. Kozlowski (E
d.), 
The Oxford Handbook of Organizational 
Psychology, Vol. 1 
(pp. 330
-
372). New York, NY: Oxford University Press.
 
Samuel, A.
 
L. (1967). Some s
tudies in mach
ine learning using the game of checkers. II 

 
Recent 
progress. 
IBM Journal on Research and Develop
ment, 11
(6), 601
-
617.
 
Schmidt, A. M., & DeShon, R. P. (2007). What to do? The effects of goal
-
performance 
discrepancies, superordinate goals, an
d time on dyna
mic goal prioritization. 
Journal of 
Applied Psychology, 92, 
928
-
941.
 
Schniehotta, F., Scholz, U., 
& Schwarzer, R. (2005). Bridging the intention
-
behavior gap: 
Planning, self
-
efficacy, and action control in the adoption and maintenance of phys
ical 
exercise.
 
Psychology & Health, 20
(2), 143
-
160.
 
 
340
 
Scholz, U., Nagy, G., Schuz, B., & Ziegelman, J. (2008). The
 
role of motivational and volitional 
factors for self
-
regulated running training: Associations on the between and within
-
person level. 
British J
ournal of Soci
al Psychology, 47, 
421
-
439.
 
Schunk, D. & Usher, E. (2012). Social Cognitive Theory and Motivation.
 
In R. Ryan (Ed.) 
The 
Oxford Handbook of Human Motivation, 
(pp. 3
-
27). New York, NY: Oxford University 
Press.
 
Sheeran, P., Webb, T., & Gollwitze
r, P. (2005). 
The interplay between goal intentions and 
implementation intentions. 
Personality and social psycho
logy bulletin, 31
(1), 87
-
98.
 
Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk
-
sensitive reinforcement 
learning. 
Neural Computation, 26
(7), 1298
-
1328.
 
Singh, V., Dong, A., & Gero, J. S. (2013). Developing a computational model to understand 
the 
contributions of social learning modes to task coordination in teams. 
Artificial 
Intelligence for Engineering Design, Analys
is and Manufacturing. 
27, 
3
-
17.
 
Sitzmann, T., & Weinhardt, J.
 
M. (2019). Approaching evaluation from a multilevel perspective: 
A
 
comprehensive analysis of the indicators of training effectiveness. 
Human Resource 
Management Review, 29, 
253
-
269.
 
Sitzmann, T.
, & Yeo, G. (2013). A 
meta
-
analytic investigation of the within
-
person self
-
efficacy 
domain: Is self
-
efficacy a product of past 
performance or a driver of future performance? 
Personnel Psychology, 66
, 531
-
568.
 
Skinner, B.
 
F. (1938). 
The Behavior of Organis
ms: An Experimental An
alysis
. New York, NY: 
Appleton
-
Century.
 
Skinner, B.
 
F. (1963). Operant behavior. 
American Psychologist, 18
(8), 503
-
515.
 
Smith, E. M., Ford, J. K., & Kozlowski, S. W. J. (1997). Building adaptive expertise: 
Implications for training de
sign strategies. In M.
 
A. Quiñones & A. Ehrenstein (Eds.), 
Training for a
 
rapidly changing workplace: Applications of psychological research
 
(pp. 
89
-
118). Washington, DC, US: American Psychological Association.
 
Soltis, S. M., Brass, D. J., & Lepak, D. P. (
2018). Social resource
 
management: Integrating 
social network theory and 
human resource management. 
Academy of Management 
Annals, 12
(2), 537
-
573.
 
Southerton, D. (2012). Habits, routines and temporalities of consumption: From individual 
behaviours to the re
production of everyday
 
practices. 
Time & Society, 22
(3), 335
-
355.
 
Spicer,
 
S. G., Mitchell, C. J., Wills, A. J., & Jones, P. M. (2020). Theory protection in 
associative learning: Humans maintain certain beliefs in a manner that violates prediction 
error. 
Jo
urnal of Experimental 
Psychology: Animal Learning and Cognition, 46
(2), 1
51
-
161.
 
 
341
 
Stajkovic, A., & Luthans, F. (1998). Self
-
efficacy and work
-
related performance: A meta
-
analysis. 
Psychological Bulletin, 124,
 
240
-
261.
 
Starns, J. J., Cataldo, A. M., Rotello,
 
C. M., Annis, J., Asc

Assessing theor
etical conclusions with blinded inference to investigate a potential 
inference crisis. 
Advances in Methods and Practices in Psychological Science, 2
(4), 335
-
349.
 
Steel, P., & Konig, C
. J. (2006). Integrati
ng theories of motivation. 
Academy of Management 
Re
view, 31
(4), 889
-
913.
 
Steele
-
Johnson, D.., Narayan, A., Delgado, K.
 
M., & Cole, P. (2010). Pretraining influences and 
readiness to change dimensions: A focus on static versus dynamic 
issues. 
The Journal of
 
Applied Behavioral Science, 46
(2), 245
-
274.
 
Stoned
ahl, F. and Wilensky, U. (2008). NetLogo Diffusion on a Directed Network model. 
http://ccl.northw
estern.edu/netlogo/mod
els/DiffusiononaDirectedNetwork
. Center for 
Connec
ted Learning and Computer
-
Based Modeling, Northwestern University, Evanston, 
IL. 
 
Sun, R., Slusarz, P., & Terry, C. (2005). The interaction and of the explicit and the implicit in 
ski
ll learning: A dual
-
pr
ocess approach. 
Psychological Review, 112
(1), 159
-
1
92.
 
Sun, S., Vancouver, J., & Weinhardt, J. (2014). Goal choices as planning: Distinct expectancies 
and value effects in two goal processes. 
Organizational Behavior and Human Decision
 
Processes, 125
, 220
-
2
33.
 
Surface, E., 
& 
Olenick, J.
 
(forthcoming). A mec
hanistic model of training transfer.
 
Susskind, D., & Susskind, R. (2017). 
The Future of the Professions: How Technology Will 
Transform the Work of Human Experts. 
New York, NY: Oxford 
University Press.
 
Sutt
on, R. I., & Staw, B. M. (1995). What theory is not
. 
Administrative Science Quarterly, 40
(3), 
371
-
384.
 
Sutton, R.
 
S., & Barto, A.
 
G. (2018). 
Reinforcement Learning: An Introduction, Second Edition.
 
Cambridge, MA: The MIT Press.
 
Tannen
baum, S.
 
I., Beard, R.
 
L., McNall, L.
 
A., & Salas, E. (2010). Informal le
arning and 
development in organizations. 
In. S.
 
W.
 
J. Kozlowski & E. Salas (Eds.), 
Learning, 
training, and development in organizations 
(pp. 303
-
331). New York, NY: Routledge.
 
Tannenb
aum, S. I., Mathieu, J
. E., Salas, E., & Cohen, D. (2012). Teams are changing: Are 
research and practice evolving fast enough? 
Industrial and Organizational Psychology, 5, 
2
-
24.
 
Tesauro, G. (2002). Programming backgammon using self
-
teaching neural n
ets. 
Ar
tificial 
Intelligence,
 
134
(1
-
2), 181
-
199.
 
 
342
 
Tesauro, G., Gondek, D.
 
C., Lenchner, J., Fan, J., & Prager, J.
 
M. (2013). Analysis of 

Journal of Artificial Intelligence Research, 
47, 
205
-
251.
 
Thayer, P.
 
W., & Tea
chout, 
M.
 
S. (1995). 
A Climat
e for transfer model. 
(Rep No. ALM
-
TP
-
1995
-
0035). Brooks Air Force Base, TX: Air Force Material Command.
 
Thorndike, E.
 
L. (1898). Animal intelligence: An experimental study of the associative processes 
in animals. 
The Psycholog
ical Re
view, Series of Monogr
aph Supplements, II
(4).
 
Trentin, G. (2001). From Formal Training to Communities of Practice via Network
-
Based 
Learning.
 
Educational Technology,41
(2), 5
-
14.
 
Turton, R., Bruidegom, K., Cardi, V., Hirsch, C. R., & Treasure, J. (2016). No
vel methods to 
help de
velop healthier eating habits for eating and weight disorders: A systematic review 
and meta
-
analysis. 
Neuroscience and Biobehavioral Reviews, 61
, 132
-
155.
 
T
versky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative rep
resentation 
of uncerta
inty. 
Journal of Risk and Uncertainty, 5, 
297
-
323.
 
Vancouver, J. & Day, D. (2005). Industrial and Organization research on self
-
regulation: From 
constructs to applications. 
Applied Psychology: An International Review, 54
(2), 155
-
185.
 
Vancouver, J. (2008). 
Integrating self
-
regulation theories of work motiva
tion into a dynamic 
process theory. 
Human Resource Management Review, 18, 
1
-
18.
 
Vancouver, J. (2012). Rhetorical reckoning: A response to Bandura. 
Journal of Management, 
38
(2), 465
-
47
4.
 
Vancouver, J., & Ke
ndall, L. (2006). When self
-
efficacy negatively rel
ates to motivation and 
performance in a learning context. 
Journal of Applied Psychology, 91
(5), 1146
-
1153.
 
Vancouver, J., & Weinhardt, J. (2015). Modeling the mind and the milieu: Com
putational 
modeling fo
r micro
-
level organizational researchers. 
Organizat
ional Research Methods, 
15
(4), 602
-
623.
 
Vancouver, J., Gullekson, N., Morse, B., & Warren, M. (2014). Finding a between person 
negative effect of self
-
efficacy on performance: Not jus
t a within
-
person effe
ct anymore. 
Human Performance, 27
(3), 243
-
261.
 
Vanc
ouver, J., Moore, K., & Yoder, R. (2008). Self
-
efficacy and resource allocation: Support for 
a nonmonotonic, discontinuous model. 
Journal of Applied Psychology, 93
(1), 35
-
47.
 
Vancouve
r, J., Weinhardt, J., 
& Vigo, R. (2014). Change one can believe in: Addin
g learning to 
computational models of self
-
regulation. 
Organizational Behavior and Human Decision 
Processes, 124, 
56
-
74.
 
 
343
 
Vermeulen, R.
 
C.
 
M. (2002). Narrowing the transfer gap: The ad

tuations in 
training. 
Journal of European Industria
l Training, 26
(8), 366
-
374.
 
Verplanken, B., & Orbell, S. (2003). Reflections on past behavior: A self
-
report index of habit 
strength. 
Journal of Applied Social Psychology, 33
(6), 1313
-
1330.
 
Vignoli, M., & 
Depolo, M. (2019). Transfer of training process. Wh
en proactive personality 
matters? A three
-
wave investigation of proactive personality as a trigger of the transfer of 
training process. 
Personality and Individual Differences, 141, 
62
-
67.
 
Vroom, V. (1964).
 
Work and Motivation
. New York, NY: John Wiley & So
ns, Inc.
 
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (in press). A model of dynamic, within
-
trial 
conflict resolution for decision making. 
Psychological Review.
 
Weick, K. E. (1
976). Educational orga
nizations as loosely coupled systems. 
Administrativ
e 
science quarterly
, 
21
(1), 1
-
19.
 
Wieber, F., Thurmer, J., Gollwitzer, P. (2015). Promoting the translation of intentions into 
actions by implementation intentions: Behavioral effects
 
and psychological cor
relates. 
Frontiers in Human Neuroscience, 9
, 1
-
18.
 
Wilensky, U. (1999). NetLogo. 
http://ccl.northwestern.edu/netlogo/
. Center for Connected 
Learning and Computer
-
Based Modeling, Nor
thwestern University, 
Evanston, IL. 
 
Wilson, T., Lindsey, S., & Schooler,
 
T. (2000). A model of dual attitudes. 
Psychological 
Bulletin, 107
(1), 101
-
126.
 
Wilson, T., Lindsey, S., & Schooler, T. (2000). A model of dual attitudes. 
Psychological 
Bulletin, 107
(
1), 101
-
126.
 
Wood, W.,
 
Quinn, J.
 
M., & Kashy, D. (2002). Habits in everyd
ay life: Thought, emotions, and 
action. 
Journal of Personality and Social Psychology, 83, 
1281
-
1297.
 
Yammarino, F. J., & Dubinsky, A. J. (1994). Transformational leadership theory: Us
ing levels of 
analysis
 
to determine boundary conditions. 
Personnel Psycho
logy, 47, 
787
-
811.
 
Yeh, C
-
H., & Chen, S
-
H. (2001). Toward an integration of social learning and individual 
learning in agent
-
based computational stock markets: The approach based on p
opulation 
genetic prog
ramming. 
Journal of Management and Economics, 5
(5).
 
Yelon, S.
 
L., & 
Ford, J.
 
K. (1999). Pursuing a multidimensional view of transfer. 
12
(3), 58
-
78.
 
Zerres, A., Huffmeier, J., Freund, P., Backhaus, K., & Hertel, G. (2013). Does it take
 
two to 
tango? Longitu
dinal effects of unilateral and bilateral integrative negotiation tr
aining. 
Journal of Applied Psychology, 98
(3), 478
-
491.
 
 
344
 
Zhang, Y., 
Olenick, J.,
 
Chang, C
-
H., Kozlowski, S. W. J., & Hung, H. (2018). 
The I in team: 
Mining personal soc
ial interaction routin
e with topic models from long
-
term team data
. 
Proceedings of the 23
r
d
 
International Conference on Intelligent User Interfaces,
 
421
-
426.