COLLEGE ENGINEERING PERSISTENCE: THE DYNAMICS OF MOTIVATION AND CO -CURRICULAR SUPPORT By Emily Bovee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Educational P sychology and Educational Technology ÑDoctor of Philosophy 2019 ABSTRACT COLLEGE ENGINEERING PERSISTENCE: THE DYNAMICS OF MOTIVATION AND CO -CURRICULAR SUPPORT By Emily Bovee This dissertation examine d the engagement and motivation of 1, 044 engineerin g students and how these constructs relate d to studentsÕ academic development and persistence in engineering. Engagement was assessed based on co -curricular participation (e.g., students' utilization of resources on campus) and motivation was assessed base d on studentsÕ self -reported expectancies for success and value for the domain of engineering. I applied machine learning techniques to a rich dataset that includes self -reported indicators, registrar data, and many time points of engagement data from vari ous campus activities (e.g., tutoring, advising). Differential predictors emerged as important in predicting motivation, co -curricular engagement, and persistence. Examination of model performance indicators revealed that second -year predictors of late -third -year engineering expectancy and task -value were most robust than models that included other yearsÕ data as predictors . In the prediction of co -curricular engagement, first -year predictors and predictors from throughout all three years yielded the stro ngest predictive capability of the model s tested . Finally, in predicting persistence, models including second -year only indicators, third -year only indicators, or indicators from all three years were equally predictive of persistence. For all models, demog raphic variables contributed strongly to the prediction of the outcomes. Implications are discussed for educational psychology research and for higher education administration. iii This dissertation is d edicated to my fianc” , Dan . I can Õt wait to marry you! iv ACKNOWLEDGEMENTS I was fortunate to be surrounded by incredible people who helped make this Ph.D. happen. I am enormously grateful to my fianc” for his everlasting support throughout this journey. Dan, you helped me find joy when I was overwhelmed by small problems , and I could not have written this dissertation without your constant support and love. To my family Ð Mom, Dad, Claire, Hannah, Maria, A.J., and Caroline Ð thank you for understanding when I had to work in the Òdowntime Ó of family events and when we couldnÕt see each other as much as we would have liked. I could always count on you to lift me up if I felt discouraged. My writing group also encouraged me to persevere when things got tough. Katie, Mary, Missy, and Molade Ð you are the best writing group I could have ever imagined, and graduate school would not have been the same without you! The Thrive PhD community provided companionship and community during what could have been an isolating dissertation year. I am especia lly grateful to Katy, Lindsey, Janet, Eleanor , and Jackie for their support and encouragement, and for helping me to understand the value of taking breaks to fuel the work and to recharge my spirit. Last but not least, I am thankful for my Michigan State f riends, especially You -kyung, Chris, Sanjana, Krystal, and Brittany. I am so very thankful for all these people and more. v TABL E OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ........................................................................................................................ x Chapter 1: Introduction ................................................................................................................... 1 Chapter 2: Literature Review .......................................................................................................... 8 Expectancy -Value Theory ........................................................................................................... 8 Task-value and persistence ...................................................................................................... 9 Expectancy and persistence ................................................................................................... 10 The development of expectancies and values .................................................................... 11 Co-Curricular Supports for Persistence ..................................................................................... 13 Academic advising and persistence ....................................................................................... 14 Major -specific residential life and persistence ...................................................................... 15 Career services and persistence ............................................................................................. 16 Tutoring and pe rsistence ........................................................................................................ 17 Predictors of engagement in co -curricular support activities ................................................ 18 Outcomes of engagement in co -curricular support activitie s ................................................ 20 Confounding factors in studying co -curricular support effects ............................................. 21 Conceptual Model ..................................................................................................................... 22 Current Study ............................................................................................................................ 24 A new approach to engineering persistence .......................................................................... 24 Research questions ................................................................................................................ 27 Chapter 3: Method ........................................................................................................................ 30 Participants ................................................................................................................................ 30 Context ................................................................................................................................... 31 Procedure ................................................................................................................................... 33 Survey data collection ........................................................................................................... 33 Institutional data collection ................................................................................................... 35 Co-curricular activity participation data collection ............................................................... 35 Measures .................................................................................................................................... 35 Survey measures .................................................................................................................... 35 Engineering expectancy ..................................................................................................... 35 Task-value .......................................................................................................................... 35 Demographic variables .......................................................................................................... 36 Gender ................................................................................................................................ 36 Race/ethnicity (URM) ........................................................................................................ 36 First -generation college student status ............................................................................... 37 Co-curricular activity participation measures ........................................................................ 37 Data source one: Residence in an engineering residence hall ........................................... 38 Data sourc e two: Engineering -specific career services ..................................................... 38 vi Data source three: math learning center ............................................................................. 40 Data source four: Engineering -specific acade mic advising ............................................... 40 Persistence Measure .............................................................................................................. 41 Data Analytic Plan .................................................................................................................... 41 Longitudinal measurement invariance ................................................................................... 41 Missing data analyses ............................................................................................................ 42 Multiple imputation ............................................................................................................... 42 Primary analyses: Random forest .......................................................................................... 44 Research question 1 ........................................................................................................... 45 Research question 2 ........................................................................................................... 46 Research question 3 ........................................................................................................... 46 Evaluating Models ............................................................................................................. 46 Chapter 4: Results ......................................................................................................................... 49 Missing Data ............................................................................................................................. 49 Missing data analyses ............................................................................................................ 49 Differences in motivation survey completion based on demographic characteristics ....... 50 Differences in motivation survey completion based on initial motivation ........................ 51 Differences in motivation survey completion based on incentive me chanism .................. 51 Differences in initial motivation based on ACT missingness ............................................ 52 Summary of Missing Data Analyses .................................................................................. 53 Multiple imputation ............................................................................................................... 54 Imputed Dataset Checking ................................................................................................. 55 Reliability Overview ................................................................................................................. 58 Longitudinal Measurement Invariance ...................................................................................... 59 Descriptive Statistics ................................................................................................................. 59 Random Forest Findings ........................................................................................................... 60 Research Question 1 .............................................................................................................. 60 Research Question 2 .............................................................................................................. 66 Research Question 3 .............................................................................................................. 72 Robustness Checks .................................................................................................................... 75 Chapter 5: Discussion ................................................................................................................... 77 Research Question 1 .................................................................................................................. 78 Research Question 2 .................................................................................................................. 80 Research Question 3 .................................................................................................................. 83 Themes across Research Questions ........................................................................................... 86 Limitations ................................................................................................................................ 88 Implications for Future Research .............................................................................................. 91 Implications for Practice ........................................................................................................... 92 Conclusion ................................................................................................................................. 93 APPENDIX .................................................................................................................................143 vii REFERENCES.................................................................................................................... ........145 viii LIST OF TABLES Table 1. Research Question 1: Predictors of Year 3 Spring Expectancy and Value ............... ......95 Table 2. Research Question 2: Predictors of Year 3 Co -Curricular Activity Participation ..... ......96 Table 3. Model 3: Predictors of Post -Third -Year Persistence in Engineering ........................ ......97 Table 4. Data Collection Timeline ........................................................................................... ......98 Table 5. Incentive Mechanisms for Each Survey .................................................................... ......99 Table 6. Reliabilities for Imputed and Raw Data .................................................................... ....100 Table 7. Variable Processing Plan for Co -Curricular Activities ............................................. ....101 Table 8. Variable Creation for Co -Curricular Analyse s.......................................................... ....102 Table 9. Correlations and Descriptive Statistics ...................................................................... ....103 Table 10. Robustness Check for Variable Importance on addit ional imputed dataset #1: Research Question 1, predicting Expectancy and Value at the end of Year 3 ........................ ....105 Table 11. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 1, predicting Expect ancy and Value at the end of Year 3 ........................ ....106 Table 12. Robustness Check for Variable Importance on additional imputed dataset #1: Research Question 2, predicting Co -Curricular Activity Participation in Year 3 ................... ....107 Table 13. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 2, predicting Co -Curricular Activity Participation in Year 3 ................... ....108 Table 14. Robustness Check for Variable Importance on addit ional imputed dataset #1: Research Question 3, predicting Post -Third -Year Persistence ................................................ ....109 Table 15. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 3, predi cting Post -Third -Year Persistence ................................................ ....110 Table 16. Fit Statistics for Longitudinal Confirmatory Factor Analysis of Expectancy and Task-Value ............................................................................................................................... ....111 Table 17. Model Performance: RMSE Values, predicting Expectancy and Value at the end of Year 3 .................................................................................................................................. ....112 Table 18. Variance Explained for Research Question 1 .......................................................... ....113 ix Table 19. Variable Importance: Research Question 1, predicting Expectancy and Value at the end of Year 3 ...................................................................................................................... ....114 Table 20. Model Performance: RMSE Values for Research Question 2, predicting Co -Curricular Activity Participati on in Year 3 ............................................................................. ....115 Table 21. Variance Explained for Research Question 2 .......................................................... ....116 Table 22. Variable Importance: Resear ch Question 2, predicting Co -Curricular Activity Participation in Year 3 ............................................................................................................. ....117 Table 23. Model Performance: AUC for Research Question 3, predict ing Post -Third -Year Persistence in Engineering ....................................................................................................... ....118 Table 24. Variance Explained for Research Question 3 .......................................................... ....119 Table 25. Variable Importance: Research Question 3, predicting Post -Third -Year Persistence ............................................................................................................................... ....120 x LIST OF FIGURES Figure 1. Conceptual Model .......................................................................................... ..............121 Figure 2. Missing Data Visualization ............................................................................ ..............122 Figure 3. Partial Dependence Plots for Model 1c for the Outcome Engineering Expectancy ..................................................................................................................... ..............123 Figur e 4. Partial Dependence Plots for Model 1c for the Outcome Task -Value ........... ..............124 Figure 5. Partial Dependence Plots for Model 1e for the Outcome Engineering Expectancy ..................................................................................................................... ..............125 Figure 6. Partial Dependence Plots for Model 1e for the Outcome Task -Value ........... ..............126 Figure 7. Partial Dependence Plots for Model 2a for the Outcome Advising ............... ..............127 Figure 8. Partial Dependence Plots for Model 2a for the Outcome Career Event ......... ..............128 Figure 9. Partial Dependence Plots for Model 2a for the Outcome Career Advising ... ..............129 Figure 10. Partial Dependence Plots for Model 2a for the Outcome Mock Interviews ..............130 Figure 11. Partial Dependence Plots for Model 2a for the Outcome Math Tutoring .... ..............131 Figure 12. Partial Dependence Plots for Model 2d for the Outcome Advisi ng............. ..............132 Figure 13. Partial Dependence Plots for Model 2d for the Outcome Career Event ...... ..............133 Figure 14. Partial Dependence Plots for Model 2d for the Outcome Career Advising . ..............134 Figure 15. Par tial Dependence Plots for Model 2d for the Outcome Mock Interviews ..............135 Figure 16. Partial Dependence Plots for Model 2d for the Outcome Math Tutoring .... ..............136 Figure 17. Partial Dependence Plots for Model 3a for the Outcome Persistence .......... ..............137 Figure 18. Partial Dependence Plots for Model 3b for the Outcome Persistence ......... ..............138 Figure 19. Partial Dependence Plots for Model 3c for the Outcome Persistence .......... ..............139 Figure 20. Partial Dependence Plots for Model 3d for the Outcome Persistence ......... ..............140 Figure 21. Partial Dependence Plots for Model 3e for the Outcome Persistence .......... ..............141 1 Chapter 1: Introduction Despite many years o f research addressing the problem , lack of persistence 1 remains an issue in U.S. universities: students who begin college do not always persist to graduation (Reason, 2009). Failure to persist is a distinct problem in STEM majors (PCAST, 2012), and especi ally in engineering majors (Chen, Carroll, & NCES, 2005) , as the demand for STEM graduates is increasing (Langdon, McKittrick, Beede, Khan, & Doms, 2011 ). It is important to note that persistence issues are not unique to engineering or to STEM. Indeed, student persistence in STEM fields is higher (72%) than in persistence in health sciences (65%) or education (58%) (Chen & Soldner, 2013) . However , rate of persistence may not tell the whole story. For instance , at some universities , there are higher barrier s to gain entrance to the major, such as a higher GPA threshold or a secondary admissions process , and sometimes the secondary admissions process is gender -biased (Holloway, Reed, Imbrie & Reid, 2014 ). Therefore, one possible explanation for the disparate rates of persistence in humanities and STEM is that there is not such a secondary admissions process for humanities majors. Thus, students in engineering majors might be, through selection effects, already a subset of higher achieving students than student s admitted to education or health science majors, since the criteria for admission are not as high for the latter fields. Although there is extensive prior research on persistence, issues of low persistence prevail in engineering (Geisinger & Raman, 2013; Litzler & Young, 2012). Researchers and practitioners alike have made efforts for many years to increase migration into engineering 1 In this dissertation, I will use the word ÒpersistenceÓ rather than the word ÒretentionÓ because, as Reason (2009) points out, ÒpersistenceÓ is more student -centric and ca lls to mind the behavior of the student, whereas ÒretentionÓ is a word that has more to do with the administrative functionality of the university. Students care about persisting, whereas universities care about retaining those students (Reason, 2009). 2 majors and mitigate the attrition from engineering majors (Ngambeki, Evangelou, Long, Ohland, & Ricco, 2010 ; Ohland, Sheppar d, Lichtenstein, Eris, Chachra, & Layton, 2008 ; Seymour & Hewitt, 1997 ), yet students continue to leave these majors at high rates. Two reasons have often been discussed as explanations for low rates of engineering persistence: (a) early attitudes toward the engineering major and profession and (b) the challenging nature of the engineering field (Besterfield -Sacre, Atman, & Shuman, 1997 ; Geisinger & Raman, 2013 ). Early attitudes toward the field of engineering and toward the engineering major are related t o studentsÕ decisions to depart from the major, even when their achievement in college is high (Be sterfield -Sacre, et al. , 1997). The university at which this study takes place makes concerted efforts to ensure that students have a clear understanding of e ngineering from the beginning by engaging students in engineering -related activities outside the classroom that are focused on supporting their Òacademic, personal, and professional developmentÓ (Walton, Briedis, Urban -Lurain, Hinds, Davis -King, & Wolff, 2 013, p. 23.262.6). Even so, students could have some mis -calibration and may need to do a reassessment of their general interest and the degree to which they are suited for the discipline when they are faced with the challenging courses that are part of th eir coursework in their major -specific classes. Accordingly, some colleges and universities are deciding to design the curriculum in a way such that the students could be exposed to the engineering content from the start of college . The PresidentÕs Council of Advisors on Science and Technology (PCAST, 2012) recommended that engineering design courses be introduced early in the first two years of college in order to foster student engagement and interest in engineering. Another reality is that engineering is a challenging discipline. In a meta-analysis, Geisinger and Raman (2013) found that 23 of the studies they reviewed indicated that low academic achievement while in college and a lack of theoretical understanding of 3 engineering principles informed student sÕ decisions to depart from an engineering major. It is particularly important to consider the psychological underpinnings of persistence because universities dedicate many resources to retention programs that endeavor to ensure that students do persist ( e.g., Good, Halpin, & Halpin, 2002). Beyond the programmatic considerations, though, another key reason to study engineering persistence is that students who are interested in engineering when they begin college should not be discouraged from completing an engineering degree by structural features of the college environment. Consequently, researchers remain committed to exploring the reasons that students persist or leave STEM fields in college (e.g., Hernandez, Schultz, Estrada, Woodcock, & Chance, 2013 ). Researchers have approached the study of persistence in different ways based on the conventions of their discipline. Researchers from two primary disciplines focus on questions of persistence in higher education. First, higher education research is focused on the administrative and logistical reasons that might drive students to continue in their degree program (see Tinto, 1997). Second, psychological research on persistence examines constructs such as studentsÕ academic self -perception s as potential mechan isms that drive students to persist (see Wang & Degol, 2013 for a review ). Higher education researchers are interested in predicting persistence, whereas psychological researchers are generally more interested in explaining the phenomenon of persistence (s ee Yarkoni & Westfall, 2017). Below, I describe the varying approaches taken by higher education and psychological researchers to address undergraduates Õ persistence . One feature of higher education research on persistence is that it tends t o focus on inst itutional structures that either support or undermine persistence , in order to inform university efforts about which programs are valuable to continue to pursue. Higher education research suggests that co -curricular activities are important for persistence (Kuh, Cruce, Shoup, Kinzie, & 4 Gonyea, 2008; Pascarella & Terenzini, 2005), but many universities do not use analytics to see the ways that different co -curricular activities affect students. Rather, each program is evaluated independently. For example, some studies have examined connections between living on campus and persistence (Schudde, 2011), while others have examined the relation of tutoring and persistence (Grillo & Leist, 2013; Laskey & Hetzel, 2011), and still others have examined the relation of academic advising and persistence (Metzner, 1989). Higher education research thus focuses on the connections between universitiesÕ co -curricular programmatic efforts and outcomes. Psychological research on persistence focuses on psychological mechanisms that inform persistence. While longitudinal studies with multiple indicators of motivation and multiple indicat ors of co -curricular engagement are rare, some research suggests that engineering persistence might be better understood by examining a combinat ion of the activities that students engage with as well as their motivation over time. For example, Raelin, Bailey, Hamann, Pendleton, Reisberg, and Whitman (2014) found that cooperative education (co -op) participation was related to higher levels of expec tancy , and that higher levels of expectancy were related to higher levels of persistence in engineering. One valuable theoretical perspective is Expectancy -Value Theory, which is often utilized to explore questions of the reasons that students leave or per sist in STEM fields ( Wang & Degol, 2013) . Expectancy -Value Theory (Eccles et al., 1983) posits that students are motivated to achieve based on their expectancies for success and their value for the task at hand. Expectancies for success are related to the question ÒCan I do this?Ó while task -value is related to the question ÒDo I want to do this?Ó (Eccles et al., 1983; Wigfield & Eccles, 2000). Psychological research suggests that motivation is important for persistence (Perez, Cromley, & Kaplan, 2014; Wang & Degol, 2013). One gap that this study will fill is to 5 examine the possibility that motivation and co -curricular activities both relate to persistence and will examine whether new insights could arise from an alternative modeling approach. Higher educati on and psychological research each present various strengths and drawbacks in the study of persistence. For example, higher education research is strong in its focus on implications for practice and in its data -driven approach to understanding the usage an d utility of campus resources, such as tutoring (e.g., Laskey & Hetzel, 2011 ; Rheinheimer & Mann, 2000). However, higher education research does not often address the underlying psychological mechanisms behind persistence. By contrast, psychological resear ch explains potential mechanisms that explain persistence, but does not always account for idiosyncrasies of the university context. Rather, psychological research focuses on generalizable psychological principles that are independent of context . Either bo dy of literature alone is insufficient to fully explain the phenomenon of persistence in engineering, so I raise the possibility that by considering both bodies of literature in tandem, new insights might emerge. In this study, I combine the practical focu s of higher education research with the theoretical explanation focus of psychological research in order to contribute in a more robust way to educational theory and practice . By combining the considerations of two major bodies of work, I was faced with th e consideration of the best methodological approach to consider focal variables from both perspectives. To fully explore the effects of co -curricular activities and motivational processes, a method was needed that could both account for longitudinal behavi oral data about co -curricular activity participation and longitudinal survey data about motivation in predicting outcomes. One promising approach is Òmachine learning,Ó which refers to a host of analytic methods by which statistical modeling is used to pre dict future outcomes (Sammut & Webb, 2016 ). Machine learning is well suited for this work because it can handle large volumes 6 of input variables and can explore the nonparametric connections between these variables and the outcome (James, Witten, Hastie, & Tibshirani, 2013 ). Machine learning uses techniques not traditionally employed in either higher education or psychological research on persistence . Indeed, machine learning can reveal new explanations about persistence by modeling the relations between ex isting constructs in a different way. Thus, machine learning provides a new analytical approach that could allow researchers to derive new insights from self -report survey data and from data that provides evidence of studentsÕ engagement with co -curricular support activities. It is possible that the modeling approaches that have historically been applied in research on college student persistence are insufficient to describe the full complexity of the context in which college students make the decision s of whether to persist or not. Thus, I demonstrate the potential of machine learning techniques - specifically, random forest - to elucidate the complex relations among co -curricular activity participation, motivation, and persistence. In so doing, I lay the foundation for future researchers who are interested in motivation in higher education to model the complex underpinnings of studentsÕ co -curricular engagement , in order to inform universitiesÕ programmatic decisions about resources they provide to support student persistence. In this dissertation, I examine both the relations of participation in co -curricular support activities and student motivation, conceptualized as expectancy for success and task -value, on undergraduatesÕ persistence in engineering . I contribute to both research and practice by investigating research questions framed in Expectancy -Value Theory while leveraging new modeling techniques to explain human behavior in a new way. I develop a series of complex machine learning models in order t o explain the complex connections among motivation, co -curricular activity engagement, and persistence. The purpose of this work is to explore the 7 possibility of differential impacts of both (a) levels of expectancies and values and (b) participation in co -curricular support activities throughout the early years of an engineering program in predicting engineering persistence. The application of the findings of this study will be to better support students in their journeys in engineering. In the following c hapter, I will review the literature from psychological research on persistence, from the Expectancy -Value Theory perspective. I will then examine the extant literature from the higher education perspective on institutional supports for persistence, includ ing the co -curricular supports of academic advising, residing on campus, career counseling, and tutoring. I will introduce a conceptual model to explain my interpretation of the ways in which motivation and co -curricular support activities support persiste nce in engineering majors. 8 Chapter 2: Literature Review Different theories have been presented to explain the ways in which humans are moved to act. In this study, I draw from Expectancy -Value Theory (Eccles et al., 1983) because this theory is useful to explain the complex relations between motivation and persistence . Expectancy -Value Theory provides a robust perspective from which to understand how motivation manifests in college students, and how that motivation is linked with persistence. Expectancy -Value Theory Expectancy -Value Theory (EVT) posits that academic motivation is comprised of studentsÕ evaluations of whether they believe they are capable of success on a task (e.g., expectancy) and whether they believe the task is worth pursuing (e.g., va lue) ( Eccles et al., 1983; Wigfield & Eccles, 2000 ). In empirical research framed in EVT , two consistent themes emerge. First, expectancies for success are predictive of studentsÕ actual achievement on a task and of academic choices (Eccles, 2009; Wigfield & Eccles, 2002 ; Wigfield, Tonks, & Klauda, 2009 ). Second, task-value is predictive of studentsÕ future choices, such as their decisions about whether to persist in studying a subject or to abandon it in favor of something they perceive to be more interest ing or important (Bong, 2001). Expectancy -value theory also includes cost, which is described as Òwhat the individual has to give up to do a task (e.g., ÔDo I do my math homework or call my friend?Õ), as well as the anticipated effort one will need to put into task completionÓ (Eccles, 2005 , p. 113). However , much of the research in EVT examines expectancies and values but excludes cost . These studies find expectancies and values alone predict student outcomes in both children and college students (e.g., Bo ng, 2001; Kosovich, Flake, & Hulleman, 2017; Wigfield et al., 1997) and thus I will focus on expectancies and values in the current work. Expectancy -value theory is a valuable theory because it not only explains the predictors of 9 motivation, but it also d escribes the connection between context and the development and outcomes of motivation. Expectancy -value theory has been applied to answer questions of STEM persistence specifically: researchers found that expectancies for success and task -value were relat ed to career choice as well as major persistence decisions ( Cromley, Perez, & Kaplan, 2016; Perez et a l., 2014; Wang & Degol, 2013). Task -value and persistence . Task-value has been linked to studentsÕ intentions to persist and to their actual persistence in STEM ( Jones, Paretti, Hein, & Knott, 2010; Perez et al., 2014; Robinson, Lee, Bovee, Perez, Walton, Briedis, & Linnenbrink -Garcia, 2018 ) as well as to their intentions to graduate from college more broadly (Ethington, 1990). Task -value can be further sub-categorized into three main types of value: interest value, utility value, and attainment value. Interest value is the level of interest that a student has in the topic; utility value is the degree to which the student perceives that the subject will be useful for the future; attainment value is the degree to which the student identifies the subject as being important because it is important to his or her identity (Eccles et al., 1983). Task-value is predictive of studentsÕ plans to persist in an engin eering major (Jones et al. , 2010; Robinson et al. , 2018). The three types of task -value (interest value, utility value, and attainment value) are related to persistence in STEM in simi lar ways. Although some studies suggest that attainment value is a stron ger predictor of career intentions than other types of value (e.g., Andersen & Ward, 2014), other work suggests that task -value as a whole is more explanatory of persistence in engineering than expectancy for success in engineering (Robinson et al., 2018). Thus, task -value may be a more salient predictor than expectancy in predicting college engineering persistence. Task-value is operationalized sometimes as its component parts and sometimes as one monolithic construct. Since the three constructs of interes t value, utility 10 value, and attainment value are highly related and expected to relate in similar ways to outcomes , there could be problems in the estimation of the model if all three types of value are included as separate predictors. Expectancy and persistence . StudentsÕ expectancies for success are related to their achievement (Battle, 1966 ; Linnenbrink -Garcia et al., 2018; Perez et al. , 2014; Wang & Degol, 2013) and persistence (Ball, Huang, Cotton, Rikard, & Coleman, 2016) . Some studies suggest that self-concept of ability is more important in predicting achievement outcomes than actual competence (e.g., Battle, 1966). In middle school students, the grade that students expected to receive was more predictive than their intelligence of the grades they actually received (Battle, 1966). More specifically, students with high expectancy for success and low IQ performed better than students with low expectancy for success and high IQ (Battle, 1966). This finding suggests that expectations of future success (or failure) are even more powerful than IQ in predicting outcomes. Therefore, students are not only assessing their intelligence when they make an assessment about whether they are capable of future success. Rather, they are making a domain -specific, task -specific projection about the work they can accomplish in that moment. Prior studies provide evidence that studentsÕ beliefs about what they are capable of have effects on their career goals ( Estrada, Woodcock, Hernandez, & Schultz, 2011; Mau, 2003) and o n their academic achievement (Battle, 1966). These self -beliefs about expectancy for future success, established early, continue to be important when students reach college. Expectancy for success informs science course -taking in college (DeBoer, 1986) an d achieve ment in science courses (Perez et al. , 2014). Higher self-assessment of science ability is related to the decision to take more science courses in college (DeBoer, 1986). The same study also found that studentsÕ expectancy for success in science i s supported by prior course -taking in 11 science; taking more science courses in high school was related to higher expectancy in science . Gender differences also emerge here: women take fewer science courses in college than men, perhaps because they took fewe r science courses in high school and because they express lower expectancy for science than men (DeBoer, 1986). This is important because taking science courses is foundational to pursuit of a major or career in STEM. One reason that students may not purs ue a career in STEM is that they feel low expectancy for success with regards to STEM. The development of expectancies and values . Expectations about whether one will be good at tasks related to STEM are created and solidified early (Mau, 2003). Indeed, students' math expectancy beliefs in 8 th grade were related to their aspirations to pursue a career in STEM two years after high school graduation (Mau, 2003). There are also gender differences in the duration of career aspirations in STEM - men who aspire d to a career in science or engineering in 8 th grade were more likely to hold onto that goal six years later than women were over the same time period (Mau, 2003). Task -value changes across development : across all racial/ethnic groups, interest in science and mathematics declined as students progressed from fourth grade to tenth grade (Peng, Wright, & Hill, 1995). Expectancy and value do not develop in a vacuum, but rather function together to predict persistence interrelated. Task -value is related to ex pectancy, and both predict persistence and achievement . In college, task -value and expectancy both predict ed course -level achievement in a chemistry course, controlling for prior levels of achievement (Zusho, Pintrich, & Coppola, 2003). There is some evide nce that task -value and expectancy work together to shape outcomes , such that higher levels of both constructs are most adaptive for academic achievement (Nagengast et al. , 2011; Trautwein et al. , 2012). Additionally, declining task -value across grades 1 through 12 is explained in large part by similar declines in studentsÕ academic expectancy for 12 success across the same time period (Jacobs, Lanza, Osgood, Eccles, & Wigfield, 2002). These studies provide evidence that expectancy and value both explain uniqu e variance in outcomes (e.g., expectancy is more predictive of achievement and value is more predictive of academic choices), while also suggesting that expectancy and value are highly interrelated and function jointly to predict key academic outcomes . Bey ond the interrelations of expectancy and value, both constructs are also shaped by socializing forces. StudentsÕ peers (Jones, Audley -Piotrowski, & Kiefer, 2012; Ryan, 2000) and parents (Fredricks & Eccles, 2002; Frome & Eccles, 1998) shape studentsÕ mot ivational beliefs. School structures and teacher behaviors can also shape the development of both ex pectancies and values (Patrick et al. , 2007; Wang & Degol, 2013). Indeed, teachersÕ supportive behaviors are related to student engagement in the elementary school classroom through a series of motivational constructs as mediators. More specifically, teachersÕ behavior shapes studentsÕ academic goals, their expectancy for success on future academic tasks (i.e., academic expectancy ), and their social efficacy; those three constructs in turn are related to studentsÕ classroom engagement (Patrick et al. , 2007). One gap in this literature is a lack of understanding of how school environment structures may affect students in college. For example, school structures in the K -12 years are fairly contained: students within a classroom have largely the same experiences. When students reach college, however, their experiences can vary widely within the same university as they take courses from different instructors and in teract with different on -campus resources. To summarize, Expectancy -Value Theory provides a robust theoretical perspective from which to consider the ways in which students are motivated to persist in college. Expectancy for success and task -value are aff ected not only by past experiences, but also by new experiences that occur during college. For instance, d uring the first year of college, expectancy for success in 13 engineering is informed by comparison of oneÕs own performance to that of oneÕs peers (Hutc hison -Green, Follman, & Bodner, 2008). Thus, the experiences that students have in college can serve to either enhance their efficacy beliefs or to diminish them. Many motivation theories acknowledge the role of context, but it is difficult to study this e xplicitly. Fortunately, self-report measures are not the only source of information about a studentÕs educational context: behavioral log data can also help researchers to understand in more depth the way that students behave within the campus context. Mor e specifically, universities leverage a number of co -curricular programmatic efforts to support student success; these programs often collect data about student attendance. Thus, co -curricular program data can provide one source of information about the ed ucational context, beyond self -report indicators alone . In the following section, I will describe four co -curricular support resources and explain how these behavioral indicators could help explain persistence behaviors in engineering. Co-Curricular Suppo rts for Persistence In addition to the factors that underlie individual motivation development, there are programs and structures at the level of the university that can support or undermine persistence in STEM (Cromley et al. , 2016). Some research has att empted to address the problem of low persistence by examining co-curricular support activities and how these support or detract from student persistence. Co-curricular support activities are defined here as resources provided by the university that are mea nt to bolster studentsÕ academic experiences and success but are not a required part of the curriculum. Students can opt -in to participate in these co -curricular activities, such as tutoring or advising, but are not required to do so in order to graduate. Researchers have found that studentsÕ participation in co -curricular activities and engagement with their campus is highly associated with their persistence (see Pascarella & Terenzini, 2005, for a review). 14 Much research has been dedicated to identifying the specific sources that are related to academic success and persistence. The current study will examine four sources that have been identified as particularly beneficial for students. Specifically, receiving academic advising (Metzner, 1989), living on c ampus (Schudde, 2011), utilizing career counseling services (Restubog, Florentino , & Garcia, 2010 ) and attending tutoring sessions (Grillo & Leist, 2013; Laskey & Hetzel, 2011) are all activities that are identified as correlated t o, and potentially suppor tive of , persistence. Academic advising and persistence . Academic advising is supportive of student persistence (Metzner, 1989; Swecker, Fifolt, & Searby, 2013). For first -generation college students, the frequency of advising appointments is related to a higher likelihood of persistence to graduation, with the likelihood of persistence increased with each additional advising appointment (Swecker et al. , 2013). Students who interact with advisors during a change of major are buffered against declines in s emester GPA; they show increases in semester GPA instead (McKenzie, Tan, Fletcher, & Jackson -Williams, 2017). Students are most satisfied with the advising process at their university when they are able to meet regularly with an academic advisor (Lowe & To ney, 2000), but the quality of advising, in addition to the frequency of advising meetings, matters for persistence. Students who perceive themselves as having received high -quality academic advising are more likely to persist than students who receive low -quality academic advising, but even the students who receive low -quality academic advising are more likely to persist than students who do not attend advising at all (Metzner, 1989). Thus, it seems that the act of attending advising - even if that advisin g is perceived as low -quality - is correlated with an elevated likelihood of persistence. There is evidence to suggest that the effect of academic advising on persistence is 15 indirect . Academic advising was related to higher levels of student satisfaction, higher GPA, higher levels of perceived utility of the degree, and lower levels of intention to leave the university; these four factors in turn led to greater persistence (Metzner, 1989). However, there is also evidence that poor advising can be harmful f or persistence. E ngineering students in particular note that receiving incorrect information from academic advisors about program requirements is a negative feature of being enrolled in an engineering major (Haag, Hubele, Garcia, & McBeath, 2007). This dis satisfaction may lead students to be less likely to persist in engineering. !While there are mixed findings with respect to the role of advising in informing student persistence , attending advising is not a required feature of all degree programs. Thus, aca demic advising may play a role in supporting student persistence, but few studies have sought to investigate whether there is a connection between student motivation, academic advising, and persistence. The current study will unpack this connection in more detail , and will also consider the potential of differential effects of advising based on studentsÕ year in school (e.g., first -year students may have different outcomes than second -year students who attend advising) . Major -specific residential life and persistence . Prior research suggests that living in a residential hall is supportive of student persistence. SchuddeÕs (2011) analysis of a national multi -university dataset found that living on campus is related to increased likelihood of persisten ce in t he second year. It is recommended that universities strongly encourage or require students to live on campus for the first year (Braxton & McClendon, 2001), perhaps because studentsÕ social integration with peers is related to enhanced institutional commit ment and intention to continue at the university during the following year (Berger, 1997). Above and beyond simply living on campus, many universities integrate programmatic elements to facilitate 16 studentsÕ sense of connection and belonging. For example, s ome programs utilize a course -taking model in which students take courses together; participating in such a course -taking group is related to persistence (Tinto, 1997), even when it is not part of a residential life program. University learning communities in STEM are sometimes provided to support lower -achieving students as they transition to college. For students who were near the inclusion threshold for the learning community (i.e., students whose prior achievement was close to, but not above, the cutoff for participation in the program), a STEM learning community that included a shared course for students and regular meetings with a more advanced student in their major was related to higher grades in major -specific courses, higher major persistence, and perceived higher belongingness (Xu, Solanski, McPartlan, & Sato, 2018). Fidler and Moore (1996) found that living on campus was positively related to persistence, and that that effect was amplified when students also participate in a first -year orientation program. In the context of engineering, participating in a first -year engineering program that included residential and academic components was also associated with higher graduation rates (Olds & Miller, 2004). In summary, living on campus may support pe rsistence by a variety of mechanisms, including by supporting studentsÕ sense of belonging and by fostering academic and social connections with peers. Career services and persistence . Engineering students sometimes seek career -related guidance from acade mic advisors, and express dissatisfaction with not being able to receive such guidance from their academic advisors (Sutton & Sankar, 2011). Beyond academic advising, there are additional resources , such as career counseling centers, provided by the univer sity to provide guidance to help students discern the career they will pursue after graduation . Students might feel high career self -efficacy , meaning that they might feel confident about gathering information about potential future careers, or their caree r self -efficacy may be low (Ireland & 17 Lent, 2018) . CollegesÕ support of such career self -efficacy conceptions through programming and advising about career pathways is important (Braxton & McClendon, 2001 ). Indeed, students who attend career counseling ses sions experience higher levels of career self -efficacy. This career self -efficacy leads in turn to a higher degree of confidence in oneÕs career path (Òcareer decidednessÓ) , which leads to lower levels of attrition from the major (Restubog et al. , 2010). While parental support is also li nked with career decidedness (Re stubog et al., 2010), it is essential to consider the role of university -provided career services so that students may be equitably supported in both their discernment of a future career and i n their journey to attain their career goals. Tutoring and persistence . Tutoring is one resource that universities provide to support student persistence. The amount of tutoring that students receive is correlated with their course grades: more tutoring wa s correlated with higher course grades across different genders and ethnic groups (Rheinheimer & Mann, 2000). For at -risk students, attending tutoring is related to elevated GPAs and persistence (Laskey & Hetzel, 2011). Thus, tutoring may help support pers istence via its effects on studentsÕ GPAs. Indeed, one study found that GPA was a partial mediator of the effect of the total number of hours a student attended tutoring on the likelihood of graduating from the university (Grillo & Leist, 2013). However, t he research about the efficacy of tutoring is mixed. Some research indicates no difference in GPA after tutoring between high -risk students who received tutoring and those who did not receive tutoring, although notably, in that study, overall frequency of tutoring attendance was low - not exceeding five times per student (Hodges & White, 2001). Some universities have tested programs in which tutoring is mandatory, to see whether marked gains emerge in student performance. One study examined the effects of t ype of math course enrollment , achievement in math, and whether or not 18 tutoring attendance was compulsory (Halcrow & Iiams, 2011 ). Indeed , when math tutoring center attendance is compulsory, time spent at the math tutoring center was correlated with higher course grades . However, for students who were informed about the tutoring center's existence, but were not compelled to attend, tutoring attendance was not correlated with higher course grades. Some students who were enrolled in lower level math courses e xpressed reluctance to ask a tutor for help, partially because they felt insecure about the questions they were asking and whether those questions would expose them as lacking knowledge (Halcrow & Iiams, 2011). To summarize, whereas some research suggests a positive correlation between tutoring and academic achievement, the precise nature of the relation between tutoring and persistence remains unclear, and some students who might benefit from tutoring do not seek it out. Predictors of engagement in co -curricular support activities . Since academic advising, major -specific residential living, attending career services, and attending tutoring are all related to persistence, it is important to consider the factors that might lead students to engage with these s ervices. Many students want academic support; however, struggling students are often reluctant to seek help (Karabenick & Knapp, 1988). The Expectancy -Value Theory perspective can help to explain studentsÕ initiation of help -seeking protocols and subseque nt engagement in co -curricular support activities. StudentsÕ expectancy for success and their value for the task are related to their continued pursuit of a given domain (Wang & Degol, 2013). Somewhat paradoxically, high -achieving students are more likely to ask for help in a course (Kitsantas & Chow, 2007) , a surprising finding because high -achieving students also have high expectancy . There are a number of reasons that low -achieving students might not seek academic support, including negative emotions and learned 19 helplessness associated with the difficult task or difficult subject matter (Karabenick & Knapp, 1988). Low task -value might help explain struggling students' reluctance to seek help . Indeed, experiencing failure within an academic field can, depe nding on studentsÕ attributions for the failure, cause them to feel less motivated and to feel higher levels of negative emotions related to that academic field (Perry, Stupnisky, Daniels, & Haynes, 2008). This decision to devalue a field in which one is n ot experiencing success may help protect students' sense of self -worth, but it could be damaging because it could lead students to prematurely give up on a task. Individual factors, such as personality, gender, and racial/ethnic group, are related to help -seeking behavior. Male -identifying students who conform strongly to American traditional masculinity norms avoid seeking help when they encounter academic challenges (Wimer & Levant, 2011). The aspects of American masculinity associated with reduced help -seeking were endorsements of the importance of independence, emotional restraint, and dominance . In a study of eleventh -grade students, researchers found that girls were more likely to seek academic help than boys, and that the psychological cost of help -seeking was lower for girls than for boys (Kessels & Steinmayr, 2013). Thus, some male students who struggle academically may be less likely to engage in support activities than female students. Students who are from underrepresented racial and ethnic group s perceive a stigma against participating in academic support resources when they feel stereotype threat and when they perceive the university environment as not welcoming (Winograd & Rust, 2014); thus, they may also be less likely to seek help when they e ncounter academic challenges. In this way, the university environment can lead to differential help -seeking behavior based on race, especially if s tudents feel stereotype threat. Additionally, there is some evidence for differential effects of academic and non-academic campus resources on Black students as compared with other students who are part of the racial 20 majority (Mallinckrodt & Sedlacek, 1987). For example, eating in a campus dining hall and working as a student employee were predictors of persisten ce for a full sample of students, whereas these predictors were not significant in a discriminant analysis for Black students only . As such, it is possible that even when resources are sought out, they differentially affect students. Taken together, these studies suggest that a variety of factors are brought to bear when students are deciding whether to engage in help -seeking resources, and that it is important to consider (1) gender, (2) underrepresented status, (3) first -generation student status, and (4) prior achievement as covariates in a model predicting college persistence using co -curricular activity participation as a major predictor. These variables were all included in the statistical models tested for this study. Outcomes of engagement in co -curricular support activities . It is possible that student engagement in co -curricular activities functions in an indirect, rather than a direct, way to have an impact on persistence. Indeed, students who engaged with campus resources in a deliberate (i.e., no n-random) way during the first year of college were more likely to persist to the second year of college and to have a higher first -year college GPA (Kuh et al. , 2008). That finding controlled for pre -college characteristics, so it is clear that the experi ences that students have outside their curricular learning are important. One student from another study, preparing to graduate, noted, ÒIt is funny that we are talking about things outside the classroom because I feel like that is the place that I have do ne my most growingÓ (Kuh, 1995, p. 123). Based on these finding s, it is possible that motivation arises from the context that students find themselves within, and part of that context does include university support and co -curricular initiatives. This evid ence for an indirect relation of contextual supports and persistence is quite promising for the current study because it suggests that it is meaningful to study the connection between studentsÕ 21 environmental supports (e.g., social capital and academic reso urces) and their academic achievement. Moreover, studies like this one lend credence to the idea that psychological mechanisms such as motivation may provide the link between student support activities and student persistence in college. Whereas this study will not precisely test for a mediating effect of co-curricular support activities on persistence via motivation, it will set the stage for future research by investigating the direct effects of co -curricular activities on persistence. Confounding factor s in studying co -curricular support effects . One important consideration is that studying co -curricular support effects puts the onus on the student to engage with those activities. One reason it is problematic to make students responsible for activity par ticipation is that they might not even know that the activities exist. Indeed, Engstrom and Tinto (2008) make the case that Òaccess without support is not opportunity. Without support many students, especially those who are poor or academically under -prepa red, are unlikely to succeedÓ (p. 50). Above and beyond issues of awareness and access, there is a fundamental problem with studying student success initiatives. Only some students participate in these, and this subset of students may be different from oth er students in important ways. In this dissertation, I aim to counter this limitation of past studies by including both students who do participate in co -curricular support activities and students who do not participate in them. I hypothesize d that there m ay be differences in the underlying motivational patterns of students who do engage with support activities as compared with students who do not. A second consideration when studying co -curricular support effects is that these resources might have differe ntial effects depending on the point within the college trajectory at which students engage with them. One study used event history modeling to show that first -generation studentsÕ likelihood of dropping out of college fluctuates over time, with the risk b eing the 22 highest in the first year of college (Ishitani, 2003). Thus, for first -generation students, interactions with co -curricular support activities in the first year may be more impactful than interactions with the same resources later in the college t rajectory, when their likelihood of departure from the university is already lower. The first year is also a time in which students are adjusting to the social and academic climate of the campus (Friedlander, Reid, Cribbie, & Shupak, 2007). This experience of adjustment affects studentsÕ emotional well -being. Additionally, after the first year is a time when students are likely to leave college, more so than at other junctures (Tinto, 1999) . Thus, the experience of utilizing co -curricular support resources during the first year of college might be fundamentally different from the experience of utilizing co-curricular support resources later in the college trajectory. One primary reason is that students might feel more academically and socially adjusted to t he campus after the first year has concluded. As such, it is important to consider the timing of studentsÕ engagement with support resources. Conceptual Model Thus, building on prior research and current conceptualizations of the complex pathways to colle ge success, I propose the following conceptual model as an organizing framework for this study (see Figure 1). Drawing from EVT, I hypothesize d that students enter college with a foundation of their motivational beliefs already established based on both th eir prior performance and their socialization (Figure 1, far left part of the model). These prior motivational beliefs may also be influenced by studentsÕ background; namely, by their race/ethnicity, their gender, and by their parentsÕ educational attainme nt. Indeed, past research indicates that there are differences in persistence and differential access to resources based on these factors. Stereotype threat affects womenÕs performance in engineering (Bell, Spencer, Iserman, & Logel, 2003), and it also aff ects 23 their likelihood of persistence in engineering (Beasley & Fischer, 2012). Students whose families do not have substantial fiscal resources could face unique challenges to persistence in college (Horn & Premo, 1995). Students who are the first in thei r families to go to college are less likely to complete a bachelorÕs degree; this effect is especially pronounced if they are from low -income families (McCarron & Inkelas, 2006). In engineering, Black, Hispanic, and Native American students are underrepres ented racial/ethnic groups (National Science Foundation, 2007), and persistence rates in STEM majors are lower for Black students than for their peers (Elliott, Strenta, Adair, Matier, & Scott, 1996). In this study, I will thus include these demographic v ariables (race/ethnicity, gender, and parental educational attainment) as controls in my models. I will focus my analyses on studentsÕ motivation and co -curricular experiences while in college, rather than on the ways in which their backgrounds may inform their initial motivational beliefs. This study will be a first step for modeling persistence in a novel and complex manner, and future research should examine ways in which these demographic variables might contribute to group differences in these processe s. These beginning -of-college motivational beliefs about engineering are then shaped by studentsÕ college experiences with success and failure. As in Eccles et al.Õs (1983) model, if students encounter success, they are likely to have their expectancy for future success enhanced . By contrast, if students encounter failure, they may experience declines in motivation based on the idea that they are uncertain about their likelihood of future success. In alignment with Expectancy -Value Theory , I posit that the decision to persist in an engineering major is shaped by a variety of factors, including studentsÕ co-curricular engagement and motivation throughout college. The underpinnings of help -seeking behavior in college have been explored, but the relation betwe en motivational factors that drive students to engage in such 24 behaviors, and their motivational outcomes (i.e., what happens during and after the student seeks help) are very nuanced . It is possible that students would seek out tutoring or other related resources when they feel that they are struggling or behind their classmates. Indeed, the same experience (e.g., tutoring) could affect one individual in one way (i.e., the student may leave and think ÒI can do this now!Ó) but another individual in a complet ely different way ( i.e., the student may leave and think ÒThis is tougher than I thought; it is hopelessÓ.) This conceptual model illustrates the complexity of the factors that may relate to persistence. Importantly, it is possible that students might ben efit differentially from co-curricular experiences. For example, low -achieving students might benefit more from a co -curricular activity than high -achieving students. There may be some selection effects that arise in this - for example, students might not select to attend tutoring if they are already doing well in math. High -achieving students or highly motivated students might be more likely to attend career services, academic advising, and other co -curricular activities because such activities could provide validation for their fit within the engineering field. Conversely, struggling students who could benefit from attending co -curricular support activities might choose not to because of the negative emotions they might associate with engineering. While th is study does not examine the reasons that students choose to attend co-curricular events , the voluntary nature these activities could produce effects that must be accounted for in future research . Current Study A new approach to engineering persistence . Engineering students are not universally persisting in college, and the supports that lead some people to persist and others to leave are not well understood (Geisinger & Raman, 2013). Whereas prior research has explored precursors of persistence in ter ms of either motivation (Cromley, Perez, & Kaplan, 2016; Perez et al., 2014; 25 Wang & Degol, 2013 ) or co -curricular support activities (Grillo & Leist, 2013; Laskey & Hetzel, 2011; Metzner, 1989; Restubog et al. , 2010; Schudde, 2011), it is uncommon to inclu de indicators of both self -reported motivation and actual co -curricular engagement. Additionally, prior research mostly focuses on studentsÕ experiences in the first year (Tinto, 1999 ), but it is likely that studentsÕ motivation and co -curricular participa tion vary greatly as they progress through their college years. One way to explore the nature of the relations between motivation and behavior is to statistically model engineering student motivation and behavior longitudinally, but the data and modeling a pproaches to allow for this type of inquiry are rare. These gaps in prior research present an opportunity for a new approach to studying engineering persistence. In this study, I examine longitudinal relations among two types of motivation (expectancy and task-value ), four types of co -curricular support activities (on-campus residence, math tutoring, career services engagement, and academic advising ), and engineering persistence at the end of studentsÕ third year in college . The current study examined a sample of undergraduate engineering students by following them for three years in order to examine the relations among co -curricular activities, motivation, and persistence. Broadly, I drew data from three main sources: (1) student self -report surveys, (2) institutional record data, and (3) co -curricular activity participation data, collected over the course of studentsÕ first three years of college. The primary variables under consideration in the current study are task -value, expectancy , on-campus resid ence, career services participation, math tutoring participation, academic advising participation, gender, race, first -generation student status, prior achievement (ACT score), and engineering persistence. The motivation constructs were measured via a self -report survey; the co -curricular activities were measured via behavioral log data. Demographic variables were gathered both by the self -report motivation 26 survey and by university administrative records. Here, persistence was operationally defined as stude ntsÕ decisions to continue in their pursuit of an engineering major at the end of their third year. It is reasonable to consider this from the perspective of persistence because prior work has shown that intentions to leave college are a good proxy for act ual attrition (Bean, 1982). I also am confident in using this outcome rather than actual graduation because a major barrier has already been passed by the time that students reach the end of their third year: they have gained admission to the College of En gineering, and most retention problems emerge within the first two years at this institution (Briedis & Walton, personal communication, 2018). The population in consideration was students who declared an engineering major before beginning at the university , and they continued to be retained in the study even if they changed their major after that initial declaration. The current study addresses several gaps in the existing body of literature. First, from the perspective of psychological research, this stu dy fills a gap through its examination of behavioral indicators. Prior psychological research has focused mainly on self -report data, but this study combine s motivation self -report data with behavioral indicators of whether students participated in co -curr icular support activities or not. A second gap this study addresses is that much of the research on the effects of co -curricular support focuses on proximal effects, such as course grades (e.g., Laskey & Hetzel, 2011 ); longitudinal research on co -curric ular support and persistence is limited (Grillo & Leist, 2013). Thus, I examined the psychological and behavioral factors that may relate to persistence for the first three years of college. In this way, I could uncover the potential differential effects o f motivation and co -curricular activity participation based on year in school. This consideration of differential effects is challenging in traditional modeling frameworks. 27 Thus, a third contribution of the current work is my use of complex modeling to analyze the data. Complex modeling is essential for the effective exploration of the complex relations of motivation and co-curricular engagement . Machine learning offers one robust modeling possibility to allow for nonparametric relationships between pred ictors and outcomes (Beck et al., 2000). Machine learning is well suited to answering questions of engineering persistence because it allows the researcher to simultaneously investigate the relations among predictors and outcomes. Machine learning models c an ÒlearnÓ the optimal combination of variables to predict the outcome with a high degree of accuracy, without overfitting. To summarize, this study adds to the engineering persistence research in several ways: first, it synthesizes both behavioral and self-report indicators , thus bridging the psychological and higher education literatures , and second, it employs a complex modeling approach to answer questions of differential effects of co -curricular activity participation and motivation based on year in school. Research questions . Thus, I address the following research questions: RQ 1: Do engineering studentsÕ co -curricular experiences (e.g. tutoring, advising), measured throughout the first five semesters (2.5 years) of college predict their motivatio n in the spring of the third year of college in terms of their expectancies for success and their value engineering? I model ed this research question by testing separate models for the first and second semester of the first year, and separate models for th e second and third year as well (see Table 1 ). Finally, I tested a model that included all time points as predictors of third -year motivation . All five models tested for this research question had third -year motivation as the dependent variable. I wanted t o test this series of models because I hypothesized that co -curricular activity participation would predict motivation later in college. Whereas I did not have specific 28 hypotheses about whether attending co -curricular activities during different semesters would differentially impact motivation , I designed models including the first, secon d, and third year experiences separately in order to see whether more proximal or distal experiences were related to third -year motivation. It was important to explore thes e co -curricular events and their relations to motivation using the random forest approach, particularly because this analytic method allows for probing for surprising or unexpected relations among predictors and outcomes. RQ 2: Do expectancies and value, measured on three occasions during the first two years of college, predict participation in co -curricular support experiences in the third year of college? I modeled this research question by testing separate models for the first and second semester of the first year and the second year (see Table 2). Finally, I tested a model that included all time points as predictors. All four models had the same multivariate outcomes: participation in co -curricular activities across studentsÕ third year of college (five variables from three sources: the engineering career center, the math learning center, and engineering academic advising). For Research Question 2, the primary goal was to test the potential role of motivation variables in predicting co -curricular activi ty engagement in the third year of college. Overall, I hypothesized that these models would show positive relations between motivation and co -curricular engagement for some activities, such as career center event attendance, academic advising . Prior resear ch has not explored in detail the reasons that students engage with career services and academic advising, focusing instead on the outcomes of these services, but I thought students would be more likely to seek these resources out if they already believed engineering to be valuable. I also hypothesized that the models in Research Question 2 would reveal the motivation variables as negative predictors of the math tutoring outcome, since studentsÕ motivation for engineering would likely be correlate d with the ir math achievement. One strength 29 of random forest models is that they allow for the exploration of the relations among predictors and outcomes by testing many combinations of relations. Since prior research had not clearly established the predictors of en gagement in co -curricular support activities, I wanted to allow the model to explore many possible relations rather than only specific relations. Thus, I only had a few specific hypotheses, as stated above. RQ 3: How do expectancies, task -value, and partic ipation in co -curricular support resources throughout the first three years of college relate to persistence in an engineering major in the summer following the third year of college? I modeled this research question by testing separate models for the f irst and second semester of the first year, and separate models for the second and third year as well (see Table 3 ). Finally, I tested a model that include d all time points as predictors. Overall, I hypothesized that the most predictive predictor of persis tence would be task -value. Task -value is a strong predictor of academic choice (Eccle s, 2009; Perez et al., 2014), so I hypothesize d that it would be more predictive of persistence than other types of motivation and other behavioral indicators would . As wi th Research Question 2 , I wanted to allow the model to explore many possible combinations of variables as predictive of persistence. Thus, while I hypothesized that task -value would be the most predictive of persistence, I kept an open perspective about th e rest of the predictors. I wanted to allow for new insights about the role of the rest of the predictors in informing persistence behavior in engineering. 30 Chapter 3: Method Participants The participants in this study were undergraduate engineering stu dents at a large, Midwestern public research university. More specifically, the participants qualified for the current study if they matriculated in an engineering major in Fall of 2015. Because the main aim of this study was to examine the motivation and behavior of students who initially matriculated in engineering, students who ÒmigratedÓ into an engineering major (Donaldson & Sheppard, 2007) Ð in other words, those who began at the university in a different major but then switched into engineering after the first semester Ð were excluded from this study as they may differ in substantial ways from the students who originally matriculated in engineering. Four surveys were conducted, with the first survey taking place in August before students began their first year of college and subsequent surveys taking place in the spring of their first, second, and third years of college. Participants were students who had survey data from any time point (e.g. Fall 2015 - T1, Spring 2016 - T2, Spring 2017 - T3, or Spr ing 2018 - T4). The total number of participants in the dataset was 1,287. Many of those participants did not complete all four waves of the survey, but all 1,287 completed at least one survey. I computed a sum variable for the number of surveys completed to help inform my decision about the inclusion criteria for this study. A frequency analysis of this Òtotal number of surveysÓ variable revealed that 348 students (27%) completed one survey, 367 students (28.5%) completed two surveys, 325 students (25.3%) completed three surveys, and 247 students (19.2%) completed all four surveys. Within this full sample ( N = 1,287), 26.2% of students (337) were female and 19.1% of students (246) were first -generation college students. With respect to the racial and ethnic background of the sample, 64.2% of students (826) were European American or White; 20.3% of students (261) 31 were Asian or Asian American; 7.5% of students (96) were Black or African American; 4.9% of students (63) were Hispanic or Latino/a; 1.1% of studen ts (14) were multiracial and not underrepresented minorities; 1% of students (13) were multiracial and underrepresented minorities; less than 1% of students (2) were Native American, American Indian, or Alaska Native; and less than 1% of students (1) were Native Hawaiian or other Pacific Islander. Less than 1% of students (11) did not have a self -reported or institutionally reported race/ethnicity provided. The random forest algorithm requires a full and complete dataset. The final number of participants u sed in the random forest analyses was 1,044; 243 participants were removed due to the presence of missing data on variables other than the motivation variables (a more detailed discussion of missing data is provided in the results). Context . Students in engineering disciplines at this university are treated as engineers from their admission to the university and onward. Students are formally admitted to the College of Engineering after reaching a credit threshold (56 credits) and a minimum GPA for engine ering and related coursework, but are able to access all engineering support resources from the beginning of college . Thus, even though not all students will be admitted formally to the College of Engineering, all students can benefit from programmatic res ources and social events from their first day of college. They also take engineering classes as early as the first year, before gaining admission to the College of Engineering One specific resource that is provided on this campus is the residential educa tion program. All first -year students are required to live on campus, and the majority of engineering students are housed in a special engineering residential community !.2 I will account for whether 2 Not everyone is placed in a residential community, partly due to issues of capacity. Depending on the date of their 32 students are housed in this residential community in my a nalyses in this dissertation. The residential neighborhood comprises many buildings all located in a close distance to one another. There are a number of levels of additional support that are incorporated into the neighborhood structure (Walton, Briedis, U rban -Lurain, Hinds, Davis -King, & Wolff, 2013). This study focused on three co -curricular support services that have been shown to be connected with student persistence: academic advising, career services, and tutoring services. First -year academic advis ing takes place is a residence hall within the residential neighborhood. Academic advising is not required for engineering students at this campus. Even so, the service is available to all students who are affiliated with the College of Engineering, even b efore they gain official admission to the College after meeting the required credit threshold. Engineering -specific career services are available to help students refine their goals and to obtain a job after graduation. In 2015, when the participants in th e current study were first year college students, the career center (which also has a physical location in the Engineering Building) held some activities in the residential neighborhood, such as corporate sponsor meetings and resume review sessions (Briedi s & Walton, personal communication, 2018). In the 2017-2018 academic year, the engineering career center at this university conducted 39 workshops and 1,350 career advising appointments (Michigan State University, 2018). Last, tutoring services are availab le in a variety of forms at the university. Students studying engineering are typically required to take many mathematics courses, and there is a mathematics -only tutoring center on campus. This mathematics tutoring center serves all introductory mathemati cs courses, and is utilized by engineering students who are enrolled in a mathematics course and want to utilize the free tutoring resource. registration, some engineering student s might be placed in a residence hall with students from a variety of majors. 33 Procedure The timeline for data collection, cleaning, and analysis appears in Table 4 and is described in greater d etail below. This study was deemed exe mpt by the Michigan State University Institutional Review Board, Study ID #STUDY00001561. Survey data collection . Students in the longitudinal study that collected the survey data take online surveys in which they ans wer questions about their motivation and engagement in engineering activities. While the survey was initially designed as a programmatic assessment, in 2015 motivation measures were added. Thus, the survey include d some items that are geared at assessing t he effectiveness of engineering support services such as introductory engineering design courses, peer mentors, and tutoring centers, but it also include d traditional measures of academic motivation. Students were recruited initially at a large assembly fo r all incoming freshman students. For subsequent surveys, students were contacted either in their courses or via email. Students received follow -up surveys even if they had not responded to the baseline survey or any prior follow -up surveys. Students were retained in the sample even if they changed their major out of engineering at any point after the week prior to the first semester of college. In August of 2015, students took their first surveys. This survey was emailed to students before their first cla sses at the university. Thus, this survey provide d a baseline measure of studentsÕ initial levels of motivation and their expected feelings about the College of Engineering before they enter ed college. In subsequent spring semesters, surveys were incentivi zed through one of these mechanisms: (1) a drawing for gift cards or school sports memorabilia, (2) a lottery drawing and the provision of guaranteed print pages for the campus printers, (3) course credit or extra credit in engineering courses, or (4) a pa id survey sent to those who are part of the cohort but were not reached in the engineering courses. Table 5 provides details of all incentives at all 34 time points. To briefly summarize: drawings were used for the first two iterations of the survey (e.g., T1 in Fall 2015 and T2 in Spring 2016) and additional types of incentive were introduced after that. In spring 2016, although the incentive was a drawing, the survey was administered in class for a large portion of participants. Through agreements with instr uctors of first -year engineering classes, students were given class time to complete the survey. To thank them for their participation, they were entered in a drawing for gift cards or school sports memorabilia after completing the survey. Beginning in Spr ing 2017, our research team began working with engineering faculty to incorporate the follow -up surveys as part of their courses. Thus, for the T3 survey, which launched in Spring 2017, some students received the survey for extra credit or course credit, w hile others who were not reached in courses were entered in a drawing. Although the research team optimized the combination of courses entered to maximally reach students, some students who were freshmen in fall of 2015 were not enrolled in any of the part icipating courses in spring of 2017. Thus, students who were not enrolled in a course that participated in the survey were emailed a survey link and rewarded with an entry in the drawing for Spartan Cash or sports memorabilia. One additional incentive type was introduced in the spring of 2018 (T4): payment. The research team obtained a grant to facilitate the longitudinal measurement of engineering studentsÕ motivation and achievement; thus, payment was offered for students starting in spring of 2018. Stude nts were offered payment if they were not enrolled in any of the participating courses. Notably, for the T4 survey (spring 2018), no one was entered in a drawing: everyone who was not reached in a course received a $10 payment for completing the survey in the form of an Amazon gift card. Since the incentive approaches varied for each measurement occasion, I accounted for incentive mechanism in my missing data analyses. Further details are provided in the missing data section below and in the Results chapter . 35 Institutional data collection . As part of the larger study, students consented to have their transcript data and admissions information released to the research team. These data were delivered from the Office of Planning and Budgets to the research team, where the data were then de -identified and matched with survey data based on Study ID number. Data were gathered yearly each June. Co-curricular activity participation data collection . Data were gathered from a variety of sources to provide a picture of studentsÕ co -curricular participation in engineering from Fall 2015 through Spring 2018 (see Table 4) . Below I provide more details about how each of these sources of data were provided. Measures Self-reported and institutional variables were collected for this study. Table 4 provides an overview of the timing of each measurement occasion. Survey measures . Measures of motivation came from each wave of the survey that was administered to students since they began college. See Table 6 for reliabilities. Engin eering expectancy . A key part of expectancy -value theory is to measure studentsÕ expectancies that they will be successful on a task or in a given domain. Thus, five items assessed engineering expectancy using the self-efficacy scale developed by Mamaril, Usher, Li, Economy, and Kennedy (2016). A sample item is ÒIÕm confident that I can learn the content taught in my engineering -related courses.Ó Task -value . Expectancy -value theory also posits that task -value is an important component of studentsÕ decisio ns about whether they persist in a given field. Thus, task -value was assessed in this study as a composite derived from three constructs: interest value, utility value, and attainment value. Interest value was measured with five items (e.g., ÒEngineering i s 36 exciting to me.Ó). Attainment value was measured with four items (e.g., ÒThinking like an engineer is an important part of who I am.Ó). Finally, utility value was measured with three items (e.g. ÒEngineering is valuable because it will help me in the fut ure.Ó). All items were adapted for engineering from Conley (2012). Demographic variables . Participation in co -curricular activities and demographic measures were collected by the university as part of its efforts to understand student support initiatives on campus. Gender . On the admissions application for this university, students were asked to report their gender on a binary scale (e.g., female or male). This information was used for this study, since it is a measure that has been collected in a stand ardized way across the entire sample. Race/ethnicity (URM) . Students answered a question about their race/ethnicity on the survey. For the current study, if students self -reported their race/ethnicity on any survey, that data was used to compute a new ra ce/ethnicity variable such that the value for the variable could have come from any of the four survey measurement occasions. If participants did not report their race/ethnicity on any of t he four surveys, institutional data from the university about stude ntsÕ race/ethnicity was used to populate the race/ethnicity variable for this study. The variable utilized for analyses in this study was a dichotomous indicator of underrepresented minority group membership. To understand which racial/ethnic groups are underrepresented in engineering, census data were examined with relation to engineering enrollment. According to the 2010 census, the United States population comprised members of the following races: 72.4% White; 16.3% Hispanic; 12.6% Black/African Americ an; 6.2% Other; 4.8% Asian American; 2.9% two or more races; 0 .9% American Indian/ Alaska Native; and 0.2% Native Hawaiian and other Pacific Islander (Humes, Jones, & Ramirez, 2011 ). 37 Enrollment in engineering bachelorÕs degrees in 2015 comprised members of the following races: 64.9% White; 13.4% Asian American; 10.7% Hispanic; 4.0% Black or African American; 3.9% Unknown, and 3.1% Other (Yoder, 201 5). Whereas the representation for White and Asian students is roughly similar to the prevalence of members of those races in the general U.S. population (and indeed, Asian students are overrepresented in engineering compared to the population), representation for other races is not parallel to the population . Thus, Native American, Native Hawaiian, Black, and Hisp anic students are underrepresented minorities (URM) in engineering. For the current study, students could indicate membership to multiple rac ial/ethnic groups on the survey . The categories for the race/ethnicity variable were as follows: a) Native America n, American Indian, or Alaska Native; b) Native Hawaiian or Other Pacific Islander; c) Asian or Asian American; d) Black or African American; e) European American or White; f) Hispanic or Latino/a ; and g) Other. I coded students to be URM (1 = URM, 0 = not URM) if one of the races they checked was Native American, American Indian, or Alaska Native; Native Hawaiian or Other Pacific Islander; Black or African American; or Hispanic or Latino/a. First -generation college student status . On the admissions applic ation for this university, students were asked to report the highest education level attained for both parents or guardians. These variables (MotherÕs Education and FatherÕs Education) were combined to create a variable for ÒFirst -generation college studen t statusÓ such that 1 indicated first -generation status and 0 indicated continuing generation status. Students were considered first -generation college students if neither parent or guardian attended college. Co-curricular activity participation measures . I obtained co -curricular activation participation data from four universi ty sources: (1) residence in an engineering residence hall , (2) 38 engineering -specific career services, (3) the math learning center, and (4) engineering -specific academic advising. E ach of these sources provided a unique source of information about the sorts of activities engineering students engaged in throughout the first three years of college. Below, I provide more details about each of these co -curricular activities. Data sourc e one: Residence in an engineering residence hall . On this campus, there is a co-curricular experience designed for engineering students in which there are academic and residential components (Walton et al., 2013). Many, but not all, first -year engineering students live on -campus in a residence hall that is part of this program . The variable describing engineering residence hall occupancy in the first year is provided by the institution, and indicates whether the student lived in an engineering residence ha ll during the first year of college (1 = yes, 0 = no, see Table 7. Data source two: Engineering -specific career services . The engineering -specific career services center provides engineering -specific career counseling and job search support to undergradu ate students at this university. The career services office is an office where students can receive help from a professional in terms of advice about the job search process. Additionally, the office provides a number of online resources that students can a ccess, through a platform that allows students to post a resume and to interact with potential employers. Thus, even if students do not come in person to seek career advice (an act that requires a certain amount of motivation to initiate in the first place ), they can engage with the career center and take advantage of the resources to advance their job search. However, attending in -person events is the area I focus on for this dissertation, rather than the submission of resumes online, because active engage ment that involves interacting with others better characterizes a Òco -curricular activityÓ than does simply uploading a resume from home. 39 The variables that I computed for this study were sum total counts by year for each of the following: (1) attended a career event, (2) met with a career advisor, and (3) attended an on -campus mock interview. Career events consisted of career development workshops provided by the career center or career fairs at which employers recruit students for internships or full -time work. Meetings with career advisors are scheduled on a one -on-one basis, at the request of the student. Last, on -campus mock interviews could be initiated by the student through the digital career services platform or they could be suggested in a meeting with a career advisor. Students signed up for interview practice at mock interviews hosted on campus. In order to analyze these data, I first transformed these co -curricular varia bles into counts by student. Then, I aggregated these counts so that each ac ademic year was summed together. With regards to the summer sem esters, I decided to include them in the next academic year (e.g., summer 2016, which takes place after students' first full year in college, is part of academic year 2 rather than academic yea r 1). In order to make sure that the predictors did not occur temporally after the outcome measures, co -curricular activity data for the 2018 spring semester was truncated such that only activities taking place before March 21, 2018 were included. The rati onale for this decision is that the spring 2018 surveys began on March 22, so in the models predicting motivation, it was important not to have the co -curricular data indicators measured after the survey took place. For full details of how these variables were created across semester , see Table 7. I followed parallel procedures for the career services variables, the math learning center variables (discussed below), and academic advising variables (discussed below). Importantly, data are only available for fall of 2016 and beyond, which means that data were missing from the first semester of college for the students in my study. The university went through a transition of reporting systems in 2016, and the data from the past were not preserved 40 (College of E ngineering Project/Event Coordinator , personal communication, 2018). In this study, I used measures of engagement with the career services center from Fall 2016 through and including Spring 2018, resulting in data for studentsÕ second and third years of co llege. I created a different variable for each type of engagement with the engineering -specific career services, and I calculated these variables on a by -semester basis for the first year and on a yearly basis thereafter (see Table 8). Data source three : math learning center . The math learning center provides free tutoring to MSU students who are enrolled in math courses. Engineering students use this resource when they take math courses throughout the first few years of their degree, because the tutorin g center provides general support for all math courses. The data takes the form of a log; this log tracks the student ID number of students that attend and the date they attend. I aggregated this log data on a by -semester basis. For the math learning cente r data, I created a count variable for the 2015 -2016 academic year (first year), for the 2016 -2017 academic year (second year), and for the 2017 -2018 academic year (third year). Data source four: Engineering -specific academic advising . The academic advisi ng data provided information about the ways in which students interacted with advisors as part of the College of Engineering. Academic advisors were available to consult with students about their academic plans. Academic advising provides an opportunity t o speak to an academic professional about oneÕs path to the degree, and discussions often include the trajectory of classes to take and whether these classes need to be taken in a specific order. Academic advisors also often provide general advice about ca reer counseling, but this advice is given in an unofficial capacity. For specific career inquiries, students are directed to attend the career services office. In the College of Engineering, students are not required to attend academic 41 advising after they leave the orientation program that precedes their first semester of college. Thus, advising is conceptualized in this study as a co -curricular activity rather than as a requisite part of the engineering curriculum. Unfortunately, first -year engineering adv ising was not tracked at this university in the 2015 -2016 year, when this studyÕs participants were in their first year of college ( Assistant Dean for Undergraduate Student Affairs , personal communication, 2018). However, data were available from Fall 2016 and beyond. I created a sum count for advising attendance in each academic year for the second and third year of college (see Table 8). Persistence Measure . As an indicator of persistence, I used whether the student was still enrolled as of the end of t he Spring 2018 semester (spring of studentsÕ third year) in an engineering major. This came from the university registrar. Importantly, students in this study consented to have their academic records released even if they departed from an engineering major . Thus, data were available for students who consented and completed any survey, even if they dropped out of engineering sometime between the fall of 2015 and the spring of 2018. Data Analytic Plan I carried out a series of preliminary analyses to ensure the normality of the data. For example, I conducted tests of descriptive statistics as well as bivariate correlations among all study variables (see Table 9). I used R for preliminary as well as central analyses in this study (R Core Team , 2018). Longitu dinal measurement invariance . In order to ensure that the motivation constructs utilized in this study were functioning in the same way over time, I conducted longitudinal confirmatory factor analyses to examine measurement invariance. Specifically, I inc luded latent factors for all the measures of motivation in this study at all time points (T1, T2, T3, T4). I tested a series of configural, weak, and strong invariance models (Vandenberg & 42 Lance, 2000). I inferred invariance if the change in RMSEA was belo w .015 and if the change in CFI was below .01 (Chen, 2007; Cheung & Rensvold, 2002). I conducted measurement invariance testing using the lavaan package in R (Rosseel et al., 2018). Missing data analyses . I tested to see whether the data were missing in a systematic way. More specifically, I used chi -square tests and MANOVA to explore whether people who were missing data were significantly different from one another in terms of demographic characteristics, survey incentive mechanism, or initial motivatio n. The purpose of these analyses was to investigate whether data were systematically missing based on any of these dimensions; if motivation survey data were missing systematically, multiple imputation should account for this (see below). Multiple imputat ion . A prerequisite of the random forest model is that there are no missing data. There are two approaches to this: (1) delete all cases that have any variables missing, or (2) impute the data. Multiple imputation is a more robust statistical strategy tha n listwise deletion; thus, this approach was used in this dissertation study. With multiple imputation, values were imputed to supplement the incomplete motivation data such that participants who completed some, but not all, motivation surveys could be inc luded in the main analysis. To do this imputation, I used the R package ÒAmeliaÓ ( Honaker, King, & Blackwell, 2019). Amelia is designed to effectively carry out the imputation algorithm for longitudinal data. For this study, the motivation variables were measured at four occasions throughout studentsÕ first three years of college. From prior research and theory ( Musu -Gillette, Wigfield, Harring, & Eccles, 2015 ), it is known that prior levels of motivation affect later levels of motivation. Thus, the imput ation procedure should account for this ÒlagÓ effect. However, other imputation 43 algorithms such as ÒmiceÓ do not have the capability to account for repeated measures over time. Amelia does have that functionality, as it is designed for multiple measurement occasions and has specific options to specify the time variable and the variables that are measured at multiple occasions over time. As such, all of the motivation variables (engineering expectancy , interest value, utility value, and attainment value) wer e specified in the imputation code as algorithms to be ÒlaggedÓ such that in the imputation of new values, prior states affect future states for studentsÕ motivation. In this way, the imputed datasets can better represent the true nature of the relations b etween motivation variables. I imputed using the raw data. Imputation is typically conducted at the level of the raw items, rather than at the level of the composite variables. Thus, I conducted imputation before statistically computing the composite var iables such that information would not be lost. The imputation was carried out only for the motivation variables because it is not standard practice to impute demographic variables. Indeed, it is difficult to know what the meaning would be if I attempted t o statistically derive a likely value for someoneÕs race, gender, or first -generation status. Rather, it is more meaningful to impute motivation variables instead of demographic or co-curricular activity participation variables. I imputed new values for al l missing time points for the following variables: engineering expectancy , interest value, attainment value, and utility value. Whereas there is a robust body of theory to support our assumptions about the ways in which studentsÕ academic motivation develo ps over time, such a canon does not exist for co -curricular activity participation. One of the central purposes of this dissertation is to uncover the potential relations among motivation, group membership, and co -curricular activity participati on. As outl ined in the literature review above, not much is known about these relations. Additionally, there is a high likelihood of high inter -individual variance in co -curricular activity 44 participation. Finally, there was no missing data for co -curricular activitie s: if a student was not in one of the co -curricular activity dataset s, it was not because the data were missing for that student, but because the student did not attend the co -curricular activity. Thus, to impute log data of activity participation would be not only atheoretical, but also w holly illogical. Primary analyses: Random forest . In this dissertation, I conducted a series of multivariate random forest models to address Research Questions 1, 2, and 3. Random forest is a machine learning technique by which a series of decision trees are grown in order to decide the best method by which to predict an outcome (Breiman, 2001). To describe this method in detail, I will first explain decision trees. The benefit of decision trees is that they are easy to in terpret. The tree begins with a certain grouping variable, and then people are classified into one of two directions based on their level of that variable. This process continues, and each split Ð also called a node Ð serves to further categorize people un til the maximum variance explained is achieved. However, one problem with decision trees is that when decision trees stand alone, one is rarely sufficient to categorize the data. First, decision trees are a data -driven approach, and the variable that is se lected to split at each point, while optimal based on the data, is not necessarily a strong predictor of the desired outcome. One drawback of decision trees is that they sometimes have difficulty in categorizing all of the variance in the variables; in oth er words, sometimes these perform poorly in terms of model prediction. The drawbacks of decision trees are compensated for in random forest, a robust machine learning technique that has been shown to have high predictive accuracy. Random forest methodolog y is less vulnerable to overfitting than other modeling techniques (Breiman, 2001). The mechanism by which random forest works is that a random subset of the data is selected from which to grow each tree, and within each tree that is grown, 45 random features are used to split at each node of the tree (Breiman, 2001). Random forest is different from stepwise regression because all the trees are estimated at once. Whereas in stepwise regression, the next modeling decisions are made iteratively depending on what the findings were of the previous step (e.g., which predictor was most significant), in random forest, predictors are selected at random to occur in each tree, and random predictors are selected to split at each interval of the tree. Rather than being est imated iteratively, the trees are all grown simultaneously. In this way, the randomness of random forest is a key distinguishing factor between random forest and stepwise regression. One variant of random forest modeling is the multivariate random forest (Se gal & Xiao, 2011). The prefix Òmulti -Ó in the multivariate random forest model refers to the modelÕs incorporation of multiple outcome variables. Whereas a random forest model traditionally has a host of predictor variables but only one outcome variable , a multivariate random forest model has both many predictors and many outcome variables. This technique is well suited to research questions that involve the prediction of many outcomes simultaneously. Multivariate random forest modeling allowed me to mod el expectancy and value simultaneously as outcomes to address Research Question 1 and to model the co -curricular activities simultaneously as outcomes to address Research Question 2. All predictors are entered into a multivariate random forest model just a s they would be in a regression, such that all predictors are tested. As a brief overview, my procedure for each research question involved (1) developing the model using a cross -validation procedure, (2) evaluating model performance, (3) analyzing which v ariables were most important from each collection of variables, and (4) interpreting each final model in detail. Research question 1 . To test Research Question 1, I examined the relations among (a) 46 motivational beliefs, (b) co -curricular activity particip ation, and (c) demographic characteristics as they predict motivation at the end of the third year of college. Table 1 depicts all predictors and outcomes for this model. I investigated whether there were differential timing effects by comparing the models that tested only one time point at a time (i.e., Models 1a Ð 1d), to a model that included all predictors (i.e., Model 1e). Research question 2 . To test Research Question 2, I examined the relations among (a) motivational beliefs, (b) co -curricular activ ity participation, and (c) demographic characteristics as they predict co -curricular activity participation in the third year of college. Table 2 depicts all predictors and outcomes for this model. Research question 3 . To test Research Question 3, I exam ined the relations among (a) motivational beliefs, (b) co -curricular activity participation, and (c) demographic characteristics as they predict persistence in engineering. Table 3 depicts all predictors and outcomes for this model. In a parallel way to t he models above, I tested a version of the model with only predictors from each time point (Models 3a -3d), and then I tested a model with all predictors (Model 3e). I have indicated which variables were included in which model in Table 3. Evaluating Model s. The performance of a random forest model is determined by the extent to which it is predictive of the outcome. One benefit of the machine learning approach I employed here is that one single model was not tested, but rather, a series of models were simultaneously tested. If the same variable came up as a good candidate for splitting across many of the trees, that would suggest that the variable is indeed important in predicting the outcome. Thus, the random fo rest models in this study were built using cross -validation and were evaluated in three ways: (1) variable importance, ( 2) Root Mean Squared Error (RMSE), and (3) 47 Area Under the Curve ROC curve (AUC). Cross -validation is a way to protect against overfitting. The purpose of the cross -validation test ing was to test how well each model performs in terms of predictive validity, when the model is only allowed to learn from on a subset of the data and is then tested on the rest of the data. By randomly selecting only part of the data to develop the model on at a time, the model is more robustly estimated. In other words, I looked at model performance on the parts of the data that were withheld. In addition to cross -validation, which supports model development, it was necessary to examine indicators of mode l interpretation. Variable importance provides an indicator of the number of times that a particular variable was picked to split the tree into a branch. If a variable, such as gender, is a good candidate for classification (or regression, if predicting a continuous outcome), it shows up as important across many of the trees. Thus, variable importance measures provide an average count of the number of times each variable is picked as a branch node. A variable being indicated as ÒimportantÓ means that it hel ps the model to better predict the outcome. To summarize, variable importance and cross -validation are different ways of testing the hypothesized models. Cross -validation answers the question: when the data are split randomly, does the model developed on o ne part of the data work when it sees the other part of the data? Variable importance answers the question: within these variables I am allowing the model to examine, which variables are consistently important? Thus, cross -validation is related to model -building and variable importance is related to understanding how models function. In addition to variable importance as a test of model functionality, I tested for predictive accuracy using two indicators: RMSE and AUC. In this study, two types of random forest models were tested. For research questions 1 and 2, the random forests predicted continuous outcomes via a mechanism analogous to 48 regression. To evaluate those that have continuous outcomes, the typical method that is used is to look at the Root Mea n Squared Error (RMSE). It is ideal for the RMSE to be low rather than high. For research question 3, the outcome is persistence, which is a dichotomous outcome. To evaluate models that are aimed at classification, the method by which these are evaluated i s to look at the area under the Òreceiver operating characteristicÓ (ROC) curve. When evaluating the area under the curve, a score of .50 would indicate chance levels, whereas higher than that would show higher performance. Examining the area under the ROC curve is a more robust measure than accuracy. Even after considering cross -validation, variable importance, RMSE, and AUC, though, the model can still seem like a Òblack box.Ó To address this important consideration, I introduce partial dependence plots b elow. Partial dependence plots show the relation between the predictor and the outcome in the random forest model. More specifically, partial dependence plots help clarify the directionality of the relations between predictors and outcomes Òafter integrat ing out the other variablesÓ (James et al., 2013, p. 331) . In other words, partial dependence plots show the directionality of each variable on its own, without looking at the other variables at the same time. Since random forest explores the parameter spa ce in a way that is substantially different than linear regression, it is possible that a partial dependence plot would reveal a complex non -linear relationship. However, partial dependence plots could also reveal linear relations as well. As Molnar explai ned in ÒInterpretable Machine Learning,Ó ÒWhen applied to a linear regression model, partial dependence plots always show a linear relationshipÓ ( Molnar, 2019, Section 5.1 { ebook, no pagination} ). Partial dependence plots were utilized in the current study to explain the most predictive models for Research Questions 1 and 2, and to explain all models for Research Question 3. 49 Chapter 4 : Results Differential findings emerged in terms of the strength of distal and proximal relations between predictors and out comes. In predicting motivation late in the third year (Research Question 1), the most predictive model s included second -year predictors (Model 1c) and predictors from across the first three years of college (Model 1e) . In predicting co -curricular activity participation in the third year (Research Question 2), the most predictive model s included predictors from the first year of college ( Model 2a) and across the first three years of college (Model 2d) . In predicting persistence (Research Question 3), three models performed about equally in terms of predicting persistence: the model including second year predictors (Model 3c), the model including third year predictors (Model 3d), and the model including all predictors (Model 3e). Across all models, demographi c predictors Ð gender, race, and first -generation student status Ð and prior achievement (ACT score) were important in predicting the outcomes of third -year motivation, third -year co -curricular activity participation, and end -of-third -year persistence. In the different models, different variables emerged as more or less important in predicting the outcomes, but the strength of demographic predictors remained consistent across models. Below, further details are provided about the findings of these analyses. Missing Data Missing data analyses . Missing data analyses were conducted to probe for differences based on demographic characteristics, on initial motivation, and on survey incentive mechanism. To create missing data variables in order to conduct these an alyses, I first created survey -level variables by marking students as missing if they did not complete any of the motivation items for a given wave. If a student took at least one motivation item, then that student was marked as 50 "complete" for that survey. If a student took no motivation items, then they were "missing" that survey. I created survey completion variables for each time point separately (Time 1, Time 2, Time 3, and Time 4). I then aggregated these into a variable to indicate whether students we re missing any survey. This variable summed the Time 1, Time 2, Time 3, and Time 4 completion variables such that a student would be categorized as "completed all surveys" or "missing at least one survey." Two hundred forty -seven students (19.2%) completed all surveys, whereas 1,040 students (80.8%) were missing at least one survey. This "missing any survey" variable was used for most of the missing data analyses, while the individual survey wave missingness variables were used for the incentive analyses be cause different incentives were used at each wave. Since random forest only allows a complete dataset, the complete dataset was created with imputation. These missing data analyses reported below help to examine whether that approach was reasonable. For th e incentive analysis, I did a more nuanced analysis and considered completion of each individual survey rather than completion of ALL surveys, because the incentive mechanism differed for each survey, such that no two surveys had the same array of incentiv e options. Differences in motivation survey completion based on demographic characteristics . First, I checked whether there were differences in survey completion based on demographic characteristics. In order to do this, I ran a series of chi -square test to probe for differences based on gender, race, and first -generation student status. The chi -square test indicated that there were statistically significant differences in likelihood of missing any survey based on race, !2 (8, 1287) = 42.272, p < .001, ! = .181. European American/white students were less likely to be missing a survey than statistically expected, and Asian students were more likely to be missing a survey than statistically expected. The chi -square test indicated that there were statisticall y 51 significant differences in likelihood of missing any survey based on first -generation student status, !2 (1, 1287) = 6.10, p = .014, ! = .069. First -generation students were more likely to be missing at least one survey than expected. The chi -square test indicated that there were statistically significant differences in likelihood of missing any survey ba sed on gender, !2 (1, 1287) = 20.07, p < .001, ! = -.125. Women were less likely to be missing at least one survey than expected. Differences in motivation survey completion based on initial motivation . Second, I tested whether there were differences in s urvey completion based on initial motivation. For this analysis, I used the composite variables (before imputation) for task -value and engineering expectancy from the baseline (T1) survey and the "missing any survey" variable. The chi -square test indicated that there were not statistically significant differences in likelihood of missing any survey based on initial task -value, !2 (44, 734) = 44.90, p = .434. Similarly, the chi -square test indicated that there were not statistically significant differences in likelihood of missing any survey based on initial engineering expectancy , !2 (21, 710) = 21.72, p = .416. These findings should be interpreted with the caveat that only participants who had completed a baseline measure were analyzed here, because only those who completed a baseline survey had values for initial motivation. Seven hundred thirty -four participants were in the task-value missingness chi -square reported above, and 710 participants were in the engineering expectancy missingness chi -square reported above. Differences in motivation survey completion based on incentive mechanism . Third , I tested whether there were di fferences in motivation survey completion rates based on the survey incentive mechanism. In order to carry out these analyses, I examined each wave separately. For the first wave, the incentive was constant for everyone so there were no differences in 52 comp letion based on incentive. For the second wave, the two incentive options were (1) entry in a drawing and print pages for on-campus printing or (2) entry in a drawing and print pages, plus the survey being completed during class. The chi -square test indic ated that there was a statistically significant difference in likelihood of completing the survey at time 2 based on the incentive, !2 (1, 1163) = 537.56, p < .001, ! = .680. Students were more likely to complete the survey if it was given during class. F or the third wave, there were three possible incentives: entry into a drawing, extra credit, or course credit. The chi -square test indicated that there were statistically significant differences in the likelihood of completing a survey at time 3 based on t he incentive, !2 (2, 1191) = 421.19, p < .001, ! = .595. Students were more likely to complete the survey if given extra credit or course credit as compared to if they were incentivized with an entry in a drawing. For the fourth wave, there were three types of incentiv e, but the drawing was no longer an option. Instead, students were incentivized with either extra credit, course credit, or a guaranteed payment of $10. The chi -square test indicated that there were statistically significant differences in likelihood of co mpleting a survey at time 4 based on the incentive, !2 (1, 744) = 10.96, p = .004, ! = .121. More specifically, students were more likely to take the survey if given extra credit or course credit and less likely to take the survey if offered payment. Differences in initial motivation based on ACT missingne ss. I conducted missing data analyses to probe for differences between students missing the ACT and those not missing the ACT in terms of demographic characteristics and initial motivation. Women were less likely to be missing the ACT than men, !2 (1, 1287) = 4.65, p = .031, ! = -.06. European American or White students were less likely to be missing the ACT than other racial and ethnic groups, while Asian or Asian American students were more likely to be missing the ACT, !2 (8, 1287) = 296.59, p < .001, ! = .48. There was not a statistically significant difference in likelihood of 53 missing the ACT based on first -generation student status, !2 (1, 1287) = 1.36, p = .244. There were also not statistically significant differences in initial motivation based o n ACT missingness: neither for engineering expectancy (!2 (19, 1287) = 19.012, p = .456) nor for task -value ( !2 (31, 1287) = 41.54, p = .098). The absence of differences in initial motivation between students who are missing the ACT and those who are not is supportive of my analytic plan because it suggests that I did not exclude students whose motivation substantially differed from those retained in my analyses. Summary of Missing Data Analyses . To summarize, there were differences in survey response ra tes based on demographic characteristics and incentive mechanism. There were no differences in survey response rates based on initial motivation. The lack of significance in the test for differences in response rates based on initial motivation is promisin g for imputation because it means that it is reasonable to impute missing data for the baseline and subsequent measures. Similarly, the lack of a significant relation between initial motivation and likelihood of being missing an ACT score is promising beca use it suggests that students missing the ACT were not missing it because they fundamentally differed in motivation. Finally, I confirmed with university administrators that the co -curricular data are complete (for semesters for which data were available) because all students who attended would have been tracked and their data recorded. The demographic differences in response rates to the survey and in likelihood of being missing an ACT score mean that it is important to include demographic variables in the imputation, such that they inform the way that motivation variables are imputed. In other words, since there are demographic differences in survey response rates, demographic variables should 54 be included to generate values for the missing motivation surve y items. The procedure by which I did this is explained in detail below. Multiple imputation . There existed a high prevalence of missing data on the surveys. This missingness could have arisen from a variety of sources, but one consideration to note is tha t students were eligible for participation in this study even if they did not submit a survey at baseline. In other words, students who have only completed one survey could have completed that survey at any of the time points: T1, T2, T3 or T4. In order to account for missing data, multiple imputation was carried out. Multiple imputation is a process by which other variables in the dataset are used to impute values that are missing for other variables. The data that are not missing are preserved in their original form. It was important to carry out imputation for this dataset because the random forest algorithm uses list-wise deletion. Therefore, it is important that the full data are available so that list -wise deletion does not remove valuable people who are important for the analysis. Using list -wise deletion in this study would mean that only students who completed all four waves ( n = 247) would be included. This sample is much too small for the random forest analyses that comprise the main analysis for this dissertation. A number of different variables mathematically informed this multiple imputation process. More specifically, each of the motivation variables was imputed based on the other motivation variables and other measurement occasions of the sa me motivation variable, race/ethnicity, first -generation status, gender, and prior achievement as measured by highest ACT composite score. The imputation algorithm automatically imputed new values for the demographic and achievement variables that were mis sing. However, these values were not carried forward into analyses, as the original race/ethnicity, gender, first -generation, and ACT 55 composite variables were re -merged into each of the imputed datasets after imputation. The rationale for retaining the ori ginal variables for demographics (e.g., first -generation student status, race, gender) and prior achievement was provided above - it is not logical to statistically infer the gender, race, prior achievement, or first -generation student status for a student who is missing those variables. The prevalence of missingness for each of the demographic variables was low enough that it was not likely to cause a major difference in findings: there was less than 1% missing data for race, no missing data for first -gene ration student status, and no missing data for gender. The prevalence of missingness for ACT score was higher: 1,083 students in the full dataset had an ACT score, whereas 204 students did not (for a total of 1,287). As stated above, the random forest algo rithm does not allow for missing data, so in the final dataset, all students had complete data for all variables. Imputed Dataset Checking . After imputing, I checked the distributions of the imputed datasets in comparison with the distributions of the r aw data. I checked to make sure that there were not major differences in skewness or kurtosis, and to make sure that the imputed values were in fact possible values. The examination of the distributions of the imputed as compared to the real values was sat isfactory, and the imputed datasets were carried forward to the next step of the process. After multiple imputation is conducted in the social sciences, the resulting datasets are then used for the analysis. More specifically, researchers typically run th e regression or whichever model it is that they are running on all of the imputed datasets, and then pool those results together using RubinÕs Rule. However, in the random forest implementation, this same paradigm cannot be applied because there are not be ta coefficients like in a regression. There is not a way to combine estimates across different random forest models because the model 56 estimation procedure is so radically different from the estimation of a linear model. Thus, for this study, after I imput ed the data, I carried out analyses on one of those imputed datasets. I randomly selected the imputed dataset upon which to carry out these analyses. I used a random number generator to indicate which dataset I would use, and ended up using the second of five imputed datasets for the main analyses. Then, as a robustness check, I checked whether the analyses held up across two more of the other imputed datasets. The analyses that I ran on the imputed dataset are (1) computing the scales, (2) computing the re liabilities, (3) measurement invariance testing, and (4) random forest analyses. I computed the scales and did the other analyses on one imputed dataset and then did the robustness check . More specifically, I conducted the main analyses and reported my fin dings on the main analyses for one imputed dataset, and I also conducted the same main analyses on two additional imputed datasets . These analyses are reported in Tables 10-15 and in the Results section. Two hundred forty -three students were not included i n the final imputed datasets because they were missing non -imputable data. More specifically, after the imputation was conducted on motivation variables, 243 students were missing at least one of these variables: (1) race, (2) ACT score, or (3) on -campus r esidence hall status in the first year. Since random forest analyses require complete data, these 243 students were not included in the final analyses, yielding a dataset of 1,044 students for final analyses. Within the imputed dataset upon which the main analyses were run ( N = 1,044), 192 students were female (18.3%), and 285 students were first -generation college students (27.3%). Compared to the original full dataset ( N = 1,287), this is proportionately fewer female students , but more first -generation co llege students. This means that the 243 students who were not included in the final analysis (e.g., were dropped before imputation because of insufficient prior achievement or demographic data) differed slightly in 57 terms of gender and first -generation stud ent status. With respect to the racial and ethnic background of the imputed -motivation -data sample, 73.3% of students (765) were European American or White; 12.1% of students (126) were Asian or Asian American; 7.3% of students (76) were Black or African American; 5.0% of students (52) were Hispanic or Latino/a; 1.1% of students (12) were multiracial and not underrepresented minorities; 1% of students (10) were multiracial and underrepresented minorities; <1% of students (2) were Native American, American Indian, or Alaska Native; and <1% of students (1) was Native Hawaiian or other Pacific Islander. Within the imputed dataset, for the students who had complete demographic and prior achievement data and were maintained in the primary analyses ( N = 1,044), response rates were as follows: Time 1, 588 respondents (56.3%); Time 2, 652 respondents (62.5%); Time 3, 506 respondents (48.5%); Time 4, 580 respondents (55.6%). One possible concern is that since such a great proportion of the data needed to be imputed, that that imputation would change the meaning of the interpretation in a substantial way. Thus, additional missing data analyses were conducted using a function of the R package ÒAmeliaÓ called Òmissmap.Ó This function allows for the visualization of miss ing and observed data for multiple variables at once. For the current study, I examined missingness for the demographic variables, the prior achievement measure (ACT), the motivation variables, and the outcome variable of persistence (see Figure 2). Very l ittle missing data was observed for race, and complete data were observed for first -generation student status and gender. Some missing data was observed for on -campus residence and ACT score. In terms of the motivation variables, it is clear that not many people completed all surveys: indeed, only 247 students fit that category, as discussed above. However, of the students missing the baseline survey, many of those had data for the motivation variables from the second time point, which took place in the sec ond 58 semester of their first year. Thus, the missing data for motivation variables does not cause large problems in interpretation in terms of what data was imputed for the first year because most students have motivation data from the first year of college . As an additional check about the robustness of the imputation, the main analyses for research question 3, Model 3e were conducted on a smaller subset of the data that included only those students who took all four surveys (n = 247). The model performan ce for Model 3e with this smaller dataset was nearly identical to the findings for Model 3e for the imputed dataset. Whereas the model performance for Model 3e including imputed data had an AUC of .77, model performance for Model 3e with the complete data had an AUC of .78. Thus, this ancillary analysis suggests that including imputed motivation data for students who were missing some survey data does not substantially change the results as compared to the same analysis on the full sample. For the findings of this ancillary analysis with respect to variable importance for Model 3e for the completed -four-surveys sample, please see the Appendix . Reliability Overview Reliabilities for the imputed dataset and the raw dataset can be found in Table 6. Whereas reli abilities were higher for the raw data than for the imputed data, reliabilities for the imputed data are all sufficiently high, ranging from .730 to .901. Prior research showed that imputing based on the assumption of a normal distribution is the safest ap proach even if the underlying data are skewed or non -normal in some way (von Hippel, 2012). Thus, even though the reliabilities declined from the raw to the imputed data, it is reasonable to assume that adopting another approach to imputation would not hav e yielded more robust results. 59 Longitudinal Measurement Invariance In order to ensure that the same construct was being measured consistently across time, measurement invariance testing was utilized. Since only two psychological constructs were measured l ongitudinally over time, I did measurement invariance testing for task -value and engineering expectancy . Each of these constructs was measured with the same indicators across four time points, spanning three years. I modeled task -value as a three -factor mo del, with one factor each for utility value, attainment value, and interest value. I modeled engineering expectancy as a single factor model. For both task -value and engineering expectancy , I iteratively tested configural, weak, and strong measurement inva riance by adding additional constraints progressively across time. To test configural measurement invariance, I tested the factor structure. To test weak measurement invariance, I constrained the factor loadings for the same item across time to be the sam e. To test strong measurement invariance, I constrained the item-level intercepts to be the same across time. Table 16 shows the fit statistics for each iteration of measurement invariance testing for both expectancy and value. The change in fit statistics supported the assumption of invariance because the change in CFI was less than .01 for each progressive constraint, and the change in RMSE was below the threshold of .015 for each progressive constraint (Chen, 2007; Cheung & Rensvold, 2002). Descriptive S tatistics I computed the means, standard deviations, and correlations for all the focal variables in this study. These can be seen in Table 9. Broadly, the motivation constructs were correlated with one another and the co -curricular constructs were also c orrelated with one another. While some correlations between co -curricular activities and motivation variables were statistically significant, these correlations were small. Career event participation in the second and third years 60 was correlated with expect ancy at time 4, task -value at times 3 and 4. Persistence was positively correlated with second, third, and fourth year motivation. There were also positive correlations between persistence and some co -curricular activities: academic advising, career events , mock interviews, math tutoring in the first and second year, and career advising in the third year. First -generation student status was negatively correlated with persistence: first -generation students were less likely to persist. Having a high ACT score was positively correlated with persistence, and on -campus residence in the first year had a smaller, but still positive, correlation. These correlations were small but statistically significant. Random Forest Findings The primary analyses for this study u sed random forest modeling to examine the relations among engineering expectancy , task -value, co -curricular activity participation, and persistence. Tables 1, 2, and 3 show the predictors and outcomes included in all models. Research Question 1 . Research question 1 examined the predictive power of co -curricular experiences (e.g., tutoring, advising) measured throughout the first six semesters of college in predicting mid -sixth -semester motivation ( expectancy , task value). A series of five random forest mod els addressed this research question. For all models, two outcomes were predicted: expectancy in spring 2018 and task -value in spring 2018. All models included demographic measures of gender, race, and first -generation student status. Additionally, prior achievement, as measured by the maximum score a student earned on the ACT examination, was included as a predictor in all models. Model 1a examined fall 2015 predictors; Model 1b examined spring 2016 predictors; Model 1c examined summer/fall 2016 and sprin g 2017 predictors; Model 1d examined summer/fall 2017 and spring 2018 predictors (see Table 1). Models 1a and 1b both represented studentsÕ first year of college, 61 whereas Model 1c represented the second year of college and 1d represented the junior year of college. Models 1a through 1d thus examined the potentially differing effects of co -curricular experiences throughout the first five semesters of college by examining each semester or yearÕs predictors separately. Finally, Model 1e combined the predictors from Models 1a, 1b, 1c, and 1d into its own model. The purpose of aggregating all predictors into one model was to parse the differential importance of the different time points in predicting the outcome. This is one contribution of exploring the relation s among these variables in a non -parametric way as compared to adopting a traditional regressive framework. Root mean squared error ( RMSE ) was examined as an indicator of model performance since the outcome indicators were continuous. RMSE is calculated b y taking the square root of the residual variance; thus, it provides an indication of how well the model performs in predicting the outcome (Grace -Martin, 2019). A low RMSE value is an indicator of better model performance (Grace -Martin, 2019). The values for RMSE are on the same scale as the measurement scale for the variables. Thus, the scales for RMSE vary for each research question. For Research Question 1, the scale is 1 -5 because the outcomes for that research question were the motivation items, which are on a 1 -5 scale. With respect to Research Question 1, the most predictive models were Model 1c, which included predictors from the second year (RMSE = .19), and Model 1e, which included all of the predictors ( RMSE = .19, see Table 17 ). The model that performed the worst in terms of predictive accuracy was Model 1a, which had an RMSE value of .68. More specifically, the model with first -year motivation, on -campus residence, and demographic variables as predictors was the least predictive of third -year m otivation. By contrast, the model that included second year co -curricular and third -year motivation predictors (1c) and the model that included co - 62 curricular and motivation predictors from all time points (1e) were equally predictive of the motivation outc omes (see Table 17 ). One possibility was that first -year distal predictors (e.g., first semester of the first year, as modeled in Model 1a) might have had a greater relation with third -year motivation than proximal predictors. Whereas the first year is often considered pivotal in informing studentsÕ trajectories over the next few years of college , the first year does not appear the most important in the current study. There also does not appear to be a clear pattern in the RMSE values, except that the mod els including first -year predictors only are the least predictive of the outcome. Perhaps the first -year predictors had the least bearing on the distal outcomes of motivation at the end of the third year. In addition to the RMSE values , I calculated the va riance explained , as measured by the coefficient of determination R 2, for each model. To do this, I took the summed squared errors from the random forest predictions as well as the total summed squared error and applied the R 2 formula (1 Ð (Sum of Squares Model / Sum of Squares Total)). The findings of these calculations can be found in Table 18 . Overall , variance explained for models 1a, 1b, 1c, 1d, and 1e were quite low, all ranging between 0.01 and 0.08. As would be expected based on the RMSE analyses, Model 1c explained the most variance in the motivation outcomes, whereas Model 1a explained the smallest amount of variance in the motivation outcomes. These low variance explained values mean that the sets of predictors were not strong predictors of motiva tion, despite hypotheses and prior research suggesting that they would be strong predictors. Whereas RMSE and R2 provide useful information about the overall model performance, another important consideration is the variable performance within each model. Thus, variable importance was examined in order to better ascertain which variables emerged as important 63 across different trees within each random forest model . Variable importance indicates the number of times each variable was picked as a branch node. No des signify different decision points for the decision tree, e.g., a gender node would mean that it's a place at which people are categorized into male or female. For each tree within each forest, there are a series of nodes at which the participants are Ò split,Ó or classified. A t each node within a decision tree, five variables were randomly selected as candidates for splitting. Thus, the number of variables considered within a given tree is potentially as high as the number of nodes in the decision tree multiplied by five. However, the same variable could show up across different nodes within a single tree, since the available five variables per node is random. Of those available five variables at a given node, one variable is selected for splitting at tha t node. The number of times a variable is selected for splitting is summed to get its variable importance measure. For example, gender being indicated as "important" would mean that when gender was used as a splitting point, it resulted in a "purer" classi fication of the outcome. Thus, higher values indicate more important variables. In terms of variable importance, one interesting trend is that the demographic predictors of gender and race/ethnicity were consistently high in terms of their variable import ance, for all five models for this research question (see Table 19). Additionally, it is interesting to consider whether the motivation or co -curricular predictors more robustly predicted the outcome of expectancy and value in the third year. In this model , including motivation variables as predictors allowed the accounting for prior levels of motivation in predicting third -year motivation. In order to better understand the predictive power of the motivation variables in comparison with the co -curricular pr edictors, I looked at the variable importance measure for each model that addressed Research Question 1 (e.g., Models 1a, 1b, 1c, 1d, and 1e). I took the average of the variable importance scores for all variables in each 64 category: Demographic Variables, M otivation Variables, and Co -Curricular Variables (see Table 19). I included the average values so that it would be possible to see, within a model, which set of predictors (demographic, motivation, co -curricular) were most important within that model. Wher eas it is possible to compare the relative importance of a variable across models, the raw numbers for variable importance should not be compared across models. For example, if a variable is most important in two models, but the values differ for each mode l, it would be incorrect to conclude that the variable was more important in the model for which that variable has a higher number. It would be correct, however, to conclude that it is interesting that the same variable arose as most important in both mode ls, relative to other predictors within each model itself. Variable importance is best understood within -model. An examination of the variable importance values for Research Question 1 suggests that, on average, the most important variables were the demogr aphic predictors and prior achievement , with average values for this category ranging from 24.7 (Model 1e) to 59.0 (Model 1a, see Table 19) . Unsurprisingly, the motivation predictors were, on average, the second most predictive , ranging from 15.6 (Model 1e ) to 45.6 (Model 1a) . This means that they were shown to be important across all five models in predicting motivation at Time 4 (end of the third year). This finding is aligned with expectations because prior levels of motivation inform future levels of mo tivation. Finally, co -curricular activity participation was the least important on average across all models, with variable importance scores ranging from 13.3 (Model 1e) to 40.4 (Model 1b). However, co -curricular activity participation variables did arise as equally as important as motivation in some of the models for Research Question 1. I focus my interpretation below on Models 1c and 1e, as these were the best performing models for this research question as measured by RMSE. An examination of the parti al 65 dependence plots for the models 1c and 1e, the most predictive models, reveals the specific directional relations among the predictors and the outcomes ( see Figures 3 -6). The x -axes for the partial dependence plots indicate input values for the model: t he values of the predictor. The y -axes for the partial dependence plots indicate the predicted values of model for a given dependent outcome. Each partial dependence plot is labeled with the name of the independent variable (predictor variable) at the top. For example, for Model 1c, in Figure 3, the first figure indicates the impact of the predictor variable ÒFemaleÓ on engineering expectancy . For Model 1c, the most predictive variables were gender, underrepresented minority status, and ACT score. Engineer ing expectancy at the end of the third year of college wa s higher if students were male or not a member of an underrepresented minority group; however, these differences were small (see Figure 3 ). Additionally, students with higher ACT scores were more likely to have higher engineering expectancy at the end of the third year. Task -value at the end of the third year of college shows similar patterns for gender and underrepresented minority status (see Figure 4 ). The relationship with ACT score is a bit more complicated: the partial dependence plot reveals that this relation is described well by a nearly horizontal line , though it seems that having a higher ACT score is marginally related to higher task -value. Taken together, these findings suggest that studen ts who are not underrepresented in engineering (e.g., males and members of the ethnic/racial majority) may have slightly higher levels of engineering expectancy and value at the end of the third year , but these differences are not very pronounced . In terms of relations of co -curricular activities to outcomes, these patte rns seemed largely linear and flat across both the expectancy and value outcomes (see Figures 3 and 4) . Career advising in the second year appears to have a slightly negative slope, but this slope is very slight. 66 Unsurprisingly, the prediction of Time 4 expectancy using the predictors of expectancy and value at Time 3 show a largely linear relation with a positive slope (see Figures 3 and 4) . For Model 1e, the most predictive variables were gender, underrepresented minority status, first -generation student status, and ACT score. As with Model 1c, females, underrepresented minority students, and first -generation students were more likely to have lower motivation than their counterparts at the end of the third year , but the difference was quite small (see Figures 5 and 6 ). The patterns of relations with co -curricular activities were very sim ilar to those for Model 1c, all relatively flat and hovering between values of 3 and 4 for the outcome of Time 4 expectancy and value (see Figures 5 and 6). The flatness of these co -curricular activity lines for the outcomes of expectancy and value suggest that there is not much variability in the motivation outcome as a factor of these co -curricular activity predictor variables. The motivation variables that showed the most pronounced relation with the Time 4 motivation outcomes were the Time 3 measurements of each type of motivation. Research Question 2 . Research question 2 examined the predictive power of m otivation on co -curricular experience participation. A series of four random forest models addressed this research question. For all models, five variables were outcomes: three indicators of career center participation (mock interviews, career advising, an d career center event participation), one indicator of math tutoring participation, and one indicator of academic advising participation. All of these outcome variables were measured in the third year of college, e.g., summer 2017, fall 2017, and spring 20 18. As with research question 1, I included demographic variables (gender, race, first -generation student status) and prior achievement (maximum ACT score) for all models. The distinguishing factor for each model was the time points included in that model. Model 2a included Fall 2015 predictors, and Model 2b includes Spring 2016 predictors (see 67 Table 2). Both Models 2a and 2b corresponded to studentsÕ first year in college. Model 2c included predictors from the second year of college, and Model 2d included predictors from both the first and second year. To evaluate the model performance for the models in research question 2, RMSE values were again utilized. Recall that low RMSE indicates that the model was more accurate in correctly predicting the outcomes. For Research Question 2, the scale is larger than for Research Question 1 because the co -curricular outcome variables were calculated as a count. Thus, the range spans from the smallest count total to the largest: 1 -128. In terms of RMSE values, the model that was most predictive of the outcome of co -curricular event participation was Model 2a with an RMSE of 0.43 , but Model 2d had an almost identical RMSE of 0.48 (see Table 20). Models 2b and 2c were about equally as non -predictive of co -curricular event participation, with RMSE values of 1.64 and 1.67, respectively. The low RMSE (high predictive value) of Model 2a suggests that first -semester motivation might play a small role in shaping co -curricular activity participation in year 3. One additional consi deration in interpreting these RMSE values is that Model 2 d (all predictors) included many co -curricular indicators as predictors: career center variables as well as math learning center and academic advising variables (see Table 2). By contrast, the only co-curricular variable entered as a predictor in Model 2a was on -campus residence in the first year (see Table 2). This means that living on campus may be a more influential co-curricular experience than other types of co -curricular engagement . The varia nce explained for Models 2a, 2b, 2c, and 2d can be found in Table 21 . Overall, variance explained for all four models were quite low, all ranging between 0.03 and 0.08. Model 2c explained the most variance in the co -curricular activity outcomes, whereas Mo del 2a explained the smallest amount of variance in the co -curricular activity outcomes. Just as for 68 Research Question 1, these low variance explained values mean that the sets of predictors were not strong predictors of co -curricular activity participatio n. Since there was not a robust body of literature about predictors of co -curricular activity participation, this finding adds to the literature by suggesting that other combinations of predictors should also be explored. It is interesting that whereas Mod el 2a was most predictive in terms of RMSE, the variance explained for this model is not similarly high. Indeed, Model 2c shows a higher variance explained than both Model 2a and Model 2d, whereas Model 2c was the worst performing model in terms of RMSE. One possible explanation for this apparently discrepant finding is that these two approaches to error measurement require calculation of the modelÕs error in different ways. The two approaches converge on not suggesting that any of the four models associate d with Research Question 2 are very predictive of the outcome (i.e., both RMSE and R 2 indicate levels of error that are higher than desired). Whereas RMSE is commonly used for random forest models, variance explained as measured by R 2 is not as commonly ca lculated for random forest models. For the purpose of this study, I will focus on the models that emerged as most predictive as informed by the RMSE value as that is the convention of the modeling approach, but it should be noted that even these most -predi ctive models are not very predictive of the co -curricular outcomes measured here. As with Research Question 1, a higher value for variable importance signifies a more important variable, as noted by the number of times it was selected as a branch node ac ross the trees in the random forest. At each node, within each tree, within each forest, five variables were considered as possible splitting points. In terms of variable importance, I conducted the same averaging procedure as describe for Research Questio n 1, again grouping the variables into Demographic variables, Motivation variables, and Co -Curricular variables. See Table 22 for the variable importance scores for this model. As with research question 1, the demographic variables arose 69 as the most import ant in their prediction of co -curricular activity participation in the third year, with an average variable importance ranging from 55.9 (Model 2d) to 73.4 (Model 2a). Interestingly, motivation variables were about equally as predictive of co -curricular ac tivity participation in year 3 as were prior levels of co -curricular activity participation. Indeed, the motivation variables had an average variable importance ranging from 14. 2 (Model 2d) to 40.3 (Model 2a) whereas the co -curricular variables had an aver age variable importance ranging from 19.7 (Model 2d) to 37.2 (Model 2b). Partial dependence plots were generated for Models 2a and 2d because they were most predictive as measured by RMSE (see Figures 7-16). Whereas two partial dependence plots were create d for each model in Research Question 1, there are five partial dependence plots for each model in Research Question 2 because the re are five outcomes (e.g., attendance at advising, career center event s, career center advising, mock interviews, and math tu toring center). For Model 2a, the most predictive variables were gender, underrepresented minority status, and ACT score. An examination of the partial dependence plots reveals that female students were more likely to attend advising, career center events , career center advising, mock interviews, and math tutoring than men. This finding is particularly interesting because it suggests that women utilize support programming resources at a higher rate than men do. The relation between underrepresented minorit y (URM) student status and utilization of programmatic resources differed by resource. URM students were more likely than their counterparts to utilize academic advising, career advising, the math tutoring center, but less likely than their counterparts to utilize career center events and mock interviews . An encouraging aspect of these findings is that it seems that underrepresented students are aware of support resources on campus, and that they utilize such resources at a higher rate than their counterpar ts. One possible explanation for differences in resource participation by demographic characteristics 70 is that these resources might be marketed in a way that appeals to underrepresented groups, such as women and underrepresented racial/ethnic group members . For Model 2a, the findings with respect to prior achievement as measured by the ACT were variable by resource. For academic advising, students who scored low on the ACT were least likely to attend, whereas students who scored high on the ACT were more li kely to attend. The same general pattern was observed for studentsÕ attendance at career center events and for mock interview participation . The opposite pattern was observed for math tutoring: the students who scored the lowest on the ACT were most likely to attend tutoring, and the high -scoring students were the least likely to attend. For career advising, a U -shaped curve was observed, such that low -achieving and very high -achieving students are more likely to seek out career advising, with students who score d in the high 20s on the ACT being the least likely to seek out career advising. This is an interesting trend because it suggests tha t the relation between prior achievement and likelihood of seeking out career advising in the third year is complex. O ne possibility is that students who scored in the middle range for their ACT were not concerned about career advising , but derived more utility from career event attendance. With respect to motivation variables in Model 2a , low levels of task -value during the first year were related to higher levels of attendance at advising, career advising, and math tutoring (see Figures 7, 9, and 11). One possible reason for the relation between the two types of advising and low task -value is that students who do not pl ace a high value on engineering as a discipline may be seeking counsel as they explore other options for careers or try to learn more about what a job as an engineer would entail. The relation between math tutoring and low task -value is interesting because it suggests that even if students do not endorse statements that engineering is important to them (i.e. task value for engineering), they might value math and thus seek support 71 to succeed in that subject. Low expectancy for success in the first year was related to higher participation in mock interviews in the third year ( see Figure 10 ). For Model 2 d, the most predictive variables were gender, underrepresented minority status, ACT score , and first -generation student status . Patterns of demographic variab lesÕ effects on co -curricular activity participation were largely similar to those observed for Model 2a. Similar to the interpretation for Model 2a, it seems that generally underrepresented students (women, underrepresented racial/ethnic group members, an d first -generation college students) were more likely to utilize co -curricular resources in the third year than their counterparts. The findings for prior achievement as measured by the ACT for Model 2d were also similar to those for Model 2a. Model 2aÕs r esults with respect to low levels of task -value being related to higher levels of participation in advising, career advising, and math tutoring were again observed in Model 2d. Interestingly, though, this relation became less pronounced over time and even began to go the other direction by the third year for some outcomes (see Figures 12, 14, and 16). By the third year, both low and high task -value were associated with higher levels of participation in advising and career advising than moderate levels of ta sk-value (see Figures 12 and 14 ). Whereas for Model 2a, a negative relation emerged between expectancy for success and participation in mock interviews in the first year, in Model 2d, it became evident that that relation became less pronounced over time, w ith the relation flattening out across all levels of expectancy in years 2 and 3 (see Figure 15). The partial dependence plots for Model 2d allowed for the modeling of prior levels of co -curricular activity participation on third -year participation. Unsurp risingly, for the same outcomes, prior levels of participation in that same activity were predictive of future participation in the same activity (see Figures 12, 13, 14, 15, and 16). This effect was most pronounced for the outcome of math tutoring: partic ipating in math tutoring in 72 the first and second year was strongly positively related to continuing to attend math tutoring in the third year (see Figure 16). In summary, for Research Question 2, partial dependence plots suggest that third -year participa tion in co-curricular activities was higher for students traditionally underrepresented in engineering. Prior achievement, as measured by the ACT, was positively related to some co -curricular activities in the third year, such as career center events, and negatively related to math tutoring participation. Low task -value during the first year of college was related to higher levels of participation in some co -curricular activities, such as advising, career advising, and math tutoring. Taken together, these f indings suggest that studentsÕ background characteristics play a large role in predicting their co -curricular involvement three years after entering the engineering program . StudentsÕ motivation and engagement in co -curricular activities are also somewhat predictive of their third -year involvement , though the nature of these relations change over time . Research Question 3 . Research question 3 examined the predictive power of motivation and co -curricular experience participation on end -of-third year engine ering persistence (see Table 3). A series of five random forest models addressed this research question. All five random forest models had a single outcome: a dichotomous measure of whether the student was still enrolled in an engineering major at the end of the third year of college (late spring, 2018). As with the prior research questions, I included measures of gender, race, first -generation student status, and prior achievement in all models. Models 3a and 3b represent the first year of college, with Mo del 3a including Fall 2015 predictors and Model 3b including Spring 2016 predictors (see Table 3). Model 3c included second -year predictors, whereas model 3d included third -year predictors. Finally, Model 3e includes predictors from all three years, with a ll indicators having been measured before the persistence measure. 73 With respect to research question 3, the outcome was dichotomous rather than continuous. Thus, it followed logically to use a different indicator of model performance than RMSE. Rather than evaluating RMSE for model performance, the area under the curve was used. To evaluate the area under the ROC curve, a value of .50 would indicate chance levels. Values above the diagonal would indicate higher levels of performance . The area under the curv e is abbreviated ÒAUCÓ and the AUC values for all models related to Research Question 3 can be found in Table 23 . Model performance was roughly the same for Models 3a and 3b (AUC = .5 6 for Model 3a, and AUC = 0.6 1 for Model 3b). Additionally, model perform ance was roughly the same for Models 3c, 3d, and 3e (AUCs as follows: 3c = .7 2, 3d = .75, 3e = .7 7). There is not an absolute level of AUC values that is recommended for absolute model fit. However, this pattern of findings suggests that in terms of predic ting persistence, more proximal predictors (used in Models 3c, 3d, 3e) are stronger in predicting persistence than more distal predictors of persistence from the first year of college (Models 3a, 3b) . For Research Question 3, the highest values for R 2 were for Models 3c, 3d, and 3e (see Table 24 ). While the R 2 values are modest, comparing the R 2 values across all models in this study reveals that those three models predicting persistence explain the most variance of all the models across all research questi on. Thus, the most compelling findings of all the models in the study are the robustness of the models predicting post -third -year persistence using second -year predictors (Model 3c), third -year predictors (Model 3d), and predictors from all three years (Mo del 3e). Even so, these variance explained values are relatively low. The most predictive model (Model 3e) explains about 21% of the variance in the outcome. Thus, the variables selected as predictors for this study may not thoroughly explain the variance in the outcome overall, and other predictor variables should be examined in future studies. 74 As with research questions 1 and 2, I examined the variable importance for the predictors entered into this model. All the variable importance scores can be found i n Table 25 . As with the previous research questions, I calculated average variable importance scores for the variables in each categor y. Following the same trend as for research questions 1 and 2, the demographic variables were on average the most importan t, with an average variable importance score of ranging from 113.7 (Model 3e) to 162. 8 (Model 3c). The co -curricular predictors were on average the next most predictive, with an average variable importance of ranging from 12. 2 (Model 3d) to 22.7 (Model 3e) . Finally, the motivation predictors seemed to be the least important in predicting third year persistence, with an average variable importance score ranging from 4. 7 (Model 3e) to 24.0 (Model 3 a). Model 3e included all predictors: motivation, demographi c, and co -curricular. While the demographic predictors were the first, second, third, and fourth most important variables in the model, one interesting finding is that the fifth most predictive factor is on -campus residence in the first year, with a variab le importance score of 61.0. Math tutoring in the first year is the sixth most predictive factor in this model. Taken together, these two findings suggest that the first -year experiences may be particularly impactful on persistence in the third year, altho ugh demographic variables also certainly play a role in predicting persistence. Partial dependence plots were generated for all five models related to Research Question 3 (Models 3a, 3b, 3c, 3d, and 3e). Since only one outcome was examined for Research Qu estion 3, only one set of partial dependence plots was required for each model (see Figures 17 -21). Across all five models, women, underrepresented minority students, and first -generation students were less likely to persist (see Figures 17 -21). Unsurprisi ngly, prior achievement, as measured by the ACT, was positively related to persistence (see Figures 17 -21). Living in a residence hall 75 during the first year was also positively related to persistence, though this relation was small (see Figures 17 and 21). The most predictive models were Models 3c, 3d, and 3e. For Model 3e, motivation levels appeared relatively flat in terms of their predictive power over the outcome until the third year of college (see Figure 21). This means that across all possible value s of motivation as measured across the first two years of college, there is not a large difference in persistence. By the third year, however , a positive linear relation emerges between Time 4 expectancy and persistence as well as between Time 4 task -value and persistence. I n Model 3e, most of the co -curricular activity predict ors d id not show strong relations with the persistence outcome. For example, in Model 3e, math tutoring , career advising attendance , and mock interview attendance all have nearly flat horizontal lines on their partial dependence plots. Some co-curricular events showed a non-linear relation with persistence, such as career event attendance . In Model 3e, attending one or two career events was related to higher levels of persistence, but there is no additional increase in persistence at higher levels of career event attendance. Academic advising showed a similar pattern: an increase in persistence at low levels of advising, but a less pronounced effect after a break point. Similar patterns emerged for Models 3c, 3d, and 3e. Robustness Checks As a robustness check, I conducted the same models (same code) on two of the other imputed datasets generated by the multiple imputation procedure. I report the findings of these robustness checks in t ables 10, 11, 12, 13, 14, and 15. The primary purpose of these robustness checks was to see whether the models exhibited similar patterns of variable importance in the other imputed datasets as they did within the imputed dataset that I randomly selected ( of five possible imputed datasets) upon which to run my analyses. Thus, I re -ran all models for all 76 research questions, first on a second imputed dataset (the first being the one I ran the main analyses on) and then on a third imputed dataset. In so doing, I could re -create the pooling that would have taken place had the model specification allowed it, because in traditional multiple imputation a pooled dataset is created. Across all models, the findings are essentially the same for the other imputed datase ts as for the main analyses. While some small differences emerged across the imputed datasets, these differences did not represent a substantive change in interpretation of the findings. If there had been a way to pool the imputed datasets together, I hyp othesize that the main findings would have been consistent with the findings I reported above. In the upcoming chapter, I will explain the implications of these findings for future research and practice. 77 Chapter 5 : Discussion The findings of this study s uggest that demographic variables contribute most strongly to models predicting motivation, co -curricular activity participation, and persistence. There were differential findings across different years of college for each research question. The strongest predictors of engineering expectancy and task -value were variables measured in the second year (see Table 17 , Model 1c) and the predictors measured throughout the first three years (see Table 17, Model 1e) . Experiences throughout the first semester of coll ege were somewhat predictive of co-curricular activity participation (see Table 20; Model 2a). By contrast, in predicting persistence in college, the models including first -year indicators were least predictive of the persistence outcome (see Table 23 ). Mo dels including second -year only indicators, third -year only indicators, or indicators from all three years were equally predictive of the persistence outcome, suggesting that the second and third year of college are instrumental in informing studentsÕ juni or year persistence decisions in engineering (see Table 23 ). Across all models, demographic predictors of gender, race/ethnicity, ACT score, and first -generation student status were shown to be on average most important in predicting the outcomes. In term s of overall predictive power, the RMSE values for Research Question 1 and were lower (and thus more predictive of the outcome) than the RMSE values for Research Question 2. Since RMSE is on the scale of the variable measured, there is not an absolute thre shold by which to compare this Ð but the values that emerged in this study indicate strong model performance because they are low (see Table 17 ). However, since the variance explained (R 2) for the models related to Research Question 1 and 2 were small, the findings with respect to those research questions are interpreted more cautiously below than are the findings from Research Question 3. Indeed, it is likely that motivation and co -cur ricular experiences are not as strongly related to one 78 another as was or iginally proposed. This conclusion is supported by the lack of strong bivariate correlations among motivation variables and the co -curricular variables (see Table 9). For Research Question 3 , the AUC is a measure of predictive accuracy because it examines a dichotomous outcome: persistence. With the AUC, a value of 0.5 would indicate chance levels, so values closer to 1 are better. An AUC of 1 would mean perfect predictive accuracy. The model performance yielding a value of about .75 for an AUC indicates th at Models 3c, 3d, and 3e strongly predicted persistence (see Table 23 ). Below, the findings specific to each research question are discussed, limitations of the study are presented, and implications for are discussed . Research Question 1 Research Question 1 was focused on the predictive power of co -curricular support experiences throughout the first three years of college in predicting engineering expectancy and task-value at the end of the third year of college. In order to address this research question, I considered both (1) predictive power of each estimated model in predicting the outcome and (2) relative importance of each variable in predicting the outcome. With respect to predictive power, the primary finding for Research Question 1 was that second -year indicators and indicators from across all the first three years of college contributed to the model s with the lowest RMSE (highest predictive power) in predicting engineering expectancy and task -value. This finding potentially suggests that studentsÕ e xperiences and motivation in the second year may play a role in predicting their third -year motivation, but it also suggests that their experiences and motivation across the full first three years of college may have cumulative effects on their third -year motivation as well. With respect to relative variable importance, the most predictive models (Models 1 c and 1 e) showed patterns such that demographic variables were most important in predicting outcomes, followed by motivation predictors, followed by co -curricular predictors , 79 though motivation and co -curricular predictors were similar in their predictive power (see Table 19). Prior research suggested that co -curricular activity participation supports STEM persistence (Grillo & Leist, 2013; Laskey & Hetzel, 2011; Metzner, 1989; Schudde, 2011; Pascarella & Terenzini, 2005 ; Restubog et al., 2010 ); however, the mechanism by which this relation may occur is unclear. One hypothesis explored in the current study was that co-curricular activity participation would lead to increased motivation by the end of the third year of college. However, for the most predictive models (Models 1c and 1e), co-curricular activities were not strong predictors of motivation at the end of the third year of college. Instead, gender, URM status, and ACT score most strongly predicted motivation at the end of the third year (see Table 19). The finding that males have higher expectancy aligns with prior research showing that expectancy in math was rated higher by males, even when achievemen t was equivalent with females (Correll, 2001; Nagy et al., 2008). The findings for this research question suggest that studentsÕ background characteristics may play a role in predicting their motivational outcomes in the third year of engineering. While this finding might seem discouraging because background characteristics are immutable, this evidence could be utilized to make a compelling argument to university administrators that additional supports should be provided to bolster motivation of underrepre sented groups. For example, the evidence that women and underrepresented racial/ethnic minority students had lower motivation at the end of the third year of college suggests that the college experience could be undermining motivation for these groups. Thi s evidence could thus be used to support the provision of additional support programming and resources for underrepresented groups in engineering that might help buffer them against those declines in motivation. 80 It is perhaps unsurprising that prior achie vement has predictive power over motivation outcomes at the end of the third year; indeed, that is part of the rationale for using such an exam as a criterion for entrance to the university. University administrators might consider the possibility that sup porting motivation in engineering for lower achieving students would be a worthy goal. Indeed, the positive feedback high -scoring students receive from doing well on the ACT might signal a higher level of preparedness for the rigors of the engineering curr iculum. On the other hand, some have suggested that the ACT may not be the best predictor of college success. It is important to consider the ACT as just one assessment measure, and not to try to project studentsÕ future progress based on the one exam. Rat her, for admissions and advising purposes, studentsÕ comprehensive records should be considered in order to protect against overlooking students whose potential might not be reflected by the test . Overall, w hen considering the findings for Research Questio n 1 in the context of the broader literature, it is important to remember that these findings are small and that it is likely that many other unmeasured variables work to inform student motivation in the third year of college. Research Question 2 Research Question 2 was focused on the predictive power of engineering expectancy and task-value throughout the first two years of college in predicting participation in co -curricular support activities throughout the third year of college. As with Research Questi on 1, I examined both RMSE values and variable importance to investigate this research question. The RMSE values indicated that Model 2a was most predictive of co -curricular support engagement in the third year of college, and Model 2 d was nearly as predic tive (see Table 20). The variable importance measures indicated that, as with Research Question 1, the most predictive models (Models 2a and 2 d) showed that the demographic variables were the most important predictors, 81 followed by motivation predictors, fo llowed by co -curricular predictors (see Table 22 ). Even though these were the most predictive models, the predictive power was small, so the following findings are to be interpreted bearing that in mind. Prior research suggested that men may be less likely to pursue co -curricular support resources than women, due to perceptions of masculinity norms (Wimer & Levant, 2011). In high school students, help -seeking at school was more psychologically costly for males than for females (Kessels & Steinmayr, 2013). T he current study provides some preliminary evidence to show that these gendered patterns in help -seeking persist into college, since women were more likely than men to participate in all the co -curricular activities: academic advising, career center events , career advising, mock interviews, and math tutoring (see Figures 7-16). It is important for faculty and university administrators to be aware of this trend, so that they might encourage struggling male students to utilize support resources when needed. The examination of the directionality of the relationships observed in Research Question 2 suggested that students who are from racial/ethnic groups that are traditionally underrepresented in engineering were more likely to participate in some co -curricular support activities in their third year than their counterparts. Prior research suggested that if students belong to an underrepresented minority group and they feel stereotype threat or perceive the university environment as unwelcoming, then they are les s likely to utilize academic support resources (Winograd & Rust, 2014). The participation of underrepresented minority students in academic advising, career advising, and math tutoring at disproportionately high rates as compared to their counterparts coul d be accounted for by the fact that students at this university do not feel stereotype threat and do feel the university to be a welcoming place. However, this study does not provide any evidence about the university climate or stereotype threat, so it is not 82 possible to state with certainty whether the campus climate contributed to URM studentsÕ differential participation in some co -curricular activities as compared to others. Another possibility is that these rates of participation differ across the years of school, such that underrepresented students are not always aware of these resources. It is possible that the advertising for the co -curricular resources such as advising, the career center, and the math tutoring center increases participation over time . Additionally, some co -curricular resources that were not measured in this study are designed specifically to support persistence for under represented students. Some students who were participants in the current study could have been participating in co -curricular resources that were more tailored to their needs, rather than the all -university co -curricular activities studied here. A future study could explore the longitudinal experiences of underrepresented students with on -campus resources. The findings for prior achievement suggest that some resources were related to prior achievement, whereas others were not. Students with high scores on the ACT were more likely to attend academic advising, career center events, and mock interviews. This might be becau se students who are high achievers are more likely to engage in co -curricular activities that are academic in focus. The finding that students who scored lower on the ACT were more likely to seek math tutoring is contradictory to Halcrow & IiamsÕ (2011) fi nding that students in low -level college math courses were reluctant to ask a tutor for help for fear of sounding foolish. However, the willingness of this group of students to seek math tutoring is promising because it suggests that the university is prov iding a resource that struggling students find helpful. The findings with respect to motivation support this idea that struggling students might utilize co -curricular support resources at a higher rate than their peers. Models 2a and 2d suggest that studen ts with low levels of expectancy or value may seek out co -curricular support activities to help direct their efforts. 83 For example, advising and career advising seem to be resources utilized most by students who have low expectancy for success or low task -value for engineering. The implication of this finding is that advisors are turned to in moments of low motivation, perhaps to provide support for the path to stay in engineering, or to explore options outside engineering. It would be interesting for a futu re study to explore the relations between motivation and frequency of advising attendance at a college or university where advising appointments are required rather than optional, in order to further unpack this relation. The findings with respect to prior levels of co -curricular activity participation being positively related to later levels of co -curricular activity participation suggests that if the programs are beneficial, universities might consider marketing to first -year students specifically. This s tudy provides preliminary evidence that doing so might catalyze some momentum and foster engagement over the upcoming years as well. Overall, for Research Question 2, some findings aligned with prior research (i.e., findings with regards to gender differ ences in utilization of support resources) while other findings were surprising (i.e., findings with regards to URM students participating in co -curricular support activities at higher rates than their counterparts ). However, the overall variance explained values that describe the models in this research question were quite low. Many other factors not studied here likely inform studentsÕ co -curricular activity participation. Research Question 3 Research Question 3 was focused on the predictive power of e ngineering expectancy , task-value, and co -curricular activity engagement across the first three years of college in predicting engineering persistence at the end of the third year. While predictive power and variable importance were considered as they were for the prior two research questions, the 84 approach differed because there was a dichotomous outcome (persistence). Specifically, I examined the area under the curve (AUC) for these Models 3a through 3e rather than evaluating RMSE values. The AUC values in dicated that Models 3c, 3d, and 3e performed about equally well in terms of predicting persistence. These models corresponded to the second year, the third year, and to all three years, respectively. Just as with the other models, variable importance value s were also examined. The variable importance measures indicated that demographic variables were the strongest predictors of persistence, followed by co -curricular engagement, followed by motivation (see Table 25 ). Whereas prior research linked task -value and expectancy for success with intentions to persist and actual persistence in STEM (Jones et al., 2010; Perez et al., 2014; Robinson et al., 2018; Ball et al., 2016), the current study found demographic variables to be more predictive of persistence than motivation predictors (see Table 25 ). One important distinction between prior research on task -value and expectancy for success as they relate to persistence and the current study is that in much of the prior research, demographic variables were included as control variables. The purpose of studies in this tradition is to understand the unique relations between the two types of motivation and persistence, without the contribution of demographic variables. Thus , prior research did not attempt to examine the contributions of demographic and motivation variables in informing persistence in the same way that the current study did. One benefit of the analytical approach of this study is that the random forest allow ed for the consideration of demographic variable s alongside motivation variables, to determine additional nuances of these relat ions. This fundamental modeling difference allowed for the current study to provide additional context about the strong role of demographic variables in predicting persistence, which is not foregrounded by other studies. 85 Whereas co -curricular activities such as academic advising (Metzner, 1989), living on campus (Schudde, 2011), utilizing career counseling services (Restubog, et al. , 2010) and attending tutoring sessions (Grillo & Leist, 2013; Laskey & Hetzel, 2011) are related to persistence in prior literature, the current study does not provide robust evidence for co -curricular activities being strong predictors of persistence. Among the co -curricular predictors that did emerg e as important for Model 3e, on -campus residence and math tutoring were most predictive of persistence. This finding aligns with prior literature that provides evidence for the importance of living on campus (Schudde, 2011) and attending tutoring (Laskey & Hetzel, 2011) in supporting persistence, especially for students who are academically at -risk (Laskey & Hetzel, 2011). The relative variable importance of the demographic variables across all the models of persistence highlights the key role that studen t characteristics play in predicting their persistence in engineering. Specifically, the current study provides evidence that members of groups that are historically underrepresented in engineering Ð women, first -generation students, and underrepresented m inorities Ð are less likely to persist in engineering than their counterparts. However, one encouraging aspect of the studyÕs results is that the partial dependence plots show that the magnitude of this effect is small (see Figures 19, 20, and 21). Thus, w hereas underrepresented groups do show lower persistence, the gap is not so large as to be insurmountable. One goal of this dissertation was to establish whether certain events were more or less predictive of persistence than others. An examination of the partial dependence plots and variable importance plots together suggests that motivation and co -curricular activity participation contribute in similar proportions to explain variation in the persistence outcome. 86 The lack of a strong relation between motiv ation and persistence is surprising. However, Model 3eÕs findings that motivation in the third year is most predictive of persistence at the end of the third year suggests that proximal indicators of motivation might be useful for predicting persistence, r ather than early levels. Some co -curricular events arose as more important than others in predicting persistence: on -campus residence and math tutoring had particularly pronounced effects (see Table 25 , Model 3e). The implication of this finding is that so me first -year experiences Ð like living on campus Ð might be supportive of persistence. Interestingly, math tutoring in the first year is also most predictive of persistence, with its impact declining over time (see Table 25 , Model 3e). This indicates that studentsÕ early college experiences might play a role in informing their persistence many years later, as they form habits of engaging with co-curricular activities. Future research should explore this in further detail. Themes across Research Questions One unique contribution of this dissertation is that it considers both motivation constructs and co -curricular constructs in concert to explore relative effects of both types of predictors on the outcome of persistence. Whereas prior literature did not expl ore both motivation and co -curricular variables simultaneously, there was evidence to suggest that motivation constructs and co-curricular activity participation each were related to persistence. For the overarching question of how both types of variables work to predict persistence, demographic variables emerged as the most predictive of the outcomes. Across all the research questions, demographic variables emerged as a consistently strong predictor of outcomes. There are many potential reasons for this p attern. First, women are underrepresented in engineering, so gender might be a particularly salient factor in the development of motivation, the decision to pursue various co -curricular activities, and the 87 decision to persist or not in the engineering fiel d. Second, prior research about womenÕs degree completion in science and engineering suggests that differential rates are present by gender: of 376,825 STEM bachelorÕs degrees conferred in the 2016 -2017 academic year, only 134,563 (35.7%) went to women (NC ES, 2018). Even so, some studies find that women are more likely to persist in STEM than men are (Brainard & Carlin, 1998). It is possible that one way to address the dearth of women graduating with science and engineering majors is to focus more on recrui tment and ÒmigrationÓ into STEM majors than on retention within STEM majors (Ohland et al., 2008). Students were least likely to migrate into an engineering major as compared with other majors (Ohland et al., 2008), although one possible reason for this is more stringent entry requirements for engineering majors. Third, the experiences of marginalized students on college campuses are different from those of the majority member peers. In engineering, women are not the only underrepresented group: f irst -gener ation college students and members of the racial/ethnic minority are also underrepresented . Whereas this study provides some preliminary evidence that URM, first -generation, and female students participate in co -curricular activities at higher rates, it re mains to be explored whether they are engaging with these resources in order to buffer themselves against what they perceive to be an unwelcoming environment. Since the current study did not address studentsÕ perceptions of stereotypes or institutional cli mate, it is not possible to explain here the relation between studentsÕ cognitions and their decision to engage in co-curricular activities. Future work should explore the potentially supportive role of participating in co -curricular activities in increasi ng studentsÕ feelings of belonging. The emergence of gender , race, and first -generation student status as consistent predictor s of motivation and co -curricular activity participation across models provides some preliminary evidence to suggest that two pote ntial areas in which these groups may differ are in their 88 engineering motivation and in their proclivity to participate in co -curricular activities surrounding engineering. Whereas the current study cannot provide an explanation of the causal mechanism beh ind differential persistence based on gender, race, or first -generation student status, it would be valuable for future research to consider this mechanism. Limitations This study was limited not only in the traditional ways (e.g., because it is not gener alizable in all contexts) but it also has limitations that are unique to this work. First, causal inference was limited because no direct experimental manipulation is present in this study. While machine learning has much to offer the field of educational psychology, I could not utilize it to make definitive statements about whether co -curricular program involvement causes persistence. Even so, it is still useful to consider the relations among motivation, co-curricular programs , and persistence because a c orrelational relation would still be of interest. A second limitation of this work is that it is impossible to know the true thoughts and rationale behind the decision not to attend co -curricular support activities. This is outside the scope of this stud y, but it is important for future work to explore, perhaps qualitatively, why some students choose not to engage in support services. Such knowledge might be quite helpful in terms of providing the university with information about how to better support st udents who might be at risk of dropping out from the university. Thus, other risk factors might be identified that perhaps cause both the lack of participation in support activities and future research should explore that possibility. A third limitation is that not all educational experiences are included in my model. While one strength of my model is that it can explore nonparametric relations among more variables than can traditional educational psychology models, it is still true that it would be impossi ble to 89 include every possible co -curricular experience. Thus, it is important to acknowledge this limitation and to consider that I could only evaluate the programs that are included in this study in this work. In a similar vein, the study is limited by th e data that are available. More specifically, data were missing at particular intervals due to either the data not having been preserved when the university moved to a new system (as in the case of the engineering career services data) or due to the lack o f data collection (as in the case of first -year advising). This is a limitation because it is possible that first -year advising and first -semester career service interaction might have a unique effect on studentsÕ motivation and persistence. By including i n my models other measures from studentsÕ first year of college (e.g., residence on campus and participation in Math Learning Center tutoring), I still explore d the possibility that first -year activities are differentially impactful as compared to later en gagement in college. Future studies could explore the specific effects of first -year advising and first -year career center interactions on student motivation and persistence. The bivariate correlations shown in Table 9 between motivation predictors and co -curricular predictors are relatively small and many are not statistically significant. Thus, all findings with regards to the relations between motivation and co -curricular activities should be interpreted as preliminary and further research should address these relations. The correlations between persistence and the predictor variables (demographics, motivation, and co -curricular activities) are more robust. Another limitation of this work is in its inclusion of demographic variables such as race, gender, ethnicity, and family SES as variables with in each model rather than considering whether the models buil t here function in a different way for students with different backgrounds. College p ersistence is affected by race/ethnicity (Elliott et al., 1996), ge nder (Bell 90 et al., 2003; Beasley & Fischer, 2012), parental educational attainment (McCarron & Inkelas, 2006), and family SES (Horn & Premo, 1995). Model complexity is a factor that was relevant in this decision to include demographic variables as variable s within each model rather than to model group differences based on demographic variables, as in a multi -group structural equation model . However, modeling based on group differences is a promising future direction for research, as these demographic variab les emerged as strong predictors of the outcomes studied here. A confounding factor in studying co -curricular activity participation is that it is possible that at -risk students might benefit differentially from support sources than non -at-risk students (Laskey & Hetzel, 2011); this could muddle the interpretation of findings. The implications of this are that colleges' provision of blanket resources could be more or less helpful for people depending on their circumstances and risk level. There are also imp lications if colleges were to provide tailored support only for at -risk students, because to do so might make them feel more stigmatized. While I considered modeling group differences, the complexity of the models I develop ed here would have made modeling group differences quite difficult. Specifying a multivariate random forest model in the way that I did here is a complex endeavor that entails many relations. To truly model group differences, I would have wanted to examine whether there were differences, by demographic characteristics, on initial levels of motivation and engagement in co -curricular activities. However, I would have also want ed to examine whether the timing mattered in different ways for different groups: e.g., whether first -year tutoring i s more beneficial for women than for men. A thorough investigation of group differences would also examine every level of every variable, which adds another layer of complexity. To check for 91 group differences would involve examinations of partial dependenc e plot for each interaction, which would quickly take the number of plots examined above 100. As such, the current work set the stage for future work to explore these group differences at a more nuanced level. For this study, I delineated my research quest ions with relation to expectancy and value, but not cost. Now that I have begun to explore the nature of the connections among expectancy, value, co -curricular activities, and persistence, a future study could examine whether similar or distinct patterns a rise for cost. Last, whereas an original goal for this study was to be able to model motivation, co -curricular activity participation, and persistence as part of a dynamic system, the constraints of the modeling approach did not allow for the true dynamic system modeling. It is important for future work to explore other methodological approaches that would allow for both differential effects for different people over time and differential effects of the same activity or type of motivation over time. By cha racterizing the college student's experience as a dynamic system, multiple explanations may emerge about the differential effects of different experiences depending upon the intervals at which students engage in those activities and depending on individual differences in motivation. Implications for Future Research One strong contribution of this study to educational psychology research is that it offers a new methodological approach to the consideration of motivation and behavior in education. The concept of random forest analysis is new to many educational researchers, but it offers the potential to unlock new insights that would previously be unavailable to them. For example, by using random forest, we glean much more information from a single dataset tha n if we had only done the traditional analyses on the full sample. By allowing a random forest model to explore 92 the parameter space, connections among variables are revealed that would not be possible in a traditional linear framework. Since a random fores t model is an aggregation of a series of smaller models (decision trees), it is a robust estimation procedure that can be more effective at prediction than a single model alone. For this specific study, it was possible to test relations among motivation va riables, co -curricular activity variables, and an indicator of persistence in a non-parametric fashion such that both linear and non -linear relations emerged. If a specific linear model had been hypothesized, it would not have been possible to discover the non-linear trends among motivation and co -curricular activities that emerged in Research Question 2. The combination of machine learning and cross -validation is an additional contribution to education research. Of course, the issues of single -sample gener alizability are ubiquitous. However, by using cross -validation in conjunction with a machine learning framework, it is possible to see whether the model estimated is robust to out -of-sample testing. While some educational researchers employ cross -validatio n methods already, to do so in the context of machine learning might help strengthen researchersÕ confidence in a model which could appear Òblack box.Ó If more educational researchers employed cross -validation methods Ð even if they do not do so with machi ne learning Ð their findings would be more robust and their insights that they provide to other researchers and to the field in general would be more meaningful. Indeed, machine learning comprises a wide arra y of methods that vary in complexity, and other machine learning approaches could be utilized to study even more complex psychological models. Implications for Practice A primary aim of this study was to be able to provide concrete recommendations to institutional planners and administrators with respec t to the utility of co -curricular activity participation in supporting engineering student motivation and persistence. For Research 93 Questions 1 and 2, the predictive power (as measured by the RMSE) and the variance explained (as measured by R 2) were quite low. Thus , implications for practice are discussed only for Research Question 3. One finding of this study is that demographic variables play a large role in the prediction of persistence. Whereas demographic predictors might seem immovable, there are acti ons that university administrators can take to help support engineering motivation, engagement, and persistence across demographic groups. For example, one study involved an intervention to support womenÕs feelings of belonging in engineering, and women wh o participated achieved higher engineering GPAs (Walton, Logel, Peach, Spencer, & Zanna, 2015). Interventions like this one can provide support for groups that may be less likely to persist in engineering. However, since the current study does not address group differences in the major constructs examined, it does not explore the efficacy of intervention work. Future work should examine the ways that different motivational and co -curricular variables might work differently for different groups. Additionally , with respect to motivation supports, expectancy and task -value appeared to be about equally predictive of persistence (see Table 25 ). This means that targeted interventions could support either or both constructs, rather than focusing on only one, in ord er to support persistence. These recommendations should be taken with caution, because the findings could be idiosyncratic to the university. Conclusion Prior research has sought to establish the precursors and outcomes of engineering student persistence (Geisinger & Raman, 2013; Litzler & Young, 2012 ), but the array of variables that contribute to this behavior is large and complex. Even still, a large amount of resources are 94 dedicated to supporting student success ( Kuh, Cruce, Shoup, Kinzie, & Gonyea, 20 08; Pascarella & Terenzini, 2005 ). The current study provided evidence that co -curricular activities and motivation are not strongly related across the first three years of college and that both co -curricular activities and motivation may be supportive of persistence in an engineering major . In the prediction of persistence, the second and third year appear t o be more important than the first year. Taken together, these findings suggest that studentsÕ motivation and co -curricular engagement behaviors differ entially predict outcomes based on studentsÕ year in school. Since demographic predictors were more meaningful indicators than other variables considered in predicting persistence, administrative efforts may be most productive if they center on supporting persistence for underrepresented students in particular . 95 Table 1. Research Question 1: Predictors of Year 3 Spring Expectancy and Value Year 1 Year 2 Year 3 Years 1 - 3 Model 1a: Fall Õ15 predictors Model 1b: Spr. Õ16 predictors Model 1c: Year 2 pre dictors Model 1d: Year 3 predictors Model 1e: Years 1, 2, 3 predictors Control Variables Gender F15 F15 F15 F15 F15 Race F15 F15 F15 F15 F15 First -Generation Student Status F15 F15 F15 F15 F15 Prior Achievement - ACT F15 F15 F15 F15 F15 Expectanc y F15 S16 S17 S17 F15, S16, S17 Value F15 S16 S17 S17 F15, S16, S17 Co-curricular Experiences (Predictors) Engineering Residence Yr1 Yr1 Career center (3 variables) Yr2 Yr3 Yr2, Yr3 Math Learning Center Yr1 Yr2 Yr3 Yr1, Yr2, Yr3 Advising Yr2 Yr3 Yr2, Yr 3 Dependent Variables Expectancy S18 S18 S18 S18 S18 Value S18 S18 S18 S18 S18 Note. The values in the cells correspond to the measurement occasion: F15 = Fall 2015; S16 = Spring 2016; F16 = Fall 2016; S17 = Spring 2017; F17 = Fal l 2017; S18 = Spring 2018. Yr1 = first year of college: Fall 2015 and Spring 2016. Yr2 = second year of college: Summer 2016, Fall 2016, and Spring 2017. Yr3 = third year of college: Summer 2017, Fall 2017, and Spring 2018. For year 3 co -curricular experie nces (included as predictors in Model 1d and 1e), ÒYr3Ó indicates experiences that took place in the summer 2017, fall 2017, and spring 2018 semesters only up through and including the week before the survey launched, so as not to have a future event predi cting the S18 survey responses for expectancy and value. Demographic predictors (gender, race, first -generation student status, and socioeconomic status: financial need) are collected only at a single time point (Fall 2015) and are used in all models. Plea se see Table 3 for the full list of co -curricular activities (e.g., for all the three variables that are used to characterize engagement with the career center). 96 Table 2. Research Question 2: Predictors of Year 3 Co -Curricular Activity Participation Year 1 Year 2 Years 1 and 2 Model 2a: F15 predictors Model 2b: Spring 2016 predictors Model 2c: Year 2 predictors Model 2d: Fall Year 1, 2, and 3 predictors Control Variables Gender F15 F15 F15 F15 Race F15 F15 F15 F15 First -Generation F15 F15 F15 F15 Prior Achievement - ACT F15 F15 F15 F15 Eng. Residence Yr1 Yr1 Career Center (3 variables) Yr2 Yr2 Math Learning Center Yr1 Yr2 Yr1, Yr2 Advising Yr2 Yr2 Motivation (Predictors) Expectancy F15 S16 S17 F15, S16, S17 Value F15 S16 S17 F15, S16, S17 Dependent Variables Career Center (3 variables) Yr3 Yr3 Yr3 Yr3 Math Learning Center Yr3 Yr3 Yr3 Yr3 Advising Yr3 Yr3 Yr3 Yr3 Note . The values in the cells correspond to the measurement occasion: F15 = Fall 2015; S16 = Spring 2016; F16 = Fall 2016; S17 = Spring 2017; F17 = Fall 2017; S18 = Spring 2018. Yr1 = first year of college: Fall 2015 and Spring 2016. Yr2 = second year of college: Summer 2016, Fall 2016, and Spring 2017. Yr3 = third year of college: Summer 2017, Fall 2017, and Spring 2018. Demographic predictors (gender, race, first -generation student status, and socioeconomic status: financial need) are collected only at a single time point (Fall 2015) and are used in all models. Please see Table 3 for the full list of co -curr icular activities (e.g., for all the three variables that are used to characterize engagement with the career center). 97 Table 3. Model 3: Predictors of Post -Third -Year Persistence in Engineering Year 1 Year 2 Year 3 Years 1 - 3 Model 3a: F15 predictors Model 3b: S16 predictors Model 3c: Year 2 predictors Model 3d: Year 3 predictors Model 3e: Years 1, 2, 3 predictors Control Variables Gender F15 F15 F15 F15 F15 Race F15 F15 F15 F15 F15 First -Generation F15 F15 F15 F15 F15 Prior Achievement - ACT F15 F15 F15 F15 F15 Predictors (Motivation and Co -Curricular Experiences) Motivation Expectancy F15 S16 S17 S18 F15, S16, S17, S18 Value F15 S16 S17 S18 F15, S16, S17, S18 Co-Curricular Experiences Eng. Residence Yr1 Yr1 Career Cent er (3 variables) Yr2 Yr3 Yr2, Yr3 Math Learning Center Yr1 Yr2 Yr3 Yr1, Yr2, Yr3 Advising Yr2 Yr3 Yr2, Yr3 Dependent Variable Engineering major S18^ S18^ S18^ S18^ S18^ Note. The values in the cells correspond to the measurement occasion: F1 5 = Fall 2015; S16 = Spring 2016; F16 = Fall 2016; S17 = Spring 2017; F17 = Fall 2017; S18 = Spring 2018; S18^ = end of Spring 2018 semester. Yr1 = first year of college: Fall 2015 and Spring 2016. Yr2 = second year of college: Summer 2016, Fall 2016, and Spring 2017. Yr3 = third year of college: Summer 2017, Fall 2017, and Spring 2018. Demographic predictors (gender, race, first -generation student status, and socioeconomic status: financial need) are collected only at a single time point (Fall 2015) and ar e used in all models. Please see Table 3 for the full list of co -curricular activities (e.g., for all the three variables that are used to characterize engagement with the career center). The dependent variable of persistence is operationalized as enrollme nt in an engineering major at the end of the spring 2018 semester. 98 Table 4. Data Collection Timeline !First Year Second Year Third Year !2015-2016 2016-2017 2017-2018 !Fall Spring Fall Spring Fall Spring Self-reported Motivation ! ! Expectancy T1 T2 !T3 !T4 Value T1 T2 !T3 !T4 Co-curricular Experiences ! ! Residence in Eng. X !!!!Career Center ! X X X X Math Learning Center X X X X X X Advising !!X X X X Demographics !! Gender X !!!!!Race X !!!!!First -Generation X !!!!!Prior ach vt: ACT *** X !!!!!Persistence !!!!!Enrollment in an engineering major at the end of the semester !!!!X!Note: X = the variable was measured at the given time point. T1 = Time 1 of the survey; T2 = Time 2, etc. Prior achvt: ACT *** the prior achievement variable was collected from university records, but students took the ACT before the fall 2015 semester, as part of their college applications. 99 Table 5. Incentive Mechanisms for Each Survey Drawing Drawing + print pages Drawing + print pages + survey in class Extra Credit Course Credit Payment T1 Survey X T2 Survey X X T3 Survey X X X T4 Survey X X X Note. T1 Survey = Fall 2015; T2 Survey = Spring 2016; T3 Survey = Spring 2017; T4 Survey = Spring 2018 100 Table 6. Reliabilities for Im puted and Raw Data T1 T2 T3 T4 Imp. Raw Imp. Raw Imp. Raw Imp. Raw Expectancy .730 .831 .792 .870 .783 .897 .824 .911 Task -Value .844 .889 .876 .925 .885 .941 .901 .949 101 Table 7. Variable Processing Plan for Co -Curricular Activities Co-Curricular Activity Type Raw Log Data Variable Processing Step Processed Variable Type Residence in engineering residence hall Fall 2015 Residence Recoded into 1 if it was an engineering residence hall; otherwise coded as 0 Dichotomous Career Center Attended ca reer event Sum total count by year Continuous Met with career advisor Sum total count by year Continuous Attended on -campus mock interview Sum total count by year Continuous Math Learning Center Attended tutoring Sum total count of math tutoring vis its by year Continuous Academic Advising Attended an appointment Sum total count of advising visits by year Continuous Note. Each variable is calculated separately by semester. Log data are available with times and dates attended for data from the career center, the Math Learning Center, and academic advising. Raw data from residence in the engineering residence halls is provided by the registrar and includes a variable indicating hall of residence. 102 Table 8. Variable Creation for Co -Curricular Analyses Semesters contained in each variable Type of Data First Year Second Year Third Year Advising n/a Ð data not available U16, F16, S17 U17, F17, S18 Career Center n/a Ð data not available U16, F16, S17 U17, F17, S18 Math Tutoring Center F15, S16 U16, F16, S17 U17, F17, S18 Note . F15 = Fall 2015 semester; S16 = Spring 2016 semester; U16 = Summer 2016 semester. Note data were not available for the entire first year for academic advising or for the career center. For the Math Tutoring Center variables fo r the first year, two separate variables were created: one for each semester. For all the rest of the variables in the table, including math tutoring in years two and three, the variables were collapsed by year. . 103 Table 9 . Correlations and Descriptive Sta tistics Variable M SD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1. Ex1 3.92 0.65 2. Ex2 3.81 0.69 .19 *** 3. Ex3 3.78 0.76 .11 *** .33 *** 4. Ex4 3.69 0.84 .06 * .24 *** .37 *** 5. Val1 4.12 0.52 .56 *** .17 *** .11 ** .09 ** 6. Val2 4.01 0.57 .16 *** .59 *** .23 *** .18 *** .26 *** 7. Val3 4.02 0.64 .07 * .24 *** .58 *** .29 *** .11 *** .41 *** 8. Val4 3.89 0.72 .05 .22 *** .28 *** .62 *** .14 *** .27 *** .37 *** 9. Ad2 0.93 1.18 .01 .08 ** .00 .05 .07 * .11 *** .06 * .10 ** 10. Ad3 0.47 0.79 .07 * -.01 .05 .10 ** .05 .07 * .13 *** .16 *** .33 *** 11. CEv2 1.41 1.90 .06 * .04 .05 .11 ** .06 * .11 *** .11 *** .15 *** .11 *** .10 ** 12. CEv3 1.66 2.23 .03 .02 .09 ** .08 * .00 .11 *** .15 *** .15 *** .13 *** .13 *** .49 *** 13. CAd2 0.23 0.67 -.03 .01 .01 -.05 -.03 .00 -.01 -.08 * .13 *** .10 ** .19 *** .16 *** 14. CAd3 0.17 0.54 .00 .02 .01 -.01 -.03 .04 .05 -.02 .08 * .08 * .11 *** .28 *** .23 *** 15. Mo2 0.02 0.12 .02 .01 .03 .03 .00 .01 .02 .01 .01 -.02 .13 *** .19 *** .17 *** .13 *** 16. Mo3 0.02 0.14 -.01 .03 .05 .06 * -.01 .00 .00 .05 -.01 .03 .15 *** .23 *** .11 *** .04 .15 *** 17. Mat1 6.33 12.17 -.04 -.04 -.04 .02 -.03 .00 .03 .08 * .09 ** .04 .15 *** .21 *** .03 .08 ** .07 * .05 18. Mat2 6.76 15.14 .02 -.02 .03 -.02 .02 .04 .06 * .06 * .20 *** .13 *** .12 *** .14 *** .12 *** .09 ** .04 -.01 .37 *** 19. Mat3 1.16 6.76 .00 .00 -.02 -.04 -.01 .00 -.02 .00 .12 *** .11 *** -.02 .03 .01 .07 * -.01 -.01 .21 *** .41 *** Note. M and SD are used to represent mean and standard deviation, respectively. These are descriptive statistics and correlations for only participants retained in the main analysis (n = 1,044), not including students who w ere dropped from the analysis because they had some missing data. The composite variables are reported for motivation, and these composites were calculated after imputation because that was the dataset the main analyses were computed on. *** p < .001, ** p < .01, * p < .05. Ex1 = Expectancy time 1; Val2 = Task -value time 2; Adv2 = Academic Advising year 2; CEv = Career Event Year 2; CAd = Career Advising; Mo = Mock Interview; Mat = Math tutoring center; Res = on -campus residence; Gen = Gender (1 = Female ), ACT = ACT score; FG = first -generation student status (1 = first -generation). 104 Table 9 (contÕd) Correlations and Descriptive Statistics for Motivation and Co -Curricular Variables with Demographic and Achievement Variables Varia ble M SD 1. Ex1 2. Ex 2 3. Ex3 4. Ex4 5. Val1 6. Val2 7. Val3 8. Val4 9. Ad2 10. Ad3 11. CEv2 12. CEv3 13. CAd2 14. CAd3 15. Mo2 16. Mo3 17. Mat1 18. Mat2 19. Mat 3 20. Res 21. Gen 22. ACT 23. FG 20. Res 0.78 0.41 .06 .02 -.00 .01 .08 * .00 -.02 .05 .03 .01 .05 .09 ** .02 -.01 .05 .01 -.01 .03 .02 21. Gen 0.27 0.45 -.04 -.08 * -.08 * -.09 ** -.05 -.01 -.01 -.05 .03 .05 .13 *** .12 *** .03 .02 .05 .05 .15 *** .05 .05 -.02 22. ACT 27.05 3.75 .03 .04 .06 .13 *** -.01 .01 .04 .09 ** -.06 .03 .16 *** .14 *** .00 .06 .05 .09 ** -.10** -.23 *** -.19 *** -.05 -.06 23. FG 0.18 0.39 .03 .00 -.03 -.07 * .05 .01 -.06 -.08 ** .02 -.04 -.06 * -.05 -.02 .02 -.06 -.01 -.05 .03 .06 -.04 .01 -.23 *** 24. Pers 0.61 0.49 .00 .11 ** .14 *** .24 *** .04 .17 *** .22 *** .31 *** .25 *** .33 *** .33 *** .39 *** .05 .11 ** .07 * .11 ** .09 ** .14 *** .01 .07 * -.02 .25 *** -.13 *** 105 Table 10. Robustness Check for Variable Importance on additional imputed dataset #1: Research Question 1, predicting Expectancy and Value at the end of Year 3 !!!Model 1a Model 1b Model 1c Model 1d Model 1e Demographic Predictors Gender 99.4 92.8 56.2 63.6 27.8 URM 51.8 52.4 46.4 46.4 25.0 ACT Score 51.2 45.8 39.0 42.6 23.2 !!First -Generation 33.6 30.8 34.4 36.4 23.4 !Demographic Average 59.0 55.5 44.0 47.3 24.9 Motivation Predictors Exp: Time 1 42.2 -- -- -- 15.8 Exp: Time 2 -- 43.6 -- -- 15.6 Exp: Time 3 -- -- 32.6 31.8 18.2 Value: Time 1 42.8 -- -- -- 14.4 Value: Time 2 -- 40.0 -- -- 12.8 !!Value: Time 3 -- -- 31.4 33.2 16.8 !Motivation Average 42.5 41.8 32.0 32.5 15.6 Co-Curricular Predictors On-Campus Residence: Y1 29.4 -- -- -- 18.4 Math Tutoring: Y1 -- 40.4 -- -- 20.8 Math Tutoring: Y2 -- -- 23.8 -- 17.6 Math Tutoring: Y3 -- -- -- 10.8 11.6 Career Event: Y2 -- -- 36.2 -- 16.6 Career Event: Y3 -- -- -- 37.2 16.2 Career Adv: Y2 -- -- 17.8 -- 12.2 Career Adv: Y3 -- -- -- 19.0 9.2 Mock Int: Y2 -- -- 11.8 -- 7.2 Mock Int: Y3 -- -- -- 11.0 4.8 Advising: Y2 -- -- 24.4 -- 14.6 !!Advising: Y3 --!--!""!!22.8 9.6 !Co-Curricular Average 29.4 40.4 22.8 !20.2 13.2 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of expectancy and value in the third year. The low est number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 106 Table 11. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 1, predicting Expectancy and Value at the end of Year 3 ! !Model 1a Model 1 b Model 1c Model 1d Model 1e Demographic Predictors Gender 102.6 95.0 61.8 56.6 29.6 URM 50.2 51.6 44.4 49.6 25.6 ACT Score 50.4 45.2 39.6 39.6 21.0 !First -Generation 31.0 32.6 30.2 37.8 21.4 !Demographic Average 58.6 56.1 44.0 45.9 24.4 Motivati on Predictors Exp: Time 1 46.4 -- -- -- 16.4 Exp: Time 2 -- 46.4 -- -- 15.6 Exp: Time 3 -- -- 27.4 35.0 19.0 Value: Time 1 48.4 -- -- -- 12.6 Value: Time 2 -- 39.2 -- -- 14.6 !Value: Time 3 -- -- 31.2 28.6 17.6 !Motivation Average 47.4 42.8 29.3 31.8 16.0 Co-Curricular Predictors On-Campus Residence: Y1 30.6 -- -- -- 17.6 Math Tutoring: Y1 -- 40.4 -- -- 20.2 Math Tutoring: Y2 -- -- 26.4 -- 16.4 Math Tutoring: Y3 -- -- -- 15.0 13.6 Career Event: Y2 -- -- 35.2 -- 17.2 Career Event: Y3 -- -- -- 34.8 17.0 Career Adv: Y2 -- -- 22.6 -- 12.0 Career Adv: Y3 -- -- -- 18.6 10.0 Mock Int: Y2 -- -- 9.4 -- 7.2 Mock Int: Y3 -- -- -- 11.4 5.2 Advising: Y2 -- -- 23.2 -- 15.0 !Advising: Y3 --!--!--!19.8 10.4 !Co-Curricular Average 30.6 40.4 23.4 !19.9 13.5 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of expectancy and value in the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 107 Table 12. Robustness Check for Variable Importance on additional imputed dataset #1: Research Question 2, predicting Co -Curricular Activity Participation in Year 3 Model 2 a Model 2b Model 2c Model 2d Demographic Predictors Gender 146.8 153.2 114.8 74.4 URM 65.6 59.2 72.2 56.4 ACT Score 52.2 47.8 42.8 37.8 First -Generation 32.6 34.4 47.2 43.8 Demographic Average 74.3 73.7 69.3 53.1 Motivation Predictors Exp: Time 1 42.8 -- -- 19.4 Exp: Time 2 -- 38.2 -- 15.6 Exp: Time 3 -- -- 23.0 13.8 Value: Time 1 41.2 -- -- 16.2 Value: Time 2 -- 37.0 -- 15.4 Value: Time 3 -- -- 22.6 14.6 Motivation Average 42.0 37.6 22.8 15.8 Co-Curricular Predictors On-Campus Resi dence: Y1 26.8 -- -- 24.4 Math Tutoring: Y1 -- 32.4 -- 25.4 Math Tutoring: Y2 -- -- 25.4 24.6 Career Event: Y2 -- -- 36.4 20.8 Career Adv: Y2 -- -- 21.2 12.0 Mock Int: Y2 -- -- 9.6 8.4 Advising: Y2 -- -- 21.2 23.8 Co-Curricular Average 26.8 32.4 22.8 19.9 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of co -curricular activity engagement in the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes (--) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. C areer Adv = Career advising; Mock Int = mock interview; Advising = academic advising 108 Table 13. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 2, predicting Co -Curricular Activity Participation in Year 3 Model 2a Model 2b Model 2c Model 2d Demographic Predictors Gender 142.2 142.4 114.2 75.4 URM 61.6 66.0 69.6 58.6 ACT Score 47.4 43.4 44.4 40.6 First -Generation 35.2 35.4 47.2 43.6 Demographic Average 71.6 71.8 68.9 54.6 Motivation Predictors Exp: Time 1 43.0 -- -- 17.2 Exp: Time 2 -- 40.4 -- 15.2 Exp: Time 3 -- -- 24.8 15.6 Value: Time 1 44.4 -- -- 14.8 Value: Time 2 -- 41.4 -- 11.6 Value: Time 3 -- -- 25.2 13.8 Motivation Average 43.7 40.9 25.0 14.7 Co-Curricular Predictors On-Campus Residence: Y1 27.4 -- -- 28.2 Math Tutoring: Y1 -- 31.8 -- 28.4 Math Tutoring: Y2 -- -- 23.8 23.6 Career Event: Y2 -- -- 35.0 19.8 Career Adv: Y2 -- -- 18.8 13.0 Mock Int: Y2 -- -- 8.2 8.0 Advising: Y2 -- -- 19.4 19.8 Co-Curricular Aver age 27.4 31.8 21.0 20.1 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of co -curricular activity engagement in t he third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising 109 Table 14. Robustness Check for Variable Importance on additional imputed dataset #1: Research Question 3, predicting Post -Third -Year Persistence !!!!Model 3a Model 3b Model 3c Model 3d Model 3e Demographic Predictors Gender 359.4 396.8 313.0 302.6 145.2 URM 122.0 135.4 177.2 172.4 123.6 ACT Score 27.0 23.6 60.0 57.8 84.0 !!First -Generation 36.4 44.4 97.4 99.4 95.8 !Demographic Average 136.2 150.1 161.9 158.1 112.2 Motivation Predictors Exp: Time 1 24.0 -- -- -- 3.8 Exp: Time 2 -- 17.6 -- -- 4.8 Exp: Time 3 -- -- 10.0 -- 4.6 Exp: Time 4 -- -- -- 14.4 4.6 Value: Time 1 23.6 -- -- -- 3.2 Value: Time 2 -- 19.2 -- -- 3.2 Value: Time 3 -- -- 11.2 -- 5.0 !!Value: Time 4 -- -- -- 11.6 5.8 !Motivation Average 23.8 18.4 10.6 13.0 4.4 Co-Curricular Predictors On-Campus Residence: Y1 14.8 -- -- -- 63.2 Math Tutoring: Y1 -- 19.2 -- -- 56.0 Math Tutoring: Y2 -- -- 33.4 -- 38.0 Math Tutoring: Y3 -- -- -- 5.8 29.0 Career Event: Y2 -- -- 13.0 -- 26.4 Career Event: Y3 -- -- -- 32.8 13.6 Career Adv: Y2 -- -- 6.2 -- 20.8 Career Adv: Y3 -- -- -- 12.8 8.8 Mock Int: Y2 -- -- 1.4 -- 14.0 Mock Int: Y3 -- -- -- 4.0 5.4 Advising: Y2 -- -- 13.6 -- 5.4 !!Advising: Y3 -- -- -- 8.6 4.4 !Co-Curricular Average 14.8 19.2 13.5 12.8 23.8 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in pr edicting the outcome of persistence at the end of the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 110 Table 15. Robustness Check for Variable Importance on additional imputed dataset #2: Research Question 3, pr edicting Post -Third -Year Persistence Model 3a Model 3b Model 3c Model 3d Model 3e Demographic Predictors Gender 374.2 401.6 303.8 307.8 146.2 URM 127.6 134.0 175.2 171.8 124.4 ACT Score 24.0 23.4 57.8 55.2 85.4 First -Generation 37.6 41.0 101.4 98.8 97.0 Demographic Average 140.9 150.0 159.6 158.4 113.3 Motivation Predictors Exp: Time 1 22.2 -- -- -- 4.0 Exp: Time 2 -- 19.6 -- -- 3.8 Exp: Time 3 -- -- 12.4 -- 4.6 Exp: Time 4 -- -- -- 13.8 4.0 Value: Time 1 21.0 -- -- -- 5.2 Value: Time 2 -- 19.0 -- -- 5.6 Value: Time 3 -- -- 10.0 -- 4.4 Value: Time 4 -- -- -- 13.2 7.0 Motivation Average 21.6 19.3 11.2 13.5 4.8 Co-Curricular Predictors On-Campus Residence: Y1 14.2 -- -- -- 63.0 Math Tutoring: Y1 -- 19.8 -- -- 52.4 Math Tutoring: Y2 -- -- 30.6 -- 38.4 Math Tutoring: Y3 -- -- -- 6.0 28.2 Career Event: Y2 -- -- 11.2 -- 23.8 Career Event: Y3 -- -- -- 34.8 13.0 Career Adv: Y2 -- -- 5.8 -- 19.8 Career Adv: Y3 -- -- -- 13.2 8.0 Mock Int: Y2 -- -- 2.6 -- 13.8 Mock Int: Y3 -- -- -- 4.2 5.2 Advising: Y2 -- -- 15.0 -- 7.2 Advising: Y3 -- -- -- 5.0 3.6 Co-Curricular Average 14.2 19.8 13.0 12.6 23.0 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thu s, high numbers indicate more important variables in predicting the outcome of persistence at the end of the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 111 Table 16. Fit Statistics for Longitudinal Confirmatory Factor Analysis of Expectancy and Task -Value Measurement Invariance !!!!!!!!!"2 df CFI !CFI RMSEA !RMSEA Expectancy Configural 213.70 *** 134 .988 Ð .021 Ð Weak 232.37 *** 146 .987 -.001 .021 <.001 Strong 244.59 *** 158 .987 <.001 .021 <.001 Task -Value Configural 2619.02 *** 999 .923 Ð .035 Ð Weak 2681.80 *** 1032 .922 -.001 .035 <.001 Strong 2779.32 *** 1068 .919 -.003 .035 <.001 Note. Invariance was inferred if the change in CFI was < .01 and the change i n RM SEA was < .015; CFI = comparative fit index; RMSEA = root -mean -square error of approximation. ***p < .001 112 Table 17. Model Performance: RMSE Values , predicting Expectancy and Value at the end of Year 3 Model RMSE ! difference between this model and model 1e 1a .68 .49 1b .47 .28 1c .19 .00 1d .32 .13 1e .19 -- Note. The third column shows the difference in RMSE values compared with model 1e. Lower RMSE values are better, so a positive number here shows that a model performs more poorly than model 1e. 113 Table 18. Variance Explained for Research Question 1 Model R2 1a 0.01 1b 0.05 1c 0.08 1d 0.06 1e 0.06 114 Table 19. Variable Importance: Research Question 1, predicting Expectancy and Value at the end o f Year 3 !!!Model 1a Model 1b Model 1c Model 1d Model 1e Demographic Predictors Gender 98.6 92.0 58.6 68.0 29.2 URM 51.4 51.2 44.0 45.2 24.6 ACT Score 51.8 47.4 42.6 42.0 22.0 !!First -Generation 34.2 36.6 33.4 36.4 22.8 !Demographic Average 59.0 56.8 44.7 47.9 24.7 Motivation Predictors Exp: Time 1 45.2 -- -- -- 15.2 Exp: Time 2 -- 42.8 -- -- 14.8 Exp: Time 3 -- -- 30.0 35.8 18.4 Value: Time 1 46.0 -- -- -- 17.2 Value: Time 2 -- 41.2 -- -- 12.8 !!Value: Time 3 -- -- 33.8 27.2 14.8 !Motiv ation Average 45.6 42.0 31.9 31.5 15.5 Co-Curricular Predictors On-Campus Residence: Y1 29.2 -- -- -- 20.6 Math Tutoring: Y1 -- 40.4 -- -- 18.0 Math Tutoring: Y2 -- -- 23.4 -- 17.0 Math Tutoring: Y3 -- -- -- 14.0 11.4 Career Event: Y2 -- -- 30.6 -- 17.8 Career Event: Y3 -- -- -- 36.8 14.4 Career Adv: Y2 -- -- 22.0 -- 11.6 Career Adv: Y3 -- -- -- 21.0 10.4 Mock Int: Y2 -- -- 9.4 -- 7.2 Mock Int: Y3 -- -- -- 10.2 5.4 Advising: Y2 -- -- 28.8 -- 11.8 !!Advising: Y3 --!--!""!!17.0 13.6 !Co-Curricular Average 29.2 40.4 22.8 !19.8 13.3 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of expectancy and v alue in the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Val ue = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 115 Table 20. Model Performance: RMSE Values for Research Question 2, predicting Co -Curricular Activity Participation in Year 3 Model RMSE ! difference between this model and model 2d 2a 0.43 -0.05 2b 1.64 1.16 2c 1.67 1.19 2d 0.48 -- Note. The third column shows the difference in RMSE values compared with model 2d. Lower RMSE values are better, so a negative number here shows that a mod el performs better than model 2d. 116 Table 21. Variance Explained for Research Question 2 Model R2 2a 0.03 2b 0.03 2c 0.08 2d 0.07 117 Table 22. Variable Importance: Research Question 2, predicting Co -Curricular Activity Participation in Year 3 Model 2a Model 2b Model 2c Model 2d Demographic Predictors Gender 147.2 143.4 117.8 76.4 URM 61.8 60.8 71.4 60.6 ACT Score 49.0 44.8 46.0 43.8 First -Generation 35.4 36.2 49.0 42.6 Demographic Average 73.4 71.3 71.1 55.9 Motivation Predictors Exp : Time 1 38.6 -- -- 15.0 Exp: Time 2 -- 40.2 -- 15.8 Exp: Time 3 -- -- 25.0 12.4 Value: Time 1 42.0 -- -- 14.0 Value: Time 2 -- 38.8 -- 15.6 Value: Time 3 -- -- 23.4 12.6 Motivation Average 40.3 39.5 24.2 14.2 Co-Curricular Predictors On-Camp us Residence: Y1 23.4 -- -- 28.0 Math Tutoring: Y1 -- 37.2 -- 22.4 Math Tutoring: Y2 -- -- 24.4 23.8 Career Event: Y2 -- -- 32.0 20.0 Career Adv: Y2 -- -- 20.6 15.2 Mock Int: Y2 -- -- 8.4 8.4 Advising: Y2 -- -- 19.8 19.8 Co-Curricular Averag e 23.4 37.2 21.0 19.7 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high numbers indicate more important variables in predicting the outcome of co -curricular activity engagement in the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes (--) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task v alue. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 118 Table 23. Model Performance: AUC for Research Question 3, predicting Post -Third -Year Persistence in Engineering Model Persistence ! AUC 3a .56 -- 3b .61 -.21 3c .72 -.16 3d .75 -.05 3e .77 -.02 Note. AUC is the area under the ROC curve; it ranges from 0 Ð 1. A value of .5 would indicate chance, closer to 1 is better. Below 0.5 would indicate worse performance than chance levels. The third column, ! AUC, indicates the relative increase or decrease in AUC over the prior model. 119 Table 24. Variance Explained for Research Question 3 Model R2 3a 0.01 3b 0.04 3c 0.14 3d 0.18 3e 0.21 120 Table 25. Variable Importance: Research Question 3, predicting Pos t-Third -Year Persistence Model 3a Model 3b Model 3c Model 3d Model 3e Demographic Predictors Gender 362.0 394.4 307.0 302.0 148.8 URM 123.4 138.8 182.0 178.6 121.4 ACT Score 24.0 22.8 59.2 55.0 85.8 First -Generation 42.4 39.6 103.0 94.4 98.6 Demographic Average 138.0 148.9 162.8 157.5 113.7 Motivation Predictors Exp: Time 1 24.0 -- -- -- 4.2 Exp: Time 2 -- 18.6 -- -- 4.2 Exp: Time 3 -- -- 12.8 -- 3.4 Exp: Time 4 -- -- -- 12.2 5.8 Value: Time 1 24.0 -- -- -- 6.0 Value: Time 2 -- 18.8 -- -- 5.2 Value: Time 3 -- -- 10.6 -- 4.2 Value: Time 4 -- -- -- 14.2 4.8 Motivation Average 24.0 18.7 11.7 13.2 4.7 Co-Curricular Predictors On-Campus Residence: Y1 14.0 -- -- -- 61.0 Math Tutoring: Y1 -- 17.0 -- -- 55.0 Math Tutoring: Y2 -- -- 32.0 -- 35.0 Math Tutoring: Y3 -- -- -- 5.6 28.6 Career Event: Y2 -- -- 11.6 -- 24.2 Career Event: Y3 -- -- -- 34.0 12.6 Career Adv: Y2 -- -- 6.8 -- 19.6 Career Adv: Y3 -- -- -- 11.4 6.4 Mock Int: Y2 -- -- 1.8 -- 13.0 Mock Int: Y3 -- -- -- 4.2 5.2 Advising: Y2 -- -- 13.6 -- 7.0 Advising: Y3 -- -- -- 5.8 5.2 Co-Curricular Average 14.0 17.0 13.2 12.2 22.7 Note: The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high number s indicate more important variables in predicting the outcome of persistence at the end of the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 121 Figure 1. Conceptual Model 122 Figure 2. Missing Data Visualizatio n 123 01FEMALEFEMALE1234501URMURM1234501FIRSTGENFIRSTGEN12345152025303512345Max_ACT_Composite Max_ACT_Composite 02468101212345center_event_yr2 center_event_yr2 0246812345center_adv_yr2 center_adv_yr20.00.20.40.60.81.012345mock_yr2 mock_yr202040608012012345mlc_yr2mlc_yr20246812345adv_yr2 adv_yr21234512345V3_ESEV3_ESE1234512345V3_Val V3_Val Figure 3. Partial Dependence Plots for Model 1c for the Outcome Engineering Expectancy Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center event in year 2; center_ adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. 124 01FEMALEFEMALE1234501URMURM1234501FIRSTGENFIRSTGEN12345152025303512345Max_ACT_Composite Max_ACT_Composite 02468101212345center_event_yr2 center_event_yr2 0246812345center_adv_yr2 center_adv_yr20.00.20.40.60.81.012345mock_yr2 mock_yr202040608012012345mlc_yr2mlc_yr20246812345adv_yr2 adv_yr21234512345V3_ESEV3_ESE1234512345V3_Val V3_Val Figure 4. Partial Dependence Plots for Model 1c for the Outcome Task -Value Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority stu dent; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Exp ectancy; Val = Value. 125 01FEMALEFEMALE1234501URMURM1234501FIRSTGENFIRSTGEN12345152025303512345Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 1234502040608012345mlc_yr1mlc_yr10408012012345mlc_yr2mlc_yr202040608012345mlc_yr3mlc_yr3024681212345center_event_yr2 center_event_yr2 0246812345center_adv_yr2 center_adv_yr20.00.40.812345mock_yr2 mock_yr2024681212345center_event_yr3 center_event_yr3 0123456712345center_adv_yr3 center_adv_yr30.00.51.01.52.012345mock_yr3 mock_yr30246812345adv_yr2 adv_yr201234512345adv_yr3 adv_yr31234512345V1_ESEV1_ESE1234512345V1_Val V1_Val 1234512345V2_ESEV2_ESE1234512345V2_Val V2_Val 1234512345V3_ESE1234512345V3_Val Figure 5. Partial Dependence Plots for Model 1e for the Outcome Engineering Expectancy Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y-axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center eve nt in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. 126 01FEMALEFEMALE1234501URMURM1234501FIRSTGENFIRSTGEN12345152025303512345Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 1234502040608012345mlc_yr1mlc_yr10408012012345mlc_yr2mlc_yr202040608012345mlc_yr3mlc_yr3024681212345center_event_yr2 center_event_yr2 0246812345center_adv_yr2 center_adv_yr20.00.40.812345mock_yr2 mock_yr2024681212345center_event_yr3 center_event_yr3 0123456712345center_adv_yr3 center_adv_yr30.00.51.01.52.012345mock_yr3 mock_yr30246812345adv_yr2 adv_yr201234512345adv_yr3 adv_yr31234512345V1_ESEV1_ESE1234512345V1_Val V1_Val 1234512345V2_ESEV2_ESE1234512345V2_Val V2_Val 1234512345V3_ESE1234512345V3_Val Figure 6. Partial Dependence Plots for Model 1e for the Outcome Task -Value Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academi c advising; V3_ESE = Time 3 Expectancy; Val = Value. 127 01FEMALEFEMALE0.00.51.01.501URMURM0.00.51.01.501FIRSTGENFIRSTGEN0.00.51.01.515202530350.00.51.01.52.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.51.01.5123450.00.51.01.52.0V1_ESEV1_ESE123450.00.51.01.52.0V1_Val V1_Val Figure 7. Partial Dependence Plots for Model 2a for the Outcome Advising Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = res idence in an engineering residence hall in the first year; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to advising, according to the model. 128 01FEMALEFEMALE0.00.51.01.52.02.53.001URMURM0.00.51.01.52.02.53.001FIRSTGENFIRSTGEN0.00.51.01.52.02.53.015202530350.00.51.01.52.02.53.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.51.01.52.02.53.0123450.00.51.01.52.02.53.0V1_ESEV1_ESE123450.00.51.01.52.02.53.0V1_Val V1_Val Figure 8. Partial Dependence Plots for Model 2a for the Outcome Career Event Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMAL E = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; V3_ESE = Time 3 Expectancy; Val = Value. The de pendent variable is the number of times a student would go to a career event, according to the model. 129 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.20.40.60.81.0123450.00.20.40.60.81.0V1_ESEV1_ESE123450.00.20.40.60.81.0V1_Val V1_Val Figure 9. Partial Dependence Plots for Model 2a for the Outcome Career Advising Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max _ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to career advising, according to t he model. 130 01FEMALEFEMALE0.000.050.100.150.200.250.3001URMURM0.000.050.100.150.200.250.3001FIRSTGENFIRSTGEN0.000.050.100.150.200.250.3015202530350.000.050.100.150.200.250.30Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.000.050.100.150.200.250.30123450.000.050.100.150.200.250.30V1_ESEV1_ESE123450.000.050.100.150.200.250.30V1_Val V1_Val Figure 10. Partial Dependence Plots for Model 2a for the Outcome Mock Interviews Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label c orresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to a mock interview, according to the model. 131 01FEMALEFEMALE024681001URMURM024681001FIRSTGENFIRSTGEN024681015202530350246810Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0246810123450246810V1_ESEV1_ESE123450246810V1_Val V1_Val Figure 11. Partial Dependence Plots for Model 2a for the Outcome Math Tutoring Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; V1_ESE = Time 1 expectancy; Val = Value. The dependent variable is the number of times a student would go to math tutoring, according to the model. 132 01FEMALEFEMALE0.00.51.01.501URMURM0.00.51.01.501FIRSTGENFIRSTGEN0.00.51.01.515202530350.00.51.01.52.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.51.01.50204060800.00.51.01.52.0mlc_yr1mlc_yr1040801200.00.51.01.52.0mlc_yr2mlc_yr2024680.00.51.01.52.0adv_yr2 adv_yr202468120.00.51.01.52.0center_event_yr2 center_event_yr2 024680.00.51.01.52.0center_adv_yr2 center_adv_yr20.00.40.80.00.51.01.52.0mock_yr2 mock_yr2123450.00.51.01.52.0V1_ESEV1_ESE123450.00.51.01.52.0V1_Val V1_Val 123450.00.51.01.52.0V2_ESEV2_ESE123450.00.51.01.52.0V2_Val V2_Val 123450.00.51.01.52.0V3_ESE123450.00.51.01.52.0V3_Val Figure 12. Partial Dependence Plots for Model 2d for the Outcome Advising Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Compos ite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academi c advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to advising, according to the model. 133 01FEMALEFEMALE012345601URMURM012345601FIRSTGENFIRSTGEN012345615202530350123456Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 01234560204060800123456mlc_yr1mlc_yr1040801200123456mlc_yr2mlc_yr2024680123456adv_yr2 adv_yr202468120123456center_event_yr2 center_event_yr2 024680123456center_adv_yr2 center_adv_yr20.00.40.80123456mock_yr2 mock_yr2123450123456V1_ESEV1_ESE123450123456V1_Val V1_Val 123450123456V2_ESEV2_ESE123450123456V2_Val V2_Val 123450123456V3_ESE123450123456V3_Val Figure 13. Partial Dependence Plots for Model 2d for the Outcome Career Event Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underreprese nted minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to a career event, according to the model. 134 01FEMALEFEMALE0.00.20.40.60.81.01.21.401URMURM0.00.20.40.60.81.01.21.401FIRSTGENFIRSTGEN0.00.20.40.60.81.01.21.415202530350.00.51.01.5Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.20.40.60.81.01.21.40204060800.00.51.01.5mlc_yr1mlc_yr1040801200.00.51.01.5mlc_yr2mlc_yr2024680.00.51.01.5adv_yr2 adv_yr202468120.00.51.01.5center_event_yr2 center_event_yr2 024680.00.51.01.5center_adv_yr2 center_adv_yr20.00.40.80.00.51.01.5mock_yr2 mock_yr2123450.00.51.01.5V1_ESEV1_ESE123450.00.51.01.5V1_Val V1_Val 123450.00.51.01.5V2_ESEV2_ESE123450.00.51.01.5V2_Val V2_Val 123450.00.51.01.5V3_ESE123450.00.51.01.5V3_Val Figure 14. Partial Dependence Plots for Model 2d for the Outcome Career Advising Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to career advising, according to the model. 135 01FEMALEFEMALE0.000.050.100.150.200.250.3001URMURM0.000.050.100.150.200.250.3001FIRSTGENFIRSTGEN0.000.050.100.150.200.250.3015202530350.000.050.100.150.200.250.30Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.000.050.100.150.200.250.300204060800.000.050.100.150.200.250.30mlc_yr1mlc_yr1040801200.000.050.100.150.200.250.30mlc_yr2mlc_yr2024680.000.050.100.150.200.250.30adv_yr2 adv_yr202468120.000.050.100.150.200.250.30center_event_yr2 center_event_yr2 024680.000.050.100.150.200.250.30center_adv_yr2 center_adv_yr20.00.40.80.000.050.100.150.200.250.30mock_yr2 mock_yr2123450.000.050.100.150.200.250.30V1_ESEV1_ESE123450.000.050.100.150.200.250.30V1_Val V1_Val 123450.000.050.100.150.200.250.30V2_ESEV2_ESE123450.000.050.100.150.200.250.30V2_Val V2_Val 123450.000.050.100.150.200.250.30V3_ESE123450.000.050.100.150.200.250.30V3_Val Figure 15. Partial Dependence Plots for Model 2d for the Outcome Mock Interviews Note. The independent variable (predictor) is on the x -axis, and th e dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT sco re; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V 3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to mock interviews, according to the model. 136 01FEMALEFEMALE01020304001URMURM01020304001FIRSTGENFIRSTGEN0102030401520253035010203040Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 010203040020406080010203040mlc_yr1mlc_yr104080120010203040mlc_yr2mlc_yr202468010203040adv_yr2 adv_yr20246812010203040center_event_yr2 center_event_yr2 02468010203040center_adv_yr2 center_adv_yr20.00.40.8010203040mock_yr2 mock_yr212345010203040V1_ESEV1_ESE12345010203040V1_Val V1_Val 12345010203040V2_ESEV2_ESE12345010203040V2_Val V2_Val 12345010203040V3_ESE12345010203040V3_Val Figure 16. Partial Dependence Plots for Model 2d for the Outcome Math Tutoring Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented m inority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advisi ng; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is the number of times a student would go to math tutoring, according to the model. 137 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.20.40.60.81.0123450.00.20.40.60.81.0V1_ESEV1_ESE123450.00.20.40.60.81.0V1_Val V1_Val Figure 17. Partial Dependen ce Plots for Model 3a for the Outcome Persistence Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigat ed in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; V1_ESE = Time 1 Expectanc y; Val = Value. The dependent variable is whether the model predicts the student would persist (1) or leave (0). 138 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 0204060800.00.20.40.60.81.0mlc_yr1mlc_yr1123450.00.20.40.60.81.0V2_ESEV2_ESE123450.00.20.40.60.81.0V2_Val V2_Val Figure 18. Partial Dependence Plots for Model 3b for the Outcome Persistence Note. The independent variable ( predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation s tudent; Max_ACT_Composite = ACT score; mlc_yr1 = math tutoring center in year 1; V2_ESE = Time 2 Expectancy; Val = Value. The dependent variable is whether the model predicts the student would persist (1) or leave (0). 139 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 040801200.00.20.40.60.81.0mlc_yr2mlc_yr2024680.00.20.40.60.81.0adv_yr2 adv_yr202468120.00.20.40.60.81.0center_event_yr2 center_event_yr2 024680.00.20.40.60.81.0center_adv_yr2 center_adv_yr20.00.40.80.00.20.40.60.81.0mock_yr2 mock_yr2123450.00.20.40.60.81.0V3_ESEV3_ESE123450.00.20.40.60.81.0V3_Val V3_Val Figure 19. Partial Dependence Plots for Model 3c for the Outcome Persistence Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investig ated in that plot. FEMALE = gender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; m lc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is whether the model predicts the student would persist (1) or leave (0). 140 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 02468120.00.20.40.60.81.0center_event_yr3 center_event_yr3 012345670.00.20.40.60.81.0center_adv_yr3 center_adv_yr30.00.51.01.52.00.00.20.40.60.81.0mock_yr3 mock_yr30204060800.00.20.40.60.81.0mlc_yr3mlc_yr30123450.00.20.40.60.81.0adv_yr3 adv_yr3123450.00.20.40.60.81.0V4_ESEV4_ESE123450.00.20.40.60.81.0V4_Val V4_Val Figure 20. Partial Dependence Plots for Model 3d for the Outcome Persistence Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = g ender; URM = underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; ad v = academic advising; V4_ESE = Time 4 Expectancy; Val = Value. The dependent variable is whether the model predicts the student would persist (1) or leave (0). 141 01FEMALEFEMALE0.00.20.40.60.81.001URMURM0.00.20.40.60.81.001FIRSTGENFIRSTGEN0.00.20.40.60.81.015202530350.00.20.40.60.81.0Max_ACT_Composite Max_ACT_Composite 01SouthcomplexBinary_2015 SouthcomplexBinary_2015 0.00.20.40.60.81.00204060800.00.20.40.60.81.0mlc_yr1mlc_yr1040801200.00.20.40.60.81.0mlc_yr2mlc_yr20204060800.00.20.40.60.81.0mlc_yr3mlc_yr302468120.00.20.40.60.81.0center_event_yr2 center_event_yr2 024680.00.20.40.60.81.0center_adv_yr2 center_adv_yr20.00.40.80.00.20.40.60.81.0mock_yr2 mock_yr202468120.00.20.40.60.81.0center_event_yr3 center_event_yr3 012345670.00.20.40.60.81.0center_adv_yr3 center_adv_yr30.00.51.01.52.00.00.20.40.60.81.0mock_yr3 mock_yr3024680.00.20.40.60.81.0adv_yr2 adv_yr20123450.00.20.40.60.81.0adv_yr3 adv_yr3123450.00.20.40.60.81.0V1_ESEV1_ESE123450.00.20.40.60.81.0V1_Val V1_Val 123450.00.20.40.60.81.0V2_ESEV2_ESE123450.00.20.40.60.81.0V2_Val V2_Val 123450.00.20.40.60.81.0V3_ESE123450.00.20.40.60.81.0V3_Val 123450.00.20.40.60.81.0V4_ESE123450.00.20.40.60.81.0V4_Val Figure 21. Partial Dependence Plots for Model 3e for the Outcome Persistence Note. The independent variable (predictor) is on the x -axis, and the dependent variable (outcome) is on the y -axis. Each plotÕs label corresponds to the independent variable investigated in that plot. FEMALE = gender; U RM = 142 underrepresented minority student; FIRSTGEN = first -generation student; Max_ACT_Composite = ACT score; SouthcomplexBinary_2015 = residence in an engineering residence hall in the first year; center_event_yr2 = career center event in year 2; center_adv = career center advising; mock = mock interview; mlc = math tutoring center; adv = academic advising; V3_ESE = Time 3 Expectancy; Val = Value. The dependent variable is whether the model predicts the student would persist (1) or leave (0). 143 APPENDIX 144 Variable Importance for the only Students with Complete Survey Data: Model 3e. !!!!Model 3e Demographic Predictors Gender 188.6 URM 159.4 ACT Score 108.2 !!First -Generation 132.8 !Demographic Average 147.3 Motivation Predictors Exp: Time 1 6.2 Exp: Time 2 4.0 Exp: Time 3 5.0 Exp: Time 4 4.2 Value: Time 1 4.0 Value: Time 2 6.6 Value: Time 3 5.4 !!Value: Time 4 5.6 !Motivation Average 5.1 Co-Curricular Predictors On-Campus Residence: Y1 87.0 Math Tutoring: Y1 70.4 Math Tutoring: Y2 48.6 Math Tutoring: Y3 35.8 Career Event: Y2 31.6 Career Event: Y3 14.2 Career Adv: Y2 25.2 Career Adv: Y3 11.2 Mock Int: Y2 18.4 Mock Int: Y3 6.2 Advising: Y2 6.8 !!Advising: Y3 6.2 !Co-Curricula r Average 30.1 Note: This table is analogous to Table 23, but it only contains data from N = 247 students who completed all four surveys. The table shows the variable importance scores, with a high number indicating a more important variable. Thus, high n umbers indicate more important variables in predicting the outcome of persistence at the end of the third year. The lowest number indicates the variable selected as a branching point for maximum classification the fewest times. Double dashes ( --) indicate that a given variable was not included in that model. Y1 = Year 1. Exp = Expectancy; Value = Task value. Career Adv = Career advising; Mock Int = mock interview; Advising = academic advising. 145 REFERENCES 146 REFEREN CES Andersen, L., & Ward, T. J. (2014). Expectancy -value models for the STEM persistence plans of ninth -grade, high -ability students: A comparison between Black, Hispanic, and White students. Science Education, 98, 216Ð242. doi: 10.1002/sce.21092 Ball, C ., Huang, K. -T., Cotten, S. R., Rikard, R. V., & Coleman, L. O. (2016). Invaluable values: an expectancy -value theory analysis of youthsÕ academic motivations and intentions. Information, Communication & Society, 19 , 618Ð638. doi: 10.1080/1369118X.2016.113 9616 Battle, E. S. (1966). Motivational determinants of academic competence. Journal of Personality and Social Psychology, 4 , 634Ð642. doi: 10.1037/h0024028. Baum, S. (2015). The federal Pell grant program and reauthorization of the Higher Education Act. Journal of Student Financial Aid, 45, 23 - 34. Retrieved from https://publications.nasfaa.org/jsfa/vol45/iss3/4 Bean, J. P. (1982). Student attrition, intentions, and confidence: Interaction effects in a path model. Research in Higher Education , 17, 291-320. doi: 10.1007/bf00977899 Beasley, M. A., & Fischer, M. J. (2012). Why they leave: The impact of stereotype threat on the attrition of w omen and minorities from science, math and engineering majors. Social Psychology of Education, 15, 427-448. doi:10.1007/s11218 -012-9185-3 Beck, N., King, G., & Zeng, L. (2000). Improving quantitative studies of international conflict: A conjecture. America n Political Science Review, 94, 21 Ð35. doi: 10.1017/S0003055400220078 Bell, A. E., Spencer, S. J., Iserman, E., & Logel, C. E. R. (2003). Stereotype threat and womenÕs performance in engineering. Journal of Engineering Education, 92 , 307-312. doi: 10.1002/j.2168 -9830.2003.tb00774.x Berger, J. B. (1997). StudentsÕ sense of community in residence halls, social integration, and first -year persistence. Journal of College Student Development , 38, 441 - 452. Besterfield -Sacre, M., Atman, C. J., & Shuman, L. J. ( 1997). Characteristics of freshman engineering students: Models for determining student attrition in engineering. Journal of Engineering Education, 86 , 139-149. doi: 10.1002/j.2168 -9830.1997.tb00277.x Bong, M. (2001). Role of self -efficacy and task -value in predicting college students' course performance and future enrollment intentions. Contemporary Educational Psychology, 26, 553-570. doi:10.1006/ceps.2000.1048 Bovee, E. (2017). Belonging as a predictor of motivation and persistence in engineering majors . Research Practicum. 147 Brainard, S., & Carlin, L. (1998). A six -year longitudinal study of undergraduate women in engineering and science. Journal of Engineering Edcuation, 87, 369-375. http s://doi.org/10.1002/j.2168 -9830.1998.tb00367.x Braxton, J. M., & McClendon, S. A. (2001). The fostering of social integration and retention through institutional practice. Journal of College Student Retention, 3 , 57Ð71. Doi: 10.2190/RGXJ -U08C-06VB-JK7D Breiman, L. (2001). Random forests. Machine Learning , 45, 5Ð32. doi:10.1023/A:1010933404324 Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14 , 464-504. doi:10.1080/10705510701301834 Chen, X., & Carroll, C. D. & National Center for Education Statistics (2005). First -generation students in postsecondary education: A look at their college transcripts. Postsecondary Education Descriptive Analysis Report. NCE S 2005 -171. Chen, X., & Soldner, M. (2013) STEM Attrition: College studentsÕ paths into and out of STEM fields. Statistical Analysis Report. National Center for Education Statistics. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness -of-fit ind exes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9 , 233-255. doi:10.1207/S15328007SEM0902_5 Conley, A. (2012). Patterns of motivation beliefs: Combining achievement goal and expectancy -value perspectives. Journal of Educational Psychology, 104 , 32-47. doi:10.1037/a0026042 Correll, S. J. (2001). Gender and the career choice process: The role of biased self -assessments. American Journal of Sociology, 106, 1691Ð1730. Crisp, G., Nora, A., & Taggart, A. (2009) . Student characteristics, pre -college, college, and environmental factors as predictors of majoring in and earning a STEM degree: An analysis of students attending a Hispanic serving institution. American Educational Research Journal,46 (4), 924-942. doi:1 0.3102/0002831209349460 Cromley, J. G., Perez, T., & Kaplan, A. (2016). Undergraduate STEM achievement and retention: Cognitive, motivational, and institutional factors and solutions. Policy Insights from the Behavioral and Brain Sciences, 3, 4Ð11. doi: 1 0.1177/2372732215622648 DeBoer, G. (1986). Perceived science ability as a factor in the course selections of men and women in college. Journal of Research in Science Teaching, 23 , 343 Ð 352. Donaldson, K., and Sheppard, S. 2007. Exploring the not -so-talked about undergraduate pathway: Migrating into engineering. In Proceedings for the International Conference on Research in Engineering Education. Honolulu, HI. 148 Eccles, J. S. (2005). Subjective task value and the Eccles et al. model of achievement -related cho ices. In A. S. Elliot & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 105Ð121). New York: The Guildford Press. Eccles, J. S. (2009). Who am I and what am I going to do with my life? Personal and collective identities as motivators of acti on. Educational Psychologist, 44, 78-89. doi:10.1080/00461520902832368 Eccles (Parsons), J., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., Meece, J. L., & Midgley, C. (1983). Expectancies, values, and academic behaviors. In J. T. Spence (Ed.), Achievement and achievement motivation, (pp. 75-146). San Francisco: W. H. Freeman. Elliott, R., Strenta, C., Adair, R., Matier, M., & Scott, J. (1996). The role of ethnicity in choosing and leaving science in highly selective institutions. Research in Hig her Education, 37 , 681Ð709. Engstrom, C., & Tinto, V. (2008). Access without support is not opportunity. Change: The Magazine of Higher Learning , 40, 46Ð50. https://doi.org/10.3200/CHNG.40.1.4 6-50 Estrada, M., Woodcock, A., Hernandez, P. R., & Schultz, P. W. (2011). Toward a model of social influence that explains minority student integration into the scientific community. Journal of Educational Psychology, 103 , 206Ð222. doi: 10.1037/a0020743 Ethington, C. A. (1990). A psychological model of student persistence. Research in Higher Education, 31 , 279-293. doi:10.1007/BF00992313 Fidler, P., & Moore, P. (1996). A comparison of effects of campus residence and freshman seminar attendance on freshma n dropout rates. Journal of The First -Year Experience & Students in Transition , 8, 7-16. Fishman, B. J., Penuel, W. R., Allen, A. R., Cheng, B. H., & Sabelli, N. O. R. A. (2013). Design -based implementation research: An emerging model for transforming the relationship of research and practice. National Society for the Study of Education, 112 , 136-156. Fredricks, J. A., & Eccles, J. S. (2002). Children's competence and value beliefs from childhood through adolescence: Growth trajectories in two male -sex-type d domains. Developmental Psychology, 38 , 519-533. doi:10.1037/0012 -1649.38.4.519 Friedlander, L. J., Reid, G. J., Cribbie, R., & Shupak, N. (2007). Social support, self -esteem, and stress as predictors of adjustment to university among first -year undergra duates. Journal of College Student Development, 48 , 259-274. doi: 10.1353/csd.2007.0024 Frome, P. M., & Eccles, J. S. (1998). Parents' influence on children's achievement -related perceptions. Journal of Personality and Social Psychology, 74 , 435-452. doi:1 0.1037/0022 -3514.74.2.435 Geisinger, B., & Raman, D. R. (2013). Why they leave: Understanding student attrition from engineering majors. International Journal of Engineering Education, 29, 914Ð925. Good, J., Halpin, G., & Halpin, G. (2002). Retaining bla ck students in engineering: Do minority 149 programs have a longitudinal impact? Journal of College Student Retention, 3 , 351-364. doi: 10.2190/a0eu -tf7u -ruyn-584x Grace -Martin, K. (2019) Assessing the fit of regression models. Retrieved from https://www.thea nalysisfactor.com/assessing -the -fit -of-regression -models/) Grillo, M. C., & Leist, C. W. (2013). Academic support as a predictor of retention to graduation: New insights on the role of tutoring, learning assistance, and supplemental instruction. Journal o f College Student Retention: Research, Theory & Practice , 15, 387Ð408. doi: 10.2190/CS.15.3.e Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects and the effects o f heterogeneous treatments with ensemble methods. Political Analysis , 25, 413Ð434. https://doi.org/10.1017/pan.2017.15 Haag, S., Hubele, N., Garcia, A., & McBeath, K. (2007). Engineering under graduate attrition and contributing factors. International Journal of Engineering Education, 23, 929-940. Halcrow, C., & Iiams, M. (2011). You can build it, but will they come? PRIMUS: Problems, Resources, and Issues in Mathematics Undergraduate Studies, 2 1, 323Ð337. doi: 10.1080/10511970903164148 Hare, C. (2018). Machine learning: Applications and opportunities in social science research. In ICPSR Summer Program in Quantitative Methods of Social Research. Ann Arbor, MI. Hernandez, P. R., Schultz, P., Estra da, M., Woodcock, A., & Chance, R. C. (2013). Sustaining optimal motivation: A longitudinal analysis of interventions to broaden participation of underrepresented students in STEM. Journal of Educational Psychology , 105, 89 - 107. doi: 10.1037/a0029691 Hilpert, J. C., & Marchand, G. C. (2018). Complex systems research in educational psychology: Aligning theory and method. Educational Psychologist , 53, 185-202. doi: 10.1080/00461520.2018.1469411 Hinds, T. J., Walton, S. P., Urban -Lurain, M., & Briedis, D. (2014). Influence of Integrated Academic and Co -Curricular Activities On First -Year Student Success. Proceedings of the 2014 ASEE Annual Conference & Exposition , 24.743.1-24.743.20. Hodges, R., & White, W. G. (2001). Encouraging high -risk student participation in tutoring and supplemental instruction. Journal of Developmental Education, 24 , 2Ð8. Holloway, B. M., Reed, T., Imbrie, P. K., & Reid, K. (2014). Research -informed policy change: A retrospe ctive on engineering admissions. Journal of Engineering Education , 103, 274Ð301. https://doi.org/10.1002/jee.20046 Honaker, J., King, G., & Blackwell, M. (2012). Amelia II: A Program for Missing Data. URL: https://r.iq.harvard.edu/docs/amelia/amelia.pdf 150 Horn, L. & Premo, M. (1995). Profile of Undergraduates in U.S. Postsecondary Institutions . Washington, DC: National Center for Education Statistics. Humes, K. R., Jones, N. A., & Ramirez, R. R. (2011). Overview of Race and Hispanic Origin, 2010: 2010 Census Briefs. https://www.census.gov/prod/cen2010/briefs/c2010br -02.pdf Hutchison -Green, M. A., Follman, D. K., & Bodner, G. M. (2008). Pr oviding a voice: Qualitative investigation of the impact of a first -year engineering experience on studentsÕ efficacy beliefs. Journal of Engineering Education, 97 , 177-190 doi: 10.1002/j.2168 -9830.2008.tb00966.x Ireland, G. W., & Lent, R. W. (2018). Care er exploration and decision -making learning experiences: A test of the career self -management model. Journal of Vocational Behavior , 106, 37Ð47. doi: 10.1016/j.jvb.2017.11.004 Ishitani, T. T. (2003) . A longitudinal approach to assessing attrition behavior among first -generation students: Time -varying effects of pre -college characteristics. Research in Higher Education, 44 , 433-449. doi: 10.1353/jhe. 2006.0042 Jacobs, J. E., Lanza, S., Osgood, D. W., E ccles, J. S., & Wigfield, A. (2002). Changes in childrenÕs self #competence and values: Gender and domain differences across grades one through twelve. Child Development, 73 , 509Ð527. http://doi.org/10.1111/1467 -8624.00421 James, G., Witten, D., Hastie, T., & Tibsh irani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer. Jones, M. H., Audley -Piotrowski, S. R., & Kiefer, S. M. (2012). Relationships among adolescentsÕ perceptions of friendsÕ behaviors, academic self -concept, and math performance. J ournal of Educational Psychology, 104, 19-31. doi: 10.1037/a0025596 Jones, B., Paretti, M., Hein, S., & Knott, T. (2010). An analysis of motivation constructs with first -year engineering students: Relationships among expectancies, v alues, achievement, and career plans. Journal of Engineering Education, 99, 319 -336. doi:10.1002/j.2168 -9830.2010.tb01066.x Kaplan, A., & Garner, J. K. (2017). A complex dynamic systems perspective on identity and its development: The dynamic systems model of role identity. Developmental Psychology, 53, 2036-2051. doi: 10.1037/dev0000339 Kaplan, A., Katz, I., & Flum, H. (2012). Motivation theory in educational practice: Knowledge claims, challeng es, and future directions . In K. R. Harris, S. G. Graham, & T. Urdan (Eds.), APA Educational Psychology Handbook Vol. 2: Individual differences, cultural considerations, and contextual factors in educational psychology (Ch. 7, pp. 165 -194). Washington, DC: American Psychological Association. Karabenick, S. A. & Knapp, J. R., (1988). Help seeking and the need for academic assistance. 151 Journal of Educational Psychology, 80, 406-498. doi: 10.1037/0022 -0663.80.3.406 Karabenick, S. A., & Knapp, J. R. (1991). R elationship of academic help seeking to the use of learning strategies and other instrumental achievement behavior in college students. Journal of Educational Psychology, 83, 221-230. doi: 10.1037/0022 -0663.83.2.221 Kessels, U., & Steinmayr, R. (2013). Mac ho-man in school: Toward the role of gender role self -concepts and help seeking in school performance. Learning and Individual Differences , 23, 234Ð240. doi: 10.1016/j.lindif.2012.09.013 Kitsanta s, A., & Chow, A. (2007). College studentsÕ perceived threat and preference for seeking help in traditional, distributed, and distance learning environments. Computers & Education, 48 , 383Ð395. doi: 10.1016/j.compedu. 2005.01.008 Kosovich, J. J., Flake, J. K., & Hulleman, C. S. (2017). Short -term motivation trajectories: A parallel process model of expectancy -value. Contemporary Educational Psychology, 49, 130-139. doi:10.1016/j.cedpsych.2017.01.004 Kuh, G. D. (1995). The other curriculum: Out -of-class exper iences associated with student learning and personal development. The Journal of Higher Education, 66 , 123-155. Kuh, G. D., Cruce, T. M., Shoup, R., Kinzie, J., & Gonyea, R. M. (2008). Unmasking the effects of student engagement on first -year college grade s and persistence. The Journal of Higher Education, 79 , 540Ð563. doi: 10.1353/jhe.0.0019 Langdon, D., McKittrick, G., Beede, D., Khan, B., & Doms, M. (2011). STEM: Good jobs now and for the future. ESA Issue Brief# 03 -11. US Department of Commerce . Laske y, M. L., & Hetzel, C. J. (2011). Investigating factors related to retention of at -risk college students. Learning Assistance Review, 16 , 31Ð43. Lent, R. W., Brown, S. D., Brenner, B., Chopra, S. B., Davis, T., Talleyrand, R., & Suthakaran, V. (2001). The role of contextual supports and barriers in the choice of math/science educational options: A test of social cognitive hypotheses. Journal of Counseling Psychology , 48, 474Ð483. Lent, R. W., Brown, S. D., Schmidt, J., Brenner, B., Lyons, H., & Treistman, D . (2003). Relation of contextual supports and barriers to choice behavior in engineering majors: Test of alternative social cognitive models. Journal of Counseling Psychology, 50, 458-465. Doi: 10.1037//0022 -0167.48.4.474 Linnenbrink -Garcia, L., Wormington , S. V., Snyder, K. E., Riggsbee, J., Perez, T., Ben -Eliyahu, A., & Hill, N. E. (2018). Multiple pathways to success: An examination of integrative motivational profiles among upper elementary and college students. Journal of Educational Psychology , 110, 1026-1048. doi: 10.1037/edu0000245 Litzler, E., & Young, J. (2012). Understanding the risk of attrition in undergraduate engineering: 152 Results from the project to assess climate in engineering. Journal of Engineering Education, 101, 319Ð345. doi: 10.1002/j.2168 -9830.2012.tb00052.x Lowe, A., & Toney, M. (2000). Academic advising: Views of the givers and takers. Journal of College Student Retention 2 , 93Ð108. doi: 10.2190/d5fd-d0p8-n7q2-7dq1 Mallinckrodt, B. , & Sedlacek, W. E. (1987). Student retention and the use of campus facilities by race. NASPA Journal, 24 , 566-572. Mamaril, N. A., Usher, E. L., Li, C. R., Economy, D. R. and Kennedy, M. S. (2016). Measuring undergraduate students' engineering self -effic acy: A validation study . Journal of Engineering Education, 105, 366Ð395. doi: 10.1002/jee.20121 Mathematics Learning Center website (2019). Retrieved from: https://math.msu.edu/mlc/ Mau, W. -C. (2003). Factors tha t influence persistence in science and engineering career aspirations. Career Development Quarterly, 51 , 234 Ð 243. doi: 10.1002/j.2161-0045.2003.tb 00604.x McCarron, G. P., & Inkelas, K. K. (2006). The gap between educational aspirations and attainment fo r first -generation college students and the role of parental involvement. Journal of College Student Development, 47 , 534-549. doi:10.1353/csd.2006.0059 McKenzie, D., Tan, T. X., Fletcher, E. C., & Jackson -Williams, A. (2017). Major re -selection advising a nd academic Performance. NACADA Journal , 37, 15Ð25. doi: 10.12930/NACADA -15-029 Metzner, B. S. (1989). Perceived quality of academic advising: The effect on freshman attrition. American Educational Research Journal , 26, 422Ð442. doi: 10.3102/00028312026003422 Michigan State University (author unknown): The Center for Spartan Engineering Activity Report, 2017 -2018. Retrieved from: https://www.egr.msu.edu/careers/sites/default/files/content/Activity%20Report%2017 -18.pdf Molnar, C. (2019). Interpretable machine learning: A guide for making black box models explainable. https://christophm.github.io/i nterpretable -ml-book/ Musu -Gillette, L. E., Wigfield, A., Harring, J. R., & Eccles, J. S. (2015). Trajectories of change in students' self -concepts of ability and values in math and college major choice. Educational Research and Evaluation, 21, 343-370. doi:10.1080/13803611.2015.1057161 Nagengast, B., Marsh, H. W., Scalas, L. F., Xu, M. K., Hau, K. T., & Trautwein, U. (2011). Who took the ÒxÓ out of expectancy -value theory? A psychological mystery, a substantive -methodological synergy, and a cross -national generalization. Psychological Science, 22, 1058Ð1066. doi: 10.1177/0956797611415540 153 Nagy, G., Garrett, J. L., Trautwein, U., Cortina, K. S., Baumert, J., & Eccles, J. S. (2008). Gender and high school course selection in Germany and the U.S.: The mediatin g role of self-concept and intrinsic value. In H. Watt & J. Eccles (Eds.), Gender and occupational outcomes . Washington, DC: APA. National Center for Education Statistics (NCES) Digest of Education Statistics, 2018 edition. URL: https://nces.ed.gov/programs/digest/d18/tables/dt18_318.45.asp National Science Foundation, Division of Science Resources Statistics. (2007). S&E degrees, by Race/ Ethnicity of recipients: 1995 Ð2004 No NS F 07 -308. Arlington, VA: National Science Foundation. Ngambeki, I. Evangelou, D., Long, R., Ohland, M., & Ricco, G. (2010). Describing the pathways of students continuing in and leaving engineering. Paper presented at the 2010 American Society for Enginee ring Education conference, Louisville, KY. Ohland, M., Sheppard, S., Lichtenstein, G., Eris, O., Chachra, D., & Layton, R. (2008). Persistence, engagement, and migration in engineering programs. Journal of Engineering Education, 97 , 259-278. doi:10.1002/j. 2168-9830.2008.tb00978.x Olds, B. M., & Miller, R. L. (2004). The effect of a first -year integrated engineering curriculum on graduation rates and student satisfaction: A longitudinal study. Journal of Engineering Education, 93, 23-35. doi: 10.1002/j.2168 -9830.2004.tb00785.x Pascarella, E. T., & Terenzini, P. T. (2005). How college affects students. Vol 2: A third decade of research. San Francisco: Jossey -Bass. Patrick, H., Ryan, A. M., & Kaplan, A. (2007). Early adolescentsÕ perceptions of the classroom social environment, motivational beliefs, and engagement. Journal of Educational Psychology, 99, 83Ð98. Pedersen, S., & Benesty, M. (2018). Lime: Local Interpretable Model -Agnostic Explanat ions. https://cran.r -project.org/web/packages/lime/index.html Peng, S. S., Wright, D. A., & Hill, S. T. (1995). Understanding racial -ethnic differences in secondary school science and mathematics achievement (NCES 95 -710). Washington, DC: U.S. Department of Education, National Center for Education Statistics. Perez, T., Cromley, J. G., & Kaplan, A. (2014). The role of identity development, values, and costs in college STEM retention. Journal of Educational Psychology , 106, 315Ð329. doi: 10.1037/a0034027 Perry, R. P., Stupnisky, R. H., Daniels, L. M., & Haynes, T. L. (2008). Attributional (explanatory) thinking about failure in new achievement settings. European Journal of Psychology of Education, 23, 459-475. PresidentÕs Council of Advisors on Science and Technology (PCAST). (2012). Engage to excel: Producing one million additional college graduates with degrees in science, technology, 154 engineering, and mathematics . Washington, DC. https://www.energy.gov/sites/prod/files/Engage%20to%20Excel%20Producing%2 0One%2 0Million%20Additional%20College%20Graduates%20With%20Degrees%20in%20STEM %20Feburary%202012.pdf R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R -project.org/ . Raelin, J. A., Bailey, M. B., Hamann, J., Pendleton, L. K., Reisberg, R., & Whitman, D. L. (2014). The gendered effect of cooperative education, contextual support, and self -efficacy on undergraduat e retention. Journal of Engineering Education. 103, 599-624. Reason, R. D. (2009). An examination of persistence research through the lens of a comprehensive conceptual framework. Journal of College Student Development, 50 , 659-682. doi: 10.1353/csd.0.0098 Restubog, S. L. D., Florentino, A. R., & Garcia, P. R. J. M. (2010). The mediating roles of career self-efficacy and career decidedness in the relationship between contextual support and persistence. Journal of Vocational Behavior , 77, 186Ð195. doi: 10.1016/j.jvb.2010.06.005 Rheinheimer, D.C., & Mann, A. (2000). Gender matching, floor effects, and other tutoring outcomes. Journal of Developmental Education, 24, 10-15. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ÒWhy should I trust you?Ó: Explaining the predictions of any classifier. ArXiv:1602.04938 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1602.04938 Robinson, K. A., Lee, Y., Bovee, E. A. , Perez, T., Walton, S. P., Briedis, D., & Linnenbrink -Garcia, L. (2018). Motivation in transition: Development and roles of expectancy, task values, and costs in early college engineering. Journal of Educational Psychology . Advance online publication. http://dx.doi.org.proxy1.cl.msu.edu/10.1037/edu0000331 Rosseel, Y., Oberski, D., Byrnes, J., Vanbrabant, L., Savalei, V., Merkle, E., É Jorgensen, T. D. (2018). lavaan: Latent Variable Ana lysis https://cran.r -project.org/web/packages/lavaan/index.html Ryan, A. M. (2000). Peer groups as a context for the socialization of adolescents' motivation, engagement, and achiev ement in school. Educational Psychologist, 35, 101-111. Sammut, C., & Webb, G. I. (2017). Encyclopedia of Machine Learning and Data Mining (2nd ed.). New York: Springer. Schudde, L. T. (2011). The causal effect of campus residency on college student retenti on. The Review of Higher Education , 34, 581Ð610. doi: 10.1353/rhe.2011.0023 Segal, M., & Xiao, Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discove ry, 1, 80Ð87. doi: 10.1002/widm.12 155 Seymour, E., & Hewitt, N. M. (1997). Talking about leaving: Why undergraduates leave the sciences. Boulder, CO: Westview Press. Smith, L. B., & Thelen, E. (2003). Development as a dynamic system. Trends in Cognitive Scien ce, 7, 343-348. doi: 10.1016/s1364 -6613(03)00156-6 Sutton, K. L., & Sankar, C. (2011). Student satisfaction with information provided by academic advisors. Journal of STEM Education, 12, 71-85. Swecker, H. K., Fifolt, M., & Searby, L. (2013). Academic advi sing and first -generation college students: A quantitative study on student retention. NACADA Journal, 33, 46-53. doi: 10.12930/NACADA -13-192 Thelen, E. (2005). Dynamic systems theory and the complexity of change. Psychoanalytic Dialogues, 15 , 255-283. doi: 10.1080/10481881509348831 Tinto, V. (1997). Classrooms as communities. The Journal of Higher Education, 68 , 599-623. doi: 10.2307/2959965 Tinto, V. (1993). Leaving college: Rethinking the causes and cures of student attrition (2nd. ed.). Chicago: The U niversity of Chicago Press. Tinto, V. (1999). Taking student retention seriously: Rethinking the first year of college. National Academic Advising Association Journal, 19 , 5-9. doi: 10.12930/0271 -9517-19.2.5 Tomarken, A. J., & Waller, N. G. (2005). Structural equation modeling: Strengths, limitations, and misconceptions. Annual Review of Clinical Psychology, 1, 31-65. doi: 10.1146/annurev.clinpsy.1.102803.144239 Trautwein, U., Marsh, H., Nagengast, B., Ludtk e, O., Nagy, G., & Jonkmann, K. (2012). Probing for the multiplicative term in modern expectancy -value theory: A latent interaction modeling study. Journal of Educational Psychology, 104, 763-777. doi:10.1037/a0027470 van Buuren, S. & Groothuis -Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45 , 1-68. doi:10.18637/jss.v045.i03 Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions , practices, and recommendations for organizational research. Organizational Research Methods, 3 , 4-70. doi:10.1177/109442810031002 von Hippel, P. T. (2013). Should a normal imputation model be modified to impute skewed variables? Sociological Methods & R esearch , 42, 105Ð138. doi: 10.1177/0049124112464866 Walton, S. P., Briedis, D., Urban -Lurain, M., Hinds, T. J., Davis -King, C., & Wolff, T. (2013) Building the whole engineer: An integrated academic and co -curricular first -year experience. Proceedings of the 2013 ASEE Conference and Exposition, Atlanta, GA. 156 Walton, G., Logel, C., Peach, J., Spencer, S., & Zanna, M. (2015). Two brief interventions to mitigate a "chilly climate" transform women's experi ence, relationships, and achievement in engineering. Journal of Educational Psychology, 107, 468-485. doi:10.1037/a0037461 Wang, M. -T., & Degol, J. (2013). Motivational pathways to STEM career choices: Using expectancy Ðvalue perspective to understand indiv idual and gender differences in STEM fields. Developmental Review , 33, 304Ð340. doi: 10.1016/j.dr.2013.08.001 Wickham, H., & Grolemund, G. (2017). R for Data Science. Sebastopol, Canada: OÕReilly. Wigfield, A., & Eccles, J. S. (2000). Expectancy Ðvalue t heory of achievement motivation. Contemporary Educational Psychology , 25, 68Ð81. doi: 10.1006/ceps.1999.1015 Wigfield, A., & Eccles, J. S. (2002). The development of competence beliefs, expectancies for success, and achievement values from childhood throug h adolescence. In A. Wigfield & J. S. Eccles (Eds.), The development of achievement motivation (pp. 91Ð120). New York: Academic. Wigfield, A., Eccles, J. S., Yoon, K. S., Harold, R. D., Arbreton, J. A., Freedman -Doan, C., & Blumenfeld, P. C. (1997). Chang e in children's competence beliefs and subjective task values across the elementary school years: A 3 -year study. Journal of Educational Psychology, 89, 451-469. doi: 10.1037/0022 -0663.89.3.451 Wigfield, A., Tonks, S., & Klauda, S. L. (2009). Expectancy -value theory. In K. R. Wentzel & A. Wigfield (Eds.), Handbook of motivation in school (pp. 55Ð76). New York, NY: Taylor & Francis. Wimer, D. J., & Levant, R. F. (2011). The relation of masculinity and help -seeking style with the academic help -seeking behavi or of college men. The Journal of MenÕs Studies, 19, 256Ð274. doi: 10.3149/jms.1903.256 Winograd, G., & Rust, J. P. (2014). Stigma, awareness of support services, and academic help -seeking among historically underrepresented first -year college students. The Learning Assistance Review, 19, 19-43. Wormington, S. V., & Linnenbrink -Garcia, L. (2016). A new look at multiple goal pursuit: The promise of a person -centered approach. Educational Psychology Review , 1-39. doi: 10.1007/s10648 -016-9358-2 Xu, D., Solanki, S., McPartlan, P., & Sato, B. (2018). EASEing Students Into College: The Impact of Multidimensional Support for Underprepared Students. Educational Researcher , 47, 435Ð450. https://doi.org/10.3102/00 13189X18778559 Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science: A Journal of the Association for Psychological Science , 12, 1100Ð1122. doi: 10.117 7/1745691617693393 Yoder, B. L. (2015). Engineering by the numbers. American Society for Engineering Education. 157 https://www.asee.or g/papers -and -publications/publications/college -profiles/15EngineeringbytheNumbersPart1.pdf Zusho, A., Pintrich, P. R., & Coppola, B. (2003). Skill and will: The role of motivation and cognition in the learning of college chemistry. International Journal o f Science Education, 25, 1081Ð1094. http ://dx.doi. org/ 10.1080/095006903200005220