DISCOVERING THE LANGUAGE OF MEANINGFUL WORK By Michael Aubrey Morrison A THESIS Submitted to 2018 Michigan State University in partial fulfillment of the requirements for the degree of Psychology — Master of Arts ABSTRACT DISCOVERING THE LANGUAGE OF MEANINGFUL WORK By Michael Morrison This study introduces a series of language signals that indicate whether a person finds their work meaningful (or meaningless). These signals are then integrated into a new, natural language measure of work meaningfulness. This algorithm can analyze a worker’s written description of their work, and using features of their writing determine whether they find their work meaningful with an average classification accuracy of 85%. As an additional, theoretical contribution, this study tests the relationship between work meaningfulness and construal level theory. Results indicate that personal pronouns and action verbs are most related to creating an impression of meaningfulness, but that identity statements and positive sentiment are more related to actual, self-reported meaningfulness. Additionally, construal level showed a significant, positive relationship with several measures of work meaningfulness. ACKNOWLEDGMENTS A number of people contributed advice, expertise, and support to this project. First, I’d like to thank my advisor, Rick DeShon, for proposing that we scrap the initial idea for my thesis (which was going to be yet another traditional, Likert-style measure of meaningfulness) and do something not boring and lame instead. This machine learning approach was way more fun and interesting. I would also like to thank Kevin Ford for helping me resolve my “Which prompt do I give them?!” crisis by recommending that, at some point, I just pick one and run with it. Otherwise, I would probably still be pilot testing prompts. I would also like to thank Ruth Kanfer at Georgia Tech for suggesting that I add a definition of meaningfulness to my survey prompt to help keep replies consistent. That seemed to help. Finally, I would like to thank my girlfriend Kelsey for telling me to finish my thesis every time I started getting excited about some new project that wasn’t my thesis. And for being supportive and loving me and stuff. iii TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... vi INTRODUCTION .......................................................................................................................... 1 Meaningfulness: A ‘Holy Grail’ Variable in Organizations .................................................. 2 What Does ‘Meaningfulness’ Mean? ..................................................................................... 3 Common themes. .......................................................................................................... 4 Universal definitions. ................................................................................................... 8 Unifying with Construal Level Theory. ..................................................................... 10 Goal #1: Using Language to Inform Theory........................................................................ 11 Linguistic signals of meaningfulness. ........................................................................ 12 Goal #2: Create a New, Natural Language-Based Measure of Work Meaningfulness ....... 17 A quick introduction to natural language processors. ................................................ 17 An NLP measure of meaningfulness. ......................................................................... 18 Goal #3: Test the Role of Construal Level as a Potential Unifier ....................................... 19 The trouble with testing construal level. .................................................................... 20 Testing construal level with language. ....................................................................... 21 Convergent validity check. ......................................................................................... 22 METHODS ................................................................................................................................... 22 Participants ......................................................................................................................... 23 Compensation. ............................................................................................................ 23 Collected Data ...................................................................................................................... 24 Work story. ................................................................................................................. 24 Explicit meaningfulness story. ................................................................................... 29 Single-item meaningfulness catchall. ......................................................................... 29 Self-report measures of meaningfulness. ................................................................... 30 Affective commitment ................................................................................................ 31 Binary meaningfulness question. ............................................................................... 31 Human Ratings..................................................................................................................... 32 Other-rated meaningfulness. ...................................................................................... 32 Other-rated construal level. ........................................................................................ 33 Machine Ratings .................................................................................................................. 33 Abstract words. ........................................................................................................... 33 Sentiment. ................................................................................................................... 34 Parts of speech. ........................................................................................................... 35 Developing the Algorithm ................................................................................................... 35 Choosing the optimization parameter. ........................................................................ 36 Creating a training set. ................................................................................................ 38 Version 1: Bag of words. ............................................................................................ 38 Version 2: A theory-driven model. ............................................................................. 39 Creating a search function .......................................................................................... 39 Searching for features. ................................................................................................ 40 Naive Bayes classifier. ............................................................................................... 40 iv Cross-validation. ......................................................................................................... 41 Construal Level and Meaningfulness ................................................................................... 41 RESULTS ..................................................................................................................................... 43 Goal 1: Discovering the Language of Meaningful Work .................................................... 45 First-person pronouns and action verbs. ..................................................................... 45 Abstract language. ...................................................................................................... 46 Positive sentiment. ...................................................................................................... 47 All-new linguistic signals of meaningfulness. ........................................................... 49 Summary of new language features. .......................................................................... 56 Goal 2: Create a Natural Language Measure of Work Meaningfulness .............................. 59 Cross-validation. ......................................................................................................... 59 Relationship with collected measures. ....................................................................... 59 Goal 3: Testing Construal Level & Meaningfulness ........................................................... 62 On the validity of other-rated construal level. ............................................................ 64 Joint Relationships ............................................................................................................... 64 What best predicts whether a story sounds meaningful? ........................................... 64 What best predicts self-reported meaningfulness? ..................................................... 65 DISCUSSION ............................................................................................................................... 67 Contribution 1: Language Reveals Meaningfulness ............................................................ 67 Validation of Podolny et al.’s (2004) theory. ............................................................. 67 Theoretical implications. ............................................................................................ 67 Future directions. ........................................................................................................ 68 Limitations. ................................................................................................................. 69 Contribution 2: A Natural Language Measure of Meaningfulness ...................................... 69 Distributing the measure. ........................................................................................... 69 Future directions. ........................................................................................................ 70 Contribution 3: Construal Level is Related to Meaningfulness ........................................... 70 Future directions. ........................................................................................................ 71 Contribution 4: The Work Stories Corpus ........................................................................... 72 PRACTICAL IMPLICATIONS ................................................................................................... 73 Watch for Language Cues of Meaningfulness ..................................................................... 73 A New Tool for Practitioners ............................................................................................... 73 Watch for Construal Level Fluctuations When People Talk About Their Work ................ 73 APPENDIX ................................................................................................................................... 75 REFERENCES ............................................................................................................................. 78 v LIST OF TABLES Table 1. Pilot work story prompts…………………………………………………………….... 27 37 Table 2. Correlations between potential optimization parameters and collected measures of meaningfulness and commitment. All correlations are significant with p < .001…...……....…. Table 3. Correlations of all variables…………………………………………….…....….…...... 44 Table 4. The relationship between positive sentiment and meaningfulness………...…...……... 48 Table 5. The relationship between “I am” language and meaningfulness………...……………. Table 6. High meaningfulness features. Importance of language features in predicting high self- reported overall work meaningfulness……………………………………………...…………... 57 Table 7. Low meaningfulness features. Importance of identified language features in predicting low self-reported overall work meaningfulness……………………………………………...…. 58 Table 8. Correlations algorithm-predicted probability of meaningfulness and collected measures of meaningfulness…………………………………………………………………………......... 61 Table 9. Correlations between other-rated construal level and meaningfulness measures........... 52 63 vi INTRODUCTION “So, what do you do for a living?” we ask at nearly every social event we attend with new people. Sometimes a short answer to this question is sufficient. Other times, when we’re interested (or bored), we encourage people to expand further, to tell us all about their work. When you listen to these longer answers, it’s often easy to get a sense of how your conversation partner feels about their work: Whether they like it, hate it, think of it as temporary, or find it meaningful or meaningless. But what cues in their language lead you to these conclusions? What if we could analyze and measure those cues in language directly? To determine, in particular, whether a person finds their work meaningful just by how they talk about it? The potential upsides to being able to measure work meaningfulness in language extend far beyond a simple party trick. People are constantly talking about their work in contexts where it would be useful to know how much meaning they find in it. Imagine all the job interviews being conducted right now, where some job candidate is recounting their past jobs. How meaningful did they find those past jobs? Imagine consultants, sitting at conference tables asking employees to “tell me a little bit about what you do here.” What verbal cues might alert those consultants to workers with high or low meaningfulness? Finally, there is a wealth of text data being generated every day: emails, internet chats, telephone transcripts, etc. where workers discuss their job roles to leaders, customers, and fellow coworkers. Natural language descriptions of work that potentially contain rich information about work meaningfulness are happening everywhere all around us, and we’re not measuring any of it! The only way we can measure meaningfulness right now is by asking people to bubble-in their agreement on a series of Likert survey items. Putting aside the inconveniences of issuing 1 surveys, what might we be missing about the construct of meaningfulness by only measuring it via traditional Likert surveys? What could we learn about what meaningfulness is — how to define the experience of meaningful work — from measuring how it shows up in people’s language? In this study, I asked n=194 full-time workers to tell me all about what they do for a living, and then to reveal whether they find their work meaningful. Using Machine Learning (ML), I analyzed each person’s ‘work story’ to discover linguistic signals associated with feelings of high or low meaningfulness. I then integrated these signals into a new algorithm — called a Natural Language Processor (NLP). This NLP algorithm can read-in any work story and output a probabilistic conclusion about whether the author of that text finds their work meaningful. As a third contribution of this study, I also used language analysis to test the relationship between a sense of meaningfulness in work and a cognitive-psychological construct called construal level, which has the potential to help push meaningful work theory towards a consensus about what meaningfulness is. Meaningfulness: A ‘Holy Grail’ Variable in Organizations Work meaningfulness is like a super-variable for organizations. If an organization can lead its members to a strong sense of work meaningfulness, that organization can enjoy workers who are more committed, more motivated, who go above-and-beyond, can manage their stress, and perform well at work (Bunderson & Thompson, 2009; Champoux, 1992; Glazer, Kozusznik, Meyers, & Ganai, 2014; Grant, 2008; Seibert, Wang, & Courtright , 2011). Meaningfulness, in effect, leads to nearly all the positive employee outcomes that organizations care about achieving. 2 Understandably, there is a great deal of research dedicated to figuring out how to foster a sense of meaningfulness at work. And these efforts have found that meaningfulness is related to some of the most central concepts in organizational science, including: job design, transformational leadership, work engagement, and job fit (Arnold et al., 2007; Brit, Adler, & Bartone, 2001; Hackman & Oldham, 1976; May, Gilson, & Harter, 2004; Rothmann & Hamukangandu, 2013). Put simply, meaningfulness is a very desirable phenomenon in organizations, and that is reflected in its frequent use as a criterion of interest in the organizational science literature. While much of the literature in organizational science seems to focus on the causes and consequences of meaningful work for organizations, usually as a secondary focus to some other construct (e.g., meaningfulness as an outcome in a study about job design), more focused articles on meaningfulness can be found in the wider psychology literature that extends to other sub- disciplines such as social and vocational psychology. This meaning-centric literature focuses primarily on defining exactly what meaningfulness is and how it is constructed from a person- centric perspective (Weiss & Rupp, 2011). What Does ‘Meaningfulness’ Mean? When a person says that they find their work ‘meaningful’ or ‘meaningless’, what do they mean? Philosophers and psychologists have been trying to capture and define the notion of meaning in work for some time. In a recent review, Bailey, Yeomen, Madden, Thompson, and Kerridge (2016) listed over 30 different definitions of the term “work meaningfulness.” And although the last decade has seen some popular, highly-integrative reviews of the meaningfulness space (e.g., Bailey et al., 2016; Lepisto and Pratt, 2016; Rosso et al., 2010) which have included their own ‘universal’ definitions of meaningfulness, there is still no 3 dominant, generally-accepted definition or model of work meaningfulness. Common themes. While the construct proliferation currently challenging the work meaningfulness literature may hinder consensus, it is a great boon to comprehensiveness --- as in, towards identifying all the potential factors in meaningfulness, and potential routes to achieving it. And while there is not yet a consensus on what work meaningfulness is or what creates a sense of meaningfulness in work, there are certainly some common themes that are mentioned often when attempting to define, discuss or test experiences of meaningfulness. In the following sections, I will list these common themes. For each theme, I will provide a theoretical background and an example of a definition of work meaningfulness that relies on the theme. Higher purpose. Seminal psychologists Victor Frankl (1962) and Abraham Maslow (1943) both emphasized the notion of self-transcendence as a path to creating meaning in life. Although there are a few different conceptualizations of self-transcendence in the literature, the essential idea is that self-transcendence is the experience of seeing your actions as serving a goal that is beyond yourself (Morrison, 2016). Or as legendary psychologist William James (1985, p. 266) put it, “a feeling of being wider in life than the world’s selfish little interests.” The notion of pursuing a higher, self-transcendent purpose was central to Frankl (1962) and Maslow’s (1969) definitions of meaningfulness (note that just before he died suddenly while jogging, Maslow published a revised, six-level version of his famous Hierarchy of Needs that included self-transcendence above self-actualization). Self-transcendence is also reflected in modern definitions of meaningful work (Bailey et al., 2016; Rosso et al., 2010): For example. Arnold et al. (2007, p. 195) define meaningful work as “finding a purpose in work that is greater than the extrinsic outcomes of the work.” Prosocial impact. Although Frankl (1962) and Maslow’s (1943) theoretical arguments do 4 not mandate that the self-transcendent ‘higher purpose’ of your work needs to be ‘helping other people’, that theme (helping others, or ‘prosocial impact’ in modern terms) is featured frequently in their writings and in the examples they employed to illustrate what ‘finding a higher purpose’ might look like. In modern meaningfulness literature, the theme of prosocial impact features prominently. Hackman and Oldham (1976, p. 257) proposed task significance (“the degree to which the job has a substantial impact on the lives or work of other people”) as one of the three job design features that lead to experiencing work as meaningful. Duffy, Allan, Autin, and Bott (2013) included “serving others in some capacity” in their definition of meaningful work. By way of testing these propositions, Allan, Duffy, and Collisson’s (2018) study concluded that feeling like you’re improving the lives of others with your work leads to a sense of meaningfulness. Further, Grant’s (2008) study showed that adding a sense of prosocial impact to a job can increase performance by increasing a sense of task significance. Return on investment. Kahn (1990, p. 704) defined meaningfulness as “feeling that one is receiving a return on investments of one's self.” We need to see our time spent working as fruitful, either simply or profoundly. The notion of needing a return on investment from work is particularly reflected in theories and studies of meaningless work. Ariely, Kamenica, and Prelec (2008) argued that the “Sisyphus experience” — referring to the mythical Greek figure who was forced to roll a boulder up a hill only to see it roll back down again (for all eternity) — is a common route to draining meaning from work. Ariely et al. (2008) illustrated this phenomenon in an experiment where participants were instructed to build figures out of LEGO bricks. For the low-meaning condition, experimenters immediately picked up each LEGO figure upon completion, disassembled the figure, and handed the bricks back to the participant. Participants 5 in this disassembly condition experienced markedly lower meaningfulness than those in the ‘high meaning’ condition where completed figures were placed on display. As a final example, Hackman and Oldham’s (1976) notion of task identity also echoes the ‘return on investment’ theme. They proposed that being able to see a completed whole produced by one’s work is a key component in the experienced meaningfulness of work. Self-growth. The idea of meaningful self-growth is perhaps best captured by Maslow’s (1943) notion of self-actualization. Maslow (1943) placed the achievement of one’s full potential at the top of his original Hierarchy of Needs, arguing that self-actualizing was the highest aim in life. Self-growth is also discussed frequently in modern meaningfulness literature. For example, Rosso et. al (2010) and Lips-Weirsma and Wright (2012) both feature analogs of self-growth in their models of meaningful work. And Fairlie (2011) defines work meaningfulness explicitly as “work that facilitates self-actualizing.” Self-expression. Self-expression is typically discussed in terms of opportunities that an organization provides for its members to bring their ‘whole self’ to work (Chalofsky, 2010). For example, a nurse that can express her passion for painting by decorating patients’ rooms may find her work more meaningful than if this expression was prohibited or blocked. Many modern theories of meaningful work support the notion that self-expression is related to meaningfulness: Hackman and Oldham (1976) listed skill variety (the degree to which a job “involves the use of a number of different skills and talents of the person”) as one of their three pillars of work meaningfulness. Finally, major integrative reviews of the meaningfulness space by Rosso et al. (2010) and Lepisto and Pratt (2016) both included a form of self-expression in their components of meaningfulness. Sensemaking. In the meaningfulness literature, the words ‘meaning’ and 6 ‘meaningfulness’ are highly distinct (Rosso et al., 2010). Work meaning is what work represents or symbolizes. It is how you ‘justify’ what you’re doing in your work (Lepisto & Pratt, 2016). Through this lens, meaning is the outcome of the process of making sense of your job and work role (Martela & Steger, 2016; Rosso et al., 2010). The meaning you learn from your work is thought to spill-over into the rest of your life, helping you understand the world and your place in it. Schnell, Hoge, and Pollet (2013) defined work meaningfulness in terms of this kind of sensemaking — as your work providing you with a broader understanding. Steger et al., (2012) also featured a sensemaking-themed item in their highly-cited measure of work meaningfulness (the item text reads “My work helps me make sense of the world around me”). Work Centrality and Job Involvement. How important is your work in your life? Your work centrality is determined by how important you find your work, relative to other aspects of your life (Rosso et al., 2010). Workers who get a lot of meaning from their work often see their work as an extremely important part of their lives (i.e., have high work centrality; Bunderson & Thompson, 2009). Similarly, job involvement is a measure of how ‘wrapped up’ your sense of self is in your work. While having high work centrality and/or job involvement is thought to be associated with a greater sense of meaningfulness in work, work centrality and job involvement can also ‘cut both ways.’ That is, highly involved workers are more emotionally sensitive to developments in their work life, both positive and negative (Douglas & Carless, 2009). Identity. More than work being important to you, work can define you and help you understand who you are, perhaps especially when it is meaningful. Some scholars have suggested that highly meaningful work is work where an individual “connects [their] identity to his or her work” (Britt et al. [2007] p36) or “integrates their personal identity with their work role” (Cohen-Meitar, Carmeli, & Waldman, 2009). Rosso et al.’s (2010) integrative review also 7 features “identity affirmation” as a pathway to meaningfulness, and they noted that identity development seemed to be the most prominent mechanism of meaningfulness presented in research focusing on the organization as a source of meaning (as in, organizations can foster meaningfulness by helping workers find a sense of identity through their work). Universal definitions. Ultimately, we need to arrive at a single, unifying definition of what meaningfulness is. Chiefly, we need a consensus definition so that we can test and measure meaningfulness in a consistent fashion. Right now, there are so many different definitions and measures of meaningfulness that the few empirical studies that have been conducted on meaningfulness often define, measure, and operationalize meaningfulness completely differently from each other (Bailey et al., 2016). This limits our ability to build a deep understanding of meaningfulness by discovering all the instantiations, contexts, effects, and boundary conditions that stem from a single theoretical root. Meaningful work researchers are well aware of this need for a universal definition, and there have been many attempts to create one. Combinatorial definitions. Combinatorial definitions attempt to create a unified definition of meaningfulness by incorporating several of the most common themes. That is, they define meaningfulness as the occurrence of one or more specific meaningfulness themes (Bailey et al., 2016). For example, McCrea, Boreham, and Ferguson (2011) defined meaningfulness as “perceived creativity, autonomy, responsibility and contribution to society.” The trouble with combinatorial definitions such as this one is that none of them cover the full variety of the identified themes commonly associated with meaningful work. Rosso et al. (2010) alone identified 13 different pathways to meaningfulness. To comprehensively define meaningfulness in terms of all the narrow pathways to meaningfulness, it would likely take a 8 long definition. Broad language definitions. In an attempt to succinctly define meaningful work in a universal manner that accounts for all identified and yet-to-be-discovered pathways to meaningfulness, ‘broad language’ definitions of work meaningfulness employ more abstract language and umbrella terms that, semantically, can accommodate any specific instantiation of meaningfulness within them. This includes defining meaningful work as “important” or “valuable” and/or “significant” (Bailey et al., 2016; Rosso et al, 2010). For example, Hackman and Oldham (1975) defined meaningfulness as work that is “important, valuable, and worthwhile.” This broad language approach to defining meaningfulness is useful because it helps capture how people feel about their work when it is meaningful, regardless of why they feel it is meaningful. For example, your work can feel important and valuable because it contributes to your self-growth, or because it has a higher purpose or a prosocial impact. In this way, these succinct, broad language definitions are a step towards a universal definition, but not all the way there yet. The trouble with many of these broad language definitions is that they often rely on synonyms for meaningfulness. A synonym, even a very good synonym, is not the same thing as a definition. A person’s answer to the question “Is your work important?” may be different than their answer to the question “Is your work meaningful?” Try this thought experiment: think of the most meaningful work you can imagine for yourself. Got it? OK, now imagine that instead of doing that thing, you are assigned to ladle soup to orphans. You might agree that your soup- ladling is extremely important work, even significant work, but it may not feel as meaningful to you as that first thing you imagined. 9 Unifying with Construal Level Theory. Morrison, Walker, and DeShon (2016) took a different approach to create an all-inclusive, unifying definition of meaningfulness. Rather than seeking to achieve all-inclusiveness and conciseness through semantic manipulations alone, they attempted to define meaningfulness through a common mechanism in the experience of work meaningfulness (instead of common descriptors). Drawing from cognitive psychological theory, they proposed that Trope and Liberman’s (2010) Construal Level Theory (CLT; detailed below) could be a common cognitive-psychological mechanism underlying all experiences of meaningful work. And that this could be used to create a universal definition of meaningfulness that was both as precise as the combinatorial definitions and as comprehensive and concise as the broad language definitions. Construal Level Theory. Briefly, Construal Level is a sensemaking mechanism through which people think about any given experience in terms of its abstract or concrete qualities (Trope & Liberman, 2010). Put another way, your current construal level is whether you are seeing the forest or the trees. When you are thinking about something at a high construal level, you are thinking abstractly — about the ‘why’ of the thing. At a low construal, you are thinking in terms of concrete details — about the ‘how’ of the thing. At a high construal, screwing in a light bulb is “bringing light to your daughter’s room so she can read” or even “contributing to global warming.” At a low construal, screwing in a light bulb is merely “rotating a sphere of glass.” It is possible to think about our work in this way. At a low construal, you may describe your work in terms of tasks: A banker fills out forms, a programmer types code on a screen. At a high construal, you may talk about your work in terms of its broader purpose: a banker helps families get homes, a programmer invents tools to improve the lives of his customers. At a very high construal, work is described not only in terms of its 10 purpose, but in terms of how that purpose is personally significant to the worker (e.g., “I help families get homes, and I believe that everybody deserves a place they can call home, because I didn’t have one when I was little.”). In this example, the worker is perceiving their work as connected to helping others, to their own life history, and to their beliefs about what an ideal world looks like. In their 2016 paper, Morrison et al. proposed this high-level construal of the work experience as a crucial element in both the evaluation of work as meaningful and the in-the- moment experience of work as meaningful. That is, if you find your work meaningful, by any definition or subjective interpretation of meaningfulness, you are construing your work at a high construal level because the process of relating experiences to your broader self and higher-order goals operates inherently at a high construal level. If you do not find your work meaningful, you will construe it at a low construal level, because you just don’t think of your work in terms of broader meaning. Construal Level Theory (CLT) allows us to approach meaningfulness from a different perspective. Instead of taking a philosophical, “here’s what work means to humans” approach, Morrison et al.’s (2016) CLT perspective on meaningfulness talks about it in terms of the cognitive mechanics of how people interpret their work as meaningful. For this reason, it’s potentially highly compatible and complementary to all extant theories of meaningful work. But, like most other definitions of meaningfulness, it is still just a theory. It needs to be tested empirically. As part of this study, we aim to do that. Goal #1: Using Language to Inform Theory The first goal of this study is to discover how people talk about their work when they find it meaningful (or meaningless) and draw implications from these language features about the 11 nature of meaningfulness. Natural language conveys a great deal of unfiltered information about how people think and feel (Mairesse, Walker, Mehl, & Moore, 2007). Recently, researchers have begun to capitalize on the psychological data latent in natural language by using machine learning techniques to discover linguistic indicators of various psychological states and traits (Kahn et al., 2016; Mairessee et al., 2007; Tausczik & Pennebaker, 2010). Researchers have found significant associations between language style and ‘stable’ individual differences like personality traits and basic values, and also with temporary psychological states like mood, emotion, and deception (Mairesse, Walker, Mehl, and Moore, 2007). A particularly notable study by Rosenberg & Hirschberg (2005) found linguistic signals of leader charisma (charismatic people use more personal, first-person pronouns). They note at the start of their introduction that charisma is “more difficult to define than identify” as a justification for pursuing a lexical approach to measuring charisma — a line of thinking that that parallels our current struggle with meaningfulness. By studying charisma through the lens of language, Rosenberg and Hirschberg (2005) were able to discover that some degree of personal connection may be instrumental in creating an impression of charisma, which they note may not have been obvious to charisma researchers before their unique study. As this example shows, language analysis can help inform definitions of hard-to-define constructs. And as our present study will show, it proved to be a useful tool for understanding meaningfulness. Linguistic signals of meaningfulness. How might a person talk about their work when they find it meaningful? In searching for previous research on linguistic signals of meaningfulness, I was only able to find one paper that explicitly proposed a relationship between certain language features and a sense of meaningfulness in work. Podolny et al. (2004) proposed — but did not test — a series of linguistic indicators of meaningfulness in work, using examples 12 from Terkel’s (1974) popular book Working to illustrate their points. Personal Pronouns and Action Verbs. According to Podolny et al. (2004), meaningfulness appears not in what people say about their jobs, but in how they say it. Virtually the same statement could be spoken two different ways — one implying high meaningfulness, one implying low meaningfulness. When people find their work meaningful, Podolny et al. (2004) argue, they tend to talk about it using language that brings themselves closer to the work and to the people they work with. When people find their work meaningless, they tend to speak in a language that distances themselves from their work and their coworkers. As an example, consider the sentence “We’re working hard to meet the deadline.” versus “There’s a lot of hard work going on here to meet the deadline.” The first example contains the plural pronoun “we,” which according to Podolny et al. (2004) indicates a close identification with both the author’s work and their coworkers. The second sentence, by contrast, uses distancing language — “There’s hard work going on here” — that doesn’t even mention the author’s self or others. The first sentence also contains verb-phrases (i.e., “working hard”), whereas the second example uses more “noun-like” language (e.g., “hard work”). Podolny et al. (2004) suggest that this ‘speaking in nouns’ language is also a form of self-distancing. At the basic level, Podolny et al. (2004) argued that highly meaningful work is discussed using first-person pronouns. Further, they suggest that first-person plural pronouns indicate that a person likely gets even more meaning from work because plural pronouns represent the connection of self to others. Finally, Podolny et al. (2004) suggested that a work story containing multiple, close-together sentences discussing work with first-person plural pronouns (e.g., gushing about “we”) likely indicates that the worker gets very large amounts of meaning from their work. 13 As part of this study, I tested two of Podolny et al.’s (2004) proposed linguistic indicators of work meaningfulness: first-person pronouns and action verbs. Hypotheses 1-2 reflect these tests. H1: People will describe their work using more first-person pronouns when they find their work meaningful. H2: People will describe their work using more action verbs when they find their work meaningful. Note that Podolny et al. (2004)’s notion of self-distance as a signal of low- meaningfulness has some tension with Morrison et al.’s (2016) proposition that workers will discuss work at a high-construal when it is meaningful, and at a low construal when it is meaningless. Construal Level Theory tightly links psychological distance with a construal level, where greater psychological distance always equals higher construal. It may appear, given this, that Podolny et al.’s (2004) notion of self-distance as a symptom of low meaning in work suggests that low meaningfulness will be associated with a high construal level. This may prove to be confounding in practice, but theoretically, it is not contradictory. Podolny et al.’s (2004) notion of self-distance, loosely interpreted as “picturing your self as distant your meaningless job” is just one type of psychological distance that may inflate construal level. Looking at the tasks of your job from the perspective of your higher order goals also involves psychological distance between the concrete ‘how’ of the task and the broader ‘why’ of the task in which your meaningful state of mind is situated. In this way, Podolny et al. 14 (2004) proposed a psychological distance from the job itself, while Morrison et al. (2016) proposed a psychological distance from the tasks and concrete processes of that job. Also, note that the act of “thinking about the self” as either near or far to the job is an act of high construal. This is evidenced in a brain imaging study by Van der Cruyssen et al. (2014) showing that the brain area associated with high construal states — the dorsal-medial pre-frontal cortex — is often referred to as the area responsible for thinking about the self. It is entirely possible to arrive at a conclusion about whether you feel far or near to your work (as proposed by Podolny [2004]) through a high-construal metacognition. You can speak abstractly about your work being meaningful or meaningless, about yourself feeling tightly bound to it or detached from it, and all the while you are thinking about your work at a high level. The key differentiator, for my purposes here, will be whether the person talks about the features of their work using high-construal, abstract language --- not just their relation to their work. Abstract words. Given the relationship proposed by Morrison et al. (2016) between high construal level and perceptions of work as meaningful, I expected high construal to be a necessary and usually-sufficient indicator that a participant finds their work meaningful. I propose that talking about work using low-construal, concrete language will usually be associated with low meaningfulness, because it suggests that the worker does not perceive a connection between his work and deeper values or purpose as salient (i.e., there is no “why” to the work). Conversely, talking about work features with abstract (high-construal) language should indicate a connection to “why” and should signal higher meaningfulness. However, there are many ways one can operationalize ‘high construal’ in the context of language. For example, the word “fruit” is higher-construal (more abstract) than the word 15 “apple” (which is more concrete). However, the phrase “I saved the orphan with an apple” is higher-construal than the phrase “I gave the orphan a fruit” because the former gets at the ‘why’ of the action, while the latter phrase communicates only the concrete ‘how’ of the action (Vallacher & Wegner, 1989). A more thorough rating of construal level language, designed to capture construal level at wider resolution (phrases, paragraphs, contexts) was conducted as part of this study and is described in detail below. However, I was also interested to see if construal level could be detected based simply on abstractness of individual words. To this end, Hypothesis 3 proposes that people will describe their work using more abstract individual words when they find their work more meaningful. H3: People will describe their work using more abstract words when they find their work meaningful. Positive Sentiment. There is a relative consensus in the theoretical literature on meaningful work that meaningfulness is positively-valenced (Lepisto & Pratt, 2016; Rosso et al., 2010). Generally, in a work context “meaningfulness” is discussed and thought of as a positive thing. Although it is possible for something to be full of dark meaning (e.g., visiting a concentration camp), people do not generally use the word that way, and it is unlikely that a worker will rate a job as high in meaningfulness that is meaningfully horrific to him/her. As such, I expect high meaning stories to be written in a positive tone and low meaning stories to convey a negative tone. There may be exceptions, as in the case of people who feel sadly “bound” to their highly meaningful work (see Bunderson and Thompson [2009]), but I expect these to be rare. 16 H4: People will describe their work using more positive sentiment when they find their work meaningful. Summary. In summary, I expect that work stories rated as meaningful will include more frequent use of first-person (self-investing) pronouns, action verbs, abstract words, and positive sentiment. Goal #2: Create a New, Natural Language-Based Measure of Work Meaningfulness Simply discovering how people talk about their work when they find it meaningful would be useful for informing our understanding of what meaningfulness is. However, it was an important goal of this project to go beyond increasing our understanding of meaningfulness to also provide a tool to help other scholars take advantage of the additional insight provided by natural language to help measure meaningfulness in new ways. To this end, this study included as its second focus the creation of a new, language-based measure of work meaningfulness. This type of measure is called a Natural Language Processor. A quick introduction to natural language processors. A Natural Language Processor is a computer algorithm that scans in a large amount of text and learns which features in that text (e.g., specific word usages, punctuation patterns, sentence length) are commonly associated with the presence of a targeted psychological construct. Once the language features for a construct are learned and built-in to the algorithm, the algorithm can then look for those features in new samples of language, and from that make an assessment about how much of the associated psychological construct is implied in the new text. Perhaps the most ‘famous’ NLP measure (in terms of popular press coverage) is IBM’s 17 Watson supercomputer, which can read-in a small amount of text (500 words, or about 6 emails) and produce statistically-significant estimates of the author’s five-factor personality traits (with sub-facets) and a complete profile of the author’s values — in less than one second (Mahmud, 2015; McCrae & Costa, 1987; Schwartz, 1994). And that’s as of this writing. Watson is still learning. Within a year of Watson’s launch, it learned to be more accurate and improved its efficiency by a factor of seven (meaning it needs less input text to arrive at the same predictive accuracy; Arnoux, 2016). Watson is one example of a growing arsenal of such natural language measures being made available to researchers. As another prominent example, the popular Natural Language Processor called “LIWC” (Linguistic Inquiry and Word Count; pronounced “luke”) can measure a battery of psychological constructs from language features, including need for achievement, time orientation, and analytical thinking (Pennebaker, Francis, & Booth, 2001). An NLP measure of meaningfulness. As part of this study, I used machine learning techniques to create an NLP measure of meaningful work. The detailed construction of this algorithm is described below, but in essence: When run on a body of inputted text (specifically, a person describing their work), this algorithm will look for signals of work meaningfulness, and output a probability (between 0 and 1) that the author of the text finds their work meaningful. I expect these ratings to be significantly related to other, more traditional measures of meaningfulness. H5: Estimates of meaningfulness produced by the natural language algorithm will correlate significantly with other measures of meaningfulness. 18 Construct validity check. I also expect that the meaningfulness scores outputted by the algorithm will relate significantly to phenomena that are known to co-occur with work meaningfulness. In particular, affective commitment is an established outcome of work meaningfulness (Jiang & Johnson, 2012). This relationship is both theoretically and empirically sound. Theoretically, you should want to stay in a job that you find highly meaningful. And empirical research has demonstrated that this commonsensical assertion seems to hold true, with commitment being one of the strongest outcomes of work meaningfulness (Jiang & Johnson, 2012; Seibert, Wang, and Courtright 2011). H6: Ratings of meaningfulness produced by the natural language algorithm will correlate significantly with affective commitment. Ultimately, this algorithm is designed to empower researchers to measure meaningfulness from samples of textual data (i.e., without having to ask people about meaningfulness via surveys and such). Hopefully, researchers will be able to use this new NLP measure to study meaningfulness within the many samples of natural language that occur organically in the modern workplace (e.g., cover pages, emails, open-ended survey responses, and interview transcripts). Goal #3: Test the Role of Construal Level as a Potential Unifier As discussed earlier in this paper, the theoretical space around meaningful work is scattered, siloed, and divergent. It is in dire need of unifying mechanisms. However, purely- theoretical unifying mechanisms may not be enough, as the space is already fat on theory and thin on empiricism. Indeed, reviews by both Bailey et al. (2016) and Rosso et al. (2010) lamented the lack of empirical work in the meaningfulness space and urged scholars to continue 19 forward to empirical testing, rather than creating more theory. Thus, although Morrison et al.’s (2016) union of meaningfulness with Trope & Liberman’s (2010) Construal Level Theory presents a promising candidate for unifying the meaningfulness space, until it is tested empirically it is effectively just ‘yet another meaningfulness theory’ in a literature full of untested unifying theories. This study aims to test Morrison et al.’s (2016) proposed relationship between construal level and meaningfulness and to (hopefully) establish it firmly as a validated candidate for unifying meaningful work theory. However, there are some unique challenges with testing construal level; challenges which this study aims to circumvent by assessing construal level through language rather than, say, through traditional Likert measurements. The trouble with testing construal level. Although Construal Level Theory (CLT) represents potentially one of the most promising unifiers for meaningfulness theory, it cannot be measured easily with traditional Likert approaches. Typically, when a new theory of work meaningfulness is introduced (e.g., Lips-Wiersma and Wright [2012]; Hackman and Oldham [1976]; Steger et al. [2012]), the researchers who introduce it create and test an accompanying Likert-scale measure for it, and then correlate responses on their Likert survey against some relevant outcome measures to demonstrate predictive and construct validity. These approaches work best to the extent that meaningfulness can be reduced to statements like “My work makes a positive impact on the lives of others” and “My work allows me to use many of my skills and talents” (as in Hackman and Oldham’s [1976] Job Diagnostic Survey). A CLT perspective on meaningfulness would be incredibly difficult to test this way because construal level is difficult to operationalize and detect through traditional, multiple-choice scales. 20 We cannot ask a participant “Are you thinking about your work abstractly?” because the answer will always be “Well, now I am.” This priming problem makes it difficult to design traditional, Likert-style scale prompts that get at construal level. The most popular measure of construal level is the scale created for Vallacher and Wegner (1987)’s Action Identification Theory (AIT), which is an extension of CLT. Vallacher and Wegner’s (1987) measure is a forced-choice scale that describes various situations (e.g., “You’re screwing in a light-bulb.”) and then asks the participant to choose how they’d think of the action: with a concrete choice (e.g., “Rotating a sphere of glass.”) and an abstract choice (e.g., “Bringing light to the darkness”). Vallacher and Wegner’s (1987) Action Identification scale is possibly the best available approach to measuring construal level with traditional Likert scales, but there are still problems with it. First, there is an issue of demand characteristics. It’s easy for the test-taker to detect an obvious relationship between the answer choices (i.e., that some are clearly abstract “higher- level” while others are specific and “low-level”). Second, the forced-choice format employed for Vallacher and Wegner’s (1987) Action Identification scale limits the test taker to 2-4 predefined interpretations of a given action. Presumably, across all human minds in the world, there are more than 2 different ways to interpret screwing in a light bulb. And even more options than that when we’re talking about something more complex than screwing in a light bulb, like a person’s interpretation of their own job role. Here, it might be helpful to let people share their own interpretations of their work freeform, and then assess construal level post-hoc. Testing construal level with language. This study tests the relationship between construal level and meaningfulness using natural language. That is, rather than asking participants to self-report their own construal level, participants provided a free-form, essay-like 21 description of their work, and a team of trained raters evaluated those essays and scored them for construal level. In line with Morrison et al.’s (2016) propositions, I expect to find a significant, positive relationship between these ratings of construal level and several measures of meaningfulness. In doing so, I hope to discover evidence that construal level could indeed be a common element in multiple theories of meaningfulness and to illustrate its power as a potential unifier. H7: Construal level will have a significant, positive relationship with work meaningfulness. Convergent validity check. If construal level is related to meaningfulness, then it should also be related to outcomes of meaningfulness. As discussed previously, affective commitment is a well-established outcome of meaningfulness, and so I expect that if high construal level signals high meaningfulness, then it will also relate to higher affective commitment. H8: Construal level will have a significant, positive relationship with Affective Commitment. METHODS There were two overarching goals to this project: discover and measure how a sense of meaningful work shows up in people’s language, and to test the relationship between meaningfulness and construal level. To achieve these goals, I needed three components: A 22 sample of language, a rating of meaningfulness, and a rating of construal level — from people who work. In order to build the desired computer algorithm to predict meaningfulness from language (called a ‘text classifier’ because it classifies text as either meaningful or not meaningful), I needed to collect a sufficiently large sample of work stories to be able to ‘train’ the algorithm on some texts, and then test it on a different sample of texts that the algorithm had never seen before. Participants Participants were N = 194 Mechanical Turk users (n = 76 male, n = 118 female) who identified as being employed full-time. Participants were asked to complete a survey about their work. Generally, for text classification algorithms like the one developed in this study, a sample size of between 80 and 560 texts is recommended for satisfactory text classifier performance (Figueroa, Zen-Treitler, Kandula, & Ngo, 2012). Given budget constraints, I aimed for the mid- lower-end of this recommendation (194 stories). Compensation. Given that this survey asked about work attitudes (i.e., meaningfulness), there was some concern that the pay rate itself could shift attitudes. I aimed for a pay rate that fairly and precisely matched the task requirements: for one, to use the money efficiently, but also so as not to add contaminating positive/negative affect due to feelings of being under/over- benefited. Pilot testing was used to arrive at an ideal pay of $7.50 for completing this study. Lower payments than this resulted in comments suggesting the pay was too low, with one participant commenting (at $6.00) that “The pay rate was okay for the time, with all that writing though, I honestly think $7 is a better price point.” At the final pay rate of $7.50, comments like “pretty fair” or “pay rate was good” were common. 23 Collected Data Work story. Each participant was prompted to write 500 words (about a page) about their work. These work stories are the key piece of the data; the primary focus of analysis. A goal of this project was to collect work stories that were reasonably naturalistic and generic; that is, to capture people speaking as they normally would about their job as if discussing their work at a party, or in an interview, or in response to an open-ended question on an employee survey. This is as opposed to speaking about meaningfulness directly. It was important for the generalizability of the measure that the participants not be too primed to speak about any specific aspect of their work. Story prompt. Recall that this project proposes that fluctuations in construal level may indicate meaningfulness. Priming a particular construal level would confound the endeavor of testing for natural fluctuations in construal level. Thus, the prompt for the story collection was designed to be natural and neutral enough to prime people to speak broadly enough about their work that they were tacitly encouraged to make overall judgments like meaningfulness, without directly asking them to talk about the meaningfulness of their work or otherwise artificially inflating (or deflating) their construal level. For example, a prompt like “Why do you do your work?” would have likely primed a high construal level, as asking ‘why’ has been used in other studies to prime high construal (Schwartz, Eyal, & Tamir, 2018). Conversely, a prompt like “Describe your job role and the tasks you need to complete” would have likely primed a low construal level, as attending to low- level details is a feature of low-construal (Trope & Liberman, 2010). Several different prompts were piloted to see which seemed to elicit the best balance of abstraction in the story texts. Surprisingly, many initial prompts resulted in insufficient variance 24 in meaningfulness ratings. That is, most people found their work meaningful. In one pilot, everyone found their work highly meaningful. This difficulty finding meaninglessness was surprising and seems to suggest that most people, upon reflection, find their work at least somewhat meaningful. This is in itself interesting and perhaps deserving of further study, as it is consistent with Frankl’s (1962) proposition that humans have a “will to meaning.” Or perhaps, consistent with the construal level relationship proposed here, simply asking people to reflect on their work at all — no matter how the question is phrased — prompts people to think of their work at a distance, abstractly, thus raising their construal level and makes it seem more meaningful. Whatever the explanation, this lack of variability in meaningfulness scores in the pilots presented a potential challenge for trying to analyze the data. It would be difficult to design an algorithm to separate high and low meaningfulness stories if they were all too some extent high in meaningfulness. Ultimately, after several pilots and consulting with several advisors, the prompt “In 500 words, tell me about your work,” combined with a more severely-worded prompts for the primary meaningfulness measures (“Think of the most meaningful work you can imagine for yourself. Now, how meaningful do you find your current work?” and “Overall, I find my work very meaningful.”), seemed to be the most neutral and capable of eliciting acceptable variability in meaningfulness scores. These final prompts were chosen by ‘eyeballing’ the variance in meaningfulness scores in each pilot and judging them as (at last) satisfactory (seeing noticeably more low-meaning scores in the final pilot). I realized after data collection that a better approach would have been to choose the best prompt by formally comparing the mean meaningfulness scores of each. I did perform this analysis retrospectively and found that (luckily) the final prompt used in the full 25 data collection exhibited the second-best variability in meaningfulness scores in the pilots. The first-best prompt was also the smallest pilot, so it’s not clear that it was really the best prompt or simply chance that lead to its slightly superior variability in meaningfulness. Despite apparent progress in pilot testing (you can see the means approaching the actual scale midpoint of 3 as the prompts were iterated with each pilot), the means in the final study were still weighted heavily towards overall high-meaning. It’s possible that none of this prompt experimentation made a difference and I just caught some extra low-meaning people coincidentally in some pilots. Table 1 below lists all prompts and means. Note that the continuous meaningfulness ratings were on a 1-7 scale, and the binary meaningfulness ratings were on a 0-1 scale. 26 Table 1. Pilot work story prompts. Story prompt Continuous scale criteria text Binary criteria text n/a n/a Pilot 1 In 500 words, tell me about your work. Pilot 2 In 500 words, tell me about your work. Pilot 3 Pilot 4 Pilot 5 Pilot 6 Imagine you are at a party, and somebody asks you "What do you do for a living?" How would you respond? In 500 words, tell me about your work. (Sunday) In 500 words, tell me all about your work. In 500 words, tell me all about your work. "I find my work meaningful." "I find my work personally meaningful" I find my work meaningful. I find my work meaningful. I find my work very meaningful. How meaningful do you find your work? [Meaningless, Not very meaningful, Generally meaningful to The most meaningful work I’ve ever done] 27 Mean Continuous Meaningfulness (Binary Meaningfulness) 5.5 (na) 5.0 (na) Overall, do you find your work very meaningful? 5.7 (.86) Overall, do you find your work very meaningful? Overall, do you find your work very meaningful? Overall, do you find your work very meaningful? 6.0 (1.00) 5.6 (.82) 5.6 (1.00) Table 1 (cont’d) Pilot 7 In 500 words, tell me all about your work. Pilot 8 In 500 words, tell me all about your work. Pilot 9 In 500 words, tell me all about your work. (unchanged) Final How meaningful do you find your work? (vs Most meaningful work I can imagine) Added a definition of work meaningfulness to consider. Added “Think of the most meaningful work you can imagine for yourself. Now, how meaningful do you find your current work?” (unchanged) Overall, do you find your work very meaningful? 4.8 (.81) Overall, do you find your work very meaningful? 3.2 (.55) Overall, do you find your work very meaningful? 4.55 (.77) (unchanged) 5.1 (.85) 28 Explicit meaningfulness story. Concerned that a neutral story prompt might fail to pick up any meaningfulness in tone or language, I collected a second mini-essay from each participant asking them to explicitly talk about the meaningfulness of their work. Participants were asked to write 250 words in response to the prompt “Overall, do you find your work very meaningful? Why or why not?” Although I ultimately found sufficient patterns in the generic story, these explicit meaningfulness stories proved highly useful for coming up with potential text themes to look for. Single-item meaningfulness catchall. I included a single, straightforward Likert-scale item to capture meaningfulness broadly on a continuous scale. Although the original intent was to provide a theory-agnostic catchall for capturing meaningfulness broadly, during the process of combatting the lack of variability in meaningfulness ratings (discussed above), I ultimately incorporated a definition of meaningfulness in this prompt. The full item starts with a definition of meaningfulness, drawn from Morrison, Walker, and DeShon (2016): “The idea of ‘work meaningfulness’ means different things to different people. For our purposes here, we say that work is meaningful when it feels connected to your deepest values, goals, and needs.” Note that the word phrase ‘deepest values’ was used in place of the original definition’s “higher-order values” for interpretability. Before completing the scale, participants were asked the question “Think of the most meaningful work you can imagine for yourself. Now, how meaningful do you find your current work?” Responses to this item were on a 7-point Likert scale, with the following scale points: ● 0 - Meaningless ● 1 - Not very meaningful ● 2 29 ● 3 - Mostly meaningful ● 4 ● 5 ● 6 - The most meaningful work I can imagine for myself. Self-report measures of meaningfulness. After providing the work story and explicit meaningfulness story, participants completed several traditional, Likert measures of work meaningfulness. The scores from these traditional measures were used as criteria together with human ratings (other-ratings, described below) of meaningfulness to inform the predictive algorithm. That is, the algorithm was developed to discover indicators in natural language that were associated with high and low scores on many different measures of work meaningfulness. My goal was for this NLP measure to not simply predict as well as any one existing meaningfulness measure, otherwise it would essentially be an NLP version of that measure, but instead to predict as well as multiple popular assessments of meaningfulness. The Work and Meaning Inventory. Participants completed the current most-cited Likert measure for assessing work meaningfulness: The Work and Meaning Inventory (Steger, Dik, & Duffy, 2012). This is a 10-item measure of experienced meaningfulness in work, divided into three factors: Greater-good motivations (“the degree to which people see that their effort at work makes a positive contribution and benefits others or society”), Meaning-making (i.e., seeing your work as something that helps you make sense of your life), and Positive meaning (“The degree to which people find their work to hold personal meaning, significance, or purpose”). The Comprehensive Meaningful Work Scale. Several prominent reviews of meaningfulness (e.g, Rosso et al., 2010; Lips-Wiersma and Wright, 2012; Barrick et al., 2012) have proposed that there may be many different pathways to finding a sense of meaning in work, 30 and that there may even be individual differences in which pathways actually provide meaning for different people (e.g., a mathematician may find meaning in self-efficacy, a nun may find meaning in self-transcendence). Although Rosso et al.’s (2010) pathway model seems to be the most dominant, based on citation counts, at present there is no measure through which I could capture the pathways to meaningfulness they propose. Lips-Wiersma and Wright’s (2012) Comprehensive Meaningful Work Scale, however, is based on a pathway model that is so similar to Rosso et al.’s (2010) model that Rosso et al. recently issued a corrigendum apologizing for not acknowledging it more seriously in their review (Rosso, Dekas, and Wrezniewski, 2011). Including Lips-Wiersma and Wright’s (2012)’s Comprehensive Meaningful Work Scale allowed me to collect information on where (in terms of existing theoretical categories) participants found meaning in their work. Again, this helped inform the themes I looked for in the text to assess overall meaningfulness and could allow future versions of the measure introduced here to use this criterion to assess both overall meaningfulness and which pathway(s) a particular person finds meaningfulness through. Affective commitment. In addition to the battery of meaningfulness measures, participants also completed a short measure of affective commitment (Allen and Meyer, 1990). This served as a construct validity check, as affective commitment has been shown to correlate with meaningfulness (Jiang & Johnson, 2012). Binary meaningfulness question. Suspecting that my relatively small sample size would limit the scale points my algorithm could predict (discussed in detail later), I included a simple, two-class binary meaningfulness question. Participants were asked the binary question “Overall, is your work very meaningful? Yes/No”. This binary “yes/no” meaningfulness question served 31 as a validation check on the human ratings of meaningfulness (discussed below), and as a backup target criterion for the algorithm. Human Ratings One of the concerns we discussed at the onset of this project was that asking for self- reports of meaningfulness could be tricky because each participant might define ‘meaningfulness’ differently for themselves. For example, anecdotally it seems like many lay- people equate the notion of meaningful work with altruistic work, even though in the literature altruism is only one of many pathways to meaningfulness (Rosso et al., 2010). To address these concerns about definitional consistency, I trained a team of raters to rate each story for meaningfulness according to a consistent definition. I also trained these raters to assess construal level as part of this study’s attempt to test the relationship between construal level and meaningfulness. In sum, raters produced judgments of both meaningfulness and construal level for each story. The other-rated scores of meaningfulness were tested in concert with the self-report scores to serve as potential prediction targets for the natural language algorithm. Other-rated meaningfulness. Each rater read the Morrison, Walker, and DeShon (2016) paper which proposes an integrative definition of meaningfulness. Following this, they were asked to write two 500 word essays about work (similar to those collected from participants): one essay about the most meaningful work they’d ever done, and one about the most meaningless work they’d ever done. They were also asked to include a paragraph after each story introspecting about how they talked about each work experience. Finally, raters were walked through examples of high and low meaningfulness stories (from pilots). Raters rated each work story for meaningfulness on a 1 to 7-point scale, and also made an 32 overall (binary) classification decision for each story as either meaningful or not meaningful. Raters were also asked to provide notes explaining their ratings. Other-rated construal level. In addition to rating for general meaningfulness, raters also rated stories for their level of construal. That is, the extent to which the participants talked about their work abstractly or concretely (i.e., spoke about the forest or got lost in the trees). Collecting a separate rating of construal level was not directly related to my goal of creating a Natural Language Measure for meaningfulness (I could have created the meaningfulness measure without it). However, I did this opportunistically, as this project provided an opportunity to test the theoretical relationship between construal level and meaningfulness. I expected that level of construal would be correlated with the amount of meaningfulness a person indicated getting from their work, and with a numeric rating of construal level provided by raters I could compare it to ratings of meaningfulness and test the relationship. How construal level was rated. Raters were educated on the notion of construal level with reading assignments and response papers. Each rater read the original Trope and Liberman (2010) paper on construal level and wrote a response paper on the concept. They were then walked through examples of spotting high/low construal language and themes in the work stories (drawn from pilots). Raters provided two ratings of construal level for each work story: a 1-7 rating (from concrete to abstract), and a binary overall classification of either high or low construal. Machine Ratings Several of the hypotheses required the calculation of endogenous statistics based on features of the collected story texts, to be used in hypothesis testing. Abstract words. To test Hypothesis 3 (“People will describe their work using more 33 abstract words when they find their work meaningful.”), I used a publicly-available dictionary by Brysbaert, Warriner, and Kuperman (2014) that includes 40,000 English words rated for concreteness of language. Using this dictionary, I was able to ‘look up’ the concreteness score of each word used in a particular work story (provided that the word was included in the dictionary), then add those scores together to form a cumulative concreteness score for the entire story. I then reversed-scored this concreteness score, so that I could treat it as an abstraction score. In summary, the abstract score for a given story was calculated by summing the abstractness (reverse-scored concreteness) scores for each word in the story and then dividing that sum by the total number of words in that story. Sentiment. Hypothesis 4 (“People will describe their work using more positive sentiment when they find their work meaningful.”) involved testing for positive sentiment. To accomplish this, I computed sentiment scores for each work story using the SentimentIntensityAnalyzer function included within Python’s Natural Language Toolkit (NLTK) package. Side note: This SentimentIntensityAnalyzer software tool was built using a similar process I used here to construct my meaningfulness NLP measure: a massive sample of text was collected and scored by human raters for positive/negative sentiment. These text and sentiment score examples were then used to train a machine learning model, which could then be applied to new bodies of text (like these work stories) to evaluate their sentiment. Sentiment scores were computed for each work story, and these sentiment scores were tested for their relationship with all collected measures of meaningfulness using a Pearson correlation test. The SentimentIntensityAnalyzer package returns four scores for sentiment: an amount of positive sentiment, amount of negative sentiment, amount of neutral sentiment, and compound sentiment intensity. Thus, I was able to test the correlation between meaning and 34 positive sentiment directly (versus other packages, which would require testing against ‘overall sentiment’ on a continuum from negative to neutral to positive). Parts of speech. To test Podolny et al.’s (2004) assertions that highly-meaningful stories will involve more first-person possessive pronouns and action verbs (Hypotheses 1-2), I used Python’s NLTK package to ‘tag’ every word in every work story with its part-of-speech, using the standard Penn Treebank part-of-speech tagging convention (Santorini, 1990). For example, the word ‘work’ became either ‘work_VERB’ or ‘work_NOUN’ depending on how it was used. The part of speech interpreter included in Python’s NLTK has some ability to be context-aware like this and attempts to determine a word’s part of speech based on the word itself and where/how it appears in the text. Note that the actual part-of-speech tags are more precise than simply “VERB,” and use special abbreviations to denote particular parts of speech (e.g., VBP = Verb, non-3rd person singular present). I counted the occurrence of each part of speech proposed in Podolny et al.’s (2004) in each story. In terms of the Penn Treebank’s part-of-speech abbreviations, for action verbs I counted occurrences of VBPs (Verb, non-3rd person singular present) and for first-person possessive pronouns I counted PRP and PRP$s (personal pronouns and possessive pronouns). Thus, a count score (weighted by story word count) was computed for each of the targeted parts of speech for each story. Part-of-speech scores were tested for their relationship with all collected measures of meaningfulness using a Pearson correlation test. Developing the Algorithm Again, the second goal of this study was to develop an algorithm that looks for a set of language features that — when found in a description of work — reliably predict whether a person finds their work meaningful or not. NLP measure development involves a great deal of 35 discovery and trial and error, but I will explain here the general process that I went through to discover these features. Choosing the optimization parameter. In machine learning, an optimization parameter gives the learning algorithm a goal to shoot for, or more accurately a ‘success’ criteria. The simplest and most common approach to optimization parameters is to use a binary optimization parameter: a zero is failure, a one is success. I collected two of these binary parameters: A binary self-report “Overall, do you find your work very meaningful?”, and a binary other-report created by raters (i.e., “Does this person find their work meaningful? Yes/No”). At the study onset, there was interest in using the human ratings of meaningfulness as my optimization parameter. However, I discovered that these human ratings had poor correlations with established self-report measures of meaningfulness. Thus, doubting their construct validity, I chose to use the binary self-report of meaningfulness instead, as this had satisfactory correlations with all self-reported measures of meaningfulness as well as the construct validity check measure (affective commitment). See Table 2 for a summary of the considered optimization parameters. 36 Table 2. Correlations between potential optimization parameters and collected measures of meaningfulness and commitment. All correlations are significant with p < .001 Optimization Parameter Self-Report Continuous Meaningfulness1 WAMI2 Affective Commitment .35 .78 .69 .35 Human-rated binary meaningfulness Self-report binary meaningfulness 1. Self-reported Continuous Meaningfulness = “How meaningful do you find your current work?” From “Meaningless” to “The most meaningful work I can imagine for myself 2. WAMI = Work and Meaning Inventory (Steger et al., 2012) .32 .57 37 Not enough power for a continuous optimization parameter. Although I collected continuous measures of meaningfulness (on a 1-7 scale), training a text classification algorithm to predict each point on a 7 point scale would have required (according to best practices) at least 40-280 participants per scale point to achieve good accuracy, so a 7-point scale would have necessitated a sample size of n = 280-1960 — and that assumes that the variance would be evenly distributed, which as we’ve seen in this study seems tricky to achieve when measuring meaningfulness (Figueroa et al., 2012). Even still, I did attempt to point the algorithm at the collected continuous parameters, just to see what would happen. As expected, it’s accuracy rate plummeted abysmally (though interestingly, still better than chance). Creating a training set. Following best practices in NLP measure development, I randomly assigned half of my work stories to a “training set” and the other half to a “testing set.” I used the “training set” to develop the measure. Once a set of language features was discovered that predicted meaningfulness scores well on the training set, I tested these features on the “testing set.” Developing the measure on a separate set of stories, and then testing the measure on a “fresh” set of stories in this way helps ensure the measure’s generalizability and guard against over-fitting the features to the data (Shan, Wang, Chen, 2015). Version 1: Bag of words. In any machine learning model, one must create a feature vector. The purpose of a feature vector is to quantify features of your sample that may originally be nominal or descriptive. For example, a simple feature vector may contain counts of occurrences of the words “happy” and “sad” contained in a text (this approach of searching simply for word occurrences is called a “bag of words” approach). Initially, I used Python’s SciKit Learn package (a well-regarded set of machine learning functions for Python) to treat all words in all work stories as features (with the key being the word, and the value being the 38 number of occurrences of that word in the document). I also added phrases of 2 and 3-word length to this feature vector. I then used an XGBoost classifier to determine the most predictive words and phrases in the text. Prediction scores from this model were good. Generally, this model could predict self- reported meaningfulness with an accuracy of 82%. However, the discovered ‘most predictive’ features often did not seem to be theoretically meaningful. Although this model was statistically sensible, it did not seem to shed much theoretical light on the meaningfulness construct as I’d hoped. Additionally, by this point, I had anecdotally observed while reading the collected work stories several seemingly clear differentiating features between high-meaning and low-meaning stories that the simple bag-of-words model wasn’t able to incorporate easily (because they were more complex patterns, like ‘first words contain...’). Desiring a more defensible, more flexible, more theoretically-driven predictive model than the ‘black box’ bag of words I had created, I switched approaches. Version 2: A theory-driven model. My first algorithm employed generic machine learning tools to build a text classifier. In my second and ultimately more successful algorithm, I used tools designed specifically for text classification problems (versus machine learning in general). Rather than treating all words in all stories as potential predictive features, this new algorithm was built using only a handful of highly-discriminating features (i.e., words and phrases that were conspicuously present in meaningful stories, and conspicuously absent from meaningless stories). A NaiveBayes machine learning classifier then tested the features I suggested to it to determine how powerful they were in predicting ratings of meaningfulness. Creating a search function Abandoning automatic feature detection forced me to use a more manual search process for potential language features that could signal high or low- 39 meaningfulness. To aid in this search, I created a Python function to accept a regular expression (regex) as a pattern in addition to simple words and phrases. Regular expressions are a somewhat-universal programming syntax for defining advanced search patterns, and incorporating them allowed me to investigate more complex patterns as potential features (e.g., phrases in a certain position, and phrases with some wording variability; Thompson, 1968). This regex investigator function searched for whatever regular expression pattern I passed it in each story, and returned the number of occurrences of that pattern in high-meaning stories, relative to the number of occurrences in low-meaning stories. This ratio represented a difference delta of sorts, either positive or negative. Individual story length was also included in this calculation, to ensure that the search didn’t conclude that a pattern was more frequent, simply because a story was longer. This function was in some ways a manual version of what typical bag-of-words feature searches do automatically, but made more flexible by regular expressions, and more preliminary in its conclusions. Searching for features. Armed with the ability to get a quick ‘discrimination ratio’ for any given pattern, I began reading the stories and searching for patterns inspired by meaningful work theory. When I discovered a highly-discriminating pattern, I added it to a custom, from- scratch vector of features, and retested the classification algorithm for accuracy. Features that improved the predictive accuracy score were retained; features that did not improve the accuracy score were excluded. Note: the specific ‘most predictive’ new language features discovered through this process are discussed in detail in the Results section. Naive Bayes classifier. While my first algorithm used XGBoost, a well-regarded ‘boosted trees’ classifier widely recommended for overall machine learning tasks, this second 40 algorithm used a Naive Bayes (NB) classifier, which seems to be used more commonly in text classification projects in particular (Naive Bayes Classifier, 2018). Additionally, because the NB classifier is ‘baked-in’ to Python’s NLTK package, it incorporates several features specifically designed to aid the development of text classification algorithms that were absent from more general Machine Learning packages like XGboost. Cross-validation. Once I had exhausted ideas for new, theoretically-inspired patterns to look for in the text, and was also satisfied with the accuracy (predictive validity) of the algorithm, I performed a final validation of the algorithm to ensure that its accuracy was relatively stable. As mentioned previously, I split my story data (according to standard practice) into test and training sets, which comprised roughly 66% and 34% of the data, respectively. The catch here is that the split is (intentionally) random. Each time the algorithm is run, different stories are used for testing and training. For this reason, classification algorithms vary in accuracy depending on what training and test data they use. To account for this variability issue and get a ‘stable’ estimate of how accurately an algorithm will be able to predict when pointed at new data, standard practice in machine learning is to validate classification algorithms using a “K-Fold” test (K-Fold Cross-Validation, 2018). In a K-Fold test, the data is split into several sub-samples, then re-trained and re-tested on each sample (Brownlee, 2018). The average accuracy of each fold is then reported as the overall ‘skill’ of the classifier. Construal Level and Meaningfulness To test the relationship between other-rated construal level and meaningfulness, I performed a Pearson correlation test between the averaged other-rated Construal Level score for 41 each work story and all collected measures of meaningfulness. Recall that testing construal against multiple measures was not an act of just “throwing it at the wall and seeing what sticks.” This multi-criteria test was central to my argument that construal level is a universal component of meaningfulness: that is, it should correlate significantly with all validated measures of meaningfulness. 42 RESULTS Table 3 displays the correlations between all study variables. In short, the language- related hypotheses were partially supported, the algorithm-related hypotheses were fully supported, and the construal level hypotheses were fully supported. Or, only about half of the language features I expected (in advance) to be signals of meaningfulness worked, but I discovered some brand-new language features that predict meaningfulness, and construal level and meaningfulness seem to be related. 43 Table 3. Correlations of all variables. 1 Other-rated meaningfulness 2 WAMI1 1 -- 2 .40** -- 3 3 CWMS2 .34** .81** -- 4 4 5 Self-reported meaningfulness Self-reported meaningfulness (binary) 6 CL .39** .79** .66** -- .26** .78** .62** .70** -- .74** .21** .16** .20* .08 -- 5 6 7 7 Personal Pronouns .31** -.04 -.02 8 Action verbs .24** -.02 9 Abstract Language .05 .11 .01 .09 -.03 -.02 .02 -.05 -.06 .13 .48** -- .26** .63** -- .09 -.01 -.02 -- 10 Positive Sentiment .52** .20* .17* .22** .11 .57** .34** .23** .16* -- 8 9 10 11 12 11 Affective Commitment 12 “I am a…” 13 Algorithm-predicted meaning. .36** .74** .79** .62** .57** .21** -.00 -.02 -.01 .21** -- .12 .10 .19** .20* .31** .18* .18* .29** .26** .35** .31** .09 -.09 -.10 .02 .03 -.02 .23 -- -.17* -.03 .01 .24** .24** * p < .05, ** p < .01, 1. WAMI = Work And Meaning Inventory (Steger et al., 2012), 2CWMS = Comprehensive Work as Meaning Inventory (Lips-Wiersma & Wright, 2012). 44 Goal 1: Discovering the Language of Meaningful Work First-person pronouns and action verbs. Recall that Podolny et al. (2004) interpreted the apparent meaningfulness conveyed in work stories collected in Studs Terkel’s (1972) book Working. That is, they made a judgment about how much meaning each story author seemed to get from their work, and then those judgments were used to inform their proposals about the language features associated with highly-meaningful (or meaningless) work stories. They could not ask the authors of the stories how meaningful they found their work, so they attempted to judge it themselves. Thus effectively, Podolny et al.’s (2004) propositions could be taken as saying “these are the language features that make a person sound like they get a lot of meaning from their work.” In this study, Hypothesis 1 and Hypothesis 2 tested two of the language features that Podolny et al.’s (2004) suggested would be related to a sense of meaningfulness in work: personal pronoun use and action verb use. H1: People will describe their work using more first-person pronouns when they find their work meaningful. H2: People will describe their work using more action verbs when they find their work meaningful. I tested each of these hypotheses by performing a Pearson’s correlation test on the relationship between personal pronouns and (separately) action verbs with all collected measures of meaningfulness. Tellingly, the strongest (and only significant) correlations were found 45 between Podolny et al.’s (2004) suggested language features and other-ratings of meaningfulness. Partially supporting H1 and H2, personal pronouns and action verbs were both significantly related to other-ratings of meaningfulness (personal pronouns: r = .31, p < .001; action verbs: r = .24, p < .001). None of the correlations between personal-pronouns/action verbs and any self-report of meaningfulness were significant. These findings seem to support Podolny et al.’s (2004) notion that first-person pronouns and action verbs are related to a work story sounding meaningful to external human raters, but fail to support Podolny et al.’s (2004) suggestion that these features relate to ‘actual’, self- reported feelings of meaningfulness. Abstract language. Consistent with the relationship proposed in this study between construal-level and meaningfulness, Hypothesis 3 proposed that people would use more abstract words (at an individual word-level resolution) when discussing work that they found meaningful. H3: People will describe their work using more abstract words when they find their work meaningful. Using the concreteness dictionary provided by Brysbaert et al. (2014) to compute an abstract word score for each work story, I performed a Pearson correlation test on these abstraction scores to determine whether they related significantly to any of the collected other/self-report measures of meaningfulness. No relationships were significant with any measures of meaningfulness. Therefore, Hypothesis 3 was not supported. People do not appear to use more abstract individual word choices when they find their work meaningful. Given the support found for the related construal-level hypotheses (described below), it’s possible that this 46 finding reflected the appropriateness (or lack thereof) of the concreteness dictionary approach employed for this step more than an underlying theoretical failure. Positive sentiment. Hypothesis 4 tested the relationship between sentiment and meaningfulness. H4: People will describe their work using more positive sentiment when they find their work meaningful. To test this relationship, I used a sentiment analysis tool included in Python’s Natural Language Toolkit, computed a sentiment score for each story, and tested those scores against all collected measures of meaningfulness using a Pearson correlation test. A significant relationship was found between positive sentiment and all collected measures of meaningfulness (see Table 4). Therefore, Hypothesis 4 was fully supported. However, it is notable that positive sentiment showed only small correlations with ‘actual’ self-reported meaningfulness, suggesting that positive sentiment may not be necessary for meaningfulness. However, positive sentiment seems to have a large relationship (.52) with making work stories sound meaningful to external human raters. 47 Table 4. The relationship between positive sentiment and meaningfulness. WAMI1 Single-Item Meaningfulness Catchall CWMS2 Other-rated Meaningfulness Positive Sentiment .20* .22** .17* .52*** *p < .05 **p < .01 ***p < .001, 1WAMI = Work And Meaning Inventory (Steger et al., 2012), 2CWMS = Comprehensive Work as Meaning Inventory (Lips-Wiersma & Wright, 2012). 48 All-new linguistic signals of meaningfulness. Recall that a primary goal of this study was not just to test existing ideas about how meaningfulness appears in language (i.e., those from Podolony et al.; 2004), but to discover brand-new signals of meaningfulness in language with the aid of machine learning tools. As mentioned above, the process of discovering new linguistic signals of meaningfulness was largely one of trial-and-error. I tried adding over 100 different language patterns to the algorithm before arriving at a relatively simple predictive model with good accuracy. What follows is a description of the language features that worked. That is, language features that through trial-and-error were found to have the largest positive effect on the algorithm’s ability to predict meaningfulness ratings. A note on communicating the significance of these features: In the case of one extremely strong and reliable feature, I was able to illustrate its relationship as a significant correlation. However, for the remaining new language features, although they improved the accuracy of the algorithm, they did not occur commonly enough overall to show even a weak correlation with meaningfulness measures on their own. Trying to illustrate the validity of these language features using correlation tests would be like trying to correlate the color “green” with a scale that ranged from “Orange” to “Apple”. If the fruit is green, it’s definitely an apple rather than an orange, so “color = green” is a good predictor of apple-ness, but enough apples are red to muffle the correlation between “green” and “apple” in most apple samples. Thus, I have indicated the predictive power of each of the new language features discovered by listing their likelihood ratios as determined by the ML algorithm (explained in detail below). All that said, here are the new linguistic signals of meaningfulness uncovered by this study, and theoretical support to help explain why they might help predict meaningfulness as 49 well as they do. Note that the predictive power ratios for these features (displayed as Machine Learning ‘likelihood ratios’) are listed in Table 6 and Table 7. Identity statements. When introducing each of their work stories, participants used a variety of “I-statements”. For example, “I work at...”, “I work as...”, etc. There was a manageably-finite number of these intros (e.g., there were 34 different first two-word combinations, out of 194 stories). One of the most obvious and face-valid language differences between low-meaning and high-meaning stories was that stories with lower meaningfulness scores (everything less than the highest meaningfulness allowed by the continuous single-item scale) tended to begin their stories with variations like “I work at…, I work for..., I work as…”. However, stories with extremely high self-reported meaningfulness (max scale value of 7, labeled “the most meaningful work I can imagine for myself”) were much more likely to begin with the words “I am...”. To help you understand the starkness of this difference, only 12% of stories with a self- reported meaningfulness of anything less than 7 (the highest possible) began with “I am a...”. In comparison, 42% of the stories with the highest-possible meaningfulness score (7) began with “I am…”. This could suggest that those who find their work extremely meaningful incorporate their work more deeply into their identity. Interestingly, the notion of identity investment being associated with meaningful work is central to Podolny’s et al. (2004) propositions. This seems to suggest that Podolny et al.’s (2004) overall theory was highly prescient, but perhaps (based on the findings in this study) the particular language features they suggested to operationalize their theory may need to be refined further. The specific pattern used to teach the algorithm how to look for these types of ‘identity 50 statements’ at the beginning of the work story is below. As this pattern was consistent enough to show up in a correlation test, the correlation between the appearance of this pattern and self/other-reports of meaningfulness, as well as with affective commitment (the construct validity check) is shown in Table 5. To illustrate how especially powerful this “I am a…” language feature is for predicting extremes of meaning, I have also included (in Table 5) correlations between the identity statements pattern and meaningfulness in a polarized version of the Work Stories Corpus that included only extremely high (“The most meaningful work I can imagine for myself”) and extremely low (“Meaningless” and “Not very meaningful”) self-reported meaningfulness stories. In this polarized dataset, the correlation between “I am” language and meaningfulness doubles for most measures of meaningfulness. Regex search pattern for ‘identity words’: ^(I am a |I'm a |I am an |I'm an ) 51 Table 5. The relationship between “I am” language and meaningfulness. WAMI1 Single-Item Meaningfulness Catchall CWMS2 Other-rated Meaningfulness Correlation with “I am a” language. (full dataset) Correlation with “I am a” language. (polarized dataset) .19** .31*** .20** .12 (ns) .41*** .43*** .40*** .36** **p < .01 *** p < .001, 1WAMI = Work And Meaning Inventory (Steger et al., 2012), 2CWMS = Comprehensive Work as Meaning Inventory (Lips-Wiersma & Wright, 2012). 52 Temporariness Words. If you ask somebody what they do for a living, and they reply “Well currently...” it’s probably safe to assume that they don’t plan on staying in their current job, perhaps because they don’t find it satisfying or meaningful. I noticed that some of the work stories started with statements like this, and so I created a search pattern to look for signals of temporariness, or intention not to remain in the job. This feature pattern helped the algorithm identify low meaningfulness stories. Regex search pattern for ‘temporariness words’: ^((I am currently)\s[^(a)])|^(My current job)|I currently work\s[^(as)] Work centrality language. Work centrality is a concept often discussed as being related to meaningfulness (Rosso et al., 2010). Workers with high work centrality consider their work a centrally-important aspect of their life. Positing that workers with higher meaning might have higher work centrality and thus higher preoccupation with their work, I created a pattern to look for potential indicators of work preoccupation. This pattern is small, and likely does not cover the full range of words likely associated with work centrality (this could be developed further in a future project). Nonetheless, it did improve the accuracy of the algorithm consistently, but only by a small amount (see Table 6 and Table 7). Regex search pattern for ‘work centrality words’: worry|care Pace. Noticing that some of the low-meaning stories seemed to lament the pace of their work, I decided to incorporate the word “pace” into the model. It turned out to be a good 53 differentiator, showing roughly a 2:1 ratio of occurrences in low-meaning stories as high meaning (see Table 6). Although mere observation inspired this feature, it’s possible that those who focus on talking about the pace of their work are experiencing job demands that exceed resources, and are experiencing job strain, which may be negatively associated with meaningfulness (May, Gilson, & Harter, 2004). Atheoretical Patterns. The following words were found to be good differentiators between high and low-meaning stories, however theoretical support for them is, at best, suggested post-hoc. Together. Surprisingly, the word “together” was much more associated with low-meaning stories than high stories. I had originally investigated it figuring that it would signal belongingness, which has been suggested many times as a pathway to meaningful work (Lips- Wiersma & Wright, 2012; Rosso et al., 2010). But I quickly discovered that it helped predict low meaning stories. Upon further investigation, it seems that indeed phrases that refer to bringing people together “e.g., brings people together, brings them together, brings us together” are typically found in high-meaning stories. However, phrases that imply bringing pieces of something together (e.g., “several departments together”) are associated with low meaning stories. The most predictive instantiation (see Table 7) of this was a pattern that looked only for non-people-related uses of “together,” and used those to signal low-meaningfulness. If asked to posit a guess about this association, I suspect that focusing on things that must be brought “together” you are in a way thinking of them as separate, which would be more consistent with a low-construal perspective, which Morrison et al. (2016) proposed would be related to low-meaningfulness. Note that there is general acceptance in the construal level 54 literature for the notion that this kind of focusing on separate pieces (vs. the whole) is related to low construal level. An especially interesting and relevant example of this is Burgoon, Henderson, and Markman (2013), who showcase studies that have linked abstract (high- construal) thinking to better performance on Gestalt completion tasks (i.e., in a high construal state people see ‘the whole’ more readily than the pieces). Given the distinction between bringing pieces together and bringing people together, the most predictive patterns taught the algorithm to treat the two uses of ‘together’ as separate features. Regex search pattern for ‘people together’: people together|them together Regex search pattern for ‘pieces together’: [^people|them] together “Have to” versus “Get to”. An anecdotal observation I’ve made since beginning my studies in organizational psychology is that people who enjoy something, especially their jobs, tend to say things like “I get to do math all day” whereas people who dislike an activity feel burdened by it, and say things like “I have to do math all day.” I have no theoretical reason for this, although it did occur to me after reading Kahn’s (1990) vignettes of work engagement/disengagement. In any case, I found in this study that the phrase “have to” is a predictive signal of low-meaningfulness and “get to” is a (weaker) predictor of high- meaningfulness. I will note, also anecdotally, that the use of “get to” showed up frequently in people’s separate descriptions of explicitly why they find their work meaningful, although these 55 were not used for analysis. Summary of new language features. Table 6 and Table 7 show each pattern used in the final algorithm, grouped by whether they predict high or low meaningfulness, together with ratio scores that represent their ability to differentiate between high and low meaningfulness. A note on interpreting these ratios: In machine learning, these ratios are called ‘likelihood ratios.’ A likelihood ratio of “5:1” means that a feature occurs 5 times more often in one case than another. Note also that there are also “negative features” grouped under “Text does NOT contain…” headings. This is intended to indicate that the absence of this feature is predictive of a particular meaningfulness decision. 56 Table 6. High meaningfulness features. Importance of language features in predicting high self- reported overall work meaningfulness. Feature Does contain... Identity words Work centrality words Does NOT contain... Pace Have to Importance Ratio 2.9:1 1.4:1 1.4:1 1.1:0 57 Table 7. Low meaningfulness features. Importance of identified language features in predicting low self-reported overall work meaningfulness. Feature Does contain... Pace Temporariness Pieces together Does NOT contain... Work centrality Identity words Have to Importance Ratio 3.9:1 2.5:1 1.2:1 1.2:1 1.2:0 1.1:0 58 Goal 2: Create a Natural Language Measure of Work Meaningfulness As a second contribution, this study introduces a new, natural-language-based measure of meaningfulness. By using a predictive model comprised of the language features outlined above, this algorithm is able to predict how meaningful people find their work from how they write about it. Specifically, this algorithm is able to predict self-reported work meaningfulness (assessed via the item “Overall, is your work very meaningful? Yes or no?”) with 85% accuracy. Cross-validation. A K-fold cross-validation test was performed, in which the text sample was divided randomly into different sub-samples so that the algorithm could be re-tested on ‘different’ bodies of text. This K-fold test reported that the overall accuracy (or ‘skill’) of my meaningfulness algorithm was a ‘stable’ 85%. This means that when presented with three sets of work stories that the algorithm had never seen before, it correctly predicted whether the author of those stories found their work meaningful (or not) with an average accuracy of 85%. Relationship with collected measures. Hypothesis 5 proposed that ratings of meaningfulness produced by the natural language algorithm would correlate significantly with other measures of meaningfulness. And the related hypothesis 6 proposed that algorithm-rated meaningfulness would correlate significantly with a known outcome of meaningfulness (affective commitment). To test these hypotheses, I conducted Pearson correlation tests on the relationship between the ‘probability of meaningfulness’ score produced by the algorithm and other measures of meaningfulness and affective commitment. Results show that, except for human ratings of meaningfulness, these hypotheses were fully supported (see Table 8). Human-rated meaningfulness. The algorithm’s probability of meaningfulness ratings failed to show a relationship with human-ratings of meaningfulness, with a non-significant correlation of .10. Given the low correlations between human-rated meaningfulness and ‘actual’ 59 self-reported meaningfulness, it is not particularly surprising that human ratings failed to correlate with algorithm-predicted meaningfulness (which was optimized to predict self-reported meaningfulness). Self-reported meaningfulness (single item). The algorithm’s ratings showed a significant, positive correlation (.35, p < .001) with self-reported meaningfulness assessed with the “How meaningful do you find your current work?” item. Self-reported meaningfulness (binary). The algorithm’s ratings showed a significant, positive correlation (.31, p < .001) with self-reported meaningfulness assessed with the “Overall, I find my work very meaningful” item. Work and Meaning Inventory (WAMI). The algorithm’s ratings showed a significant, positive correlation (.29, p < .001) with self-reported meaningfulness assessed with the current most-cited Likert meaningfulness measure, Steger et al.’s (2012) Work and Meaning Inventory. Comprehensive Meaningful Work Scale (CMWS). The algorithm’s ratings showed a significant, positive correlation (.26, p < .001) with self-reported meaningfulness assessed with Lips-Wiersma and Wright’s (2012) Comprehensive Meaningful Work Scale. Affective commitment. The algorithm’s ratings showed a significant, positive correlation (.24, p < .001) with self-reported affective commitment as measured by Allen and Meyer’s (1990) affective commitment scale. 60 Table 8. Correlations algorithm-predicted probability of meaningfulness and collected measures of meaningfulness. Hypotheses Meaningfulness measure Correlation with algorithm- predicted probability of meaningfulness H5-A H5-B H5-C H5-D H5-E H6 Human ratings of meaningfulness Self-reported meaningfulness (1-7) Self-reported meaningfulness (binary) WAMI1 CMWS2 Affective Commitment .10ns .35*** .31*** .29*** .26*** .24*** *** p < .001, 1WAMI = Work And Meaning Inventory (Steger et al., 2012), 2CWMS = Comprehensive Work as Meaning Inventory (Lips-Wiersma & Wright, 2012). 61 Goal 3: Testing Construal Level & Meaningfulness As an additional theoretical contribution of this paper, I tested the relationship between perceptions of work meaningfulness and construal level, which Morrison et al. (2016) proposed as a potential unifying mechanism to the somewhat fragmented theoretical literature surrounding work meaningfulness. As detailed above, trained raters rated each work story for the overall construal level of the language used in the story. In Hypotheses 8 and 9, these other ratings of construal level were tested via Pearson’s correlation test for a relationship with self-reported and other-rated meaningfulness. H8: Construal level will have a significant, positive relationship with meaningfulness. H9: Construal level will have a significant, positive relationship with Affective Commitment. Both of these construal-related hypotheses were fully supported. By any measure, construal level seems to be positively related to perceptions of work as meaningful. When people find their work meaningful, they speak about it more in terms of its overall, zoomed-out qualities. When people find their work meaningless, they speak about it more in terms of its concrete details. 62 Table 9. Correlations between other-rated construal level and meaningfulness measures. Hypotheses Meaningfulness measure H8-A H8-B H8-C H8-D H9 Other-rated meaningfulness Continuous single item WAMI1 CMWS2 Affective Commitment Other-rated Construal Level .74*** .20*** .21** .16** .21** ** p < .005 *** p < .001, 1WAMI = Work And Meaning Inventory (Steger et al., 2012), 2CWMS = Comprehensive Work as Meaning Inventory (Lips-Wiersma & Wright, 2012). 63 On the validity of other-rated construal level. A note on the validity of the other-rated construal level in light of the apparent lack of validity of other-rated meaningfulness: Recall that part of the motivation for this study was the lack of definitional clarity in the meaningfulness literature. Without a clear consensus on what meaningfulness is, it was difficult to teach others to recognize it. The literature on construal level, by contrast, is nearly in the opposite state as the meaningfulness literature: it is clearly and consistently defined, and the concept itself is objective and easy to manipulate in accordance with its definition (see Wakslak, Liberman, and Trope [2007] for a review). Additionally, many instantiations of construal level have been identified in organizational research, making it easy to generate work-related examples (Wiesenfeld, Reyt, Brockner, & Trope, 2017). All of this makes construal level much easier to recognize and teach than meaningfulness. Note also that construal level correlated significantly (.21, p < .001) with the convergent validity check measure (affective commitment). This finding that construal level relates to all collected measures of work meaningfulness, as well as to affective commitment has large implications for the meaningfulness literature. Construal level may indeed be ready for further investigation as a core ‘mechanism of meaning,’ as Morrison et. al (2016) proposed. Joint Relationships In addition to the bivariate correlational results provided thus far, I also tested the incremental validity of the significant variables from related hypotheses. These variables have shown that they are significantly related to our outcomes of interest alone, but do they offer additional predictive power when combined together using a Multiple Regression (MR) model? What best predicts whether a story sounds meaningful? The other-rated 64 meaningfulness scores provided by human raters are, effectively, ratings of how meaningful each story sounded. Although these ratings proved to be too unrelated to ‘actual’, self-reported meaningfulness to be useful to algorithm development, it may still be useful to understand what variables contribute to making somebody sound (to others) like they find their work meaningful. Recall that the use of first-person pronouns (e.g., I, we) and action verbs (e.g., help, save, work), and positive sentiment were positively correlated with ratings of meaningfulness provided by external raters. Multiple regression analysis was used to test which of the language variables predicted other-rated meaningfulness. Results of the regression indicated that although action verbs alone (β = 110.53, R2 = .05, p < .001) and personal pronouns alone (β = 79.33, R2 = .09, p < .001) provide some predictive power, the significance of action verbs falls away when the two are combined. Personal pronouns, however, still remain significant in the combined model (β = 68.25, p < .01), which predicts only as well as personal pronouns alone (R2 = .09). When positive sentiment is added to the model, the significance of both action verbs and pronouns falls away, with positive sentiment showing the only significant slope (β = 16.58, p < .001). Ultimately, a model excluding action verbs and combining personal pronouns and positive sentiment accounted for the most variance. In this model, both positive sentiment (β = 16.62, p < .001) and personal pronouns (β = 39.16, p < .05) show significant incremental validity, and the full model accounts for (R2 = .29) of the variance in other-reported meaningfulness. Summary. In sum, the best predictor of whether a person sounds like they find their work meaningful (to others) is whether they talk about it with a positive tone of (written) voice, and use lots of personal pronouns (I, me, you). What best predicts self-reported meaningfulness?While positive sentiment and personal pronouns together accounted for 29% of the variance in other-rated meaningfulness, the 65 same model only accounted for 5% of the variance in ‘actual’ self-reported meaningfulness (rated on a continuous scale from 1 to 7), and only positive sentiment remained significant (β = 9.99, p < .001). So then, which of the variables collected ‘traditionally’ (outside the machine learning algorithm) best predicts self-reported meaningfulness? I tested several combinations of language features (i.e., positive sentiment, abstract word score, construal level, identity words, personal pronouns, and action verbs). The best model explained (R2 = .11) of the variance in self-reported meaningfulness by combining positive sentiment (β = 8.80, p < .001) with identity words (β = 0.21, p < .001). Based on this result, it seems that the best indicators of self-reported meaningfulness are whether somebody writes about work with a positive tone and also starts their work story with an identity statement (e.g., “I am a…”). Algorithm-predicted meaning. As a final consideration, note that a regression model consisting only of the “probability of meaning” score produced by this study’s NLP algorithm explained more of the variance in self-reported meaningfulness than did the best model of independent language features (β = 4.29, p < .001, R2 = .15). When positive sentiment was added to this model (β = 6.83, p < .01), variance explained improved to R2 = .18, suggesting that a future version of the algorithm might benefit from incorporating positive sentiment into its probability estimates. 66 DISCUSSION Contribution 1: Language Reveals Meaningfulness The core assertion of this study, that people’s sense of meaningfulness comes through in how they talk about their work, seems to hold true. There are linguistic indicators of work meaningfulness, and it’s possible to use such indicators to predict from speech alone whether a person finds their work meaningful. Validation of Podolny et al.’s (2004) theory. The study also served as a partial test of Pdonlony et al.’s (2004) study on linguistic indicators of meaningful work. And results suggest that Podolny et al.’s (2004) conclusions have merit: first-person pronouns and action verbs (e.g., “I file TPS reports” vs. “TPS reports are filed”) do seem to have a role in creating the impression of meaningful work. This finding itself may have implications about what people think the term ‘meaningfulness’ means for others. Theoretical implications. Given the new language features found to aid in the prediction of self-reported meaningfulness, it seems like identity (“I am…”) and work centrality (“worry/care”) are worthy of further investigation for their relationship with feelings of meaningfulness in work. Likewise, notions of job strain (“pace”) and a sense of burden (“have to”) may deserve further investigation for their relationship with feelings of meaninglessness in work. The finding that words emphasizing separate pieces are associated with a sense of low- meaningfulness seems to provide further support for the proposition in this study that a low- construal of work (seeing it for its separate, concrete details) is associated with low meaningfulness. 67 Future directions. The language features identified in this study are just a beginning. They are by no means comprehensive. There are likely myriad other linguistic signals of meaningfulness left to be discovered. In this section, I will suggest some promising potential avenues that could lead to new and perhaps even more-predictive linguistic signals. Further mine Podolny et al. (2004). Podolny et al. (2004) proposed other operationalizations of their notion of self-distance from work being associated with meaningfulness. For example, the proposed that “you-language” phrases like “You’re expected to” signal a sense of self-distance from work. Although these were not included in my initial hypotheses, I did run some preliminary (word frequency) tests on a couple of them to test for their ability to discriminate between high and low meaning. On the first attempt, they were not powerful enough on their own to significantly contribute to the algorithm’s predictive accuracy. Anecdotally however, I can attest from reading the work stories that some of the additional language signals of low/high-meaning provided by Podolny et al. (2004) do indeed seem accurate for extreme examples (very high or low meaning on the continuous scales), but may not be as useful at differentiating middle-range scores, which may be why they failed to help predict overall in my preliminary tests. Additionally, Podolny et al. (2004) noted in their discussion that their proposed linguistic indicators of meaningfulness were likely too appear together. This could explain the high correlation found in this study between personal pronouns and action verbs. It could also suggest that when testing any additional features suggested by Podolny et al. (2004), one should be rigorous about checking for incremental predictive validity over and above the other features. Ultimately, I think that there is much more to be mined from Podolny et al.’s (2004) paper than was tested in this study, and would like to see future tests of their suggestions use 68 even more complex text processing techniques (e.g., custom decision trees and sentence-level analysis) to fully test all of their propositions. Limitations. In my introduction I gave a brief tour of some of the problems facing the literature on work meaningfulness. Namely, there are many different definitions of meaningfulness, and no consensus yet on what the ‘best’ definition is. Another limitation I have not yet mentioned is that meaningfulness is discussed near-universally as an overall evaluation of one’s work, even though it is possible that the experience of finding your work meaningful may also occur in-the-moment, rather than retrospectively. It is possible that capturing meaningfulness as it occurs during the work day, versus as an overall judgement, may increase the strength of the ‘signal’ and allow for more accurate assessment of the construct. Contribution 2: A Natural Language Measure of Meaningfulness There is no shortage of Likert-based measures of meaningful work. Bailey et al. (2016) list 25 different measures. To my knowledge, this study introduces the first measure of work meaningfulness that utilizes language patterns rather than Likert items. However, the NLP measure at the moment is only validated on samples of text where people were talking about their work in response to a certain, generic prompt. It is unknown how far it could generalize to other samples of text. There are many instances where similar samples of text may be generated (e.g., job analysis, talking about past jobs in job interviews). Distributing the measure. Likert-based measures are designed for paper-and-pencil testing, and thus can be distributed easily via PDF file or within an article. However, there is little precedent for distributing natural language-based measures of psychological constructs. In the hopes that this measure could be helpful to other researchers and practitioners, the code for this measure will be uploaded (along with instructions for use) to GitHub as soon as a paper 69 introducing it is published. The code will be available under a Creative Commons - BY (CC-BY) license, which means it will be free for modification, distribution, and use with proper credit. Future directions. Ultimately, with further refinement and testing on new samples of text, I hope that this measure could one day be used on open-ended survey responses and more passively-collected natural language data, like emails and customer service phone call transcripts. It may also be interesting to translate the measure for use with other languages. In terms of accuracy improvement, I think that a hybrid approach that combines the two approaches in this study (XGBoost/bag-of-words and Naive Bayes/binary discrimination), along with a custom decision tree classifier (which would allow me to say “If you see this pattern, it’s always low meaningfulness”), could yield higher prediction accuracy, perhaps nearing 100%. Limitations. It should be noted that machine-learning algorithms like those in this study utilize multiple regressions to achieve their predictive results. Running so many regressions can threaten generalizability by inflating type 1 error. That is, there’s a risk that so much trial and error can result in finding ‘successful predictors’ that may just be idiosyncratic to this data. The K-Fold test performed in this study is the standard practice for addressing this concern. This K- Fold method put the generalizability of the algorithm to the test by checking its predictive validity on multiple samples of data that it hadn’t seen before, and it passed these tests by maintaining its initial accuracy of 85% on new data. However, it should be noted that one large generalizability concern still remains: That the data used to train and test the algorithm is highly structured and may be too different from text data found organically in the ‘real world’. Contribution 3: Construal Level is Related to Meaningfulness Construal level shows at least a medium-sized relationship with several different measures of work meaningfulness. I suspect with further training on recognizing examples of 70 high/low construal level, raters could capture more of the variance in construal level, and that refined rating may relate even more strongly to meaningfulness than this initial attempt. This construal level relationship has many implications for research on meaningful work. First, it provides evidence to suggest that construal level may deserve a place as a central feature in definitions of meaningfulness, and further development along Morrison, Walker, and DeShon’s (2016) theoretical line seems warranted. Second, it suggests that construal level may indeed be a much-needed unifying component in work meaningfulness theory and should be included in discussions of the nature of experienced work meaningfulness. A high construal level perspective may be a precursor to perceiving meaning through any specific pathway to meaningfulness like those identified by Lips-Wiersma and Wright (2012). This construal level finding should be treated as the beginning (“tip of the iceberg”) of what should be a much larger investigation of construal level’s relationship with work meaningfulness. The relationship between construal level and work meaningfulness may also have wide implications for the meaning-making literature. Meaning-making refers to the process through which individuals are able to create a sense of meaning in their work (Frankl, 1962; Rosso et al., 2010). It’s possible that ‘raising construal level’, for instance by articulating a broad vision, is the route through which meaning-making travels. Future directions. This study was designed primarily to test the relationship between meaningfulness and language. It included a test of construal level as a secondary, opportunistic contribution. However, a more direct test of construal level and meaningfulness could shed even further light on the relationship. A future study might first manipulate construal level, and then ask about the 71 meaningfulness of the participant’s work. If construal level of work and meaningfulness of work are related as Morrison et al. (2016) proposed and the results of this study suggest, then a manipulated construal level could spill over onto perceptions of meaningfulness. Specifically, meaningfulness ratings should be higher in a high-construal condition than in a low-construal condition. Contribution 4: The Work Stories Corpus Studs Terkel’s (1972) book Working and Bowe et al.’s (2000) book Gig are both ‘just’ collections of assorted people talking about their work. With 1754 citations for the former, and 90 for the latter, each has been highly generative for researchers seeking to understand work experiences, and in some cases (i.e., Podolny et al. [2004]) to study natural language use. There are two major limitations hindering the more widespread use of these books in research: First, it is hard to find digital copies of the books for easy analysis. Second, neither book (obviously) includes any explicit psychological data on the authors of its stories, which limits their value in looking for language patterns that correlate with various psychological phenomenon (i.e., there are no criteria to compare to, requiring manual coding). The present study created a large, digital collection of Work Stories similar to those in Terkel’s (1972) Working and Bowe et al.’s (2000) Gig and paired them with a great deal of psychological data about the authors of each story. My hope is that this collection of work stories and accompanying data will empower other researchers to discover new qualitative insights about work and to develop new work-related natural language measures. To maximize the impact and generativity of this work story dataset, I plan to post it free and Open Access on the Open Science Framework, following the publication of a ‘data paper’ which will explain the dataset and give researchers something to cite, so that this project can be credited when the data 72 is used. PRACTICAL IMPLICATIONS Watch for Language Cues of Meaningfulness If nothing else, I hope this paper has taught you a cool party trick: Next time you’re at a party, and you ask someone “What do you do?”, listen to the first words out of their mouth. If they start with an identity statement like “I am a...” instead of “I work at...” or “Currently, I…”, and they say it with a positive tone of voice, there’s a chance they find their work meaningful. Similarly, if you hear them lamenting the pace of their work or all the separate pieces that “have to” be brought together, it may signal that they find their work low in meaning. Practitioners could be trained to look for these language signals of low and high meaning whenever people talk about their job (e.g., in interviews, job analyses, and performance feedback meetings). A New Tool for Practitioners Practitioners will be able to download and use the NLP meaningfulness algorithm created in this study freely as soon as it is published on GitHub. I recommend trying it especially in contexts where a person is describing their job, such as describing past jobs in job interview transcripts or describing their current job in open-ended survey responses. Modifications and improvements to the measure can also be made easily by ‘forking’ the GitHub repository. Watch for Construal Level Fluctuations When People Talk About Their Work The construal level findings in this study have much larger implications for theory than practice. However, it could still be useful for practitioners and managers to know that if they hear a worker describing their job in too many concrete details (e.g., “TPS reports, calendar, 73 bathroom, desk”) it may be a signal that they’re experiencing some meaninglessness. If they describe it in abstract terms and talk about the broader nature of their work (e.g., “keeping everybody up to speed” instead of “filing reports”), it may signal that they find their work highly meaningful. 74 APPENDIX 75 Training Procedure for Construal Level In the following sections, I will outline the procedure used to train the undergraduate raters to recognize construal level in the work stories. Reading Assignment. Each member of the rating team read Trope and Liberman’s seminal (2010) paper examining construal level in detail. This paper includes several examples of construal level’s antecedents and consequences, and explains the definition of the construct at length. Response paragraph. Each rater was asked to send a short response paragraph summarizing their impression of the definition of construal level. These paragraphs were all deemed to indicate satisfactory basic understanding. Guided examples. As a group, raters were shown 3 example stories from pilot data. Two examples of clearly high-construal stories (lots of abstract connections and language), and one example of a clearly low-construal story (which mostly discussed the concrete features of the work). Of the two high-construal examples, one was a high-meaning/high-construal story, and one was a low-meaning/low-construal story. In advance of the training session, I marked-up the example stories, highlighting language signals that struck me (as a subject matter expert) as particularly high or low construal. I discussed each of these highlighted sentences, and explained why I thought they represented high/low construal. Training Procedure for Meaningfulness In the following sections, I will outline the procedure used to train the undergraduate raters to recognize meaningfulness in the work stories. Reading assignment. Each undergraduate rater read the Morrison, Walker, and DeShon 76 (2016) definition of meaningfulness paper as an introduction to the construct of meaningfulness. It should be noted that this paper includes a focus on construal level as the potential unifying mechanism of meaningfulness, which could well have alerted the raters to the hypotheses being tested, and could have explained the strong correlation between other-rated and self-reported meaningfulness. Writing assignment. Each rater wrote two, 500-word essays (of the same length as those written by participants). In one essay, they were asked to write about the most meaningful work they ever participated in. In the second essay, they were asked to write about the most meaningless work they ever participated in. In both essays, they were asked to include an additional paragraph introspecting about how they talked about each work experience. The idea here was to get them thinking about language signals that they used when talking about meaningful/meaningless work to help them recognize such signals in the stories of others. Guided examples. Using the same example stories used to train construal level, raters were walked through sections in each work story that indicated to me (as a subject matter expert) that the person found their work meaningful. I also explained the logic behind the signals I pointed out, noting how each represented a connection (or lack thereof) to the author’s values, and/or represented a common pathway to meaningfulness. 77 REFERENCES 78 REFERENCES Allan, B. A., Duffy, R. D., & Collisson, B. (2018). Helping others increases meaningful work: Evidence from three experiments. Journal of Counseling Psychology, 65(2), 155-165. Arnold, K., Turner, N., Barling, J., Kelloway, E. K., & McKee, M. C. (2007). Transformational leadership and psychological well-being: The mediating role of meaningful work. Journal of Occupational Health Psychology,12, 193-203. Bailey, K., Yeoman, R., Madden, A., Thompson, M., Kerridge, G. (2016). A Narrative Evidence Synthesis of Meaningful Work: Progress and Research Agenda. A paper presented at the meeting of the Academy of Management, Anaheim, CA. Barrick, M. R., Mount, M. K., & Li, N. (2012). The theory of purposeful work behavior: The role of personality, higher-order goals, and job characteristics. Academy of Management Review, 38(1), 132–153. https://doi.org/10.5465/amr.2010.0479 Britt, T.W., Dickinson, J.M., Castro, C.A. & Adler, A.B. (2007). Correlates and consequences of morale versus depression under stressful conditions. Journal of Occupational Health Psychology, 12, 34-47. Bowe, J., Bowe, M., & Streeter, S. C. (2000). Gig: Americans talk about their jobs at the turn of the millennium. New York: Crown Publishers. Brownlee, J. (2018, May 23). A Gentle Introduction to k-fold Cross-Validation. Retrieved from https://machinelearningmastery.com/k-fold-cross-validation/ Brysbaert, M., Warriner, A.B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904-911. Bunderson, J. S., & Thompson, J. A. (2009). The call of the wild: Zookeepers, callings, and the double-edged sword of deeply meaningful work. Administrative Science Quarterly, 54(1), 32-57. Burgoon, E. M., Henderson, M. D., & Markman, a. B. (2013). There are many ways to see the forest for the trees: A tour guide for abstraction. Perspectives on Psychological Science, 8, 501–520. https://doi.org/10.1177/1745691613497964 Chalofsky, N. E. (2010). Meaningful workplaces: Reframing how and where we work. San Francisco, CA: John Wiley & Sons. Champoux, J. E. (1992). A multivariate analysis of curvilinear relationships among job scope, work context satisfactions, and affective outcomes. Human Relations, 45(1), 87-111. 79 Cohen-Meitar, R., Carmeli, A., & Waldman, D. A. (2009). Linking meaningfulness in the workplace to employee creativity. The intervening role of organizational identification and positive psychological experiences. Creativity Research Journal, 21, 361-375. Diesner, J., Frantz, T. L., & Carley, K. M. (2005). Communication networks from the Enron email corpus “It's always about the people. Enron is no different”. Computational & Mathematical Organization Theory, 11(3), 201-228. Douglas, K., & Carless, D. (2009). Abandoning the performance narrative: Two women's stories of transition from professional sport. Journal of Applied Sport Psychology, 21(2), 213- 230. Duffy, R. D., Allan, B. A., Autin, K. L., & Bott, E. M. (2013). Calling and life satisfaction: It’s not about having it, it’s about living it. Journal of Counseling Psychology, 60, 42-52. Fairlie, P. (2011). Meaningful work, employee engagement, and other key outcomes: Implications for human resource development. Advances in Human Resources, 13, 508- 525. Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC medical informatics and decision making, 12(8), 1-10. Förster, J., Friedman, R. S., & Liberman, N. (2004). Temporal construal effects on abstract and concrete thinking: Consequences for insight and creative cognition. Journal of Personality and Social Psychology, 87(2), 177-189. Frankl, V. E. (1962). Man’s search for meaning. New York: Simon and Schuster. Glazer, S., Kozusznik, M. W., Meyers, J. H., & Ganai, O. (2014). Meaningfulness as a resource to mitigate work stress. In S. Leka, & R. R. Sinclair (Eds.), Contemporary occupational health psychology: Global perspectives on research and practice (Vol. 3, pp. 114-130). Chichester, UK: Wiley-Blackwell. Grant, A. M. (2008). The significance of task significance: Job performance effects, relational mechanisms, and boundary conditions. Journal of Applied Psychology, 93(1), 108-124. Hackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work: Test of a theory. Organizational Behavior and Human Performance, 16(2), 250-279. James, W. (1985). The varieties of religious experience: A study in human nature. [Powerpoint slides] Retrieved from http://93beast.fea.st/files/section1/James%20- %20Varieties%20of%20Religious%20Experience.pdf Jiang, L., & Johnson, M. J. (2018). Meaningful work and affective commitment: A moderated mediation model of positive work reflection and work centrality. Journal of Business and 80 Psychology, 33(4), 545–558. https://doi.org/10.1007/s10869-017-9509-6 Lips-Wiersma, M., & Wright, S. (2012). Measuring the meaning of meaningful work: Development and validation of the Comprehensive Meaningful Work Scale (CMWS). Group & Organization Management, 37(5), 655–685. https://doi.org/10.1177/1059601112461578 Lepisto, D. A., & Pratt, M. G. (2016). Meaningful work as realization and justification: Toward a dual conceptualization. Organizational Psychology Review, 7, 99-121. https://doi.org/10.1177/2041386616630039 Kahn, J. H., Tobin, R. M., Massey, A. E., Anderson, J. A., The, S., Journal, A., & Summer, N. (2016). Measuring emotional expression with the Linguistic Inquiry and Word Count, The American Journal of Psychology, 120(2), 263–286. Kahn, W. A. (1990). Psychological conditions of personal engagement and disengagement at work. Academy of Management Journal, 33(4), 692-724. K-Fold Cross-Validation. (2018). In Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370-396. Maslow, A. (1969). The farther reaches of human nature. Journal of Transpersonal Psychology, 1(1), 1–9 Mahmud, J. (2015). IBM Watson Personality Insights: The science behind the service. Retrieved from https://developer.ibm.com/watson/blog/2015/03/23/ibm-watson-personality- insights-science- behind-service/ Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30, 457–500. https://doi.org/10.1613/jair.2349 Malsburg, T. V. (n.d.). How to correctly calculate worker compensation for Amazon Mechanical Turk. Retrieved March 26, 2017, from https://tmalsburg.github.io/blog/how-to-correctly- calculate-worker-compensation-for-amazon-mechanical-turk/ Martela, F., & Steger, M. F. (2016). The three meanings of meaning in life: Distinguishing coherence, purpose, and significance. The Journal of Positive Psychology, 11(5), 531- 545. https://doi.org/10.1080/17439760.2015.1137623 May, D. R., Gilson, R. L., & Harter, L. M. (2004). The psychological conditions of meaningfulness, safety and availability and the engagement of the human spirit at work. Journal of Occupational and Organizational Psychology, 77(1), 11-37. 81 McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52(1), 81-90. Morrison, MA., (2016). Increasing the meaningfulness of work with motivational self- transcendence. Paper presented at the meeting of the Academy of Management, Anaheim, CA. Morrison, MA., Walker, R, DeShon, R. (2016). Toward a comprehensive definition of work meaningfulness, Paper presented at the meeting of the Society for Industrial- Organizational Psychology, Orlando, FL. Naive Bayes classifier. (2018, August 02). In Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Naive_Bayes_classifier Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender Differences in Language Use: An Analysis of 14,000 Text Samples. Discourse Processes, 45(3), 211–236. https://doi.org/10.1080/01638530802073712 Pennebaker, J.W., Booth, R.J., Boyd, R.L., & Francis, M.E. (2015). Linguistic Inquiry and Word Count: LIWC2015. Austin, TX: Pennebaker Conglomerates (www.LIWC.net). Podolny, J. M., Khurana, R., & Hill-Popper, M. (2004). Revisiting the meaning of leadership. In B. M. Staw & R. M. Kramer (Eds.), Research in Organizational Behavior, 26, 1–37. Rothmann, S., & Hamukang’andu, L. (2013). Callings, work role fit, psychological meaningfulness and work engagement among teachers in Zambia. South African Journal of Education, 33(2), 1–16. Rosenberg, A., & Hirschberg, J. (2005). Acoustic/prosodic and lexical correlates of charismatic speech. Paper presented in Ninth European Conference on Speech Communication and Technology. Rosso, B. D., Dekas, K. H., & Wrzesniewski, A. (2010). On the meaning of work: A theoretical integration and review. In A. P. Brief & B. M. Staw (Eds), Research in Organizational Behavior, 30, 91–127. https://doi.org/10.1016/j.riob.2010.09.001 Rosso, B. D., Dekas, K. H., & Wrzesniewski, A. (2011). Corrigendum to “On the meaning of work: A theoretical integration and review”. In A. P. Brief & B. M. Staw (Eds.), Research in Organizational Behavior, 31, 277. Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Technical Reports (CIS), 570. Schnell, T., Höge, T., & Pollet, E. (2013). Predicting meaning in work: Theory, data, implications. The Journal of Positive Psychology, 8, 543-554. 82 Schwartz, A., Eyal, T., & Tamir, M. (2018). Emotions and the big picture: The effects of construal level on emotional preferences. Journal of Experimental Social Psychology, 78, 55–65. https://doi.org/10.1016/j.jesp.2018.05.005 Schwartz, S. H. (1994). Are there universal aspects in the structure and contents of human values? Journal of Social Issues, 50(4), 19-45. Steger, M. F., Dik, B. J., & Duffy, R. D. (2012). Measuring meaningful work: The work and meaning inventory (WAMI). Journal of Career Assessment, 20(3), 322-337. Stillman, T. F., Baumeister, R. F., Lambert, N. M., Crescioni, A. W., DeWall, C. N., & Fincham, F. D. (2009). Alone and without purpose: Life loses meaning following social exclusion. Journal of Experimental Social Psychology, 45(4), 686–694. https://doi.org/10.1016/j.jesp.2009.03.007 Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676 Terkel, S. (1972). Working: People talk about what they do all day and how they feel about what they do. Bridgewater, NJ: The New Press. Thompson, K. (1968). Programming techniques: Regular expression search algorithm. Communications of the ACM, 11(6), 419-422. Trope, Y., & Liberman, N. (2010). Construal-level theory of psychological distance. Psychological Review, 117(2), 440 Updegraff, J. A., Emanuel, A. S., Suh, E. M., & Gallagher, K. M. (2009). Sheltering the self from the storm: Self-construal abstractness and the stability of self-esteem. Personality and Social Psychology Bulletin, 36, 97–108. Vallacher, R. R., & Wegner, D. M. (1989). Levels of personal agency: Individual variation in action identification. Journal of Personality and Social psychology, 57(4), 660. Van der Cruyssen, L., Heleven, E., Ma, N., Vandekerckhove, M., & Van Overwalle, F. (2014). Distinct neural correlates of social categories and personality traits. NeuroImage, 104, 336–346. https://doi.org/10.1016/j.neuroimage.2014.09.022 Vauclair, C. M., Hanke, K., Fischer, R., & Fontaine, J. (2011). The structure of human values at the culture level: A meta-analytical replication of Schwartz’s value orientations using the Rokeach Value Survey. Journal of Cross-Cultural Psychology, 42(2), 186-205. Wakslak, C., Liberman, N., & Trope, Y. (2007). Construal levels and psychological distance: Effects on representation, prediction, evaluation, and behavior. Journal of Consumer Psychology, 17(2), 83–95. https://doi.org/10.1016/S1057-7408(07)70013-X.Construal 83 Weiss, H. M., & Rupp, D. E. (2011). Experiencing work: An essay on a person-centric work psychology. Industrial and Organizational Psychology, 4(1), 83-97. Wiesenfeld, B. M., Reyt, J.-N., Brockner, J., & Trope, Y. (2017). Construal level theory in organizational research. Annual Review of Organizational Psychology and Organizational Behavior, 4(1), 367–400. https://doi.org/10.1146/annurev-orgpsych- 032516-113115 84