IICIIIOAII STATE UNIVERSITY MARIE! 52 0 { c5 cw o: c: '1le mm Mill/WI {NU/IIHWINIHIUIIII '" 3 1293 00056 2649 LIBRARY Michigan State University This is to certify that the dissertation entitled Performance Appraisal in Context: Motivational Influences on Performance Ratings presented by Margaret Youtz Padgett has been accepted towards fulfillment of the requirements for Ph . D . Management degree in ; Major profér MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 Date May 13, 1988 }V1SSI_J RETURNING MATERIALS: Place in book drop to LJBRARJES remove this checkout from -_. your record. FINES will be charged if book is returned after the date stamped below. AUG 1 419?? “$83 2000 §.E§§§ *' PERFORMANCE APPRAISAL IN CONTEXT: MOTIVATIONAL INFLUENCES ON PERFORMANCE RATINGS By Margaret Youtz Padgett A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Management 1988 ABSTRACT PERFORMANCE APPRAISAL IN CONTEXT: MOTIVATIONAL INFLUENCES ON PERFORMANCE RATINGS BY Margaret Youtz Padgett The purpose of this study was to gain an understanding of some of the determinants of accuracy in performance ratings. Traditional explanations for inaccuracy have focused on rater ability, arguing that better rating formats or more effective rater training should make raters more able to evaluate performance accurately. The central thesis of this study was that an equally important, but generally ignored, determinant of accuracy is the motivation of raters to provide accurate ratings. A causal model detailing the relationship between some conditions likely to reduce the motivation to rate accurately was developed and submitted to empirical test using latent variable structural equation analysis. One hundred and twenty four managers completed a questionnaire assessing the hypothesized motivational influences and participated in a short interview during which they were asked to provide an honest appraisal of the performance of one employee. This private evaluation was then compared to the most recent public evaluation (obtained from organizational records) to obtain a behavioral index of the extent to which intentional distortion of performance appraisals occurs. The hypothesized structural model was found to fit the data well (Goodness of Fit Index - .883). Results indicated that the perceived freedom of the rater to be honest and the expected reaction of the ratee to the appraisal were direct influences on the amount of difference between public and private ratings and thus, the accuracy of ratings. Other important motivational influences included the credibility of the rater to the ratee, the ability of the rater to document the evaluation, and the expected consequences of the appraisal for the ratee. Implications of these results and suggestions for future research are discussed. Copyright by MARGARET YOUTZ PADGETT 1988 In loving memory of Sunny, our family cat and faithful friend for eighteen years ACKNOWLEDGMENTS It is impossible to look back over the years I have spent working on my dissertation without recognizing the invaluable assistance of those people who helped make the completion of this project possible. First, I must express my thanks to Dan Ilgen, my chairperson, who provided an immeasurable amount of help and guidance throughout the entire dissertation process. I especially appreciated his speed in providing feedback to me after reading each draft of the proposal and final write-up, particularly toward the end when time was of the essence. Let me also express my gratitude for his constant patience and for those gentle ”prods” he provided to keep me working and on target. I also want to recognize Ken Wexley and John Hollenbeck, my other two committee members. Both Ken and John were always very encouraging and supportive and their substantial contribution helped improve the final product a great deal. Finally, to all three committee members I want to express my appreciation for their guidance and friendship throughout my (many!) years in graduate school. I have learned a great deal from all of them. And even though all memories of graduate school are not positive, mine of these three will always be among my most pleasant. In addition to my committee, there are several other people who cannot go without recognition. Foremost among these is my husband, vi Bob, whose computer and data-analytic skills were absolutely invaluable to me. His frequent help while analyzing my data made this portion of the project go much more smoothly than it would otherwise. It would not be an exaggeration to say that I would probably still be trying to figure out LISREL VI and MTS, Wayne State's computer system, if I hadn't been so fortunate as to have his knowledge and experience at my disposal whenever I needed it. Beyond this, I want to thank him for his constant support, tolerance and love while I was working on my dissertation. I would also like to recognize my parents, who instilled in me the desire for and value of a good education. They were always in the background cheering me on and encouraging me to keep working even when I was discouraged or it seemed as though little progress was being made. Their frequent encouragement made the long process more bearable. Last, but by no means least, I want to recognize the 124 managers who contributed their time by participating in this study, for without their help I truly would have had no dissertation. vii TABLE OF CONTENTS List of Tables ...................................................... xi List of Figures .................................................... xii CHAPTER 1: INTRODUCTION ............................................. 1 Statement of the Problem ........................................ 1 Factors Influencing Rater Ability ............................... 9 The Rating Instrument ...................................... 9 The Roles of Rater and Ratee .............................. 13 The Rating Process ........................................ 16 The Rating Context ........................................ 18 Conclusion ................................................ 20 Factors Influencing Rater Motivation ........................... 21 Purpose of the Appraisal and Appraisal Consequences ....... 21 Trust in the Appraisal Process ............................ 28 Conclusion ................................................ 31 CHAPTER 2: MODEL AND HYPOTHESES .................................... 32 Overview of Model .............................................. 33 Expected Consequences of the Appraisal for the Ratee ........... 36 Purpose of the Appraisal .................................. 36 Reaction of the Ratee to the Appraisal ......................... 38 Expected Consequences of the Appraisal for the Ratee ...... 38 Credibility of the Rater to the Ratee ..................... 40 Appraisal Visibility ...................................... 41 viii Appraisal Visibility ........................................... 43 Task Interdependence Among Employees ...................... 43 Perceived Freedom to be Honest ................................. 43 Reaction of the Ratee to the Appraisal .................... 44 Rater's Desire to be Liked by the Ratee ................... 44 Ability to Document the Appraisal ......................... 4S Appraisal Visibility ...................................... 46 Occurrence of Rendering Errors ................................. 47 Perceived Freedom to be Honest ............................ 47 Summary ........................................................ 47 CHAPTER 3: METHOD .................................................. 49 Overview of Methodology ........................................ 49 Participants ................................................... 49 Procedure ...................................................... 51 The Pilot Study ........................................... 51 The Primary Study ......................................... 53 Variables ...................................................... 58 Rendering Errors .......................................... 58 Motivational Influences ................................... 61 Data Analysis .................................................. 68 Overview of Linear Structural Equation Analysis ........... 68 Assessment of Fit ......................................... 69 Assumptions Underlying the Use of Structural Equation Analysis .................................................. 71 Description of Diagram Depicting the Measurement and Structural Models ......................................... 73 CHAPTER 4: RESULTS ................................................. 77 Assessment of the Measurement Model ............................ 77 ix Assessment of the Structural Model ............................. 78 Exploratory Analysis ........................................... 87 CHAPTER 5: DISCUSSION .............................................. 96 Summary and Implications of Findings ........................... 96 Informal Observations ..................................... 96 Formal Analyses: Supported Hypotheses .................... 98 Formal Analyses: Unsupported Hypotheses ................. 106 Limitations in the Study ...................................... 111 Suggestions for Future Research ............................... 114 Conclusion .................................................... 117 APPENDIX A: Evaluation Forms Used by the Organization ............. 118 APPENDIX B: Questionnaire Completed by Study Participants ......... 121 APPENDIX C: Procedures for Measuring Expected Consequences of the Performance Appraisal for the Ratee ............ 134 APPENDIX D: Questionnaire Items Measuring Each Motivational Influence ............................................. 136 Footnotes .......................................................... 143 List of References ................................................. 145 Table Table Table Table Table Table Table LIST OF TABLES Means, Standard Deviations, Reliabilities and Correlations between Scales Measuring the Motivational Influences ................................................ 79 Factor Loadings for Confirmatory Factor Analysis - The Lambda Matrix ......................................... 80 Intercorrelations between Latent Variables - The Phi Matrix ................................................ 81 Structural Coefficients and T-Values for the Originally Hypothesized Model ............................. 84 Structural Coefficients and T-Values for the Modified Model ............................................ 91 Goodness of Fit Indices for the Original Model and Sequential Modifications ............. . .................... 92 0 Structural Coefficients and T-Values for the Final Model ............................................... 95 xi Figure Figure Figure Figure Figure Figure Figure Figure LIST OF FIGURES Factors Influencing the Accuracy of Performance Appraisals ................................................ 4 Landy and Farr's (1980) Component Model of Performance Rating ....................................... 10 Performance Appraisal Behaviors and Possible Outcomes of those Behaviors for Raters and Ratees (from Mohrman and Lawler, 1983) ................................ 24 Model of the Factors Influencing Rater Motivation to Provide Accurate Performance Evaluations .............. 34 Hypothesized Measurement and Structural Model of Rater Motivation ......................................... 74 Structural Parameters for Hypothesized Model of Rater Motivation ......................................... 82 Structural Parameters for Modified Model of Rater Motivation ............................................... 90 Structural Parameters for Final Model of Rater Motivation ............................................... 94 xii CHAPTER 1: INTRODUCTION Matthew The appraisal of human performance has been a concern of researchers for many years as demonstrated by the large number of empirical studies on performance appraisals (cf. Landy & Farr, 1980, for a review of much of this literature). Yet, in spite of the vast amount of research conducted on the appraisal process, it is not clear that much progress has been made toward improving the quality of ratings that result from a typical appraisal system. Understanding and improving the performance evaluation process is particularly important, given the extent to which appraisals are used in organizations. The results of a 1977 survey, for example, revealed that over 90% of those organizations sampled had an appraisal system (Locher & Teel, 1977). Furthermore, in most organizations that have appraisal systems, they are used for purposes that have important implications for employees (Ilgen & Feldman, 1983; Kane & Lawler, 1979). For example, performance appraisals may be used as a basis for promotion and placement decisions, as well as reward allocation and termination decisions. Evaluations may also serve as the criteria against which training and selection programs are validated and be used to provide developmental feedback to employees. Given the number, diversity and importance of the situations utilizing performance appraisal information, it is necessary that this information be as accurate as possible. In order to understand the evaluation process and some of the factors that can affect appraisal accuracy, it is helpful to examine performance appraisal from a job behavior, or task, perspective. Researchers examining human performance have argued that effective performance on some task is a function of two factors: a person's ability to perform the task and his/her motivation to do so. The basis of this assumption is Lewin's (1935) interactive model of performance which states that both ability and motivation must be present in order for a person to perform well on some task. In a performance appraisal context, the central task is evaluating the performance of employees. The goal is to obtain ratings that reflect, to the extent possible, the actual behavior of the ratee (Borman, 1978; Bernardin & Pence, 1980). Adapting Lewin's general model of performance to the performance appraisal task suggests that performance rating accuracy is affected by two conditions, a rater's ability to provide accurate ratings of performance and his/her motivation to do so. In order to better understand how rater ability and rater motivation influence the accuracy of performance ratings it is necessary to recognize the existence of three potentially distinct views of ratee performance. These are: (1) the actual performance of the ratee, (2) the rater's private evaluation of ratee performance, and (3) the rater's public evaluation of ratee performance. Although the first of these requires little clarification, the distinction between private and public evaluations of ratee performance needs some explanation. According to Mohrman and Lawler (1983), private performance appraisal behaviors include any internal acts of cognition, judgment, perception, evaluation or attribution on the part of raters about some ratee. Private behaviors might also include the making and retention of private notes or other documents about the ratee. These private rating behaviors, therefore, reflect what raters actually think about the ratee's performance. On the other hand, public performance appraisal behaviors involve verbally communicating appraisals to other people, such as ratees, or recording the appraisal on a form that is seen and used by other people in the organization (Mohrman & Lawler, 1983). Public ratings of performance indicate what the rater wants other people to know about the ratee's performance. The public evaluation is what is typically referred to when the term "performance appraisal” is used. Based on this distinction, it can be seen that the relationship between actual ratee performance and written ratings of performance (i.e., the extent of appraisal accuracy) really contains two linkages: (1) a linkage between actual ratee performance and rater private judgments about performance and (2) a linkage between rater private judgments about performance and his/her public ratings of performance (see Figure 1). The first linkage has been termed the judgment, or evaluation, process and the second linkage the rating, or rendering, process (Banks & Murphy, 1985). Clearly, both linkages must be strong if performance appraisals (i.e., public ratings) are to be accurate. The traditional explanation for inaccurate ratings has focused on rater ability. Inaccuracy in performance ratings due to low rater ability is likely to be reflected in a lack of correspondance between actual ratee performance and rater private judgments of performance (linkage 1). Underlying this explanation has been the implicit, but rarely stated, assumption that most inaccuracies in performance ratings occur unintentionally (i.e., without the awareness of the Figure 1: Factors that Influence the Accuracy of Performance Ratings Rater Ability Judgment Protess Actual Ratee Performance I Private Ratings of Performance Public Ratings of Performance Rendering Process A Rater Motivation rater who, in fact, is trying to rate performance as accurately as possible). In other words, raters are believed to accidently form inaccurate private judgments about ratee performance through various rating errors and biases (e.g. selecting inappropriate performance information, interpreting this information incorrectly, forgetting relevant aspects of ratee performance etc.). As a result, inaccuracies of this sort are likely to be unsystematic or random (i.e., sometimes resulting in evaluations that are higher than the ratee's actual performance and sometimes leading to lower ratings). Attempts to improve the accuracy of performance appraisals by increasing the ability of raters to evaluate performance (e.g. developing new appraisal instruments, rater training and research on rater cognitive processes) are clearly important since evaluations cannot be accurate if raters lack the requisite ability to appraise performance. However, the necessity of distinguishing between private and public performance ratings indicates that rater motivation is also an important determinant of appraisal accuracy. Motivation, in general, reflects the level, direction and persistence of behavior (Campbell & Pritchard, 1976). While in a performance appraisal context the level of effort exerted by raters toward actually doing performance evaluations is a relevant concern, even more important is the direction of that motivational force. Specifically, it is important that the motivation of raters be directed toward rating performance accurately rather than toward producing a rating at a particular level. If the rater's objective when doing the performance evaluation is to get the employee a large raise or to avoid an unpleasant confrontation, then the rater might be motivated to intgntignglly provide a public rating that he/she believes is inaccurate (i.e., that differs from his/her private evaluation). Thus, low motivation to rate accurately is reflected in intentional discrepancies between public and private ratings, and therefore, is likely to produce systematic biases in performance appraisals (i.e., appraisals that either consistently overstate or understate ratee performance). Most previous performance appraisal researchers have failed to recognize the potential impact of rater motivation on performance appraisals (Banks & Murphy, 1985). Perhaps one reason for this is the dominant paradigm for research examining performance appraisal accuracy. Specifically, the majority of this research has been conducted in laboratory settings (where standards for determining the accuracy of performance ratings can be developed), particularly since the shift in the last few years toward studying the cognitive processes of raters. While laboratory studies are likely to be helpful in illuminating processes affecting rater ability, they may be less useful in understanding factors influencing rater motivation. This is because laboratory settings may reduce or eliminate the effects of motivational influences, such as personal or political agendas, (Banks & Murphy, 1985), since raters have little to lose by rating accurately or to gain by rating inaccurately. Thus, the motivation to record public ratings that differ from private judgments is likely to be lower. When performance appraisals are conducted in an organizational setting, however, there is more likely to be a discrepancy between public and private ratings because of organizational pressures placed on raters to intentionally distort public evaluations of ratee performance. Consider the following situation. Suppose that organizational policies and procedures require that employees be given developmental feedback based on performance appraisal data. Assume also that a particular manager has to provide negative feedback to a poor performing employee whom he or she knows has a tendency to get very defensive and hostile, no matter how constructively the criticism is given. Finally, assume that the manager does not feel that she/he has adequate documentation (i.e., specific examples of ineffective job behavior) to support his/her evaluation. Several motivational influences are operating in this example. The ratee's anticipated defensiveness, the lack of performance documentation and the way in which appraisal information is used are all likely to affect the extent to which the rater is motivated to provide an accurate rendering of performance. Since these kinds of pressures only exist in real organizations, identification of conditions that affect the motivation to rate accurately requires research conducted in field settings. The purpose of the present study was to gain an understanding of the rendering process and of some of the reasons for intentional discrepancies between private and public evaluations of performance, termed rendering errors. Although, in theory, low rater motivation could result in public evaluations that either consistently overstate g; understate performance, in practice, the former are likely to be more common (see Dayal, 1969; Rowe, 1964; Thayer, 1981). This is because raters' personal goals (e.g. getting an employee a promotion or making themselves appear favorably to superiors) are more likely to be achieved by inflating, rather than deflating, ratings. In addition, managers have been found to express a great deal of reluctance to intentionally deflate ratings because of the high probability that such an action will lead to subsequent problems (Longenecker, Gioia & Sims, 1987). Therefore, the emphasis in this study was on identifying conditions likely to result in intentionally over-rating performance. Following the suggestion of Bartlett (1983), several motivational influences were identified and a model detailing their interrelationships was tested in a field setting. Before discussing the motivational issue in greater detail, however, literature examining factors that influence the ability of raters to provide accurate ratings is briefly reviewed. The purpose of this discussion is not to provide an in-depth and critical review of the large volume of empirical research examining ability effects on performance appraisal. Such a review is beyond the scope of this paper and has been conducted by others (e.g. Landy & Farr, 1980; Wexley & Klimoski, 1984). Rather, this overview is intended to demonstrate the pervasiveness of the belief that a lack of rater ability accounts for most of the inaccuracy in performance ratings. In addition, the review introduces the major issues relating to rater ability in order to provide a point of contrast for the major focus of this study, which is the examination of motivational influences on performance ratings. mwmml In describing the large body of research examining performance appraisal from the standpoint of rater ability, Landy and Farr (1980) suggest a model that includes several determinants of performance rating results (see Figure 2). These include the vehicle (the rating instrument), the roles (rater and ratee), the rating process and the rating context (e.g. the type of job or organization, the purpose for the appraisal). These components provide a useful structure for briefly reviewing the research dealing with rater ability. mmw Much of the early performance appraisal research focused on the rating instrument used to record judgments about performance. The assumption behind this research was that how information about a person's performance was elicited (i.e., the design of the form) would influence the ability of raters to make accurate judgments of performance. The different rating formats that have been developed can be distinguished in terms of whether they measure people (i.e., traits), processes (i.e., activities or behaviors) or products (i.e., results) (Wexley & Klimoski, 1984). The measurement of people typically involves assessing the personal characteristics or traits which they possess. The most pervasive format for measuring traits is the graphic rating scale, introduced by Paterson (1922). This format consists of several rating scales, each associated with a different trait label, a brief definition of the trait and an unbroken line with varying types and numbers of anchors on which the rating is marked. Research on graphic rating scales has involved varying the presence or absense of trait 10 Figure 2: Landy and Farr's (1980) Component Model of Performance Rating ROIGS Ratin C0019)“ Procgss $ Results Instrument 11 definitions, the number of divisions in the scale, and the number and type of anchors to see if this affected the quality of performance ratings (e.g. Barrett, Taylor, Parker & Martens, 1958; Madden & Bourdon, 1964). A second group of rating formats are those which focus on measuring the observable behaviors or activities of employees. The first format of this type was the Behaviorally Anchored Rating Scale (Smith & Kendall, 1963). Behaviorally anchored rating scales (BARS) differ from graphic rating scales in that they utilize behaviorally- oriented anchors for each job dimension, rather than adjectives or numbers. For each dimension, raters indicate which of the behavioral anchors (sealed in terms of effectiveness) is most similar to how they would expect the ratee to behave. A variant of the BARS format is Behavioral Observation Scales (BOS), developed by Latham & Wexley (1977; 1981). Behavioral observation scales require raters to indicate the frequency with which they have observed each of several specific job behaviors relevant to a given performance dimension. Thus, multiple measures are taken of each dimension, rather than just one, as with BARS and graphic rating scales. Other examples of behavioral rating formats are Behavioral Discrimination Scales (Kane & Lawler, 1979), Behavior Summary Scales (Barman, Hough & Dunnette, 1976) and Behavioral Assessment Approaches (Komacki, 1981). Behaviorally-oriented rating formats offer a number of potential advantages over the traditional graphic rating scale (see Latham & Wexley, 1981 for a more complete discussion of these advantages). For example, behavioral measures are less ambiguous and subjective than are trait measures since they involve actual observations of behavior 12 rather than abstractions from behavior. In addition, activity measures are more directly related to what the employee actually does and they facilitate providing explicit performance feedback to ratees. Product, or results, measures are the final type of rating format. The most common results-oriented rating system is Management by Objectives (Drucker, 1954). Management by Objectives involves joint participation by managers and subordinates in the setting of results-oriented goals. Performance evaluation then consists of measuring the extent to which these goals are achieved. The presumed advantage of results-oriented rating systems is that they do not require as much judgment on the part of raters (and thus, bypass their cognitive processes), which should increase the ability of raters to make accurate judgments (Wexley & Klimoski, 1984). Studies comparing graphic rating scales and BARS have measured rating quality in several ways, including the absence of rating errors, such as halo and leniency, reliability (interrater agreement), discriminability and rater satisfaction with the format. The results of this research are mixed with some studies suggesting that the BARS format may be superior to the graphic rating scale (e.g. Borman & Dunnette, 1975; Burnaska & Hollmann, 1974), and other studies yielding the opposite conclusion (e.g. Bernardin, Alvares & Cranny, 1976). Little research has been conducted comparing results-oriented systems to the other rating formats. Although the practical utility of identifying rating formats that result in more accurate ratings would be substantial, it is not the case that developing better rating formats necessarily eliminates all bias and error in performance 13 ratings, as early researchers had hoped (Landy & Farr, 1980). Rather, even when carefully developed rating systems are used (whether they are trait, behavior or results systems), some rating bias still seems to occur. Furthermore, Wexley and Klimoski (1984) suggest that the traits vs. behaviors vs. results controversy is not the real issue since each format may be effective in certain situations. measlescffiatarandflassa The Egtgz. Research on the rater has been of two types, both oriented toward improving rater ability. Some research has focused on rater personal characteristics, with the primary aim of identifying raters who are more able to provide accurate ratings. A variety of rater characteristics have been examined, including demographic, psychological and job-related attributes (e.g. Borman, 1979b; Taft, 1955; Wexley & YOutz, 1985). The most frequently examined rater characteristics have been the sex and race of the rater. While the results have been somewhat mixed, there is no consistent evidence that there are sex differences (e.g. Hammer, Kim, Baird & Bigoness, 1974; Jacobson & Effertz, 1974; Rosen & Jerdee, 1973) or race differences (Schmidt & Johnson, 1973) in the quality of evaluations. The primary race-related bias observed consistently is a tendency for raters to give higher performance ratings to ratees of the same race and to be more confident of ratings given to ratees of the same race (e.g. Cox & Krumboltz, 1958; Banner et a1., 1974; Schmidt & Lappin, 1980). Although rater psychological characteristics would seem to be a fruitful avenue for identifying individuals who are more able to provide accurate performance ratings, psychological characteristics 14 have been examined too infrequently to allow definite conclusions (see Taft, 1955 and Landy & Farr, 1980 for reviews). Nevertheless, tentative conclusions suggest that more accurate ratings may occur when raters are intelligent, have artistic interests, possess self insight and social skills, and are emotionally adjusted (Borman, 1979b; Taft, 1955). Furthermore, there is some evidence that raters who believe in the variability of people (i.e., who recognize the extent of individual differences) may rate more accurately (Wexley & Youtz, 1985). The second major type of research on raters has been to examine the quality of ratings from various rater groups. Although the most common source for ratings is the immediate supervisor, other possibilities include peer, self or subordinate ratings. The results of research comparing the quality of ratings from different sources are mixed. While it is evident the ratings obtained from different sources are usually not the same (e.g. Borman, 1974; Kirchner, 1966; Klimoski & London, 1974; Lawler, 1967; Zedeck, Imparato, Krausz & Oleno, 1974), it is not clear that one source is more valid than another. Rather, each rater group appears to have a unique perspective that contributes valid information about performance (Landy & Farr, 1980). This view is consistent with research indicating that different dimensions of job performance are identified by peers and supervisors in the development of BARS for the same job (e.g. Borman, 1974; Landy, Farr, Saal & Freytag, 1976). Ihg Egggg. Research on the impact of ratee characteristics on performance ratings has been limited almost exclusively to the 15 examination of ratee demographic characteristics, such as sex and race, on performance ratings (for reviews, see Ford, Kraiger & Schectman, 1986; Kraiger & Ford, 1985, Nieva & Gutek, 1980, and White, Crino & DeSanctis, 1981). Sex of the ratee has been found in some studies to interact with the sex stereotype of the job, such that females in typically male jobs receive lower performance ratings (e.g. Schmitt & Hill, 1977), or lower salaries and less challenging job assignments (e.g. Terborg & Ilgen, 1975). A meta-analysis of ratee race effects showed that black ratees typically receive lower performance ratings than whites but only when evaluated by white raters (Kraiger & Ford, 1985). Several other studies have shown that ratee performance characteristics, such as performance level and performance consistency, may also affect the quality of performance ratings (e.g. DeNisi & Stevens, 1981; Padgett & Ilgen, 1988; Scott & Hamner, 1975). Overall, research suggests that rater and ratee characteristics may influence the ability of raters to accurately evaluate performance. More research is needed, however, to clarify the mechanisms by which these effects occur. While, from a practical point of view, it is probably not possible to make major changes in the characteristics of raters and ratees which will improve the quality of evaluations, this perspective on rater ability does suggest which people might benefit more from rater training programs designed to eliminate ratings errors (e.g. Bernardin, 1978; Bernardin & Walter, 1977; Latham, Wexley & Pursell, 1975) or improve accuracy (Pulakos, 1984). l6 magnum The newest emphasis for research on performance appraisal has been to examine the cognitive processes of raters when making performance evaluations (DeNisi, Cafferty & Meglino, 1984; Feldman, 1981; Ilgen & Feldman, 1983). This approach views the rater as an active information processor involved in selecting information about ratee performance, organizing and storing this information in memory and then, at some later time, recalling the information in order to complete the evaluation form. Although this approach focuses primarily on understanding the rating process, one outcome of this research, from the perspective of rater accuracy training, may be the identification of more effective strategies for gathering, organizing and retrieving information about ratee performance. It may then be possible to teach raters these strategies so that they are more able to rate performance accurately. Thus far, more theorizing on rater cognitive processes has occurred than actual research and much of the theorizing has tended to emphasize the general relevance of findings in the area of social cognition for performance appraisal than specific applications of this literature to the performance appraisal process (DeNisi et a1., 1984). An exception to this tendency is the large body of research examining attribution processes (e.g. Kelley, 1967; Weiner, Frieze, Kukla, Reed, Rest, Rosenbaum, 1971), the effect of attributions on performance evaluations (e.g. Knowlton & Mitchel, 1980; Mitchell & Wood, 1980; Nieva & Gutek, 1980) and the effect of attributions on the distribution of organizational rewards (e.g. Heilman & Guzzo, 1978). While attributional processes are important cognitive l7 determinants of the ability of raters to provide accurate performance ratings, it has been argued that research on cognitive processes needs to go beyond attribution theory to examine how the selection, organization, storage and retrieval of performance information affects the accuracy of appraisals (Feldman, 1981; Ilgen & Feldman, 1983). DeNisi, Cafferty & Meglino (1984) provided a model and a number of specific propositions to guide research in this area. Among the more interesting examples of research from this perspective are studies examining (1) factors that influence the selection, organization and recall of performance information, such as appraisal purpose (e.g. Williams et a1., 1985), affect (e.g. Bower, 1981; Cardy & Dobbins, 1986; Park, Sims & Motowidlo, 1986), and categorization (e.g. Favero & Ilgen, 1983; Lord, Foti & Phillips, 1982; Murphy & Balzer, 1986; Padgett & Ilgen, 1988), (2) how performance information is processed and its effect on recall (e.g. DeNisi, Williams, Cafferty & Meglino, 1985; Lance & Woehr, 1986; Murphy, Martin & Garcia, 1982; Nathan & Lord, 1983) and (3) the effect of rater cognitive processes on traditional measures of rating quality, such as rating accuracy and the occurrence of rating errors (e.g. Cafferty, DeNisi & Williams, 1984; Favero & Ilgen, 1983; Mount & Thompson, 1987). Overall, a cognitive processing perspective on performance appraisal seems to offer a number of potential practical applications for training raters how to rate evaluate performance more accurately. However, as noted by DeNisi and his colleagues (DeNisi, Williams, Cafferty & Meglino, 1985), a great deal more research is before it can be concluded that this perspective is more useful than other 18 approaches to studying performance appraisal and before specific applications can be developed. mmm According to Landy & Farr (1980), the rating context consists of those factors not specifically related to the instrument, rater, ratee, or rating process that are still part of the situation surrounding the appraisal and thus, could affect its accuracy. The most frequently cited contextual factor affecting performance ratings is the purpose of the appraisal. It appears, however, that appraisal purpose can influence performance ratings both through its effect on rater ability (e.g. Crockett, Mahood & Press, 1975; Jeffrey & Mischel, 1979; Williams, DeNisi, Blencoe & Cafferty, 1985; Wyer, Srull, Gordon & Hartwick, 1982) and rater motivation (e.g. Bernardin, Orban & Carlyle, 1981; McIntyre, Smith & Hassett, 1984; Meyer, Kay & French, 1965; Sharon & Bartlett, 1969; Zedeck & Cascio, 1982). Only research on how appraisal purpose influences rater ability (i.e., results in unintentional inaccuracies) is discussed here; that dealing with rater motivation and intentional rating distortions is described later. Performance appraisals can be used for a variety of purposes in organizations, but the two most commonly mentioned purposes are control and coaching. Performance appraisals are used for control when they help to determine rewards and punishments for employees (e.g. salary, promotion, demotion, transfer and termination decisions). As noted by Ilgen & Feldman (1983), the control purpose . of appraisals can either be explicit, as when appraisals are directly tied to rewards via a merit pay system, or implicit, such as when a superior determines job assignments for employees based on his or her 19 impression of their performance. The coaching function involves providing employees with feedback on their performance in order to facilitate performance improvement and development. DeNisi et a1. (1984) suggested that appraisal purpose is most likely to influence rater ability to provide accurate ratings of performance through its effect on the amount and type of information sought by raters and the way that information is stored in memory. For example, some research suggests that raters seek out more information when appraisals are done for administrative decision- making than when they are done for employee development (Matte, 1982). In addition, raters have been found to select distinctiveness information when appraisals are used for salary decisions but sought out consensus information when the appraisal would influence promotion or remedial training decisions (Williams, DeNisi, Blencoe & Cafferty, 1985) (see DeNisi, Cafferty & Meglino, 1984 for a more complete discussion of this issue). A second contextual factor that could influence rater ability to provide accurate performance evaluations is characteristics of the employee's task and workgroup. For example, several studies have suggested that performance appraisals tend to be done on a relative, rather than absolute, basis (e.g. Grey & Kipnis, 1976; Knowlton & Mitchell, 1980; Mitchell & Liden, 1982). Furthermore, the amount of task interdependence between members of a workgroup could affect performance ratings. As noted by Kane & Lawler (1979), when the tasks performed by members of a workgroup are interdependent, it is more difficult to evaluate performance because of the difficulty in 20 determining the contribution of any given individual. The result is likely to be less variance across the performance ratings for the members of the workgroup (as found by Liden & Mitchell, 1983) and possibly lower rating accuracy. A final contextual factor that could affect the ability of raters to accurately evaluate performance is the opportunity of the rater to observe the performance of the employee (Kane & Lawler, 1979). The less opportunity the rater has to observe relevant job behaviors the more difficult it is to develop an accurate picture of an employee's performance (e.g. Henemen & Wexley, 1983). To some extent, the opportunity to observe is determined by the nature of the employee's job, since some jobs (e.g. sales representatives) require employees to spend a significant amount of time in locations where their behavior cannot be observed by the rater. Overall, contextual factors represent an important but relatively unexplored influence on the ability of raters to provide accurate evaluations of performance. More research is needed to further elaborate the effect of these and other contextual factors on performance ratings. M1211 In the section above, research dealing with the effect of the rating instrument, the rater and ratee, the rating process and the rating context on the ability of raters to provide accurate performance evaluations was briefly reviewed. This review highlights the pervasiveness of the assumption that inaccuracies typically result from a lack of rater ability. It also demonstrates the enormous complexity of the rating process and the extreme difficulty of 21 obtaining accurate ratings even when only factors influencing rater ability are considered. Unfortunately, even if the ideal rating instrument could be developed, the best raters selected and trained in the most effective strategies for selecting, organizing and retrieving performance information, and the most effective rating context achieved, it is doubtful that accurate performance ratings would result. Only when the issue of rater motivation is also considered is the goal of accurate performance ratings likely to be realized. MWMW The influence of rater motivation on performance ratings has received little attention, compared to the volumes of research examining issues related to rater ability. However, as disillusionment with typical methods of improving rating accuracy (e.g. developing new instruments or training raters to eliminate rating errors and bias) has increased, there has been a greater realization of the importance of rater motivation (e.g. Banks & Murphy, 1985). The most frequently mentioned motivational influences discussed in the literature are described below. mmmwmww A number of researchers have recognized the potential influence of appraisal purpose on the motivation of raters to evaluate performance accurately (e.g. Decotiis & Petit, 1978; Kane & Lawler, 1979; Sharon & Bartlett, 1969; Zedeck & Cascio, 1982), as distinct from its effect on rater ability (discussed above). The research on appraisal purpose comprises the majority of the empirical research done to date which is relevant to understanding rater motivation. 22 Although research results are somewhat inconsistent, the most common finding is that ratings for research purposes are less lenient and more accurate than ratings for personnel decisions (Sharon & Bartlett, 1969). Within the category of personnel decisions, ratings are less lenient and more accurate when they are used for subordinate development than for salary, promotion or termination decisions (Bernardin et a1., 1981; Meyer, Kay & French, 1965; Zedeck & Cascio, 1982). Unfortunately, most of the research on appraisal purpose has been limited to examining its effect on either rating accuracy or the occurrence of rating errors. Few attempts have been made to understand how and why purpose influences performance ratings. An exception is the DeCotiis & Petit (1978) model of performance appraisal which went beyond simply noting that purpose influenced performance ratings to describe why this relationship might occur. Specifically, they argued that appraisal purpose has an important motivational component because of its inextricable linkage to the consequences of the appraisal for the rater and ratee. The importance of appraisal consequences for rater motivation can be derived from an expectancy theory framework (Mitchell, 1974; Porter & Lawler, 1968; Vroom, 1964). This perspective has been specifically applied to performance appraisals by Mohrman & Lawler (1983) in an attempt to understand the motivations of both raters and ratees in an appraisal situation. However, because the focus of this study was on understanding the actions of the rater, only this aspect of performance appraisal motivation is considered in the discussion below. 23 According to expectancy theory, an individual's motivation to exert effort toward some behavior is a function of three cognitions, the expectation that effort will result in the desired behavior, the perceived outcomes of those behaviors, and the attractiveness of those outcomes (Porter& Lawler, 1968). In a performance appraisal situation the relevant behavior is doing an accurate performance evaluation. Therefore, it follows that a rater's motivation to evaluate performance accurately should be influenced by the extent to which the rater believes he/she is able to evaluate performance accurately, the perceived consequences of doing an accurate appraisal and the attractiveness of those consequences. The latter two cognitions have the greatest relevance for this discussion. Important appraisal consequences for the rater include both what happens to the rater directly as a result of the evaluation and what happens to the ratee because of the evaluation. Ratee consequences (e.g. the size of salary increases, the likelihood of promotion, effects on self-esteem) represent important concerns for the rater because of the potential ramifications for his/her day-to-day interactions and future relationship with the subordinate (Dayal, 1969; Decotiis & Petit, 1978; McCall a DeVries, 1977). Some of the possible consequences of appraisals that might occur for raters and ratees are described in Figure 3. While some of the outcomes for the desired appraisal behaviors are positive, there is the potential for many negative outcomes to result from accurately recording performance evaluations as well. In fact, it could be argued that many more of the probable consequences for the rater will 24 Figure 3: Performance Appraisal Behaviors and Possible Outcomes of these Behaviors for Raters and Ratees (from Mohrman and Lawler, 1983) Performance Appraisal Behavior E"El' -biasing -doing PA at all -witholding information . -allowing participation \\ -attributing -gathering information -evaluating -giving feedback to others E"E]' -accept feedback from others -se1f appraisal -defend self -seek career guidance Outcomes -interpersonal reaction of ratee -reaction of others to the appraisal -pay action for ratee -ability to fire or promote ratee -own credibility -future performance of ratee -training chances for ratee -overall performance of unit -rewards for doing PA behaviors -self esteem -better understanding of role -interpersonal reaction of rater -pay action -promotion -validity of information from rater -ability to improve own performance -training opportunities -development of skills and abilities -rewards for doing prescribed PA behaviors 25 be negative (e.g. getting an undesirable pay or promotion action for the employee, having the employee react defensively to the appraisal, having superiors in the organization reject the appraisal, damaging his/her relationship with the employee etc.). Thus, it should not be surprising that, in many cases, the motivation to provide accurate appraisals is low. There is also some empirical evidence that the anticipated consequences of the appraisal for the rater and ratee are important influences on the public performance ratings given by raters. In the only empirical study specifically examining motivational issues in performance appraisal (Longenecker, Gioia & Sims, 1987), the evaluation process was viewed as a political process where actors (i.e., raters) were motivated to enhance or protect their own interests. This study involved in-depth semi-structured interviews with 60 executives employed in seven large organizations. The methodology employed was primarily inductive in that no hypotheses were tested (although some a priori "probes" were used during the interview). Rather, executives were encouraged to freely and subjectively describe how they perceived their performance evaluation processes. All interviews were tape-recorded and then transcribed onto notecards, with each card containing one directly quoted idea or thought from one executive. Notecards were then classified into categories representing the various political/motivational issues that emerged during the interviews. Only issues that were raised by 72% or more of the executives were reported. Perhaps the most important finding from this study was the open recognition and admission by managers that performance appraisal was a 26 political process and that it was not uncommon for them to intentionally modify their performance ratings of an employee (most typically by inflating the rating, but, in a few circumstances, by deflating it) if this resulted in more positive outcomes for either the employee or themselves. Some of the reasons given by the managers interviewed in the study for intentionally inflating performance ratings included: (1) a desire to maximize the merit increases an employee would be able to receive; (2) to protect or encourage an employee whose performance was suffering for personal reasons; (3) to avoid letting people outside the department know about problems within the department; (4) to avoid creating a written record of poor performance that would become a permanent part of the employee's personnel file; (5) to avoid confronting a problem employee; (6) to give an employee whose performance had improved a break; and (7) to promote out of the department an employee who was a trouble-maker or who didn't fit in. On the other side of the coin, consciously deflating performance ratings, while much less frequent, occurred when the manager wanted to: (1) shock an employee back to high performance; (2) teach a rebellious employee who was in charge; (3) let an employee know that they should consider leaving the organization; and (4) begin to build a documented case that would facilitate the process of terminating the employee. The study by Longenecker and his colleagues discussed above represents a significant first step in identifying some of the important influences on rater motivation. The study employed real 27 managers from a wide variety of different organizations (although currently employed in only seven different companies, collectively they had been involved in the appraisal processes from 197 organizations over the course of their careers), and thus, has a high degree of the external validity. On the other hand, the study suffers from several limitations. First, although having managers freely describe their own evaluation process reduces the potential for priming effects, where the questions asked during the interview create feelings or opinions not otherwise present (Salancik & Pfeffer, 1978), this methodology makes the data inherently more subjective and less rigorous since no a priori hypotheses could be tested. Second, the study contained no actual behavioral measure of rating inflation or deflation except the verbal reports of the managers who were interviewed. Thus, there is no direct evidence that rating distortion, such as that reported by the managers, actually occurred, nor is there information about the magnitude of the distortions. Finally, while the study does demonstrate the pervasiveness of motivational influences on performance appraisals, it consists primarily of a listing of potential motivational constructs. No attempt is made to develop these constructs into an integrated model of rater motivation. As a result, it is less helpful in directing future research on the motivation to rate performance accurately. The study described in this paper attempts to eliminate some of these deficiencies. One specific potential negative consequence of the appraisal that has received some attention in the literature concerns the extent to which raters feel able to confront employees about their performance 28 (Bernardin & Beatty, 1984; Bernardin & Buckley, 1981; Dayal, 1969). Research suggests that raters are more likely to provide lenient ratings when they expect to have to provide employees with feedback on their performance (Fisher, 1979; Sharon & Barlett, 1969). While no research on the psychological processes mediating this relationship exists, a likely explanation is that having to openly discuss and justify their evaluations with employees is an unpleasant and difficult situation that many raters would prefer to avoid. Since inflating performance ratings is one way to avoid a confrontation of this nature, particularly with those employees not performing at the highest level where some negative feedback is required, this practice is not surprising. To deal with this problem, Bernardin and his colleagues (e.g. Bernardin 5 Buckley, 1981) suggest training raters on how to be critical. Utilizing a social learning perspective (Bandura, 1977), they argue that training should focus on increasing a rater's efficacy expectations, or the belief that he/she can successfully execute some behavior, in this case, meeting with the employee and discussing the performance evaluation. mmmmuazm The final motivational issue that has been discussed in the literature is the rater’s trust in the appraisal process (Bernardin & Beatty, 1984). Trust in the appraisal process is defined as, "the extent to which both raters and ratees perceive that the appraisal data will be (or has been) rated accurately and the extent to which they perceive that the appraisal data will be (or has been) used 29 fairly and objectively for pertinent personnel decisions" (Bernardin & Beatty, 1984, p. 268). Although to a large degree, trust in the appraisal process may be indicative of the overall organizational climate, it reflects more specifically the organizational climate with regard to performance evaluations. Trust in the appraisal process is important because it seems to correlate with the degree of leniency in ratings (Bernardin, Orban & Carlyle, 1981). In the Bernardin, Orban and Carlyle (1981) study, performance appraisal systems were going to be developed for two agencies, neither of which had been doing performance evaluations for several years. In one agency, the appraisal was only to be used for employee feedback and development while in the other agency, it was to be used for both employee development and administrative decision- making (e.g. promotion and salary decisions). Before actually implementing the new appraisal systems, managers in both agencies completed a questionnaire designed to measure their gxpggggg trust in the appraisal process (called the TAPS questionnaire). Then confidential practice performance ratings were collected and their actual trust in the appraisal process was assessed. A week later performance ratings were collected again. Several interesting findings emerged. First, from time 1 to time 2, trust in the appraisal process decreased while performance ratings increased (relative to the initial rating) in both agencies. It is interesting to note that both changes were greater in the agency that intended to use appraisals for administrative decision-making. Second, in both agencies a negative correlation was found between trust in the appraisal process and scores on the rating scale, 30 indicating that as trust decreased, ratings became more lenient. Again, the relationship was stronger in the agency where appraisals were also used for administrative purposes which suggests that appraisal purpose may moderate the relationship between trust in the appraisal process and rating level. Although this study makes a contribution to our understanding of performance rating processes by suggesting that trust in the appraisal process may be an important determinant of rating level, it has several weaknesses which complicate interpretation of the results. First, the initial set of performance ratings collected were practice ratings that were kept confidential from everyone in the organization (i.e., they were for research purposes only), while the second set of ratings were not confidential. Previous research indicates that ratings for research purposes only are less lenient than ratings that are used by the organization in some way (e.g. Sharon & Bartlett, 1969). As a result, the increase in leniency observed could have resulted from the change in how ratings were used rather than from changes in the amount of trust in the appraisal process, as suggested by the authors. In addition, the study treats the relationship between trust and performance ratings as a "black box", in that it does not identify the psychological mediators of this relationship or how trust impacts rater motivation. To some extent, issues similar to trust in the appraisal process appeared in the study by Longenecker and his colleagues (Longenecker, Gioia and Sims, 1987) described above, providing further support for the importance of this construct. For example, managers reported 31 greater likelihood of political behavior (and therefore, probably less trust in the appraisal process) in situations where top management did not take the appraisal process seriously and only gave "lip-service" to its importance. It was also more likely when upper managers themselves used political tactics when appraising the performance of their subordinates. Finally, political behavior occurred more frequently when there was a lack of openness and trust between managers and employees about performance appraisal and when raters did not personally value performance appraisal as a tool for helping employees grow and develop. 99351121121! While very little empirical research has been done examining motivational issues in performance appraisal, the small amount that has occurred suggests that motivational influences are likely to have a significant effect on performance ratings in actual organizational settings. Unfortunately, the research on rater motivation so far is fragmented and consists of little more than a listing of some of the organizational and individual level conditions that might reduce rater motivation to record accurate evaluations. What is needed is a more theoretical approach to examining rater motivation that will direct future research on and facilitate understanding of this important influence on performance ratings. One step toward achieving this goal would involve the development of a causal model detailing the way in which these motivational conditions are related to one another. In subsequent sections of this paper, such a model is described and submitted to empirical test. CHAPTER 2: MODEL AND HYPOTHESES In this section, a model detailing the relationships between a number of potential motivational influences is presented. Before describing the model, however, several introductory comments should be made. First, this model is not intended to be an exhaustive description of all the constructs relevant to understanding rater motivation. Such an undertaking is beyond the scope of a single study. As a result, a subset of potentially interrelated motivational influences was selected in order to explain a part of the complicated process by which raters are motivated to provide accurate or inaccurate ratings of performance. Secondly, the model presented is a cognitively based model, in that it relies primarily on a rater's cognitions or beliefs about the performance appraisal situation to explain his/her actions in that situation. This idea is based on the work of a number of theorists in psychology (e.g. Markus & Zajonc, 1985) and sociology (e.g. Ball, 1972; Berger & Luckman, 1966; Silverman, 1971; Thomas, 1928), who have argued that an individual's actions are determined by his/her definition of the situation. According to Ball (1972), an individual's definition of the situation may be seen as "the sum of all recognized information, from the point-of-view of the actor, which is relevant to his locating himself and others, so that he can engage in self-determined lines of action and interaction" (p. 63). The definition of the situation is important because it means that in order to understand social behavior one must look to the meanings of situations as they are experienced by the actors within those situations, rather than to "objective reality" (if, in fact, such a 32 33 thing really exists) since the former determines how an individual will behave. Within a performance appraisal context, this suggests that the rater's definition of the performance appraisal situation (i.e. his/her perceptions or cognitions), rather than objective reality, determines whether or not he/she will be motivated to provide an accurate evaluation of the performance of employees. For example, if a manager believes that an employee is likely to react defensively to a negative performance appraisal, this will have implications for his/her motivation to rate accurately and actual rating behaviors. Whether or not the employee would, in fact, react defensively is irrelevant in determining the subsequent actions of the manager. As Thomas (1928) put it, "if men define situations as real, then they are real in their consequences" (p. 572). Because the model presented here is from the perspective of the rater's definition of the performance appraisal situation (a perspective similar to that adopted by Mohrman & Lawler, 1983), all of the constructs in the model involve the IQ§QIL§ perceptions about various elements of the performance appraisal context. Matthew The model presented in Figure 4 is an attempt to incorporate some of the motivational influences described by previous researchers as well as some additional ones into an integrated picture of the processes affecting rater motivation. Before describing the rationale for specific linkages in the model, a brief summary of the entire model is presented. 34 3:22: 82mm ooamm .3 box: 3523852. on 9 98mm xmmh 3:36; .3653. 7:85: 22. 9: oumwnwmwmmm 3233200 239?. . mma an _ 2 Eonooi moamm noaooaxm _ . a < .mm_maaa< cowmfimwm comm .o E E a $5680 2 553 mcozmagm mocmantma 2882 332a 2 c2838: 85m 9.6522. 920m“. .2: do Enos. ”v 2:9... 35 One important exogeneous variable in the model is the purpose of the performance appraisal, conceptualized as the extent to which appraisals are used for employee development and/or for salary, promotion and termination decisions. Appraisal purpose is important because it influences raters' perceived consequences of the appraisal for the ratee (e.g. salary or promotion decisions, self esteem). The magnitude and direction of these consequences is hypothesized to mediate the relationship between appraisal purpose and the anticipated employee reaction to the appraisal (e.g. accepts the appraisal, becomes angry and defensive etc.). The more negative the expected consequences of the appraisal for the ratee, the more negatively the ratee is expected to react to the appraisal. The anticipated reaction of the ratee to the performance appraisal is also likely to be influenced by the extent to which raters believe they are credible to the ratee as a feedback source. The more credible raters feel they are to the ratee, the more likely they are to expect the ratee to accept the appraisal, regardless of its sign and consequences. Finally, the ratee's expected reaction should be affected by the perceived visibility of performance appraisals to coworkers. When raters believe that members of their workgroup will find out how coworkers were evaluated, then they are also likely to expect a more negative employee reaction to the appraisal due to the potential for comparisons and felt inequities. The model suggests that appraisal visibility is enhanced when task interdependence among ratees in the workgroup is greater because of the greater number of opportunities for interaction and discussion among members of the workgroup. 36 The anticipated reaction of the ratee to the appraisal is expected to influence raters' perceived freedom to be honest and objective when evaluating performance. The more negative the expected reaction, the less raters should feel free to be honest. The perceived freedom to be honest is also hypothesized to be affected by the visibility of performance appraisals to ratees. The more that raters expect employees to find out how coworkers were evaluated, the less freedom they are likely to feel to be honest when doing performance evaluations. Raters may feel unable to allow true performance differences to show up in performance ratings since this might result in conflict among employees in the workgroup. Freedom to be honest should also be lower when raters have a strong desire to be liked by the ratee and when they do not feel they have adequate documentation to support their evaluation. Finally, raters' perceived freedom to be honest and objective when evaluating ratee performance is hypothesized to be positively related to the occurrence of rendering errors (i.e., differences between public and private evaluations of performance). In the sections which follow, the specific motivational influences and the linkages between them will be described in greater detail. Each endogeneous (i.e., dependent) variable, and its hypothesized causes, is discussed separately. wwamwmmm mammal The results of research done so far on appraisal purpose (e.g. Bernardin, Orban & Carlyle, 1981; McIntyre, Smith & Hassett, 1984; 37 Meyer, Kay & French, 1965; Sharon & Bartlett, 1969; Williams, DeNisi, Blencoe & Cafferty, 1985; Zedeck & Cascio, 1982) are fairly clear in demonstrating that the way in which appraisal information is used does affect performance ratings, probably through its effect on rater ability and rater motivation. However, the mechanism by which appraisal purpose affects performance ratings has not been specified. The effect of appraisal purpose in the model described here is hypothesized to be a motivational one. One reason that appraisal purpose might be important from a motivational point of view is that it influences the expected consequences of the appraisal both for raters and ratees (Bartlett, 1983; DeCotiis & Petit, 1987). While both rater and ratee consequences are likely to be important influences on rater motivation, in this model, more emphasis is placed on the consequences of the appraisal that raters expect for ratees. When appraisals are used in the organization for either controlling or coaching purposes, this should influence the attractiveness of potential consequences that raters might expect to result to ratees from the performance appraisal. For example, when appraisals are done for research purposes or when they are completed but serve little purpose in the organization (i.e., they are filed and forgotten), they have few consequences of any significance for ratees. In these situations, raters are likely to perceive little need to distort public ratings of performance, which is likely to account for the lower leniency and greater accuracy typically found when ratings are done for research purposes (e.g. Sharon & Bartlett, 1969). On the other hand, performance ratings that are used in the 38 organization, either for coaching or controlling purposes, do have important consequences for employees. The most obvious and, perhaps, most significant, consequences of appraisals occur when they are used for purposes of control because then they affect ratees' salaries as well as their likelihood of being promoted, demoted or transfered. Appraisals may also affect whether employees are given opportunities for training. When performance appraisals are used for developmental purposes (i.e., to provide developmental feedback to employees), appraisal outcomes might include such things as development of skills and abilities, a better understanding of the job, the ability to improve their own performance and increased (or decreased, depending on the sign of the feedback) self esteem (Mohrman & Lawler, 1983). Thus, the way in which appraisals are used in an organization implies the existence of potential consequences, each of which will have some valence, either positive or negative, to the ratee. This suggests the following hypothesis: H1: The purpose that performance appraisals serve in the organization will be significantly related to the overall attractiveness of the consequences of the appraisal for the ratee. mammmmm mwmmwmmm The overall attractiveness of the appraisal consequences to the ratee is hypothesized to influence his/her reaction to the appraisal. Consistent with the cognitive and perceptual nature of the model, it is suggested that raters attempt to estimate the overall valence of the consequences likely to accrue to a ratee from the appraisal and use this information to predict how he/she will react to the 39 evaluation. The more negative the overall valence of these consequences, the more likely that raters will expect the ratee to reject the appraisal and become defensive and angry in reaction to it. Since many raters are likely to feel uncomfortable about confronting a potentially hostile employee and furthermore, may lack confidence in their ability to cope with this reaction (Bernardin & Buckley, 1981), they may try to avoid the situation by inflating public ratings of performance. In this way, the ratee is less likely to react defensively to the appraisal (a favorable consequence for the rater) and is likely to receive more positive personal and organizational outcomes (a positive consequence for the ratee that is likely to favorably impact raters' future interactions with the ratee). Appraisee reactions to an evaluation can occur both at the time they actually receive the evaluation and later. Clearly, both have important implications for rater motivation. A second negative, but more extreme, ratee reaction that would happen some time after they receive the appraisal might be complaining to the rater's superior or, if the organization is unionized, filing a formal grievance with the union. This might occur if the ratee decides that the appraisal was incorrect or unfair. Here again, raters are likely to try to avoid this potential negative reaction. At the opposite end of the continuum are situations in which raters expect employees to respond favorably to the appraisal (accept the criticism constructively, try to change in response to the feedback etc.). Favorable reactions might be anticipated because raters believe the overall valence of the appraisal consequences for 40 the ratee will be positive. This suggests the following hypothesis: H2: The more positive the expected consequences of the appraisal for the ratee the more positive the anticipated reaction of the ratee to the appraisal. mammmmm The extent to which raters believe they have a high degree of credibility to the ratee as a feedback source is also expected to influence the anticipated reaction of the ratee to the appraisal. This suggestion comes from previous research on performance feedback (Ilgen, Fisher & Taylor, 1979). Ilgen et a1. (1979) argued that perhaps the most important factor influencing the extent to which feedback recipients (i.e., ratees) will accept feedback is the degree of credibility that they attribute to the source of the feedback (i.e., the rater). The credibility of the source is a function of a number of factors. For example, the more expertise the source has the greater his/her credibility to the ratee and the more likely that feedback from them will be accepted (Klein, Kraut and Wolfson, 1971; Tuckman & Oliver, 1968). In addition, the more that recipients trust the source the more credible he/she is and the greater the probability that performance feedback will be accepted (Huse, 1967). Along these lines, source trustworthiness has been found relate positively to ratee perceptions of the atmosphere and helpfulness of feedback sessions and their satisfaction with the session (Ilgen, Peterson, Martin & Boeschen, 1981). Similarly, Wexley & Snell (1987) found that the extent to which managers were attributed with positive power (a composite consisting of French & Raven's (1959) reward power, expert power and referent power) was positively correlated with employee reactions to an appraisal, such as the perceived accuracy of 41 the feedback and the motivation to improve. Taken together, this research suggests that when rater's believe they are respected and trusted by ratees (i.e., have a good working relationship with ratees) they should worry less about ratees responding defensively to the performance appraisal and should expect acceptance of the feedback regardless of whether it is positive or negative. This leads to the third hypothesis: H3: The more credible raters believe they are to the ratee as a feedback source, the more positive the expected reaction of the ratee to the performance appraisal. mm The final variable hypothesized to influence the reaction of the ratee to the performance appraisal is the extent to which raters believe that performance appraisals have a high degree of visibility. Performance appraisals are visible to employees when they are able to find out how coworkers were evaluated. This might happen if employees directly discuss this information among themselves or if it gets passed through the grapevine. A high degree of visibility is expected to increase the probability that raters will anticipate negative ratee reactions to their performance appraisals, either during the actual evaluation interview or subsequently, depending on when employees obtain information about the evaluations of coworkers. To some extent, this relationship may depend on the sign of the performance feedback being received, in that if the evaluation is positive, a high degree of visibility might result in a more positive response to the evaluation (i.e., employees might like others to find out that they received a positive evaluation). 42 Receiving feedback that is basically positive, however, does not ensure that appraisal visibility will lead to a favorable response because of the likelihood that self appraisals and self-other equity comparisons will reduce the attractiveness of the feedback received. Specifically, since most of the research on self appraisals (Holzbach, 1978; Kirchner, 1966; Klimoski & London, 1975; Parker et a1., 1959; Thornton, 1968; 1980; Waldman & Thornton, 1978) suggests that they are more lenient than supervisory appraisals, even an employee who receives a fairly positive evaluation might be unhappy about it, and thus, react negatively, if they felt they should have received a more favorable rating. This possibility is further enhanced by the tendency for self-other equity comparisons to occur. According to Adams (1965), employees frequently compare their inputs (e.g. how hard they work, the amount of work they do) and outcomes (in this case, the evaluation they received) with the inputs and outcomes of others. One reason that self appraisals are generally higher than supervisory evaluations may be that employees tend to overestimate their inputs. If this is true (or if managers believe it to be true), it means that felt-negative inequity perceptions are likely to occur frequently (or at least be expected frequently by managers). The more visible that appraisals are to members of the workgroup, the more likely that raters will expect such comparisons to occur and to expect employees to react negatively to appraisals because they believe they should have received a higher evaluation. This suggests the fourth hypothesis to be tested in this study: H4: The more raters believe appraisals are visible to members of their workgroup, the more negative the anticipated reaction of the ratee to the appraisal. 43 Wills-111211131 leakIWrdee terminal-2mm Several researchers have noted the potentially important influence of task interdependence on the behavior of people in organizations (e.g. Cheng, 1983; Kane & Lawler, 1979; Kiggundu, 1983; Liden & Mitchell, 1983; Mitchell, 1983). Task interdependence among employees in a workgroup exists when the nature of the tasks performed requires them to work together and interact on a regular basis in order to achieve high performance. According to Cheng (1983), when tasks are highly interdependent, no one work role can be performed effectively unless all or most other work roles are carried out properly. Task interdependence is likely to have a direct and positive effect on appraisal visibility. Specifically, the more interdependent that tasks are in the rater's workgroup, the more opportunities there are for employees to discuss their evaluations with each other. When a manager's employees rarely see one another (an extreme example of task independenge) it is less likely that they will find out how each other were evaluated. Therefore, it is hypothesized that: H5: The greater the degree of task interdependence among members of a workgroup, the greater the degree of perceived appraisal visibility. 2m vdmgcmflhsfienast A rater's perceived freedom to be honest is seen as an important attitudinal indicator of whether or not raters are motivated to evaluate performance accurately. It is seen as playing a role in this model that is similar to that played by "turnover intentions" in models of 44 turnover (e.g. Mobley, Horner & Hollingsworth, 1978.). Perceived freedom to be honest has several hypothesized causes in the model of rater motivation that was presented in Figure 4. mummmmw The expected reaction of the ratee to his/her performance appraisal is hypothesized to have a direct effect on raters' perceived freedom to be honest and objective in rating performance. Specifically, the more 'negative the expected reaction the less free raters are likely to feel to be honest. Bernardin and his colleagues (Bernardin & Beatty, 1984; Bernardin & Buckley, 1981) have argued that the tendency of raters to be lenient in providing performance evaluations is probably a defensive behavior aimed at avoiding the potential negative reactions of employees to harsh ratings. This defensiveness may occur because many raters feel they lack the ability to cope effectively with the ratee's anger. This leads to the following hypothesis: H6: The more negative the anticipated ratee reaction to the performance appraisal, the lower the perceived freedom of raters to be honest when providing performance evaluations. mmmnmmmm A second hypothesized cause of the perceived freedom of the rater to be honest when evaluating performance is the extent to which the rater wants to be liked by the ratee. Raters having a strong desire to be liked by a particular ratee are likely to do things they believe will make the ratee like them (e.g. giving the ratee higher ratings than they feel are warranted) and to avoid behaviors which might threaten their relationship with the employee (e.g. giving them low ratings, even if they are deserved). 45 This notion is supported by research concerning the need for affiliation which suggests that individuals possessing a strong desire for companionship and friendly interpersonal relationships may dispense rewards (e.g. positive performance evaluations and thus, the potential for larger salary increases and/or likelihood of being promoted) as a way of winning or keeping friends (McClelland & Burnham, 1976). As a result, individuals with a strong desire to be liked by their employees may feel less freedom to be honest when evaluating performance since an honest, but at least partially negative, evaluation could create tension between them and the ratee. Therefore, it is hypothesized that: H7: The stronger a rater's desire to be liked by a ratee, the lower his/her perceived freedom to be honest when providing performance evaluations. mammw Another hypothesized cause of the freedom of raters to be honest is their ability to document the performance evaluation. When raters believe they are able to document and support their evaluation of a ratee with critical incidents of good and poor performance their perceived freedom to be honest when evaluating performance should be greater. This is based on the belief that feedback is more likely to be accepted when it is supported by specific documentation (Ilgen, Fisher & Taylor, 1978; Leskovec, 1967). In a similar vein, several researchers (e.g. Bernardin & Buckley, 1981) have advocated diary-keeping as way of increasing the gbiligy of raters to make accurate ratings of performance by improving their observation of behavior. For example, results of one study on diary- keeping (Bernardin & Walter, 1977) found that those raters who 46 regularly recorded critical incidents of employee performance had significantly lower leniency and halo and greater interrater agreement than those who did not keep such diaries. It is suggested here that diary-keeping may also positively impact the motivation to rate accurately. Raters who feel they have concrete information and examples to support their evaluations are likely to have greater confidence in their appraisals and thus, feel less apprenhensive about providing negative feedback to ratees. This should then increase their perceived freedom to be honest. Therefore, the eighth hypothesis to be tested is: H8: The more that raters feel they have adequate documentation to support their performance evaluations, the greater their perceived freedom to be honest when providing performance evaluations. Annalee]. JaihLleV t Finally, the perceived visibility of the appraisal is hypothesized to influence the freedom of raters to be honest when doing performance appraisals. Appraisal visibility is expected to have a direct and negative effect on the freedom of raters to be honest because it increases the potential for conflict among employees. If an employee finds out how coworkers were evaluated and, as a result, comes to believe he/she has been treated unfairly, not only may the employee react negatively during the appraisal, but he/she may also argue with coworkers. The possibility of conflict occurring among employees may cause raters to feel less free to differentially evaluate (and reward) all employees based on their actual performance level. This suggests the following hypothesis: 47 H9: The greater the degree of appraisal visibility, the lower the perceived freedom of raters to be honest when doing performance evaluations. mwawm As described earlier, rendering errors occur when there is a difference between private evaluations of performance and the actual ratings publically recorded on an appraisal form. This is considered to be a behavioral indicator of the extent of rater motivation since the motivation to rate accurately must be low if private and public evaluations of performance differ. Percgivgd Erggdom £2 E; Benefit The only hypothesized cause of the occurrence of rendering errors in the model is the perceived freedom of raters to be honest when evaluating performance. The occurrence of rendering errors represents an external and behavioral (i.e., nonattitudinal) measure of the outcome of the perceived freedom to be honest. In other words, it indicates the extent to which this attitude is translated into actual behavior. Thus, the role that the occurrence of rendering errors plays is comparable to that of measures of actual turnover in turnover models. The less free that raters feel to be honest, the more likely they are to record public evaluations that differ from their private evaluations. Therefore, it is hypothesized that: H10: The lower the perceived freedom of raters to be honest when doing performance evaluations the greater the difference between their public and private evaluations of performance. Samara: In the previous section a model describing the relationship between several motivational influences on performance appraisal was 48 described. Previous researchers have recognized appraisal purpose as an important motivational variable but thus far, have failed to describe the process by which purpose influences performance ratings. The model described above attempts to remedy this deficiency by elaborating some of the important motivational constructs intervening between the purpose that appraisals serve in organizations and actual performance ratings. Thus, the model represents a first step in understanding the process by which raters are motivated to provide accurate or inaccurate evaluations of performance. In the study described below, this model was submitted to empirical test. CHAPTER 3: METHOD Wfldfihsdelan Based on a review of relevant literature a model was developed describing the relationship between a number of constructs thought to influence rater motivation to evaluate performance accurately. A questionnaire was then designed to assess these motivational influences. The questionnaire was piloted on a small group of managers to assess its psychometric adequacy and was then completed by a group of full-time employed managers. In addition to completing the questionnaire, these managers participated in a short interview during which they were asked to provide researchers with their most accurate assessment of the performance of a subordinate in their workgroup. This private evaluation was then compared to the most recent public evaluation which the manager provided for that employee (collected from organizational records) to develop a measure of the occurrence of rendering errors. The extent to which the model of rater motivation fit the data was then assessed using latent variable structural equation analysis. Participgnts Two groups of people participated in the study. Forty-seven managers and 54 students participated in the three phases of the pilot study (described below) and 124 managers were involved in the primary data collection for the study. Recruitment of participants for both the pilot study and primary study took place in several stages. The personnel office of the organization was contacted about participation in the study. When consent was given, the researcher was provided with the names of personnel representatives in various units of the 49 50 university. The representatives were contacted and asked to provide the researcher with the names of managers in their units. These managers were then approached by the researcher and asked to participate in the study. It is worth noting that of all the managers called by the researcher, for either the pilot study or the primary data collection, only 3 were unwilling to participate in the study. As recommended by Cohen (1969), an a priori power analysis was conducted to determine the sample size needed to have adequate power to detect a significant effect. Based on an overall effect size of .20 (considered by Cohen to be a small effect size), 12 predictors, an alpha of .05, and power of .80, it was determined that 100 participants would provide an adequate amount of power. One hundred and twenty four managers agreed to participate in the primary study. The managers participating in the primary data collection were employed full-time by a large midwestern university. Although there may be external validity problems with collecting data from a single organization, it was considered desirable that the subject population be drawn from the same organization to ensure consistency in the performance evaluation forms used across participants. This eliminated the need to standardize performance ratings before doing statistical analyses. Furthermore, in many ways, a large university setting offered a partial solution to the generalizability problem. This is because a large university consists of many relatively autonomous units that are involved in very different types of work activities. Therefore, it allowed, in a single setting, the collection of data from both skilled and unskilled, and white collar and blue collar workers, as well as 51 greater variance on educational level. The managers participating in the primary study were employed in a wide variety of areas of the university. These included, but were not limited to: grounds maintenance, security, housing and food service, clerical services, library, administration (e.g. accounting, payroll, recruitment etc.). personnel services, university health center, university computers and information services, and public relations and funds development. The primary sample consisted of 55 males and 69 females, ranging in age from 26 to 65. Ninety-four percent were white, 5% were black and 1% were Asian. It should be noted that all managers involved in the study had been employed by the participating organization for at least a year, had been employed in their current position for at least 6 months, and had held a supervisory position for at least a year. This is important because it demonstrates that all participants in the study had some exposure to the performance evaluation process as implemented in this organization (and their specific department) and had actually conducted performance evaluations on several occasions. Although the number of employees that participants evaluated varied to some extent depending on the area in which they worked, 95% of the participants evaluated between 2 and 15 employees. Precedent; 11:221me Three separate sets of people participated in the pilot study: two groups of managers and one group of students. All managers were employed by the same organization as used in the primary data collection but were not members of the sample for the main study. The S2 purpose of the pilot study was twofold: (l) to make a preliminary assessment of the technical adequacy of the questionnaire and (2) to determine the extent of variance in the measure of rendering errors. The former was important because all of the constructs measured with the questionnaire were exploratory in nature and, thus, not measured with previously used and tested scales. The latter was necessary because the potentially sensitive nature of the information being assessed (i.e., the extent to which raters intentionally provide inaccurate performance ratings) made it possible that managers would hesitate to respond honestly and thus, reduce the variance on this measure. The first phase of the pilot study assessed the technical adequacy of the questionnaire by checking the clarity of the items on the questionnaire and making preliminary assessments of scale reliabilities. _Five managers completed the questionnaire and then met with the researcher to discuss it. They provided input which was used to edit unclear or ambiguous items, eliminate irrelevant items and reduce possible misinterpretations. Thirty managers completed the revised questionnaire so that initial assessments of scale reliability could be made. Although Nunnally (1978) suggests that reliabilities above .60 are adequate for exploratory work, it was desired that these preliminary reliabilities be above .70 if possible. Results from these managers revealed that all of the scales had reliabilities above .70 with the exception of the termination scale (alpha - .63) and the interdependence scale (alpha - .54). In order to improve these scales, the items in them were re-written. In addition, two of the other scales were each 53 modified by eliminating a single item that substantially improved the reliability of the overall scale. The four scales which had been modified were cross-validated by administering them to a group of 54 part-time employeed undergraduate students in two organizational behavior classes. These students were asked to fill out the questionnaire by responding in terms of the organization where they currently worked. Reliabilities for the two scales which had been rewritten were found to be above .70 while those for the two scales which had been modified were similar to the reliabilities found in the original pilot sample (after eliminating the bad item). The results of these analyses suggested that the questionnaire scales were adequate for the primary data collection. The actual items and scales included on the questionnaire are described in detail later. Twelve managers participated in the piloting of the dependent measure. These managers completed the final version of the questionnaire and then participated in the interview during which performance evaluation information was collected. The actual procedures followed were the same as those for the primary data collection and are described in more detail below. Results from this pilot demonstrated enough variance on the dependent measure to proceed with the primary study. 1119mm All managers were personally contacted by telephone and asked to participate in the study. During the initial contact, the study was described to participants as one with the purpose of learning how 54 managers conduct performance appraisals and some of the factors affecting this process. Participants were told that the study would consist of filling out a questionnaire and meeting with the researcher for a short interview and that the total time involved would be about one hour. After managers agreed to participate in the study, the researcher noted that portions of the questionnaire would require them to answer questions in relation to a particular employee in their workgroup (called the "focal ratee"). Managers were told that the focal ratee should be the individual on whom they had most recently completed a formal performance evaluation subject to the constraint that the evaluation had been done at least 2 weeks ago. This procedure for selecting the focal ratee approximates random selection since which individual was most recently evaluated depended only on the ratee's employment date (evaluations were completed annually during the month that employment in the organization commenced). It was important that specific procedures for selecting the focal ratee be given to managers to ensure that they did not use nonrandom selection criteria (e.g. selecting a subordinate whom they personally liked or who was a good performer). Participants were contacted about a week after mailing the questionnaire to ensure that it was received and to set a date for the interview portion of the study (which lasted approximately 30-40 minutes). Questionnaires were collected from participants at the beginning of the interview. It should be noted that because of the procedures used to collect data in this study, the response rate for the questionnaire was very high (over 95%). Only those managers who 55 were unable to continue their participation in the study due to unanticipated time constraints (5 managers), or illness (1 manager) did not return the questionnaire. The primary purpose of the interview was to collect from participants the information needed to determine the extent of difference between the private evaluation and the public evaluation (i.e., the measure of rendering errors, described in greater detail later). The public rating was the evaluation of the focal ratee most recently completed for the organization (obtained from the employee's personnel file) while the private rating was the rater's actual opinion of the ratee. In order for participants to feel free to provide the private evaluation it was necessary to create a climate in which they would feel comfortable providing an honest assessment of the focal ratee's performance. It was felt that this could best be accomplished by giving participants the opportunity to talk informally with the researcher for 25-30 minutes so that rapport could be developed. This should also have increased the willingness of managers to provide the researcher with a copy of the employee's most recent evaluation from organizational records. Thus, the interview portion of the study was considered to be crucial for obtaining the data. The interview proceeded by asking managers to talk about some of their experiences in doing performance evaluations. The interview followed a semi-structured format, in which there were both standard questions asked of all participants and questions which flowed naturally from the comments which participants madel. After the 56 informal discussion ended, participants were asked to provide the researcher with their private evaluation of the focal ratee's performance. The private rating was done on the same evaluation form normally used by the manager when providing performance evaluations for organizational purposes. Although during the interview participants completed their private rating prior to obtaining the public rating from the employee's file, the public rating actually occurred in time before the private rating (i.e., it took place before managers were asked to participate in the study). It was important, however, that the private rating be collected from participants first so that the public evaluation would not be particularly salient to the manager when completing the private evaluation. Thus, priming and consistency effects (Salancik & Pfeffer, 1978) should not have caused managers to intentionally or unintentionally reduce the difference between public and private ratings. This was further reinforced by making sure that there was at least three weeks between completing the public evaluation and the date of the interview, when the private evaluation was done. To increase the likelihood that participants would be honest when providing this evaluation, the researcher informed them that the evaluation was for research purposes only and thus, would not be seen by anyone in the organization. This is because previous research (e.g. Sharon & Bartlett, 1969) suggests that evaluations are less lenient, and therefore, may be more accurate, when completed for research purposes only. The instructions for providing this evaluation were as follows: 57 "Now what I'd like you to do is take a few minutes to think about the performance of the focal ratee and provide me with the most accurate evaluation of the performance of this employee that you can. In doing this evaluation, think 2311 of how well the employee does his/her job. The reason I say this is because sometimes when managers evaluate an employee's performance they may think about things other than just how the employee performs on the job. If this is the case for you, in ghig situation, please ignore any of the other factors that might affect your evaluation when you do it for the organization and think only of the employee's performance. Keep in mind that this evaluation is being done only for purposes of this research and will not be seen by anybody in the organization. Also, do not write the name of the employee on the evaluation form so that there will be no way of identifying whose performance is being evaluated.” It was important that it be clear to participants that the evaluation they were providing here (i.e., the private evaluation) need not be the same as the last evaluation they completed for the employee while in no way suggesting that the two evaluations 93gb; to be different. Informal observations of managers during these instructions indicated that they often seemed confused about what they were to do until the researcher noted that sometimes managers considered factors other than performance when doing evaluations for organizational purposes. After this point, most managers had no difficulty in doing the task, and, in fact, often volunteered the information that their evaluations for the organization did not always reflect their true opinion of an employee's performance. Next managers were asked to provide the researcher with a copy of the most recent evaluation that they had done for the focal ratee. In making this copy, it was stressed that the manager should black out the employee's name, the signatures at the bottom of the form, and any comments written about the employee that might identify him/her. In this way, the employee's identity was protected. 58 To ensure that the manager did not believe the performance of the focal ratee had changed significantly since the date of the public evaluation (and thus, that differences between the two ratings were not due to true performance differences) several safeguards were taken. First, all focal ratees included had been in their current position for at least six months so that the majority of the initial learning would have taken place (and 91% of the ratees had been in their job for a year or more). In addition, it was important that the time between the two evaluations was not great enough that the ratee's performance was likely to have changed. Thus, for 90% of the ratees, the time between the two evaluations was between 3 weeks and 6.5 months. Finally, after obtaining a copy of the public evaluation, managers were directly asked if they believed the focal ratee's performance had changed significantly since the date of this evaluation. Three managers gave an affirmative response to this question, and thus, were eliminated from the study. This resulted in a final sample of 115 managers. Variables Rendering E11915 The primary criterion in this study was the extent to which raters commit rendering errors. Following the ideas of several researchers (Banks & Murphy, 1985; Mohrman & Lawler, 1983), rendering errors occur when there is a discrepancy between a rater's actual opinion of the ratee's performance (i.e., the private rating) and what she/he marks on an evaluation form (i.e., the public rating). It is suggested that the extent to which there is a difference between these 59 two evaluations is a behavioral indication of a rater's motivation to rate accurately. Specifically, the relationship between rendering errors and rater motivation should be negative, so that the greater the difference between private and public evaluations of performance the lower a rater's motivation to rate accurately. This seems appropriate since motivation to rate accurately must be low when raters intentionally evaluate subordinates differently than they feel they really ought to be rated. The occurrence of rendering errors was measured using a difference score. The algebraic difference (rather than absolute difference) between the public and private evaluations was used in this measure. While both intentional under-evaluations (i.e., deflated ratings) and over-evaluations (i.e., inflated ratings) are equally indicative of low motivation to rate accurately, only differences in the positive direction (where public evaluations are higher than private evaluations) were included in the measure of rendering errors. This is because inflated ratings have been found to occur more frequently than deflated ratings (Longenecker, Gioia & Sims, 1987) and thus, only motivational influences likely to result in positive rendering errors were incorporated into the model. Therefore, the dependent measure was really a measure of rating inflation. The participating organization utilized two primary evaluation forms, one for employees in administrative and/or management positions and one for nonmanagerial employeesz. The forms are presented in . Appendix A. These evaluation forms were the basis for determining the extent to which raters made rendering errors. The forms were very 60 similar in that both consisted of a list of general job traits or dimensions (e.g. Job Knowledge, Dependability, Attitude and Cooperation etc.) that were evaluated on a 5-point rating scale, with anchors ranging from "outstanding" to "unsatisfactory.” The primary difference between the forms was in the number of job dimensions on the form and what the actual dimensions were. The nonmanagerial form consisted of seven performance dimensions while the administrative/managerial form contained nine performance dimensions. There was some overlap in the dimensions included on the two forms but the administrative/managerial form contained two dimensions that were only appropriate for managers (Ability to Develop subordinates and Supervision) as well as two dimensions that were only relevant for employees in certain types of administrative/managerial positions (Cost Control and Affirmative Action). Because there was some variability in the number of performance dimensions that were actually used by participants in evaluating their subordinates, it was necessary to take this into account when computing the measure of rating inflation. The actual measure of rating inflation was calculated by subtracting the private evaluation from the public evaluation on each performance dimension. Positive differences (i.e., where the public evaluation was higher than the private evaluation) on any performance dimension were then summed and divided by the maximum difference possible given the number of dimensions used on the evaluation form. For example, with 5-point rating scales, the maximum difference between public and private ratings possible on any performance 61 dimension would be 4 (5 - 1 - 4). If evaluations were provided on seven performance dimensions, then the maximum difference possible would be 28 (4 x 7 - 28). Calculated in this way, the measure of rating inflation indicates the proportion of all differences possible that were in the positive direction. It is worth noting that 35% of the instances of differences between public and private evaluations were deflations (i.e., the public evaluation on some performance dimension was lower than the private evaluation for that dimension) and, as indicated earlier, deflations were ignored when calculating the measure of rating inflation for each participant. mm The motivational influences were measured using a questionnaire developed by the researcher for this purpose. With the exception of Performance Appraisal Consequences, they were all measured using 5- point Likert scales (ranging from "strongly agree" to “strongly disagree") where respondents were asked to indicate the extent to which they agreed with each statement. All of the motivational influences reflected the Iggg;;§ perceptions of various aspects of the performance appraisal situation. This stems from the underlying assumption behind the model that whether or not raters are motivated to rate accurately is determined by their definition of the appraisal situation. However, it creates a potential percept-percept (or common method variance) measurement problem (Campbell & Fiske, 1959). The problem with having all measures provided by one source is that indicators of relationships between variables may be inflated. While this problem cannot be eliminated in the present study, several factors combine to lessen the negative effect of the problem. 62 First, although all measures were provided by the rater, the two components of the measure of rating inflation (i.e., the public and private evaluations) were obtained at different times and both were obtained at a different time than the questionnaire measures of the motivational influences. As noted above, the public evaluation was collected from organizational records while the private evaluation was obtained about two weeks after the questionnaire measures during the interview with the participant. This temporal separation reduces the potential for inflated relationships resulting from obtaining measures of variables from the same source. Secondly, when testing the model of rater motivation, the pattern of relationships between variables is more important than the actual magnitude of the relationships in determining the fit of the data to the model. Thus, although the magnitudes could be inflated due to percept-percept bias, this should not affect the pattern of relationships between the variables and hence, should not bias a test of the overall fit of the model. Each of the motivational influences is described below. Two types of motivational influences were measured. Some of the motivational influences deal with a particular employee. Items assessing these motivational influences were answered in relation to the ”focal ratee” (described above). The other motivational influences deal with either the organization in general or characteristics of the participant's department or workgroup. Thus, these items were answered independently of any particular employee. Motivational influences of the former type were: (1) expected 63 consequences of the appraisal for the ratee; (2) credibility of the rater to the ratee; (3) rater's desire to be liked by the ratee; and (4) reaction of the ratee to the appraisal. Motivational influences of the latter type were: (1) appraisal purpose; (2) task interdependence among employees; (3) ability to document the appraisal; (4) appraisal visibility; and (5) perceived freedom to be honest. Examples of items to measure each of the variables are provided below. A complete copy of the questionnaire is included in Appendix B. The procedures for measuring the expected consequences of the appraisal for the ratee are provided in Appendix C while a list of the questionnaire items included in each of the other scales is included in Appendix D. Ceneegeeneee e; Ehfi Appreieel £2 ehe Beeee. This scale was used to determine the overall perceived attractiveness of the appraisal consequences to the focal ratee. Participants were provided with a list of eleven potential outcomes that a subordinate could obtain from a performance appraisal (e.g. a large salary increase, a promotion, a transfer, development of skills and abilities etc.) and asked to indicate two things for each outcome. First, participants indicated the likelihood that each of the outcomes would occur given the subordinate's actual performance level. This was measured as a probability, ranging from ”0" (will definitely not occur) to ”1" (will definitely occur). Second, managers were asked to indicate how attractive they thought each outcome was to the subordinate. This was measured with a 5-point scale ranging from "would like receiving this outcome very much" to "would dislike receiving this outcome very much." The perceived instrumentality of each outcome was multiplied 64 by the valence of that outcome and summed across all outcomes to yield an overall indication of the valence of the appraisal consequences for the ratee. High scores on this variable indicated that the rater believed that, given the ratee's true performance level, the consequences of the appraisal for the ratee would be positive. gregihiliey ef Shfi Beee; 59 Lbs Beeee. This scale measured the extent to which raters believed they were trusted and respected by the focal ratee. Sample items included: ”This individual trusts my judgment on work-related matters" and “This employee does not think very highly of me as a supervisor” (reverse scored). A high score on this scale indicated that raters believed they had a high degree of credibility to the ratee. There were six items in this scale. Deeire £9 he Likfié by £h£ Regee. This scale assessed the degree to which the rater wanted to be liked by the focal ratee and to have a good relationship with him/her. Sample items included, ”In order to be satisfied with my work, I need to have a good working relationship with this employee” and "I would not go out of my way to try to get this person to like me” (reverse scored). High scores on this scale indicated that it was important to the rater to be liked by the focal ratee. Five items were included in this scale. BEQQELQD 2f the Regee 5e ehe Appxeieel. This scale measured the extent to which raters felt that the focal ratee was likely to react defensively to performance feedback. Sample items included, "This person is able to respond constructively to feedback on his/her performance" and "It is not uncommon for this individual to feel that I am attacking him/her personally if he/she receives less than the 65 highest performance ratings” (reverse scored). High scores on this scale indicated that the rater believed the ratee would respond positively to the performance evaluation. There were seven items included in this scale. Purpose of the appraisal consisted for four subscales, employee development, salary decisions, promotion decisions, and termination decisions, each of which were measured separately. Each subscale was included in the causal model tested in this study as a separate exogenous variable. Each subscale is described below. Perpeee efi ehe Appreisal; Empleyee Develepmene. This scale indicated the extent to which the rater believed performance appraisal information was used to help employees grow and develop on the job. Sample items included, "Formal performance appraisals provide a means for me to get together with each of the individuals in my department to discuss how to help them become better employees" and ”In this organization, performance appraisals are rarely used to show individuals areas of their performance where improvement is needed" (reverse scored). A high score on this scale indicated that raters believed subordinate development was an important purpose for performance appraisals. Five items were included in this scale. Perpeee efi £h2.822121§§li fielezy Deeieiene. This scale measured the extent to which managers believed performance appraisal information was used in making salary decisions. Sample items included: ”In this organization the best way to ensure receiving a large wage/salary increase is to receive a good performance appraisal rating" and "Most of the raises that the people in my unit receive are based very little upon merit" (reverse scored). High scores on this 66 scale meant that raters believed performance appraisals had a great deal of impact on salary decisions. There were six items included in this scale. Wefthsaw mm. This scale assessed whether or not managers believed performance appraisal information was used in making promotion decisions. Sample items included: ”Only people who receive high performance evaluations will be promoted in this organization" and "Promotions are based on who you know rather than how well you perform" (reverse scored). High scores indicated that managers perceived promotion decisions to be based upon performance appraisal information. There were four items in this scale. Matthew mm. This setof items indicated the extent to which managers thought termination decisions depended upon performance appraisal data. Sample items included, "Termination decisions are made only after consulting an employee's performance appraisal records" and "A person's performance on the job is not a major factor considered by those who make termination decisions" (reverse scored). High scores indicated that managers believed performance appraisal information was used in making termination decisions. Five items were included in this scale. Ability £9 Deeeheng the Exeleeeieh. Items in this scale measured the extent to which raters felt they were able to support their performance evaluations of employees with specific behavioral examples. Sample items in this scale included, ”I am generally able to support my evaluations of individuals working in my unit with 67 specific incidents of good and poor performance" and ”I should keep better records on the performance of people in my department than I do” (reverse scored). A high score on this scale meant that raters believed they were typically able to document their performance evaluations of subordinates. There were four items in this scale. Teak Ingezeeeeheenee Amehg Empleyeee. This scale measured the extent to which the jobs supervised by the rater required a great deal of interaction among employees in order to be completed effectively. Sample items included, ”The people that I supervise often need to coordinate their work activities with each other” and "The jobs which I supervise don't require much interaction among employees” (reverse scored). High scores on this scale were indicative of a high degree of task interdependence among subordinates. Six items were included in this scale. Aeezeieel E1§1§111£¥- This scale assessed whether raters believed that members of their workgroup would find out how each other were evaluated. Sample items in this scale included, "People in my workgroup often compare their performance ratings” and "It would be very unusual for individuals in my unit to mention their performance appraisal ratings to each other" (reverse scored). High scores on this scale meant that managers believed performance appraisals had a high degree of visibility. Four items were included in this scale. Pezeeizee Exeegem fie he Heneee. This scale measured the extent to which raters felt free to rate employee performance honestly and openly. Sample items included, ”I would rarely hesitate to tell an employee my true assessment of his/her performance" and ”If there was some way I could avoid having to approach my employees about a problem 68 with their performance I would do it" (reverse scored). High scores on this scale indicated that raters felt free to be honest when evaluating an employee's performance. There were four items in this scale. 12mm mm w sf Lime: _r__relSt uctu Baum Alleluia The primary data analytic strategy used in this study was the analysis of linear structural equations. The analysis of linear structural equations was accomplished using LISREL VI (Joreskog and Sorbom, 1984), a procedure that derives parameter estimates for the unknown coefficients in a set of linear structural equations. Parameter estimates can be derived using either a maximum likelihood or an unweighted least squares solution. In this study, a latent variable structural model with multiple manifest indicators was utilized. A latent variable is an unobserved variable presumed to exist within a structural model but which can't be measured directly; in other words, it is a hypothetical or theoretical construct (James, Mulaik & Brett, 1982). The primary reason for using latent variable models with multiple indicators, rather than manifest variable models where each latent construct is represented by only one manifest variable, is that they offer a solution to the problem of working with variables that are not measured with perfect reliability (Bentler, 1980). Unreliability of variables is a problem because it results in biased estimates of structural parameters and path coefficients linking latent variables (James et a1., 1982). In addition the use of latent variable models 69 allows testing the a priori measurement model to determine whether the manifest indicators are, in fact, related to the latent variables with the hypothesized structure. Essentially, this is a test of the construct validity of the measurement instrument (James, et a1., 1982). The testing of latent variable structural models with LISREL proceeds through a two step process. The first step involves testing the measurement model, which details the relationships between each latent variable (or cause) and the manifest, or measured, variables (the effects) that serve as indicators of that cause. The measurement model is tested using confirmatory factor analysis to determine whether the items on the questionnaire form the clusters intended a priori to exist. The second step involves an assessment of the adequacy of the hypothesized structural model, which specifies the causal relationships among the latent variables. The goodness-of-fit for both models is determined by the extent to which the observed correlation matrix is similar to the reproduced correlation matrix based on the parameter estimates derived from the hypothesized model. The more similar the reproduced and observed correlation matrices are, the better the degree of fit. Mafia There are a number of ways to assess the degree of fit for a model (i.e., the extent to which the hypothesized model is consistent with the data). According to Joreskog & Sorbom (1984), unreasonable values for parameter estimates (e.g. correlations greater than 1.00), squared multiple correlations or coefficients of determination that are negative or large standard errors are all indications that the \ 70 model does not fit the data very well. In addition, there are several specific measures which indicate the overall goodness of fit for both the measurement and the structural model. The Chi-Square (x2) and its associated degrees of freedom and probability level provides one overall measure of fit (for maximum likelihood estimation procedures only). Although the x2 measure can theoretically be shown to be the likelihood ratio test statistic for testing the hypothesized model against the alternative that the model is unconstrained (in which case perfect fit would result), Joreskog and Sorbom (1984) do not recommend using it in this way since the assumptions underlying this usage are rarely met in practice. Rather, they suggest that the x2 be used as an overall index of fit, where large values correspond to poor fit and small values correspond to good fit. The degrees of freedom in the model serve as the standard for determining whether the x2 2 is large or small. According to the authors, a ratio of x to degrees of freedom of 3:1 or less reflects a good degree of fit. A second overall way to assess fit, the goodness of fit index (GFI), is a measure of the relative amount of variances and covariances jointly accounted for by the model (Joreskog & Sorbom, 1984). This goodness of fit index can also be adjusted for the number of degrees of freedom in the model (called the adjusted goodness of fit index, or AGFI). Both of these measures should be between zero and one, with values closer to one indicating better fit. Finally, the root mean square residual can be used to assess overall fit. The root mean square residual (RMSR) is a measure of the average of the 71 residual variances and covariances. The smaller the value of the RMSR, the less the difference between the observed and reproduced matrices and thus, the better the degree of fit. Formulas for all fit indices are provided in Joreskog and Sorbom (1984). WWMMefimcwa Mississauga The use of structural equation analysis involves a number of assumptions about the data and the model which should be true when data are collected at one point in time in order to make strong causal statements about the relationships among the variables in the model (see Bentler, 1980 and James et a1., 1982 for a more complete discussion of the conditions and assumptions underlying the use of causal analysis). For example, it is assumed that causal effects have occurred rapidly, that the system of relationships among the variables has reached an ”equilibrium-type condition” at the time of data collection (i.e., are relatively stable and constant), and that the structural model, as originally hypothesized, is specified correctly. The latter condition implies (1) that the paths hypothesized to have nonzero structural parameters are actually significantly different from zero and (2) that unspecified paths hypothesized to have structural parameters equal to zero do have parameters not differing from zero. Additionally, it is assumed that all variables are measured on interval scales and with a high degree of reliability and that relationships among variables within the model are linear. A further assumption inherent in all forms of causal analysis is that the causes for a dependent endogenous variable are uncorrelated with the residual (or error term) of the causal equation for that endogenous variable and with the residual for any endogenous variable 72 occurring later in the causal ordering of the model (James, 1980). This assumption also implies that the error terms for the path equations of each endogenous variable are uncorrelated (Duncan, 1975; James, 1980; James et a1., 1982). To the extent that this assumption is violated, it indicates that there are relevant unmeasured causes in the model. An unmeasured cause is considered to be relevant (or important) when it is stable, has a nontrivial direct influence on an effect, is related to at least one other cause in the model and makes a unique contribution to the model (James et a1., 1982). Relevant unmeasured variables are a problem because they result in biased estimates of path coefficients for the variables that are included in the model. When using structural equation analysis, an important concern is the extent to which the hypothesized model is identified. Identification concerns whether or not enough information is available to obtain unique mathematical solutions for the structural parameters (James et a1., 1982). In order for a model to be tested it must be overidentified, which means, loosely speaking, that there are more data points (correlations) than there are parameters to estimate. Models which are underidentified or just identified cannot be tested because the one-to-one correspondance between data and parameters means that these models cannot be rejected (Bentler, 1980). Although determination of the identification status of any model is extremely complex, some guidelines are available. Specifically, James, Mulaik and Brett (1982) suggest that each latent variable should be represented by at least four manifest indicators. When this 73 is the case, the measurement submodels relating a set of manifest indicators to their respective latent variable will be overidentified and it should be possible to test the underlying measurement model. In the present study, all latent variables had at least four indicators. Due to the complexity of the identification issue, it should be noted that the LISREL VI procedures check the identification status of any model before computing parameter estimates. This check has been found to be nearly 100% reliable (Joreskog & Sorbom, 1984). For a more thorough discussion of identification see James et a1. (1982) or Kenny (1975). mammmmmmm A pictorial representation of the measurement model and structural model tested in this study is provided in Figure 53. Following the conventions in the structural modeling literature, observed variables (i.e., the manifest indicators) are enclosed in squares and denoted with Arabic letters ("x" for the manifest indicators of the exogenous variables and ”y” for the indicators of the endogenous variables). Due to space limitations in the figure, x- variables are numbered consecutively from "1" to ”41" (rather than “x1“ to “x41"), beginning with “docu” (ability to document the appraisal) and ending with ”desire” (desire to be liked by the ratee). The y-variables are numbered consecutively from '1" to ”17", beginning with 'paconseq” (performance appraisal consequences) and ending with "inflation" (rating inflation). The actual questionnaire items corresponding to each of the manifest indicators in Figure 5 are presented in Appendix D. Latent variables are enclosed in circles and labeled with Greek letters (ksi, é, for the eight exogenous variables 74 « 3?; o co:a>:o_2 cocoa do .0005. .9285 0:0 EoEoSmooS. po~no£oa>1 no 050: omaa exec ca 90509 com o 75 Footnote for Figure 5 Variables Names 1. Docu - Ability to Document the Appraisal 2. Cred - Credibility of the Rater to the Ratee 3. Devel - Purpose of the Appraisal: Employee Development 4. Salary - Purpose of the Appraisal: Salary Decisions 5. Term - Purpose of the Appraisal: Termination Decisions 6. Promo - Purpose of the Appraisal: Promotion Decisions 7. Interdep - Task Interdependence Among Employees 8. Desire - Rater's Desire to be Liked by the Ratee 9. Paconseq - Expected Consequences of the Appraisal for the Ratee 10. Visibility - Appraisal Visibility 11. Reaction - Reaction of the Ratee to the Appraisal 12. Honesty - Perceived Freedom to be Honest 13. Inflation - Rating Inflation Symbols 1. £1 - latent exogenous variables (enclosed in a circle) 2. "i - latent endogenous variables (enclosed in a circle) 3. x1 - manifest indicators (i.e., questionnaire items) for each latent variable (enclosed in a square) 4. Axij - path from £1 to xi 5. Ayij - path from "j to yi 6. 111 - path from £1 to n1 7. flij - path from "j to "i 76 and eta, n, for the five endogenous variables). Parameters to be estimated are labeled with the appropriate Greek letters. Paths from each of the {-variables to the appropriate x- variables are denoted lambda-x (Ax) while those from the n-variables to the appropriate ysvariables are denoted lambda-y (Ay). Even though there were multiple Ax or Ay paths for each exogenous and endogenous variable, respectively, space limitations required that only one of these paths in Figure 5 be labeled for each of the latent variables. Paths between an endogenous and exogenous variable are labeled by gammas (1) and those between two endogenous variables are labeled by betas (6). Again, following the conventions in causal analysis, each path coefficient has two subscripts, the first being the subscript of the variable that the arrow is pointing to (i.e., the effect) and the second being the subscript of the variable that the arrow is coming from (i.e., the cause). Thus, the paths from the latent variable "docu" (£1) to its four manifest indicators are, respectively, Axll’ 2x21’ Ax31' and Ax4l' while the path from "paconseq" ("1) to "reaction" ("3) is labeled B31. CHAPTER 4: RESULTS The results of this study are discussed in three sections. The first section describes the findings from the confirmatory factor analysis used to assess the construct validity of the proposed measurement model. Secondly, the results from the confirmatory analysis of linear structural equations for the hypothesized structural model are discussed. The final section presents the findings from the subsequent EXPLQIEEQEX analysis done on the data. amass 2f the Basement Medal The measurement model was assessed by confirmatory factor analysis using LISREL VI (Joreskog & Sorbom, 1984). As described previously, the measurement model examined in this study is depicted in Figure 5. The initial confirmatory factor analysis was done using all of the items on the questionnaire. The GFI for this model was .803 the AGFI was .788 and the RMSR was .096. These indices reflect a moderate degree of fit between the data and the model. However, examination of the factor loadings obtained for each item on the relevant latent construct revealed that some of the items were not good indicators of the latent construct they were intended to measure. Items with factor loadings below .30 were dropped from the measurement model. Eliminating these items served both to improve the fit of the model and to increase the stability of the parameter estimates by increasing the ratio of people to items. In most cases, dropping the items with low factor loadings from the scales also resulted in an increase in the reliability estimate (assessed with coefficient alpha) for the scale. The confirmatory factor analysis was then repeated with the smaller set of items. It should be noted that the final set 77 /—. V | l ii. 78 of parameter estimates for the measurement model could be due to capitalization on chance and thus, should be replicated to increase confidence in their validity. The means, standard deviations and reliabilities for each of the motivational influences (based on the final set of items used to measure each latent construct) are presented in Table 1. This table also shows the intercorrelations (based on the raw data) between the scales assessing the motivational influences. Table 2 contains the final set of factor loadings for each latent construct (i.e., the lambda matrix in LISREL terminology). The indices of fit for the final measurement model showed a sizeable improvement. Although it is clear that the imposed structure did not account for all of the covariance between the items, the GFI (.880), the AGFI (.864) and the RMSR (.087), taken together, suggest that the measurement model does adequately account for the observed data. Table 3 presents the correlations between the latent constructs (the phi matrix). The phi matrix was used as the input into LISREL for the assessment of the structural model. Aeeeesmeht ef the Strugguzél HQQEl Figure 5 depicted the structural model (in combination with the measurement model) that was tested in this study while Figure 6 shows the structural model with the obtained structural parameters. T- values are calculated to assess the significance of the structural parameters. The t-value for a parameter estimate is calculated by dividing the parameter estimate by its standard error. T-values larger than two are judged to be significantly different from zero at the alpha - .05 level. The structural coefficients for the original 79 23.5036 5a.: 93 ca 890% 5 was 8fluflwnc3om 5E2 ufifiuflmuoam .manmflga mum figumd baggage“. a: on emufi 205m a 53:8 >20 meadow 03» 085a :: 8.: emf mo. R.: 3.: 3.: 3.: 2.: 2.: 2.: 3.: 3.: 8. 8. acoflmflfi .3 Go; 8.: 3.: ma. mm. mm. 3. mm. ma. mm. an. R. on. 35 3382.3 am; 2.: me. Q. R. 2. 3. 2. mm. R. 3. B. 3...” 82.2 :3 2.: 2.: mo. 8. 8.: 3.: 8. our 8.: 2.. 3S Badflmgda ll hm. ON. mo. no. NH. an. an. .8. we. db...” a; lCOUdm .m 3m; nH. No.l HH.I .8... ma. No. mo. 50. No4” fig .m AR; 3. mm. 2. cm. an. «N. 5. 8.4 E .s mo: mm. an. «N. 3. 8. ms. 3...” SE .m :3 R. 3. 8. 3. we. 2...” 8383 .m 8m; R. mo. 2. 84 SS Rudd .4 Ge; om. mm. 3. more EH98 .m a: 2. S. 8.4 Demanded .m «a; 3.. as; 83 :35868 A 2 S S 2 m m a. e m e n m H on :8: 3.395 $02033 @3939 on» guano: Manda gen 963ch use wwuafinmfium .gfiflg magnum dead: ”a 03mm. 80 Table 2: Factor Loadings for Confirmatory Factor Analysis - Lambda Matrix Item Easter Item E__terac PA We v men 1.000 816 .350 826 .514 Vieihiliey B35 .828 B3 .852 B41 .412 B22 .775 856 .703 B32 .696 B50 .726 Salagy B2 .741 Reactioh B9 .647 E2 .629 B14 .782 E3 .712 B31 .848 E9 .696 B55 .862 E11 .589 B60 .845 E17 .553 E24 .600 erm t 0 E27 .675 B4 .570 B24 .762 Hohes§y B30 .593 B20 .612 B40 .758 B29 .782 BS9 .520 B43 .585 B47 .499 Egomotioh B7 .727 Ihfleeieh B10 .589 1.000 B39 .478 854 .785 Decumeneaeieh 88 .606 te nde e 819 .574 811 .584 842 .604 815 .692 848 .664 821 .641 836 .761 magma: BS3 .691 E1 .659 858 .552 E8 .542 E10 .559 Desire E16 .380 E6 .859 E22 .401 E7 .464 E26 .602 E15 .757 E23 .538 E26 .466 81 84 8.: 44.: 8. 8.: 8.: 84.: 8.: 8.: 44.: 54.: 8.: 8.: 8484.45 44 84 mm. 8.: mm. 8. 44. mm. 8. m4. 3. 4n. 8. 3882.3 84 em... 8. 8. m4. m4. m4. 8.: mm. mm. 44. 88894 .44 84 m4.: 8.: 8. 8.: 84.: 84.: 8. 8.: 8.: 3444.483 .84 84 8. 4N. 84. 8. n4. 8. mm. 8.: 89868 1.88 .m 84 >4. 8.: 8.: 8. m4. 8. 8. 8.4488 .w 84 em. 8. on. 8. 84. 34. gm .4. 8.4 m4. 8. on. m4. 8. 84g .8 8.4 mm. 8. 84. m4. 83g .m 84 mm. 8. n4. 848 .4 84 4m. 8. 4.888488 .4 84 84. 344484qu .N 8.4 84» 8455.08 .4 m4 N4 44 84 m m a. o m 4 n m 4 64849.5 x458: am 85 : 8484.48 88.4 5982 808884.455 3 «48.4. . nopaoac. 9 Va . 2> 82 8.. .8..- 8. 08.- c 80:0 :0 000 0080 0 . 0.0m 80. :43. icon. .0>0O imam. . 00.0 3000 c0z0>=0.2 .202 .0 .0005. 00805400.»... .0. maoaoanoa .9285 no 050.... 83 model and their associated t-values are presented in Table 4. The x2 for this model was 110.16 with 37 degrees of freedom (p < .01), indicating that the observed and reproduced matrices differed significantly from one another. While this is typically considered disconfirming evidence, Joreskog (1978) and others (e.g. Bentler & Bonett, 1980; Tucker & Lewis, 1973) have noted that the x2 test is very powerful, particularly with large samples, and thus, tends to reject the model even when differences between the observed and reproduced matrices are small. Thus, as noted earlier, the 2 preferred use of the x is in comparison to the degrees of freedom, with the recommended ratio being 3:1 or less. In this case, the ratio of x2 to degrees of freedom was slightly under 3:1, which suggests a reasonably good fit of the model to the data. Further support for this conclusion comes from examining the other indices of fit. Specifically, the GFI (.883), the AGFI (.711) and the RMSR (.118) all indicate that the hypothesized model accounted for the observed data reasonably well. The specific hypotheses concerning the relationships between the motivational influences are discussed below. mm; The first hypothesis stated that the purpose of the performance appraisal would be significantly related to the overall magnitude and attractiveness of the consequences of the appraisal for the ratee. Four potential purposes for performance appraisals were included in the model: subordinate development, salary decisions, promotion decisions and termination decisions. This hypothesis was only supported for subordinate development. Subordinate development had a 84 Table 4: Structural Coefficients and T-Values for the Originally Hypothesized Model Structural Parameter Coefficient T-Value B31 .273 3.119** 832 -.150 ~1.716* 842 -.126 -l.464 £43 .119 1.330 554 -.289 -3.052** 113 .363 3.694** 714 .047 .413 115 -.089 -.781 716 .009 .087 727 .092 .951 132 .181 2.068** 741 .353 4.144** 748 .171 2.000** ** p < .05 * p < .10 85 significant and positive structural coefficient with performance appraisal consequences (713 - .363; t - 3.694), indicating that the more performance appraisals are used for purposes of subordinate development the more positive the consequences of the appraisal for subordinates. The structural coefficients for salary decisions (114 - .047; t - .413), termination decisions (115 - -.089; t - -.781) and promotion decisions (716 - .009; t - .087) were not significant. Expothesig 2 The second hypothesis was that the more attractive the expected consequences of the appraisal for the ratee, the more positive the anticipated subordinate reaction to the appraisal. This hypothesis was supported, as indicated by the significant and positive structural coefficient between performance appraisal consequences and subordinate reaction to the appraisal (631 - .273; t - 3.119). W1 This hypothesis stated that the more credible raters believe they are to subordinates as feedback sources, the more they will expect subordinates to respond positively to the evaluation. The significant positive structural coefficient between rater credibility and anticipated subordinate reaction (132 - .181; t - 2.068) demonstrates that this hypothesis was supported. WA The fourth hypothesis was that appraisal visibility would be negatively related to the anticipated reaction of the subordinate to the appraisal. There was partial support for this hypothesis. Although the structural coefficient (£32 - -.150) indicated that the relationship between appraisal visibility and anticipated subordinate 86 reaction was in the hypothesized direction, the coefficient was only marginally significant (t - -l.716; p < .10). W152 The fifth hypothesis was that the greater the degree of task interdependence between subordinates in a workgroup the greater the degree of appraisal visibility. The structural coefficient between task interdependence and appraisal visibility was not significant (127 - .092; t - .951), indicating that this hypothesis was not supported. M2136 This hypothesis suggested that the expected reaction of the subordinate to the appraisal would be positively related to the perceived freedom of the rater to be honest when evaluating performance. This hypothesis was not supported. While the direction of the relationship was as hypothesized (£43 - .119), the path coefficient was not significantly different from zero (t - 1.330). W1 The seventh hypothesis stated that the stronger a rater's desire to be liked by the ratee, the lower his/her perceived freedom to be honest when evaluating performance. Examination of the structural coefficient between desire to be liked and freedom to be honest (748 - .171) indicates that although the coefficient was significant (t - 2.000), the direction of the relationship was the opposite of that hypothesized. Specifically, the stronger a rater's desire to be liked, the greater his/her perceived freedom to be honest. 87 W118 The next hypothesis posited a positive relationship between the ability of the rater to document the performance evaluation and his/her perceived freedom to be honest. The structural coefficient for this relationship (141 - .353; t - 4.114) was significantly different from zero and in the direction hypothesized, indicating that the more raters believe they are able to document their evaluations, the more they feel free to be honest when evaluating performance. M1112 The final cause hypothesized for perceived freedom to be honest was appraisal visibility. Specifically, it was hypothesized that the greater the degree of appraisal visibility, the lower the perceived freedom of the rater to be honest. This hypothesis was not supported. Although the structural coefficient (642 - -.l26) was in the right direction, the coefficient was not significant (t - -1.464). M11519. The last hypothesis was that the lower the perceived freedom of the rater to be honest the greater the occurrence of rendering errors (i.e., differences between public and private evaluations). The significant negative structural coefficient between freedom to be honest and rating inflation (654 - -.289; t - -3.052) indicates that this hypothesis was supported. mm The last analyses to be described are the results from an exploratory analysis designed to improve the fit of the data to the model and thus, to provide suggestions for future research. As noted above, although the originally hypothesized model fit the data 88 reasonably well, examination of the results revealed several possible changes in the model that might improve its fit to the data. The modification indices provided by LISREL for each fixed parameter are useful in identifying possible ways to change the model by relaxing parameters previously fixed to zero. While the specific computation for the modification indices is complicated, it can be shown that the modification index for a given path equals the expected decrease in X2 if this particular constraint is relaxed and all other estimated parameters are held fixed at their estimated values (Joreskog & Sorbom, 1984). Clearly, utilizing the modification indices to suggest changes in the model to improve fit constitutes an exploratory analysis of the data and could result in capitalizing on chance relationships that might be present in the data. Thus, it is important to note that any significant structural coefficient resulting from changes made in the model would need to be cross-validated with another sample. Joreskog and Sorbom (1984) provide several guidelines for using the modification indices to make changes in the model. First, they recommend making changes sequentially, which means that one parameter should be relaxed at a time. Specifically, they suggest relaxing the fixed parameter with the largest modification index, as long as it is greater than 5.00, and then reassessing the fit of the model. A 2 reduction in x that is large relative to the change in the degrees of freedom represents a real improvement in the model. In contrast, a 2 drop in x equal to or smaller than the change in the degrees of freedom probably indicates that the improvement in fit was obtained by 89 capitalizing on chance. The second recommendation made by Joreskog and Sorbom (1984) is to only make changes that have substantive meaning and which result in parameters that can be interpreted. Following these procedures, several changes were made in the originally hypothesized model. The structural model summarizing these changes and the resulting structural coefficients are presented in Figure 7. Table 5 presents the structural coefficients and their associated t-values for this modified model“. The indices of fit obtained after each sequential change in the model are contained in Table 6. Specific changes made in the model are discussed below. The first change involved the addition of a path from task interdependence to perceived freedom to be honest (based on a modification index of 13.342). The resulting structural coefficient (747 - .323) was significantly different from zero (t - 3.882), indicating that a high degree of task interdependence resulted in greater perceived freedom to be honest. Furthermore, the decrease in x2 of 14.7 compared to a decrease of l in the degrees of freedom was large enough to suggest that the change probably represented a real improvement in the model. A second change in the model was adding a path from anticipated subordinate reaction to the appraisal to rating inflation (based on a modification index of 9.939). The structural coefficient for this path was negative (553 - -.290) and significantly different from zero (t - -3.124), indicating that when subordinates were expected to react negatively to the appraisal, the amount of rating inflation increased. Again, the ratio of the decrease in x2 to the decrease in degrees of freedom (10:1), suggests a substantial improvement in the model. 90 o..v0 . 0 a0 000.0E. nova .. .. o «8. :«R. «S :02. 9:05 :3..- .09.. . 08 _ E.0.F _ :0. c 8:0 :0. 000 0 :0: . :08: 2 I 03. _ . w. 03000; :ooN- :oom. :ooo co:0>..o.2 .063 .0 .0005. 00200.). .0. m.0.0E0.0n. 0.202% K 0.30.. 91 Table 5: Structural Coefficients and T-Values for the Modified Model Structural Parameter Coefficient T-Value £31 .273 3.119** 632 -.150 -l.716* 642 -.157 -1.997** 643 .044 .542 654 -.216 -2.362** 653 -.290 -3.135** 113 .363 3.694** 114 .047 .413 115 -.089 -.781 716 .009 .087 127 .092 .951 132 .181 2.068** 141 .260 3.230** 748 .165 2.074** 745 .200 2.453** 147 .272 3.251** ** p < .05 * p < .10 92 o8. 8... 8m. 8 In 3.3 H80: 3:... 8o. 8... 3m. an and 3.2. 83803 Rm m8. 8... 8m. mm 8.3 3.8 838.03 Em mod. mm... mm». mm 2.: 8.3 838380: and 0:. fig. 8m. 2 In 035 ~89. 3:38 mus. E E0 no £88.00 03.3.8 .08: 5 0890 983803 36.5580 Em H03. 0.53.5 you «805 fl... 00 08880 6 03c... 93 The third change in the model was the addition of a path from using appraisals for termination decisions to perceived freedom to be honest (based on a modification index of 5.695). The structural coefficient for the path from termination decisions was positive and significantly different from zero (gammaas - .200; t - 2.453). This indicates that the more appraisals are used for termination decisions, the greater the perceived freedom of the rater to be honest. Finally, the decrease in x2 (6.28) relative to the decrease in the degrees of freedom (1) again suggested a sizeable improvement in the model. The final change in the model involved eliminating all paths that were not significantly different from zero. This model is presented in Figure 8. The structural coefficients and t-values are presented in Table 7. The overall assessment of the degree of fit of the final model to the data indicated some improvement over the originally hypothesized model. Specifically, the x2 for the final model was 81.11, with 39 degrees of freedom, a ratio of about 2:1. Furthermore, the GFI (.908) and the AGFI (.786) were both higher while the RMSR (.090) was lower than in the initial model. 94 o—.va . . 0 .. _ 0.50 no v 300m. - . n. - 000.0.c. .Lcw. :02. .an. . E.0h 23:0... .bom ooCPNf oomopf w_> CO .55.. .8 F .- . .oour Cozoomz . - 0080 0 .0>0n. :mR. , 0 a he :08. r _ .OFmFo 8.0 5:02.05. 0.0a .0 .0005. .09... .0. 09060.00 0.3.03.6 no 050... 95 Table 7: Structural Coefficients and T-Values for the Final Model Structural Parameter Coefficient T-Value 631 .273 3.119** 832 -.150 -1.7l7* 642 -.l65 -2.127** 354 -.216 -2.397** 553 -.290 -3.151** 113 .359 3.960** 132 .181 2.067** 741 .261 3.246** 748 .179 2.265** 745 .205 2.513** 747 .277 3.320** ** p < .05 * p < .10 CHAPTER 5: DISCUSSION WMWQW The purpose of this study was to gain a better understanding of some of the factors that can influence the accuracy of performance ratings. It has been suggested here and by others (e.g. Banks & Murphy, 1985; Bernardin & Beatty, 1984; DeCotiis & Petit, 1978; Mohrman & Lawler, 1983) that performance rating accuracy has two primary determinants, rater ability and rater motivation. Given the large body of previous research that has examined influences on rater ability (cf. Landy & Farr, 1980), the focus of the present study was on rater motivation. More specifically, a cognitive process model depicting the relationships between a number of potential motivational influences was proposed and submitted to empirical test. Before discussing the results of the analysis of this model, however, several brief, informal observations about the data are presented. limno a 922mm The first observation concerns the extent to which managers appear to intentionally provide public ratings of performance that are not accurate. Over 70% of the participants in the study provided public evaluations that were higher on one or more performance dimensions than their actual opinions about the employee's performance. At the same time, they indicated their belief that the employee's performance had not changed since the date of the public evaluation, suggesting that true performance change did not account for the difference between the two evaluations. While this was an indirect and unobtrusive measure of rater motivation, many managers frankly admitted during the interview with the researcher that they 96 97 intentionally distorted evaluations of employees when they felt there was a "good" reason for doing so. This is consistent with the findings of Longenecker, Gioia & Sims (1987), who also found widespread admission by managers that political considerations and intentional rating distortions frequently entered into performance evaluation processes. One interesting contrast between the present study and the Longenecker et a1. study was in which aspects of the evaluation form were most subject to distortion. Longenecker and his colleagues reported that managers were more likely to distort the overall evaluation of performance than they were their evaluation of any of the specific dimensions on the form. This apparently occurred because the overall rating was believed to be the most important to employees and because this was the evaluation used for administrative decision- making. In the present study, the opposite was found, in that distortions appeared to be more likely on individual dimensions than in the overall evaluation. Nearly 70% of managers did no; distort the overall evaluation even though many of them manipulated ratings on specific dimensions. A possible reason for this difference is that the organization from which the data were collected in this study tended not to use performance evaluations for administrative decision-making so there was less reason to distort the overall evaluation (which would probably be the basis for these decisions). Thus, managers could make an employee feel better by inflating some of the dimension ratings while at the same time maintain a reasonably accurate overall evaluation. Another potential explanation for this finding was the 98 fact that in the present organization, giving an employee an overall evaluation of ”outstanding" (the highest score on the rating form) required attaching a separate written explanation supporting the evaluation. During the interview, many managers stated that they were reluctant to give overall "outstanding" ratings for this reason. On the other hand, "outstanding" on one or more individual dimensions did not require any documentation, making inflation of dimension evaluations less “costly" and thus, more likely. Another interesting informal observation was that the particular subordinate being evaluated seemed to influence whether or not such distortion took place and thus, distortion was not a general phenomenon that occurred for all the employees evaluated by managers. One assumption underlying the model tested in this study was that managers make decisions about how accurately to rate performance based, at least partially, on characteristics of the specific person being evaluated and their relationship with this person. In conversations with the researcher, a number of managers noted that they were more likely to distort the evaluations of some subordinates than of others. While the reasons for this varied, it is significant that managers considered a number of factors relevant to the specific recipient of the evaluation before determining the extent of distortion of the public rating. MW WW More sophisticated examination of the hypothesized model using linear structural equation analysis showed that the data were generally consistent with the overall model depicting rater cognitive 99 processes as well as with a number of the specific linkages hypothesized to exist. Results relating to specific linkages in the model are discussed next. The finding that using appraisals for employee development positively affected the attractiveness of the consequences of the appraisal to the ratee is consistent with discussions of previous authors (e.g. DeCotiis & Petit, 1978; Mohrman & Lawler, 1983; Sharon & Bartlett, 1969) on the effect of appraisal purpose on performance ratings. These researchers have noted that evaluations done for purposes of development are typically less lenient and more accurate than evaluations used for administrative decision-making. To a large extent, this may be due to the fact that evaluations used for developmental purposes are more likely to have positive consequences for ratees, as found in this study. When suggestions for employee performance improvement are made from the perspective of helping the person develop into a more competent and valuable employee they are less likely to have a negative effect on his/her self esteem and, in fact, may even increase self esteem. Furthermore, employees who are concerned about doing a good job are likely to value the opportunity for training and for the development of job skills and abilities as well as the chance to gain a better understanding of their job and role in the organization. As hypothesized, the attractiveness of the consequences of the appraisal seemed to be important, at least partially because of its relationship with the expected reaction of the ratee to the appraisal. When raters expected the consequences of the appraisal for the ratee to be negative, they were more likely to expect the ratee to react 100 defensively and nonconstructively to the performance evaluation. A negative reaction was also expected when managers believed that employees would find out how their coworkers were evaluated (i.e., appraisal visibility was high) and when they did not believe they were credible to employees as a feedback source. Furthermore, results from the exploratory analysis revealed a significant negative relationship between expected ratee reaction and the occurrence of rendering errors. Specifically, an anticipated negative reaction resulted in greater positive differences between public and private evaluations (i.e., inflated public evaluations). The anticipated reaction of the ratee to the appraisal seems to be a pivotal influence on rater motivation to evaluate performance accurately. There are several possible reasons for this. First, how the ratee reacts to the appraisal has long term implications for future interactions between the rater and the ratee (DeCotiis & Petit, 1978; McCall & DeVries, 1977). Raters may (justifiably, perhaps) hesitate to provide negative (but honest) feedback to ratees if they believe the ratee won't accept the feedback, will get hostile or defensive, or possibly even file a grievance, particularly when they know that in the future they will have to work with the employee and keep them motivated to do their job. Another possible reason for the importance of the anticipated reaction of the ratee to the appraisal stems from the perceived ability of the rater to effectively handle the feedback situation (Bernardin & Beatty, 1984; Bernardin & Buckley, 1981). Intentionally inflating performance ratings may be a defensive strategy for raters 101 designed to avoid having to cope with the anticipated negative reaction of an employee to the evaluation. Bernardin and his colleagues discuss this tendency within the context of Bandura's (1977) social learning theory. Social learning theory suggests two critical cognitions that could influence a rater's motivation to evaluate performance accurately: (1) an efficacy expectation, which is the conviction that one can successfully execute a behavior in order to produce a particular outcome and (2) an outcome expectation, which concerns the extent to which a person believes that some outcome will result from the behavior. Even if the manager believes that something positive will result from confronting the employee about problems with his/her performance (e.g. the employee will be motivated to improve), if the manager does not believe that he/she would be successful in dealing with the situation (an example of low efficacy expectations), then he/she would probably not be very motivated to rate the employee accurately. As noted by Bandura, Adams, and Beyer (1977): Strength of convictions in one's own effectiveness determines whether coping behavior will be attempted in the first place. People fear and avoid threatening situations they believe exceed their coping abilities, whereas they behave assuredly when they judge themselves capable of managing situations that otherwise intimidate them (p. 126). This description of managerial behavior is consistent with the expectancy theory perspective on rater motivation discussed earlier. Low efficacy expectations in social learning theory would be comparable to a low effort---performance expectancy in expectancy theory (Porter & Lawler, 1968). From a practical point of view the importance of the anticipated 102 reaction of the employee to the appraisal suggests the need to increase the rater's expectation of personal efficacy for dealing with this reaction (Bernardin 6 Beatty, 1984). Only when raters believe they are capable of effectively handling this difficult situation will a potential negative reaction by the ratee not result in lower motivation to rate accurately. To this end, Bandura (1977) suggests several sources of information that should serve to increase efficacy expectations. These include performance accomplishments, vicarious experience, verbal persuasion and emotional arousal. Performance accomplishments are considered to be the most effective way to increase personal efficacy since they are based on experiences of personal mastery. However, vicarious experience can also be useful. In order to increase rater motivation, these two sources of information could be utilized within a typical behavioral modeling training program (e.g. Goldstein & Sorcher, 1974; Latham, Wexley & Pursell, 1975; Spool, 1978). Such a training program might involve having managers view videotapes of people successfully dealing with a difficult ratee during a performance appraisal session, along with a discussion of several key learning points that would help them to execute the appropriate behaviors themselves. Then managers would be given the opportunity to actually practice these behaviors and receive feedback on their effectiveness. This form of training has been found to be successful in teaching managers interpersonal skills (e.g. Carroll, Paine & Ivancevich, 1972) so it also appears to have potential for reducing the occurrence of inflated performance ratings resulting from a rater's low efficacy expectations. Another practical strategy that might have utility for 103 eliminating rendering errors resulting from an anticipated defensive or hostile employee reaction might be oriented toward teaching ratees how to receive and deal with negative performance feedback (Bernardin & Beatty, 1984). A behavior modeling program similar to that described above might be helpful in this regard. In addition, developing an appraisal system that employees trust and believe is fair and useful should also be effective in reducing the potential for a negative employee reaction since employees should have more confidence in the evaluations they receive (Bernardin & Beatty, 1984). The perceived freedom of the rater to be honest when evaluating performance was found to be an important attitudinal precursor of the likelihood of rendering errors occurring. The less that raters felt they could be honest when evaluating an employee's performance, the less honest they actually were, as exemplified by the difference between their public and private evaluations. A rater's perceptions concerning how honest they were able to be was influenced by both the visibility of performance appraisals and the ability of the rater to document and support his/her evaluations with concrete behavioral examples. The former indicates that the more employees in the workgroup find out how each other were evaluated, the lower the perceived freedom of the rater to provide honest evaluations. A likely explanation for this is that managers believe comparisons among members of the workgroup concerning their evaluations will lead to dissatisfaction, anger and/or perceptions of inequity if they find out another coworker received a higher evaluation than they did. In other words, managers appear to believe 104 that employees are not able to distinguish good and poor performance and thus, if an honest (and at least partially negative) appraisal is given, employees will believe it is unfair. This belief is consistent with research examining self appraisals, which indicates that they tend to be more lenient than ratings from supervisors (e.g. Kirchner, 1965; Parker, Taylor, Barrett & Martens, 1959). Nevertheless, one practical implication of this concern is that there is a need for explicit and unambiguous definitions of both the dimensions upon which performance will be evaluated and the standards that will be used in identifying various levels of performance effectiveness. The more that raters and ratees share a common understanding of what constitutes effective performance and the more that raters feel they can apply the standards consistently, the less likely that raters will be concerned about employees feeling they have been evaluated unfairly (even if they find out a coworker received a higher evaluation than they did). It is interesting to note, in support of this suggestion, that the organization where the data for this study were collected used an evaluation form that consisted of seven to nine general performance dimensions. When asked by the researcher if they felt the form was adequate, most managers reported that it was not and that they disliked it because the dimensions were too vague and the standards (e.g. what constitutes "very good" performance on some dimension) unclear. If the managers using the form felt it was ambiguous then it would be surprising if employees didn't also feel the same way, thereby opening the door for misunderstandings that most managers would probably prefer to avoid (and hence, the lower perceived freedom 105 to be honest). The ability of the rater to document and support his/her evaluations of performance was another determinant of the perceived freedom of the rater to be honest when evaluating performance. The more that raters felt they were able to provide concrete behavioral examples to back-up their ratings, the more willing they were to provide an honest performance appraisal. A number of researchers (e.g. Bernardin & Buckley, 1981; Borman, 1979a) have recommended diary-keeping of critical incidents of work performance as a way of improving rater observational skills and thus, rating accuracy. People who have been trained to record critical incidents have been found to provide ratings with less leniency and halo and greater interrater agreement (e.g. Buckley & Bernardin, 1980; Bernardin & Walters, 1977). The implicit assumption of this research is that diary-keeping resulted in improved rater ability to evaluate performance accurately through better observational skills. The results of the present study, however, suggest an alternative explanation. Specifically, since diary-keeping is likely to increase the extent to which raters feel they are able to document their evaluations this should give them greater confidence in their evaluation and thus, greater perceived freedom to provide an honest assessment of performance. Since perceived freedom to be honest was found reduce the occurrence of rendering errors it appears that being able to document evaluations has an indirect and positive effect on rater motivation and the accuracy of public ratings of performance. 106 mm WW In spite of the overall fit for the proposed model, there were several hypothesized linkages in the model that did not receive support from the data. Most notable perhaps, given the large amount of research that has been done on appraisal purpose, was the lack of any relationship between the administrative purpose of performance appraisals (e.g. salary, promotion and termination decisions) and the attractiveness of performance appraisal consequences. One probable explanation for this is that in the organization from which data were collected, performance appraisals were only tangentially related to administrative decision-making in most units. For example, the organization is unionized at most levels and, thus, union contract, rather than an employee's performance level, determines salary increases for most employees. Further, both promotion and termination decisions also bear little direct relationship to employee performance. Promotions in this organization occur through a somewhat unusual process, which differs substantially from that used in most organizations, because of a highly complicated job classification system. Except in a few units of the organization, it is fairly uncommon for there to be a standard career path in the department or administrative unit into which management selects individuals based on their performance. Rather, promotions in this organizations typically occur through one of two processes: (1) the employee decides he/she wants a job at a higher classification level and applies for such a position in the organization when one is posted or (2) the employee has his/her current job reclassified at a higher level by demonstrating that the duties involved in the position 107 correspond more closely to those duties typically part of jobs at the higher level. Similarly, managers in the study reported that it was extremely rare for employees to be fired, regardless of their performance level. Given the strong probability of range restriction on appraisal purpose it is likely that the hypotheses involving these variables did not receive an adequate test in this study. Two other hypotheses, both involving the perceived freedom of the rater to be honest, were also not supported. First, it was hypothesized that the expected reaction of the ratee to the appraisal would have a direct positive effect on the perceived freedom of the rater to be honest. Although this relationship was in the hypothesized direction, it did not reach statistical significance. Furthermore, when the direct linkage between expected ratee reaction and the occurrence of rendering errors was added into the model (during the exploratory analysis), the relationship between ratee reaction and honesty became trivial in magnitude. This suggests that the relationship originally observed between reaction and honesty could have been spurious (i.e., it may have only existed because both reaction and honesty were correlated with the occurrence of rendering errors). In addition, the desire of the rater to be liked by the ratee was hypothesized to be negatively related to the rater's freedom to be honest. However, the exact opposite relationship was found. At first glance these results seem surprising. However, an examination of the items contained in the original honesty scale suggests a possible explanation for these findings. Specifically, it appears as though two somewhat distinct subscales existed among the 108 items. Four of the items seemed to be related to the general feelings that raters have about doing performance appraisals (e.g. “I feel uncomfortable telling an employee he/she is not performing well") while three of the items appeared to measure whether or not raters believed it was important to tell the truth when evaluating performance (e.g. "When evaluating an employee's performance, I don't feel that complete honesty is always the best policy"). After doing the confirmatory factor analysis, the items remaining in the scale were primarily those of the former type that dealt with the manager's general feelings concerning performance appraisals. Given this, it is not surprising that the desire of the rater to be liked was positively related to perceived freedom to be honest - when raters want very much to be liked by subordinates they are more likely to feel uncomfortable about doing performance appraisals because of the fear that providing negative feedback will cause employees not to like them. This would also be consistent with the findings from the exploratory analysis that the extent to which appraisals were used for termination decisions and degree of task interdependence were both significantly and positively related to perceived freedom to be honest. Concerning the former relationship, several managers participating in the study had gone through the process of terminating an employee because of poor performance and all of them agreed that it was a very difficult and time-consuming process that generally created many negative feelings between the manager and his/her employees (due to the involvement of the union in the process). Given the difficulty involved in terminating an employee, it is not surprising that the potential for this kind of situation would make managers feel very 109 uncomfortable about providing honest performance appraisals. Similarly, when a high degree of task interdependence exists among employees in the workgroup managers appear to feel more uncomfortable about providing honest appraisals, perhaps because of the fact that such honesty might result in conflict among members of the workgroup, which could then lower the overall performance level of the group. While the tentative nature of these findings should be recognized (since they resulted from an exploratory examination of the data), they are consistent with other relationships observed. However, cross-validation with another sample is necessary to have greater confidence in the validity of these conclusions. On the other hand, the expected reaction of the ratee to the performance appraisal appears to be more strongly related to the importance managers place on being honest and on the extent to which they actually are honest than to their general feelings about doing performance appraisals. When honesty was assessed as the manager's general feelings about performance appraisals, the relationship between expected reaction and honesty was positive but not significant. However, when a separate structural equation analysis was done using only the items assessing the manager's belief that telling the truth is important (the items not used in the original analysis), the expected reaction of the ratee was found to be significantly related to the perceived freedom of the rater to be honest, as hypothesized. In addition, the exploratory analysis provided results consistent with this. Specifically, the addition of the direct linkage between 110 reaction to the appraisal and the occurrence of rendering errors improved the overall fit of the model. Thus, anticipated reaction to the appraisal appears to have a stronger effect on the actual behavior of the rater (or measures that are more closely related to actual behavior) than it does on his/her general feelings about doing performance evaluations. Why this occurs is somewhat unclear although it may be related to the amount of variability among raters on their general feelings toward doing performance appraisals. While most people probably feel uncomfortable doing appraisals and providing negative feedback to employees, there may be more variability across people in the extent to which these feelings actually translate into distorted performance ratings. The final unsupported hypothesis concerned the relationship between task interdependence and appraisal visibility. It was expected that task interdependence would increase appraisal visibility due to more opportunities to discuss or hear about the evaluations of coworkers. While there was a slight tendency for this to be true, the relationship was not significant. It appears that that other factors, besides just opportunity, influence whether or not employees find out how their coworkers were evaluated. For example, it is likely that there are informal norms in workgroups about the extent to which evaluations are discussed and become ”public knowledge." In addition, whether or not coworkers have personal friendships with one another is likely to influence whether they talk to one another about evaluations. Thus, while opportunity may be a necessary condition for a high degree of appraisal visibility it does not appear to be a suffficient condition. lll Limigagiohs in she Study In spite of the reasonably strong degree of empirical support found for the model of rater motivation hypothesized in this study, it is necessary to recognize some of the limitations present in this study. The first limitation concerns the somewhat small sample size of the study (n - 115). Although this sample size was determined, a priori, to result in an adequate amount of power to detect significant effects, the difficulty with the sample size stems from the fact that parameter estimates are less stable when they are based on a smaller sample size. The potential instability of the parameter estimates indicates a greater need for cross-validating the findings from this study with another sample of people. Another limitation in the present study concerns the extent to which there are unmeasured causes for any of the endogeneous variables in the model (James, 1980; James et a1., 1982). This is typically referred to as the "unmeasured variables problem." Specifically, to the extent that there are relevant unmeasured causes for any of the endogeneous variables included in the structural model, biased estimates of structural coefficients may result. James (1980), has argued that the unmeasured variables problem is unavoidable and thus, the relevant question is not whether or not there is an unmeasured variables problem but rather, to what extent does the problem exist. James (1980) presents several decision rules for determining the seriousness of an unmeasured variables problem. These involve determining if there are unmeasured causes for any of the endogeneous variables in the model and then assessing the extent to which these 112 cause are expected to make a unique and nontrivial contribution to one of the effects in the model. Unmeasured causes that are expected to have only small effects and which are linearly dependent on other causes that are measured are not relevant and thus, are not likely to bias parameter estimates. Unfortunately, given the lack of previous empirical research on factors influencing rater motivation to provide accurate performance ratings, it is extremely difficult to determine the seriousness of the unmeasured variables problem in this study. While clearly only speculative, other potential causes of the anticipated reaction of the ratee to the appraisal might include (1) the extent to which the ratee believes the evaluation was a fair assessment of his/her performance or (2) the extent to which the ratee values personal growth and development. For perceived freedom to be honest, possible unmeasured causes might be: (1) the extent to which raters believe their evaluations will be reviewed by superiors and (2) the extent to which raters want to create a favorable impression with superiors. For the first cause (i.e., the anticipated reaction of the ratee to the appraisal), the extent to which the ratee believes the evaluation is fair is likely to be at least moderately related to the credibility of the rater as a feedback source. For the other potential unmeasured causes of the two endogeneous variables discussed above, the lack of previous research makes the assessment of linear dependence with other causes only speculative. The same would be true for potential unmeasured causes for the other endogeneous variables in the model. Overall, the seriousness of the unmeasured variables problem in the model tested in this study is indeterminable. Thus, 113 the magnitude of the structural coefficients found in this study should be accepted with some caution. Future research on other possible causes is clearly needed to resolve this issue. The issue of the external validity of the findings from this study also deserves mention. The data were all obtained from a single organization that appeared to be unique in some respects (e.g. in the degree to which performance appraisals were used for making administrative decisions) and thus, potentially unrepresentative of other organizations. On the other hand, as noted earlier, the organization selected was large enough and diverse enough to allow collecting data from many semi-autonomous units and a wide variety of types of employees (e.g. skilled, unskilled, educated, uneducated etc.). Since the findings appeared to be consistent across such divergent organizational settings and types of employees it is likely that the phenomena observed in this study are fairly representative of the behavior of people in a variety of situations. Nevertheless, the extent to which the findings from this study actually do generalize to other types of settings with other groups of people is an important issue that remains for future research to demonstrate. The only way to be truly confident about the generalizability of the results of a study is to empirically, and conceptually (i.e., using different research procedures), replicate the study (Cook & Campbell, 1976). 114 WMMW This study examined the relationships between a subset of motivational influences in order to begin to develop an understanding of the complicated process by which individuals are motivated to provide accurate or inaccurate ratings of performance. However, while this study has advanced our understanding of this important aspect of the performance appraisal process, it is clear that there is much yet that we do not know. The component model of performance rating (Landy & Farr, 1980) used earlier to summarize the research examining rater ability suggests some areas for future research on rater motivation. With respect to the rating instrument, it would be helpful to learn whether or not the appraisal form itself influences the motivation of raters to provide accurate ratings. DeCotiis and Petit (1978) suggested that when raters understand how to use the appraisal instrument and when they perceive it as being adequate (e.g. includes all relevant aspects of job performance, does not include irrelevant job dimensions) and appropriate for its purpose, then they will be more motivated to use the appraisal form accurately. Informal comments by managers participating in the present study would appear to support this as a possible motivational influence, but it remains to be tested empirically. It might also be interesting to determine if different types of evaluation forms are more subject to intentional distortions. For example, because of the ambiguity of the dimensions contained on a typical trait-oriented rating scale, intentional distortions might be more likely to occur because of the difficulty of detecting them. On the other hand, a behavior or results oriented rating system might be 115 less subject to this type of distortion since these formats are less ambiguous, require less interpretation, and are easier to verify. Several rater characteristics might also influence motivation to rate accurately. For example, it is possible that some traditional individual difference variables, such as rater self esteem, locus of control, or need for affiliation, might influence whether or not rating distortion occurs. Similarly, raters who generally have positive beliefs about the nature of other people (Wexley & Youtz, 1985) or who are very people-oriented might be more likely to inflate ratings because they don't want to make employees feel bad by giving them a negative evaluation or because they feel sorry for employees who are having problems. On a more specific level, raters who personally value performance appraisals and believe they are worthwhile might be less likely to intentionally inflate the performance ratings of their employees (as suggested by Longenecker et a1., 1987). It is also plausible that ratee demographic characteristics might influence the extent to which raters are motivated to rate performance accurately. Intentional distortion of performance ratings might be more likely for females or blacks. Along similar lines, dyadic characteristics might also be relevant. For example, rating distortion might be more prevalent in mixed sex or mixed race dyads than in dyads where the manager is evaluating someone of the same race or sex. Research on these ratee characteristics could shed some light onto the reasons for race and sex discrimination in performance evaluations. 116 There are also a number of potential contextual influences on the motivation to rate accurately. The culture of the organization is one such influence. To the extent that top management in the organization believes in the appraisal process and values employee growth and development then the motivation of managers to rate accurately should be higher. This is similar to the notions of the "political culture” of the organization (Longenecker et a1., 1987) and trust in the appraisal process (Bernardin & Beatty, 1984) discussed by others. Other contextual factors might include the extent to which superiors scrutinize and evaluate the performance appraisals of their employees (Kane & Lawler, 1979; Longenecker et a1., 1987) and the amount of time pressure managers are under to complete evaluations. Future research is needed to examine these and other potential influences on rater motivation. From a more practical point of view, research examining the effectiveness of training raters to increase confidence in their ability to deal with defensive or hostile employees during appraisal sessions is also needed. The utility of training ratees to respond more appropriately to negative feedback is also worthy of investigation (Bernardin & Beatty, 1984). As noted earlier, a behavioral modeling approach to training might be an effective method for strengthening rater and ratee interpersonal skills in this type of situation. Training in diary-keeping procedures (e.g. Bernardin & Buckley, 1981) could also be an effective way to increase the perceived ability of raters to document their performance evaluations. Finally, future research should address reasons for intentional deflation of performance ratings. In this study, about 35% of the ll7 incidences of differences between public and private ratings involved deflations, where public ratings were lower than private ratings, which indicates that deflation is a phenomenon worth some attention. While no attempt was made to identify correlates of deflation in this study, Longenecker et al. (1987) suggested several possible reasons for deflation. These included shocking an employee back to high performance or sending a message to an employee that they should consider quitting their job. However, these and other explanations for deflation need to be examined more systematically. M12}: The present study represents a different focus for performance appraisal research. Until recently, most research on performance appraisal has ignored the impact of the social context in which ratings occur on the accuracy of those ratings. This study demonstrates that such an omission has resulted in a serious gap in our understanding of the performance appraisal process as it occurs in organizational settings. Clearly, rater motivation is an important influence on performance rating accuracy. While the present study is only a preliminary investigation of some of the possible motivational influences, it is a first step toward gaining an understanding of this important phenomenon. Future research needs to focus on the motivational determinants of performance ratings if the goal of accurate ratings is to be achieved. APPENDICES APPENDIX A Evaluation Forms Used by the Organization NAME COO-.00.... musENT ....... DEPARTMENT. . . - DATE EMPLOYED. CLASSIFICATION 3 EVALUATION DATE The supervisor‘s opinion 0! the employee's performance should he indicated on the scale as ohjectively as possible The evaluation m he renewed and discussed with the ernployce. Rating Factors: Couider each [actor separately and independently. Iase your rating on ohservahle and proven performance. My: (0) Indicates an eatrenely W level or job performance. W (V) Mermace'nheyoadnor-altequirenentaandeoaipetence. Seminars. (S) Fulfillsthenorualjohrannirernentsvithsoaestronapoiats Wm) Mmhhelo-johnnmhminpmiaanticipated. W (U) Lot periorrnance level shows a significant htnttation that artist he improved snhetantially to acceptable. When appropriate. trite in ‘conttnents' sectionis). 'No opportunity to ohserve'. IlAD f‘TlRl IE\ (Rh! SIDE DEIDRE LSI‘G TMIS OOIM QUANTITY OF WORK: Consider achievements resulting from personal efiort. Also completion o! assignments. O V S N U Comments: 1 l l QUALITY OF WORK; Consider acuracy. thoroughness. usability. and dependability of results. 0 V S N U Comments: I J 1 108 KNOWLEDGE: Understanding of ohjecuves. duties and responsibilities gained through education. training a v s s u “"m' [DID (- v ' N' V: Ability to he sell-starting efficient. resourceitil and creative toward job 0 v 5 5- U objectives. dill!“ and responsibilities. m r- W Ability and willingness to cooperate with supervisors. coworkers and others. 0 V 5 5' U follow directions and rules. accept constructive criticism and exhibit good Em judgement. r QEPENQAQILEQ: Consider regularity of attendance. punctuality. and attention to use at rest periods. Also. 0 V S N'U users deadlines. r CAEQQHI T9 DEVELQE: Consider the parental to develop skills. itnprove job performance and assume roots 0 V S V U mm Comments: OVERALL EVALUATION: AN OVERALL RATING OF "OUTSTANDING.’ 'NEEDS IMPROVEMENT.“ OR 'UNSATISFACTORY' REQUIRES WRITTEN DOCUMENTATION TO BE INCLUDED WITH THIS EVALUATION. Consider the employee‘s total joh periormnce u a major (actor not rated shove is considered. please 0 V 5 V U lain. CID EgplaiaADoea-enlU.Not0J—) | '1 A FOLLOW-UP EVALUATION FOR EMPLOYEES RATED ’NEEDS IMPROVEMENT" OR 'UNSATISFACTORY‘ IS NORMALLY REQUIRED WITHIN N DAYS. THE FOLLOW-UP REVIEW SHOULD IE SCHEDULED FOR SUPERVISORS COMMENTS‘ EMPLOYEES COM MENTS' I certify that this evolution was reviewed with .e hy ny supervisor. My signature does not necessarily indicate my More: one W—Ak MEN ADMINIS'ETRATOR 118 DATE 119 NAME ........... ”RM SENT .............. “PARTMENT. . . . DATE EMPLO‘ ED ........ WICATTON' EVALUATDN PERDD It. to mmdmmmuMum-anW-e-nammnimusu—amnm Mmmsmmemommu] Whuuflmafl-fidmde—hmwm O v s N U M flMflTVE Amowcwdmmiumumdpa O V S N U M emu-mm-ummamnmmuum‘nmp-um 0 V S N U m mmmothmuMW-‘uumoh‘: " r ‘ -‘ O V S ~ U m muma-un-ummummnmumunmmummm O V S N U O—enra- _IIUM_ANR_ELATTONS.Ahiity-am.mmmm 0 V s N U m O V S N U mm:m:mammmm.mmmum Orr—em: monsooeonosmowumdmmuw 0 V 5 " U M V WPERVBIOV‘Eflcnvet-nmmdmwmmm. O S N U m OVERALL EVALUATION! MR THE EMPLOYEES TOTAL JOB PERFORMANCE. IF A MAJOR FACTOR NOT RATED AwVE IS WED. PLEASE EXPIAI'N. MLTJIIIITIL A FOLLOW-UP EVALUAT'DN FOR EMPLOYEE RATED 'NEEIS IMPROVEMENT‘ OR ‘UNSATTSFACI'ORY' IS REQUIRED NORMALLY WITHIN W DAYS. THE WW REVIEW SCHEDULED ”R m EVALUATII: MEN“. mm- mm W m DATE F'v AI L'ATOR DAfi filtfifi b? FESONNEL ADMINISTRATION DATE — QUALITY OF WORK: 120 _ Needs Improvement (errors are frequent) Meets Requirements (errors are fewl Exceeds Requirements (errors are rsrel I 2 3 4 5 6 QUANTITY OF WORK: Insufficient WorIt Completes Required Highly Productive Volume of Work I 2 3 4 5 6 JOB KNOWLEDGE: Limited Knowledge Understands Job Duties Excellent and Responsibilities Comprehension 1 2 3 4 5 6 ADAPTABILITY: Resists Change Adapts Well Extremely Flexible 1 2 3 4 5 6 DEPENDABILITY: Unreliable Consista' nt Performance Highly Reliable (neeos constant (needs general (needs minimum supervision) supervision) supervrsio' ' nl ‘I 2 3 4 5 6 COOPERATION: I-Ias Difficulty Generally Works Well With Working With Others Cooperative Others 1 2 3 4 5 6 SELF MOTIVATION: Indifferent. Little Does Routine Work Seeks Out Work Effort to Achieve Without Awaiting Directions 1 2 3 4 5 6 COMMUNICATION: Poor Communicating Clearly Expresses Self Excellent Abilities and Understands Others Communication Abilities I 2 3 4 5 6 SAFETY: Not Safety Generally Observes Always Safety Conscious Safety Rules Minded 1 2 3 4 5 6 CARE OF EQUIPMENT: Neglects Care Alert to Condition Keeps Equipment Clean 8 of Equipment of Equiipment In Good Operating Order I 2 3 4 5 6 OVERALL EVALUATION: Unsatisfactory Meets Expectations Highly Productive 1 2 3 4 5 6 APPENDIX B Questionnaire Completed by Study Participants Thank you for your interest in participating in this study. The purpose of the study is to gain a better understanding of how managers such as yourself make performance appraisal ratings. The study is being conducted by Margaret Y. Padgett, a graduate student in the Department of Management, as part of her dissertation and is under the direction of Professor Daniel R. Ilgen, also from Michigan State University. Your participation in the study will consist of two things: (1) completing a questionnaire (this should take approximately 20-30 minutes) and (2) meeting with the researcher for an interview (approximately 30 minutes). The questionnaire will ask you to provide some background information about yourself and about a randomly selected person working in your unit (the ”focal ratee"). THE FOCAL RATEE SHOULD BE THE INDIVIDUAL ON WHOM YOU MOST RECENTLY COMPLETED A PERFORMANCE EVALUATION. Do not provide the full name of this individual. However, as a reminder to yourself, you might find it helpful to write his/her initials in the space provided on the questionnaire. In addition, you will be asked to respond to a number of opinion items about yourself, the focal ratee, and your perceptions of some characteristics of your organization. Keep in mind that we are only interested in you; 9213193; there are no right or wrong answers to the items. The purpose of the interview will be to give you the opportunity to discuss in more detail some of your personal experiences when conducting performance appraisals. During the interview, you will also be asked to provide an evaluation of the focal ratee. The identity of the focal ratee will, of course, be protected by having the evaluations done anonymously. All of the information that you provide on the questionnaire and in the interview, including the performance evaluation of the focal ratee, will be kept in strict confidence and will only be seen by the researchers directly involved in the project. At the completion of the project a report will be prepared for Personnel and Employee Relations at Michigan State University. All data in this report, as well as the dissertation report, will be provided in ways that maintain the anonymity of respondents and focal ratees. To participate in the study, please read the consent statement below and sign and date the form. Be sure to return this form with the questionnaire. Again, thank you in advance for your time and interest in the study. 121 122 Mgmtc e "I agree to participate in this project as described above. I understand that my participation will consist questionnaire and meeting with the researcher the total time commitment being approximately understand that the researchers agree to keep completely confidential. I further recognize discontinue my participation in this study at recrimination." of completing a for an interview, with 60 minutes. I any data that I provide that I am free to any time without Signature of Participant Date 123 PART I Wilma ato Mon lease]: 1. Please indicate your approximate age using the following scale (circle one). 20-25 b. 26-30 c. 31-35 d. 36-40 e. 41-45 46-50 g. 51-55 h. 56-60 1. 61-65 j. 66-70 Sex (circle the appropriate response): Male Female Race (circle the appropriate response): a. Caucasian b. Black c. Indian d. Asian e. Other (please specify): Length of employment with Michigan State University (in years): years Time in your current position (in years): years Length of time in a supervisory position (in years): years Number of individuals on whom you currently complete performance evaluations: Type of work currently supervised (please circle all that are relevant): a. clerical b. technical c. administrative d. supervisory e. operating engineers f. crafts g. laborers h. police 1. other (please specify) 12h The remainder of this part of the questionnaire consists of a number of opinion items concerning yourself and your organization. There are two things that you should keep in mind as you are working on the questionnaire. First, for all items on the questionnaire, we are interested in your opinion about what actually existg in the organization rather than how you think things ought to be or should be. Secondly, when the term ”workgroup" is used, we are referring to those people that you supervise and on whom you regularly complete performance evaluations. Hence, when the term "coworkers” is used in relation to a person in your unit, it refers to the other people that are also directly supervised by you. Please read each statement carefully and indicate whether or not you agree with it. There are no right or wrong answers so please respond as honestly as possible. When responding to each item, please use the scale which follows. Place the number corresponding to your opinion for each item in the blank space to the left of each statement. For your convenience, the scale will be reprinted at the top of each page of the questionnaire. 5 - Strongly Agree 4 - Agree on I Undecided 2 - Disagree 1 - Strongly Disagree 1. After filling out performance evaluations on employees in my department, I am expected to meet with them to discuss their evaluation. 2. The performance of individuals, as indicated by their performance appraisal, has little influence on the size of raise that they receive. 3. Individuals in my department rarely talk about their performance appraisals with each other. 4. In this organization, even individuals who receive low performance ratings are unlikely to be fired. 5. Individuals in my workgroup often ask me how they were evaluated compared to their coworkers. 6. In this organization, performance appraisals are rarely used to show individuals areas of their performance where improvement is needed. 7. Performance appraisal data is given a lot of weight in making promotion decisions. 125 Please continue to use the scale which follows when responding to each item: 5 Strongly Agree 8. 10. 11. 12. 13. 14. 15. l6. 17. 18. 19. 20. 21. 4 3 2 1 Strongly Agree Undecided Disagree Disagree I generally can provide specific examples of things which individuals in my department did during the appraisal period if they ever question my evaluation of their performance. Most individuals receive about the same pay increase regardless of their performance level. Only people who receive high performance evaluations will be promoted in this organization. The jobs which I supervise don't require much interaction among employees. Employees in my workgroup typically find out how their coworkers were evaluated by me. I have very little trouble being open to my subordinates about their performance. Most raises that the people in my unit receive are based very little upon merit. The people that I supervise often need to coordinate their work activities with each other. Formal performance appraisals provide a means for me to get together with each of the individuals in my department to discuss how to help them become better employees. In this organization, wage/salary decisions are based on seniority, such that employees with greater tenure receive higher raises. When making decisions about who to terminate, performance appraisal information is rarely examined. I often do not feel that I could explain to my employees why I evaluated them as I did. I feel uncomfortable telling an employee that he/she is not performing well. Individuals in my workgroup need to interact with one another a great deal in performing their jobs. 126 Please continue to use the scale which follows when responding to each item: 5 Strongly Agree 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 4 3 2 l Strongly Agree Undecided Disagree Disagree People in my workgroup often compare their performance ratings. Sometimes organizational ”politics" is a more important factor in determining who gets fired than is a person's job performance. It is rare for individuals to be terminated in this organization, regardless of how they perform. Promotions are based on who you know rather than on how well you perform. In this organization, performance appraisals are not used to provide feedback to employees. Performance appraisals are one of the major means by which employees learn how to improve their performance on the job. I should keep better records on the performance of people in my department than I do. When evaluating an employee's performance, I don't feel that complete honesty is always the best policy. When people are terminated in this organization, it is typically those who have been on the job less time, rather than those who perform less well. In this organization, the best way to ensure receiving a large wage/salary increase is to receive a good performance appraisal rating. Most of the people on whom I do performance appraisals are not very interested in learning how their coworkers were evaluated or rewarded. I often base my evaluations of employees on general impressions of their performance rather than on concrete behaviors which I have observed. The employees in my department are often not aware of when I do performance appraisals on their coworkers. 127 Please continue to use the scale which follows when responding to each item: 5 Strongly Agree 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 4 3 2 l Strongly Agree Undecided Disagree Disagree One of the reasons that we do performance appraisals is to help employees develop their job-related skills and abilities. The large amount of interaction needed between members of my department in doing their jobs requires that interpersonal conflicts be dealt with immediately. I don't really think it is necessary to discuss my evaluation of an employee's performance with him/her. When doing performance evaluations, I feel that it is better for people to know the truth, even if this is unpleasant for either the employee or myself. If someone receives several low performance ratings, they are unlikely to ever get promoted to a better position. A person's performance on the job is not a major factor considered by those who make termination decisions. After completing a performance evaluation on an individual, I turn it in to the appropriate personnel and then forget about it. I am generally able to support my evaluations of individuals working in my unit with specific incidents of good and poor performance. If there was some way that I could avoid having to approach my employees about problem with their performance I would do it. Individuals in my department are aware of the wage/salary increases that their coworkers receive. My department often has assignments that require several members of the group to work together in order to complete the project. Performance appraisal data is checked regularly by those who make decisions on salary increases. It is not difficult for me to discuss the performance of my employees with them. 128 Please continue to use the scale which follows when responding to each item: 5 Strongly Agree 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 4 3 2 1 Strongly Agree Undecided Disagree Disagree I am always prepared to back up the performance appraisals of the individuals in my department. When an individual in my department is out of the office for a day, his/her absence would make it difficult for others to complete their normal work assignments. It would be very unusual for individuals in my unit to mention their performance appraisal ratings to each other. I typically keep a file on what each person in my unit has done during the year to help me when I do his/her annual performance appraisal. People do not get fired in this organization unless they receive a number of low performance ratings. The people in my department do not require much information or assistance from coworkers in order to do their individual jobs effectively. People who do not perform well cannot expect to be promoted in this organization. Wage/salary decisions are made independently of information about a person's performance evaluations. Performance appraisals are used to help employees perform better in the future. I would rarely hesitate to tell an employee my true assessment of his/her performance. The work areas of individuals in my department are located close together. Termination decisions are made only after consulting an employee's performance appraisal records. Individuals who receive favorable performance appraisal ratings are likely to be given larger salary increases than those who perform less well. 129 PART II This section of the questionnaire deals primarily with the individual from your department selected as the "focal ratee." All of the remaining items on the questionnaire should be answered in relation to this person. Recall that the focal ratee should be the individual on whom you most recently completed a performance evaluation. Be sure NOT to identify the individual by his/her name. However, as a reminder to yourself, you might find it helpful to write his/her initials in the space provided below. First, I would like you to provide some background information about the focal ratee. Initials of Focal Ratee: 1. Sex (circle the appropriate response): Male Female 2. Race (circle the appropriate letter): a. Caucasian b. Black c. Indian d. Asian e. other (please specify): 3. Please indicate the approximate age of the focal ratee using the following scale (circle one). a. 20-25 b. 26-30 c. 31-35 d. 36-40 e. 41-45 f. 46-50 g. 51-55 h. 56-60 1. 61-65 j. 66-70 4. Length of this individual's employment at Michigan State University (in years): 5. Amount of time this individual has been in his/her in current position (in years): years 6. Date of his/her most recent performance evaluation: Now I would like you to respond to several questions about the types of outcomes which you believe the focal ratee might receive as a result of your evaluation of his/her performance. Below is a list of several potential outcomes that might result for the focal ratee because of how you evaluated his/her performance. After each outcome are two blank spaces. Please use them to answer the following two questions about each outcome (see next page). 130 (1) GIVEN THIS INDIVIDUAL'S ACTUAL PERFORMANCE LEVEL, HOW LIKELY IS IT THAT EACH OF THESE POTENTIAL OUTCOMES WOULD OCCUR FOR THAT INDIVIDUAL? Your responses to this item should range from ”0%" - will definitely not occur to ”100%" - will definitely occur. You may use any percentage between 0% and 100% in your response to this question. (2) IN YOUR OPINION, HOW MUCH WOULD THIS INDIVIDUAL LIKE OR DISLIKE RECEIVING EACH OF THESE POTENTIAL OUTCOMES? IN OTHER WORDS, HOW ATTRACTIVE WOULD EACH OUTCOME BE TO THIS PERSON? Your responses to this item should be made using the following scale: 5 - Would like receiving this outcome very much; receiving it is necessary in order for this person to be satisfied with his/her job 4 - Would like receiving this outcome but it is not necessary in order for this person to be satisfied with his/her job 3 - Would be neutral about receiving this outcome 2 - Would dislike receiving this outcome but receiving it wouldn't make this person dissatisfied with his/her job 1 - Would dislike receiving this outcome very much; receiving it would make this person dissatisfied with his/her job W (1) (2) Likelihood of Outcome Attractiveness Outcomes Qgggrgigg 9f Outcome 1. Promotion within the 0 next three years 75 /D 5 (1) If you believe that, given this person's performance, there is a 75% chance that he/she will be promoted to a higher job level within the next three years, then you would write "75%" in the first blank space to the right of the outcome "promotion within the next three years." (2) If you believe that getting promoted would be is necessary in order for this person to be satisfied with his/her job, then you would place a "5" in the second blank space to the right of the outcome "promotion within the next three years." 131 Please respond to each of the outcomes in the list which follows in a similar manner. (1) (2) Likelihood of Outcome Attractiveness 53:29am 99.921.11.11: 2f. Mme 1. Salary increase 2. Promotion within the next three years 3. Termination of Employment 4. Transfer to an equal but different position (i.e. lateral transfer) 5. Receive remedial training 6. Opportunities for training to prepare for potential advancement 7. Demotion 8. Opportunity to develop job- related skills and abilities 9. Improved self-esteem 10. Lowered self-esteem 11. Better understanding of how to do his/her job 132 The last section of the questionnaire also concerns your beliefs about the particular person in your workgroup selected to be the focal ratee so your respsonses should be made with ONLY this person in mind. The items will ask you to indicate the extent to which you feel each statement is true for this individual. As before, the items ask for your opinion so there are no right or wrong answers. Please respond to the following items as honestly as possible using the same scale as you used above. Place the number corresponding to your opinion on the blank space provided to the left of the statement. For your convenience, the rating scale is reprinted below and again at the top of each page. 5 - Strongly Agree 4 - Agree to I Undecided N l Disagree 1 - Strongly Disagree 1. This individual trusts my judgment on work-related matters. 2. This person is able to respond constructively to feedback on his/her performance. 3. I don't worry about discussing this employee's performance evaluation with him/her because he/she is usually open to any suggeStions that I make for improvement. 4. It is not important to me that I be liked by this employee. 5. This person rarely seeks my help in doing his/her job. 6. I really like being around and working with this employee. 7. In order to be satisfied with my work, I need to have a good working relationship with this employee. 8. In general, I think that this individual values my opinion on most subjects. 9. This individual is receptive to receiving feedback on his/her performance even if it is negative. 10. I sometimes feel that this individual does not have much respect for my ideas and opinions. 11. It is not uncommon for this individual to feel that I am attacking him/her personally if he/she receives less than the highest performance ratings. 133 Please continue to use the scale which follows when responding to each item: 5 Strongly Agree 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 4 3 2 1 Strongly Agree Undecided Disagree Disagree I would be very surprised if this person ever complained to my superior about a performance appraisal received from me. I would not go out of my way to try to get this person to like me. This employee has asked for my advice on nonwork-related issues. I value the admiration and respect of this person. I would be very surprised if this person ever followed any advice that I gave him/her. This employee usually does not have difficulty admitting that he/she has areas of performance on which improvement is needed. This individual tends to react defensively to negative performance feedback. This person relies on my advice when making decisions. This individual is likely to file a grievance against me if unhappy with the performance appraisal received. I would dislike work if I didn't get along well with this person. It is not uncommon for this employee to ask my opinion about important work issues. It wouldn't bother me if this individual didn't like me very much. This employee values performance feedback as a means for becoming a better performer. This person's opinion of me as a person or as a manager makes very little difference to me. This employee does not think very highly of me as a supervisor. This individual seems to feel threatened by criticism no matter how constructively it is given. APPENDIX C Procedures for Measuring Expected Consequences of the Performance Appraisal for the Ratee Now I would like you to respond to several questions about the types of outcomes which you believe the focal ratee might receive as a result of your evaluation of his/her performance. Below is a list of several potential outcomes that might result for the focal ratee because of how you evaluated his/her performance. After each outcome are two blank spaces. Please use them to answer the following two questions about each outcome. (1) GIVEN THIS INDIVIDUAL'S ACTUAL PERFORMANCE LEVEL, HOW LIKELY IS IT THAT EACH OF THESE POTENTIAL OUTCOMES WOULD OCCUR FOR THAT INDIVIDUAL? Ybur responses to this item should range from "0%" - will definitely not occur to “100%" - will definitely occur. You may use any percentage between 0% and 100% in your response to this question. (2) IN YOUR OPINION, HOW MUCH WOULD THIS INDIVIDUAL LIKE 0R DISLIKE RECEIVING EACH OF THESE POTENTIAL OUTCOMES? IN OTHER WORDS, HOW ATTRACTIVE WOULD EACH OUTCOME BE TO THIS PERSON? Your responses to this item should be made using the following scale: 5 - Would like receiving this outcome very much; receiving it is necessary in order for this person to be satisfied with his/her job 4 - Would like receiving this outcome but it is not necessary in order for this person to be satisfied with his/her job 3 - Would be neutral about receiving this outcome 2 - Would dislike receiving this outcome but receiving it wouldn't make this person dissatisfied with his/her job 1 - Would dislike receiving this outcome very much; receiving it would make this person dissatisfied with his/her job 134 10. 11. 135 (1) Likelihood of Outcome Qutcames 9992:2122 Salary increase Promotion within the next three years Termination of Employment Transfer to an equal but different position (i.e. lateral transfer) Receive remedial training Opportunities for training to prepare for potential advancement Demotion Opportunity to develop job- related skills and abilities Improved self-esteem Lowered self-esteem Better understanding of how to do his/her job (2) Attractiveness aim APPENDIX D Questionnaire Items Measuring Each Motivational Influence mammmmmm l. *2. *5. *7. This individual trusts my judgment on work-related matters. (Item El) This person rarely seeks my help in doing his/her job. (Item E5) In general, I think that this individual values my opinion on most subjects. (Item E8) I sometimes feel that this individual does not have much respect for my ideas and opinions. (Item E10) This employee has asked for my advice on nonwork-related issues. (Item E14) I would be very surprised if this person ever followed any advice that I gave him/her. (Item E16) This person relies on my advice when making decisions. (Item E19) It is not uncommon for this employee to ask my opinion about important work issues. (Item E22) This employee does not think very highly of me as a supervisor. (Item E26) Desire to he Likgg bl Eh; EEEEQ *1. *4. *6. It is not important to me that I be liked by this employee. (Item E4) I really like being around and working with this employee. (Item E6) In order to be satisfied with my work, I need to have a good working relationship with this employee. (Item E7) I would not go out of my way to try to get this person to like me. (Item E13) I value the admiration and respect of this person. (Item E15) I would dislike work if I didn't get along well with this person. (Item E21) 136 137 It wouldn't bother me if this individual didn't like me very much. (Item E23) This person's opinion of me as a person or as a manager makes very little difference to me. (Item E25) Wefthefiateetememl 1. *5. *7. *8. 10. This person is able to respond constructively to feedback on his/her performance. (Item E2) I don't worry about discussing this employee's performance evaluation with him/her because he/she is usually open to any suggests that I make for improvement. (Item E3) This individual is receptive to receiving feedback on his/her performance even if it is negative. (Item E9) It is not uncommon for this individual to feel that I am attacking him/her personally if he/she receives less than the highest performance ratings. (Item E11) I would be very surprised if this person ever complained to my superior about a performance appraisal received from me. (Item E12) This employee usually does not have difficulty admitting that he/she has areas of performance on which improvement is needed. (Item E17) This individual tends to react defensively to negative performance feedback. (Item E18) This individual is likely to file a grievance against me if unhappy with the performance appraisal received. (Item E20) This employee values performance feedback as a means for becoming a better performer. (Item 824) This individual seems to feel threatened by criticism no matter how constructively it is given. (Item E27) Matthew MW *1. *2. After filling out performance evaluations on employees in my department, I am expected to meet with them to discuss their evaluation. (Item Bl) In this organization, performance appraisals are rarely used to show individuals areas of their performance where improvement is needed. (Item B6) *5. *7. 138 Formal performance appraisals provide a means for me to get together with each of the individuals in my department to discuss how to help them become better employees. (Item B16) In the organization, performance appraisals are not used to provide feedback to employees. (Item B26) Performance appraisals are one of the major means by which employees learn how to improve their performance on the job. (Item B27) One of the reasons we do performance appraisals is to help employees develop their job-related skills and abilities. (Item B35) I don't really think it is necessary to discuss my evaluation of an employee's performance with him/her. (Item B37) After completing a performance evaluation on an individual, I turn it in to the appropriate personnel and then forget about it. (Item B41) Performance appraisals are used to help employees perform better in the future. (Item B56) Matthew mm 1. *4. *6. The performance of individuals, as indicated by their performance appraisal, has little influence on the size of raise that they receive. (Item B2) Most individuals receive about the same pay increase regardless of their performance level. (Item B9) Most raises that the people in my unit receive are based very little upon merit. (Item B14) In this organization, wage/salary decisions are based on seniority, such that employees with greater tenure receive higher raises. (Item B17) In this organizations, the best way to ensure receiving a large wage/salary increase is to receive a good performance appraisal rating. (Item B31) Performance appraisal data is checked regularly by those who make decisions on salary increases. (Item B46) Wage/salary decisions are made independently of information about a person's performance evaluations. (Item 855) 8. 139 Individuals who receive favorable performance appraisal ratings are likely to be given larger salary increases than those who perform less well. (Item B60) Matthew WW 1. *3. Performance appraisal data is given a lot of weight in making promotion decisions. (Item B7) Only people who receive high performance evaluations will be promoted in this organization. (Item B10) Promotions are based on who you know rather than on how well you perform. (Item B25) If someone receives several low performance ratings, they are unlikely to ever get promoted to a better position. (Item B39) People who do not perform well cannot expect to be promoted in this organization. (Item B54) @23ng WW 1. *2. *3. *7. In this organization, even individuals who receive low performance ratings are unlikely to be fired. (Item B4) When making decisions about who to terminate, performance appraisal information is rarely examined. (Item B18) Sometimes organizational "politics" is a more important factor in determining who gets fired than is a person's job performance. (Item B23) It is rare for individuals to be terminated in this organization, regardless of how they perform. (Item B24) When people are terminated in this organization, it is typically those who have been on the job less time, rather than those who perform less well. (Item B30) A person's performance on the job is not a major factor considered by those who make termination decisions. (Item B40) People do not get fired in this organization unless they receive a number of low performance ratings. (Item B52) Termination decisions are made only after consulting an employee's performance appraisal records. (Item B59) 140 Wigwam l. *3. *4. *7. 18815 *5. *6. I generally can provide specific examples of things which individuals in my department did during the appraisal period if they ever question my evaluation of their performance. (Item B8) I often do not feel that I could explain to my employees why I evaluated them as I did. (Item B19) I should keep beeter records on the performance of people in my department than I do. (Item B28) I often base my evaluations of employees on general impressions of their performance rather than on concrete behaviors which I have observed. (Item B33) I am generally able to support my evaluations of individuals working in my unit with specific incidents of good and poor performance. (Item B42) I am always prepared to back up the performance appraisals of the individuals in my department. (Item B48) I typically keep a file on what each person in my unit has done during the year to help me when I do his/her annual performance appraisal. (Item B51) wterde ende Amen; W The jobs which I supervise don't require much interaction among employees. (Item B11) The people that I supervise often need to coordinate their work activities with each other. (Item B15) Individuals in my workgroup need to interact with one another a great deal in performing their jobs. (Item 821) The large amount of interaction needed between members of my department in doing their jobs requires that interpersonal conflicts be dealt with immediately. (Item B36) My department often has assignments that require several members of the group to work together in order to complete the project. (Item B45) When an individual in my department is out of the office for a day, his/her absence would make it difficult for others to complete their normal work assignments. (Item B49) The people in my department do not require much information or assistance from coworkers in order to do their individual jobs effectively. (Item B53) 8. 141 The work areas of individuals in my department are located close together. (Item B58) WW 1. *2. *3. *6. *7. Individuals in my department rarely talk about their performance appraisals with each other. (Item B3) Individuals in my workgroup often ask me how they were evaluated compared to their coworkers. (Item B5) Employees in my workgroup typically find out how their coworkers were evaluated by me. (Item B12) People in my workgroup often compare their performance ratings. (Item B22) Most of the people on whom I do performance appraisals are not very interested in learning how their coworkers were evaluated or rewarded. (Item B32) The employees in my department are often not aware of when I do performance appraisals on their coworkers. (Item B34) Individuals in my department are aware of the wage/salary increases that their coworkers receive. (Item B44) It would be very unusual for individuals in my unit to mention their performance appraisal ratings to each other. (Item B50) mammhem *1. *4. I have very little trouble being open to my subordinates about their performance. (Item B13) I feel uncomfortable telling an employee that he/she is not performing well. (Item B20) When evaluating an employee's performance, I don't feel that complete honesty is always the best policy. (Item B29) When doing performance evaluations, I feel that it is better for people to know the truth, even if this is unpleasant for either the employee or myself. (Item B38) If there was some way that I could avoid having to approach my employees about a problem with their performance I would do it. (Item B43) It is not difficult for me to discuss the performance of my employees with them. (Item B47) 142 *7. I would rarely hesitate to tell an employee my true assessment of his/her performance. (Item B57) *indicates item was eliminated from the scales when used in the analyses FOOTNOTES FOOTNOTES 1The standard interview questions asked of all participants are given below: 1. How are performance appraisals done in this organization? 2. What types of information do you look for or consider important when evaluating someone's performance? 3. To what extent are the things you look for determined by the evaluation form used? 4. What sorts of problems or difficulties have you had in doing performance evaluations? 5. What kinds of reactions to the evaluation do you typically get from subordinates? ‘ 6. How do you feel about doing performance evaluations? Do you like them, dislike them, or feel indifferent to them? 7. Do you think the evaluation form used by this organization is adequate? Does it cover all the relevant aspects of an employee's performance? If you are unsatisfied with it, how would you change it? 8. Do you think performance evaluations are worthwhile? Do you think your subordinates find them to be worthwhile? 2Although most units of the university use the standard two university appraisal forms, a few units had developed their own forms. Four managers from one such unit participated in this study. The form developed by this unit was similar to the university form, except that it contained ten general dimensions (over half of which coincided with dimensions on the university form) evaluated on a 6-point scale. To make these ratings comparable in standard deviation to the university form, all ratings were converted to their equivalents on a 5-point scale before computing the measure of rendering errors. 3To reduce confusion in presentation, the measurement model depicted in Figure 5 only shows the final number of manifest indicators for each latent construct (based on the results of the initial confirmatory factor analysis) rather than all of the items included on the questionnaire. For the same reason, the error terms for each manifest and latent variable are also excluded from the diagram. 143 144 alt should be noted that the structural coefficients presented in Figure 7 and Table 5 will differ somewhat from those described in the text. This is due to the fact that the table and figure present the coefficients for a model that includes all three of the additional paths (i.e., the overall modified model) while the text lists the coefficients for each path as it was sequentially added to the model. These two sets of structural coefficients will differ because each time a change in the model is made, other coefficients in the model may also be altered. LI ST OF REFERENCES LIST OF REFERENCES Adams, J. S. (1965). Inequity in social exchange. In L. Berkowitz (Ed.), Advancee 1n ernerimenral social neyehology, Vol. 2, New York: Academic Press, 267-300. Ball, W. J. (1972). The definition of situation: Some theoretical and methodological consequences of taking W. 1. Thomas seriously. Journal fer rne Theory 2: Social Behaviour, 2, 61-82. Bandura, A. (1977). Social Learning Ineory. Englewood Cliffs, NJ: Prentice-Hall. Bandura, A., Adams, N. E., and Beyer, J. (1977). Cognitive processes mediating behavioral change. Journal 2: Personality eng Soclal Psychology, 35, 125-139. Banks, C. G. and Murphy, K. R. (1985). Toward narrowing the research- practice gap in performance appraisal. Personnel Psychology, 38, 335-345. Barrett, R. S., Taylor, E. K, Parker, J. W. and Martens, L. (1958). Rating scale content: I. Scale information and supervisory ratings. Personnel Peyennlngy, ll, 333-346. Bartlett, C. J. (1983). Would you know a properly motivated performance appraisal if you saw one? In F. Landy, S. Zedeck, and J. Cleveland (Eds.), Performance measurement and theory. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers. Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. Annnel Pevlew'efi Peyennlngy, Vol. 31, 419-456. Bentler, P. M. and Bonet, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Peyenologicel Bnllerln, en, 588-606. Berger, P. and Luckman, T. (1966). The eoeial eonstruction er reeliry. Garden City: Doubleday. Bernardin, H. J. (1978). Effects of rater training on leniency and halo errors in student ratings of instructors. Journel nfi 82211-24 £§ychelogy, g1, 301-303. 145 146 Bernardin, H. J., Alvares, K. M., and Cranny, C. J. (1976). A recomparison of behavioral expectation scales to summated scales. Mme]. 2f 522L314 M21291. 5.1. 564-570. Bernardin, H. J. and Beatty, R. W. (1984). Performance ennraisal; Aeeeeelng human h2h§¥121.§£‘wo£k- Boston, MA: Kent Publishing Co. Bernardin, H. J. and Buckley, M. R. (1981). Strategies in rater training. Aeegemy,nf Managemenr Bevlew, n, 205-212. Bernardin, H. J., Orban, J. A., and Carlyle, J. J. (1981). Performance ratings as a function of trust in appraisal, purpose for appraisal, and rater individual differences. Preeeedinge 2i rhe Academy e£,uanegemenr, 311-315. Bernardin, H. J. and Pence, E. C. (1980). Effects of rater training: Creating new response sets and decreasing accuracy. Journal er 8mm Wax. .61. 60-65. Bernardin, H. J. and Walters, C. S. (1977). The effects of rater training and diary-keeping on psychometric error in ratings. Maleféaalisdwch 0 .fZ. 54-69- Borman, W. C. (1974). The rating of individuals in organizations: An alternate approach. Organizationel Behavior eng Human Performance, 12, 105-124. Borman, W. C. (1978). Exploring upper limits of reliability and validity in job performance ratings. Journel efi,Annlled Peychelogy,,§3, 135-144. Borman, W} C. (1979a). Format and training effects on rating accuracy and rater errors. learns]. of Antilles W. .63. 410-421. Borman, W. C. (1979b). Individual differences correlates of accuracy in evaluating others' performance effectiveness. Annlled Psyeholegleel Measurement, 1, 103-115. Borman, W. C. and Dunnette, M. D. (1975). Behavior-based versus traint-oriented performance ratings: An empirical study. Mal 2f Applied W. 5.9. 561-565. Borman, W. C., Hough, L.M. and Dunnette, M. D. (1976). Development of behaviorally based rating scales for the performance of U. S. Navy recruiters. Navy Ppersonnel Research and Development Center Technical Report TR-76-3l. Bower, G. H. (1981). Mood and memory. Amerleen Peycnolegisr, fin, 120-148. 147 Buckley, M. R. and Bernardin, H. J. (1980). An essessmenr er rne cements 2f a ra_rte training 2122m- Paper presented at the annual meeting of the Southeastern Psychological Association, Washington, DC. Burnaska, R. F. and Hollmann, T. D. (1974). An empirical comparison of the relative effects of rater response biases on three rating scale formats. my. 2: mus-a W. .52. 307-312. Cafferty, T. P., DeNisi, A. S. and Williams, K. J. (1984). Organization of recall and performance evaluation accuracy for multiple targets. Paper presented at the American Psychological Association meeting, Toronto. Campbell, D. T. and Fiske, D. W. (1959). Convergentand discriminant 'validation by the multitrait-multimethod matrix. PfiXQthQElEél . mm. 25.. 81-105. Campbell, J. P and Pritchard, R. D. (1976). Motivation theory in industrial and organizational psychology. In M. Dunnette (Ed.), Benita): at BMW Melen- New York: John Wiley, 63-130. Cardy, R. L. and Dobbins, G. H. (1986). Affect and appraisal: Liking as an integral dimension in evaluating performance. Jenrnel er 5221.129 mm. 11. 672-678. Carroll, 8. J., Paine, F. P., and Ivancevich, J. M. (1972). The relative effectiveness of training methods: expert opinion and research. Personnel Peyenelegy, 25, 495-509. Cohen, J. (1969). §§3E1§tlcél mower enelyele fer the behavleral eeleneee. New York: Academic Press. Cook, T. D and Campbell, D. T. (1979). Qnael-exnerimentatlen; Deeign and minis Lama :21; field settings- ChicaSO. IL: Rand McNally. Cox, J. A. and Krumboltz, J. D. (1958). Racial bias in peer ratings of basic airmen. Snelnmerry, 21, 292-299. Crockett, W. H., Mahood, S. and Press, A. N. (1975). Impressions of a speaker as a function of variations in the cognitive characteristics of the perceiver and the message. Journal er mm. $3. 168-178. Dayal, I. (1969). Some issues in performance appraisal. Pereonnel WEB. 3.2. 27-30- DeCotiis, T. and Petit, A. (1978). The performance appraisal process: A model and some testable propositions. Acedemy ef Menegemenr Review, fin 635-646. 148 DeNisi, A, 8., Cafferty, T. P. and Meglino, B. M. (1984). A cognitive view of the performance appraisal process: A model and research propositions. Qrgenirarienal Behavior eng Human Perfermance, 33, 360-396. DeNisi, A. S., Williams, K. J., Cafferty, T. P. and Meglino, B. M. (1985). Cognitive processes and performance appraisals: The role of information acquisition and organization. In R. Cardy (Chair), Information nreeeeeing reeeereh in performance genitalia; swammnmi sand imnlicatione. Symposium presented at the meeting of the Academy of Management in San Diego, CA. DeNisi, A. S. and Stevens, G. E. (1981). Profiles of performance, performance evaluations, and personnel decisions. Acedemy n; W Lrnalou . 29.. 592-602. Drucker, P. F. (1954). The nracrice er management. New York: Harper & Row. Duncan, 0. D. (1975). Introduction re errnernrel eguation models. New York: Academic Press. Favero, J. L. and Ilgen, D. R. (1983). The effects of ratee characteristics on rater performance appraisal behavior. Office of Naval Research, Technical Report 83-5. Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal. Jeurnal er Annlied Eastman. £6. 127-148. Fisher, C. D. (1979). Transmission of positive and negative feedback to subordinates: A laboratory investigation. Journal ef Annlied Psyehology, er, 533-540. Ford, J. K., Kraiger, K., and Schechtman, S. L. (1986). Study of race effects in objective indices and subjective evaluations of performance: A meta-analysis of performance criteria. Weasel Bulletin. 22. 330-337. French, J. R. and Raven, B. (1959). The bases of social power. In D. Cartwright (Ed.), firndiee in eeeiel power. Ann Arbor, MI: Institute for Social Research. ' Goldstein, At P. and Sorcher, M. (1974). Chenging enneryieery heneyier. New York: Pergamon. Grey, J. and Kipnis, D. (1976). Untangling the performance appraisal dilemma: The influence of perceived organizational context on evaluative processes. lsmmal 2r ApaliLd 251mm. 91.. 329- 335. 149 Hamner, W. C., Kim, J. S., Baird, L., and Bigoness, N. J. (1974). Race and sex as determinants of ratings by potential employers in a simulated work sampling task. Jnnrnel ef Annlieg Peyehology, :2, 705-711. Heilman, M. E. and Guzzo, R. A. (1978). The perceived cause of work success as a mediator of sex discrimination in organizations. Qrgenirerienel Pehavier eng Human Perfermanee, 21, 346-357. Henemen, R. L. and Wexley, K. N. (1983). The effects of time delay in rating and amount of information observed on performance rating accuracy- Asaéemx 2f Management lsurnal. 2i. 677-686. Holzbach, R. L. (1978). Rater bias in performance ratings: Supervisor, self and peer ratings. Journal er Annlieg Psycholegy, §§. 579-588. Huse, E. F. (1967). Performance appraisal: A new look. Personnel Administratien. 3Q. 3-5. 16-18. Ilgen, D. R. and Feldman, J. M. (1983). Performance appraisal: A process approach. In L. L. Cummings and B. M. Staw (Eds.), Hesearcn in Organiratienal Henavior, Vol. 5, 141-197. Ilgen, D. R., Fisher, C. D. and Taylor, M. S. (1979). Consequences of individual feedback on behavior in organizations. Jenrnel 2: Applied Pfiychelogy, e5, 349-371. Ilgen, D. R., Peterson, R. B., Martin, B. A. and Boeschen, D. A. (1981). Supervisor and subordinate reactions to performance appraisal feedback. Qraaniaetienal Behaxicr and Human Deeisiea Preseasea. 23. 311-330. Jacobson, M. B. and Effertz, J. (1974). Sex roles and leadership: Perceptions of the leaders and the led. Qrgenlzarional Hehevior eng Human Performanee, 12, 383-396. James, L. R. (1980). The unmeasured variables problem in path analysis. mm at 622.1124 312mm. 5.2. 415-421. James, L. R., Mulaik, S. A. and Brett, J. M. geneel enelyeie. Beverly Hills: Sage Publications. Jeffery, K. M. and Mischel, W. (1979). Effects of purpose on the organization and recall of information in person perception. leurnal 2f Bersenelisx. 51. 297-319. Joreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Peyenemerrike, 91. 443-477. Joreskog, K. G. and Sorbom, D. (1984). ,LlSHEL.!I; Analx§1§,9£ linear errnernrel relarienshine ny rne methog 2f maximnm likeliheee. Mooresville, IN: Scientific Software. 150 Kane, J. S. and Lawler, E. E. (1979). Performance appraisal effectiveness: Its assessment and determinants. In B. M. Staw (Ed.), Hesearen in Qrgenirerienel Behavior, Vol. 1, 425-478. Kelley, H. H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Hebraske eymnneinm en morivatien. Lincoln: University of Nebraska Press, Vol. 15. Kenny, D. A“ (1979). QEII:1§£12D.BD§ geneeliry, New York: John Wiley & Sons. Kirchner, W. K. (1965). RelationShips between supervisory and subordinate ratings for technical personnel. Jenrnel ef industrial Psychology, 1, 57-60. Klein, S. M., Kraut, A. K. and.Wolfson, A. (1971). Employee reactions to attitude survey feedback: A study of the impact of structure and process. Heministrative Beience Qnerrerly, 1e, 497-514. Klimoski, R. J. and London, M. (1974). Role of the rater in performance appraisal. Jnnrnel efi Annliee Psyeholegy, B2, 445- 451. Knowlton, W. A. and Mitchell, T. R. (1980). Effects of causal attributions on a supervisor's evaluation of subordinate performance. 121mm]. of Applied 2mm. 61. 459-466. Komacki, J. (1981). Behavioral measurement: Toward solving the criterion-problem. Paper presented at the American Psychological Association Convention, Los Angeles, August, 1981. Kraiger, K. and Ford, J. K. (1985). A meta-analysis of ratee race effects in performance ratings. Journel,ef Annlieg Peychology, 1Q, 56-65. Lance, C. E. and Woehr, D. J. (1986). Statistical control of halo: Clarification from two cognitive models of the performance appraisal process. Journal nf Annlieg Peyennlngy, ll, 679-685. Landy, F. J. and Farr, J. L. (1980). Performance rating. W Bulletin. .81. 72-107. Landy, F. J., Farr, J. L., Saal, F. G. and Freytag, W. (1976). Behaviorally anchored scales for rating the performance of police officers. Jamaal of. Applied 23191191223. .61. 752-758. Latham, G. P. and Wexley, K. N. (1977). Behavioral observation scales for performance appraisal purposes. Pereennel Peyehelegy, 19, 255-268. 151 Latham, G. P. and.Wexley, K. N. (1981). Inereeeing preeuerivity Man nerfornenee entangl- Reading. MA: Addison-Wesley Publishing Co. Latham, G. P., Wexley, K. N. and Pursell, E. D. (1975). Training raters to minimize rating errors in the observation of behavior. Jeanne]. 2i 62211.2(! W. 5.9. 550-555. Lawler, E. E. (1967). The multitrait-multirater approach to measuring managerial job performance. Journal er Annlieg histology. :1. 369-381. Leskovec, E. (1967). A guide for disscussing the performance appraisal. Bereennel Jamel. gm, 150-152. Levin. K. (1935). A dynamic. them of nexeenellgz- New York: McGraw-Hill. Liden, R. C. and Mitchell, T. R. (1983). The effects of group interdependence on supervisor performance evaluations. Personnel Remember. 16.. 289-299. Locher, A. H. and Teel, K. S. (1977). Performance appraisal: A survey of current practices. Personnel Jeurnal, §§. 245-247; 254. Longenecker, C. O., Gioia, D. A. and Sims, H. P. (1987). Behind the mask: The politics of employee appraisal. The Heademy er Henagemenr Ereenriye, 1, 183-193. Lord, R. B., Foti, R. J. and Phillips, J. S. (1982). A theory of leadership categorization. In J. 8. Hunt, W. Sekaran, & C. Schreisheim (Eds.), Leegerenin; Beyend eetablisnee viewe. Carbondate, IL: Southern Illinois University Press, 104-121. Madden, J. M. and Bourdon, R. D. (1964). Effects of variations in rating scale format on judgment. Jnnrnel er Annlied Psyenology, QB, 147-151. Matte, W. E. (1982). An experimental investigation of information search in performance appraisal. In A. DeNisi (Chair), Qognitive enmesheemrheelmgfnerfnrnaneewu Mi c ringinge. Symposium presented at the meeting of the American Psychological Association, Washington, D. C. McCall, M. W} and DeVries, D. L. (1977). Annreiee1 in ennrerr; melting nub ersetnzetienel realities. Technical Report #4. Center for Creative Leadership. McClelland, D. C. and Burnham, D. H. (1976). Power is the great motivator. Harland Engines: Being. 5.4.. 100-110. 152 McIntyre, R. M., Smith, D. E. and Hassett, C. E. (1984). Accuracy of performance ratings as affected by rater training and perceived purpose of retins- learnel of Applied Perchelesl. e2. 147-156. Meyer, H. H., Kay, E. and French, J. R. (1965). Split roles in performance appraisal. H2122£4.52§192§§ Review, A}, 123-129. Mitchell, T. R. (1974). Expectancy models of job satisfaction, occupational preference and effort: A theoretical, methodological, and empirical appraisal. Psychologicel Bulletin, B1, 1053-1077. Mitchell, T. R. (1983). The effects of social, task, and situational factors on motivation, performance and appraisal. In F. Landy, 8. Zedeck, and J. Cleveland (Eds.), Perfermenee meaeuremenr end enenry. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers, 39-59. Mitchell, T. R. and Liden, R. C. (1982). The effects of the social context on performance evaluations. Qrgenizerienel Behevior enn Human Performance, 22, 241-256. Mitchell, T. R. and Wood, R. E. (1980). Supervisor's responses to subordinate poor performance: A test of an attributional model. Qrganizational Behavior eng Human Periormanee, 2;, 123-138. Mobley, W. H., Horner, S. D., and Hollingsworth, A. T. (1978). An evaluation of precursors of hospital employee turnover. Journal 9.: ABM Wax. 5.3.. 408-414- Mohrman, A. M. and Lawler, E. E. (1983). Motivation and performance appraisal behavior. In F. Landy, 8. Zedeck, and J. Cleveland (Eds ). Perfernenee measurement and theory. Hillsdale. NJ: Lawrence Erlbaum Associates Publishers. Mount, M. K. and Thompson, D. E. (1987). Cognitive categorization and quality of performance ratings. Journal 9f Annlied Psychology, 12, 240-246. Murphy, K. R. and Balzer, W. K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluations: Consequences for rating accuracy. Jnnrnel ef Annlien Peyehelogy, 11, 39-44. Murphy, K. R., Martin, C. and Garcia, M. (1982). Do behavioral observation scales measure observation? Jenrnel ef Annlieg Perchelesx..e1. 552-557- Nathan, B. R. and Lord, R. B. (1983). Cognitive categorization and dimensional schemata: A process approach to the study of halo in performance ratings. Jnnrnel ef Annlieg Psyehology, BB, 102-114. 153 Nieva, V. F. and Gutek, B. A. (1980). Sex effects on evaluation. Academy 2f Hanagement Heview, 5, 267-276. Nunnally, J. C. (1978). Psyenometrie rnenry. New York: McGraw-Hill. Padgett, M. Y. and Ilgen, D. R. (1988). The effect of ratee performance characteristics on alternative measures of rater accuracy. Qrgenireriene1 Behavier eng Human Beeieion Processes. In press. Park, 0. 8., Sims, H. P., and Motowidlo, S. J. (1986). Affect in organizations. In H. Sims, D. Gioia, and Associates (Eds.), The rhinking ergenirerinn. San Francisco: Jossey-Bass Publishers. Parker, J. W., Taylor, E. K., Barrett, R. S. and Martens, L. (1959). Rating scale content: 3. Relationship between supervisory and self-rating. Pereonnel Peyennlngy, 12, 49-63. Paterson, D. G. (1922). The Scott Company graphic rating scale. Jeurnel ef Eereennel Research. 1. 361-176. Porter, L. W. and Lawler, E. E. (1968). Hanagerial erritueee eng nerfermenee. Homewood, IL: Richard D. Irwin. Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal 2f Annlied Psycnolegy, n2, 581-588. Rosen, B. and Jerdee, T. H. (1973). The influence of sex role stereotypes on evaluations of male and female supervisory behavior. Jenrnel ef Applies! Perchelegx. 51. 44-48. Rowe, K. H. (1964). An appraisal of appraisals. Jnnrne1 ef Management Studies. Vol. 1. Salancik, G. R. and Pfeffer, J. (1978). A social information- processing approach to job attitudes and task design. W Elem Qnerterlx. 21. 224-253. Sharon, A. T. and Bartlett, C. J. (1969). Effect of instructional conditions in producing leniency on two types of rating scales. mm Melony. 22. 251-263. Schmidt, F. L. and Johnson, R. H. (1973). Effect of race on peer ratings in an industrial setting. Jnnrne1 2f Annlieg Peychology, 51, 237-241. Schmitt, N. and Hill, T. (1977). Sex and race composition of assessment center groups as a determinant of peer and assessor ratings. Ismael 2i $221122 Eelehelen. £2. 261-264- 154 Schmitt, N. and Lappin, M. (1980). Sex and race composition of assessment center groups as a determinant of peer and assessor ratings. mm at Applied Benhelegx. ez. 251-254. Scott, W. E. and Hamner, W. C. (1975). The influence of variations in performance profiles on the performance evaluation process: An examination of the validity of the criterion. Qrganiratienel Behexier ens! amen Bermmenee. IA. 360-370. Silverman. 0. (1971). The sheen: 9f. ermieetienes. A eeelelealeel firemeyerk. New York: Basic Books, Inc., Publishers. Smith, P. C. and Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Jnnrnel efi Applied Peyghelggx. 31. 149-155. Spool, M. D. (1978). Training programs for observers of behavior: A review. Persennel 15191121221. 31. 853-888. Taft, R. (1955). The ability to judge people. h o c l Bullerin, :2, 1-23. Terborg, J. R. and Ilgen, D. R. (1975). A theoretical approach to sex discrimination in traditionally masculine occupations. Qrganigetionel Behavier Ann Human Performence, 1B, 352-376. Thayer, F. C. (1981). Civil service reform and performance appraisal: A policy disaster. Publie Pereonnel Hanagement, 19, 20-28. Thomas, W. I. (1928). The ehilg in Ameriee. New York: Knopf. Thornton, G. C. III (1968). The relationship between supervisor and self appraisals of executive performance. Personnel Peycholegy, 21, 441-455. Thornton, G. C. III (1980). Psychometric properties of self appraisals of job performance. Pereennel Peyennlegy, BB, 263- 271. Tucker, L. R. and Lewis, G. A. (1973). A reliability coefficient for maximum likelihood factor analysis. Peychometrika, BB, l-10. Tuckman, B. W. and Oliver, w. F. (1968). Effectiveness of feedback to teachers as a function of source. Jnnrne1 nf Beneeriene1 W. 5.2. 297-301. Vroom, V. H. (1964). Berk and meriyerien. New York: John Wiley. Walsman, D. A. and Thornton, G. C. III (1978). A comparison of supervisors' self appraisals and their administrators' appraisals. Medical Sirens Management. 4.6. 42-46. 155 Weiner, B., Frieze, I., Kukla, A., Reed, L., Rest, S. and Rosenbaum, R. M. (1971). Pereeiving rne eauses nf enccess end failure. Morristown, NJ: General Learning Press. Wexley, K. N. and Klimoski, R. (1984). Performance appraisal: An update. In K. Rowland & G. Ferris (Eds.), Research in Personnel eng Human Hesourcee Henegemenr, Vol. 2, Greenwich, CT: JAI Press, Inc., p. 35-79. Wexley, K. N. and Snell, S. A. (1987). Managerial power: A neglected aspect of the performance appraisal interview. Journal ef mines: Research. 1:. 45-54. Wexley, K. N. and Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal ef Qecnnarional Psychology, BB, 265-275. White, M. C., Crino, M. D. and DeSanctis, G. L. (1981). A critical review of female performance, performance training and organizational initiatives designed to aid women in the work-role environment. Personnel Psyehology, BA, 227-248. Williams, K. J., DeNisi, A. S., Blencoe, A. G. and Cafferty, T. P. (1985). The role of appraisal purpose: Effects of purpose on information acquisition and utilization. Qrganizationel Behavior and Hnmen Deeieien Breeeem. iii. 314-339. Nyer, R. S., Srull, T. K., Gordon, S. E. and Hartwick, J. (1982). -Effects of processing objectives on the recall of prose material. l_rn_lou a ef mac a and Steele]. Perm. £13.. 674-688. Zedeck, S. and Cascio, W. F. (1982). Performance appraisal decisions as a function of rater training and purpose of the appraisal. Jonrnal 2f Annlieg Peycholegy, B1, 752-758. Zedeck, S., Imparato, N., Krausz, M. and Oleno, T. (1974). Development of behaviorally anchored rating scales as a function of organizational level. Jnnrne1 2f Annlied Peyehnlegy, B2, 249- 252. "iiiillilliiilli