m wrmaas MP 622“! W HALO IN A MULTIDIMENSIONAL FORCED-CHOICE PERFORMANCE EVALUATION SCALE By Larry Mason King A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1978 ABSTRACT HALO IN A MULTIDIMENSIONAL FORCED-CHOICE PERFORMANCE EVALUATION SCALE by Larry Mason King Many theorists in the industrial-organizational literature advocate the use of multidimensional measures of performance. However, attempts to construct subject multidimensional rating instruments have encountered persistent problems with rater halo. While rater halo has not been fully defined and explored in the literature in a truly substantive sense, it is generally believed to be a tendency for raters to allow overall impressions of the rater to influence scores on all performance dimensions, either high or low. The present study sought to reduce halo by constructing a forced-choice, multidimensional performance rating instrument. Rater halo was assessed by means of an analysis of variance model. Analysis of published data using this model found an average Of 28% of total variance due to rater halo, and an average of only 8.8% due to trait variance. The rating instrument was a forced—choice scale constructed using conventional methodology. However, unlike previous forced- choice rating scales, the scale used in this study was multidimensional Larry Mason King and was designed to permit the independent assessment of performance in important subareas of performance as well as overall performance. Three versions of the scale were constructed and administered within a large state police agency. Data were collected at eighteen month intervals over a three—year span of time; with a different version Of the scale being employed at each time period. Sample sizes at the three time periods ranged from 740 to 1060. Additional data collected included peer ratings (n-740), graphic ratings (n=300), and supervisor rank orders (n=900). An analysis of the data indicated that the forced-choice instru- ment reduced the amount of rater halo (22) of total variance) from that typically found in the literature (28% of the total variance). Trait variance was also increased (14% versus 8.8%). However, rater halo still made a substantial contribution toward total variance so that distinctions between the dimensions of the rating scale would tend to be blurred. Moreover, most of the trait variance (over 50%) was eliminated from the scale when interpersonal relations was dropped from the scale. Thus, most of the trait variance seemed to have arisen from the rater's ability to distinguish interpersonal relations competence from the other three dimensions: leadership, judgment, and report- writing. The forced-choice instrument was highly correlated with the simpler measure of performance: graphic ratings (r=.77), peer ratings (r-.62), supervisor rank order (r-.65), and promotional potential (r=.45). I Larry Mason King Peer raters give ng_weight to interpersonal relations when assess- ing overall performance, whereas supervisor raters did. Presumably, supervisors are sensitive to interpersonal relations because inter- personal relations are integral to the role of supervisors. However, since the zero-order correlations for the supervisors could not be corrected for attenuation (due to the presence of correlated errors), these results were viewed as tentative. The use of the forced-choice multidimensional rating instrument reduced but did not eliminate rater halo as defined by the analysis of variance model employed in the study. The forced-choice multi- dimensional rating form was highly correlated with other simpler rating instruments. So the incremental costs of constructing a forced-choice performance rating instrument must be weighed carefully against its incremental benefits when compared with the simpler measure. The use of additional raters is the cheapest and easiest way to reduce rater halo. ACKNOWLEDGMENTS I am very grateful to all those who helped me realize my goal of finishing this project. The Michigan State Police provided logistical support and access to needed data. Lieutenant R. T. Davis was a friend and unfailing supporter. I am in his debt. The members of my dissertation and guidance committee, Fred Wickert, Neal Schmitt, and Carl Frost, provided an excellent critical reading of the dissertation which resulted in numerous improvements. Jack Hunter, the dissertation chairman, was a friend, source of encouragement, advisor, and always so very, very helpful in every way imaginable. He spent many hours going over computer printouts and suggesting new and ingenious ways to test hypotheses. He and his wife, Ronda, opened their home to me so that I could complete the writing. I shall never forget their kindness, nor Jack's constant efforts to help me finish. He is what I would like to be. My parents not only paid for most of my education; they developed my interest in finishing school. They sacrificed so that I could have the commodities which stimulated my curiosity about the world which lay beyond West Virginia. Their gift to me was a desire to improve my lot in life. For that, I am eternally grateful. My employer, Mead Johnson & Company, and my supervisor, Lynn Johnson, have been generous in allowing me time to finish writing. Lynn's support and encouragement is very much appreciated. My secretary, Colleen Long, patiently typed and retyped this document. Her typing skills are unsurpassed and her sense of humor made several long Saturday writing and typing sessions more bearable. She gave up much of her free time to help me finish and I shall be eternally grateful. Finally, my wife, Janey, was my inspiration throughout all those years of graduate school. She always encouraged my efforts and supported them without complaint. Her work and devotion made this dissertation possible. Her presence guided me through the completion of every page. I dedicate this dissertation to Janey and our son, Alexander, in the hope that its completion marks the beginning of the best phase of our lives together. ii TABLE OF CONTENTS Page LIST OF TABLES ............ . ......... ...... ...... . ..... .......v CHAPTER I. INTRODUCTION .............. ..... ....... ..... ........... 1 me PrOblem000000000000000000000.000000000000000001 The Measurement of Halo...... ..... ................4 l IIO METEIOD000000 ......... 0000000000000000000000000000000016 Overview.........................................16 The Organization................... ............ ..16 The Measures ...... .............. ......... . ..... ..17 The Forced-Choice Instrument..................l7 Other Measures........... .............. .......22 Peer Ratings................... ..... .......22 Post Commander Ratings.....................23 Graphic Ratings............................23 Civil Service Test for Sergeants ...... .....24 III. RESULTS The Content Dimensionality of the Three Forced-ChOice SC81€S......o.............o......25 Halo malys1800000000000000000000000000000000000029 Stability of the Forced-Choice Measures..........36 Other Criterion Measures.........................37 The Determinants of Rank Order....... ...... ......39 IV. DISCUSSION0000000000000 .......... 0000000000000000000042 V0 CONCLUSION00000 ..... 00.0000000000000000 ...... 0000000046 REFERENCES. 000000000000 0.00000000000000000.0000000000000000047 APPENDICES A. THE MEASURES Time One Forced Choice Scale.....................53 iii Time Two Forced-Choice Scale....... ........... ...60 Time Three Forced-Choice Scale...................68 Graphic Rating Form at Time Two..................75 Post Commander Ranking Form............ ...... ....76 Peer Evaluation Form.............................77 B. TETRAD MEANS AND STANDARD DEVIATIONS AT ALL THREE TIME PERIODS Time 011600000000000.000000000000000000000000000.083 Time Two ........ . ....................... . ....... .84 Time Three.... ....... ........ ................... .85 iv LIST OF TABLES Table Page 1. Estimates of Halo, Trait, and General Variance Components from Published Multitrait, Multi- rater Studies000000000000000000.0000.0000000 0000000 13 2. Comparison of A Priori Tetrad Clusters with Tetrad Clusters by Cluster Analysis.................. ..... 27 3. Correlations between Dimensions at Each Point in Tim000000 00000 0000000...0.00....00000000000000028 4. Trait by Rater Correlations (Uncorrected for Attenu- at10n>atTime1.00oooooooooooooooooooooooooo000.0030 5. Trait by Rater Correlations (Uncorrected for Attenu- ation) at Time 2000.00000.000000000000000...00.000031 6. Trait by Rater Correlations (Uncorrected for Attenu- ation) at Time 3.0 00000 00.000.00.00000000000 000000 032 7. Standardized Variance Components for Each Point in Time (mcorreCted)00.00.0.00000000000000000.000000034 8. Standardized Variance Components Across Time Without Interpersonal Relations (uncorrected)..............34 9. The Correlation Among Forced-Choice Scores Across Time 00000 00000 00000000 00000000000000.000000000 00000 36 10. Forced-Choice Correlations with Peer Ratings, Post Commander Rankings, Promotional Potential, and Graphic RatingSOO0000000000000.0000000000000000000038 Chapter One Introduction The Problem For a variety of practical reasons, most job performance evaluation is done by supervisor ratings. This was originally achieved by having supervisors give a global or overall impression of performance. Global evaluations of job performance have been criticized by writers such as Guion (1965, 1976); Smith (1976); and Dunnette (1963, 1966), among others. These researchers have asserted that global evaluations of job performance do not fully take into account the many quasi—independent aspects which comprise total job performance. At best, a rater making an overall judgment of job performance may take into account only a few of the many components of performance, and those may be inappropriately weighted. To eliminate this alleged deficiency of performance ratings, Smith (1976), Guion (1976), and others have recommended that measures of job performance (criteria) be multidimensional in character. For subjective appraisals, each employee should be rated on each specific aspect of job performance. However, these multidimensional ratings of job performance have been subject to "rater halo," a ratings problem long noted in the personality literature (Johnson, 1945). Halo is a tendency for a rater to make each judgment within a person uniformly higher or lower than udght be expected for each separate trait. It is reflected collectively over many judges by correlations among trait scores which are higher if made by the same judge, than if made by separate judges. Two job performance rating techniques which have been widely used were developed specifically to deal with the problem of rater halo: behaviorally anchored rating scales (Smith and Kendall, 1963) and forced-choice rating scales (Sisson, 1948; Zavala, 1965). The behaviorally anchored rating scale attempts to reduce halo by anchoring the endpoints and middle of each rating scale with examples of behavioral incidents indicative of performance within that dimension at that level. The rater is asked to make his description of the job performance of the individual using specific behaviors rather than abstract traits. The forced choice scale seeks to eliminate halo by forcing the rater to choose which of two or moreequally desirable (socially desirable) adjectives or phrases most accurately describes the employee. Although the adjectives or phrases in the sub- set are equally desirable, only one or more discriminate between good and poor job performance. Neither of these methods has lived up to the original high hopes for them. Little reduction in halo has been ob- served by researchers who have utilized behaviorally anchored rating scales (Schwab and Heneman, 1975; Guion, 1976; Smith, 1976; Bernardin, 1977, 1978; Burnaska and Hollman, 1974). The forced choice approach, though controlling fairly effectively for leniency and halo effects, still ends up with an overall or global rating (Dunnette, 1966; Zavala, 1965; Smith, 1976). Moreover, since it utilizes a hidden or disguised scoring system, the forced-choice method has often proved to be unacceptable to the raters asked to use it; the experience with it in the United States Army, for example, led to its abandon- ment in 1950 because "raters...found it so unacceptable to rate without knowledge of the final outcome that they concentrated on finding ways to beat the system" (Dunnette, 1966, p. 96.) The overall measure of performance provided by typical forced-choice rating scales (while accurate) has been found inadequate by most raters because it yields virtually no performance counseling information. The purpose of this study was to develop and evaluate an improved job performance appraisal technique by synthesizing some aspects of the behaviorally anchored ratings scale method with the forced choice method. The objective was to obtain a performance appraisal instrument which is multi-dimensional in nature, but is subject to significantly less rater halo effect than is typically reported in the literature for other performance appraisal techniques. Such an instrument should yield psychomet— rically more accurate appraisals which are also meaningful in a performance counseling sense. The Measurement of Halo Halo has not been studied as a cognitive process in supervisor judgment, but rather has been cited as the explanation for certain "high" correlations. In early studies, one judge would rate workers on various aspects of job performance. These ratings were then "very high." However, correlated and the correlations were found to be this comparison was based on the assumption that independently defined job performance attributes should be uncorrelated; an assumption generally assumed to be untrue for human abilities in general. This. consideration then led to studies with more than one judge. Each of several supervisors would rate each worker on the various dimensions of job performance. Halo was again inferred by comparing correlations between trait judgments by different judes to trait judgments by the same judge. Correlations between judgments made by the same judge were found to be "much higher" than correlations between judgments made by different judges. The difference in correlations was thought to be produced by a tendency for a judge to let his general impression of the worker to color his other judgments. This interpretation can be restated mathematically in terms of average judgments across traits. The correlational definition of halo stated above is equivalent to the assertion that a judge will assign some workers uniformly higher and some workers uniformly lower scores across traits than would be assigned by other judges. That is, the correlational definition of halo is essentially equivalent to the rater by person interaction in a three way analysis of variance where data is coded as person by judge by trait. The measurement of halo in this study will be done using the model developed by Guilford (1954) and subsequently embellished by Stanley (1961) and by Zyzanski (1962). In this model, each score xjpt is expressed as the sum of several components: a general factor Gp which underlies all judgments of that person across traits and across judges, a trait score Tpt which measures the extent to which the judges see the person's ability on that trait as being above or below their level on the general factor, a halo score ij which measures the extent to which judge j gives generally higher or lower scores on all traits to person p, and an error term ejpt. Actually, in a data set on which there are multiple judgments by each judge on each trait, the "error" term ejpt can be further decomposed into two parts, that part which represents idiosyncracy in how the judge assesses the trait which is not accounted for by halo, and a part which represents random processes in the judgment. (This decomposition has received little attention since most studies have each judge make only one judgment per trait.) Thus, the measurement model can be written: xjpt = Gp + Tpt + “113 + eJ'pt: where p is person p, t is trait t, and j is judge j. Since each trait variable Tt and each halo variable Hj is defined to be a residual from the general factor G, the residualized trait variables and the residualized halo variables are uncorrelated with the general factor by definition. Since the halo variables characterizing judges are defined as residuals from the trait scores, the halo variables are uncorrelated with the trait variables by definition. However, the model makes further assumptions which may not be true in certain data sets. The model assumes that that residualized trait variables are uncorrelated with each other. The model presented above also assumes that the residualized halo variables in the data are uncorrelated with each other. This is equivalent to the assumption that the determinants of halo in one judge are independent of the determinants of halo in other judges. This is also equivalent to the assertion that a correlation matrix of summed judge scores would be a Spearman one factor matrix. There are other homogeneity assumptions which are not necessary to the model, but which make for much easier estimation if they hold. These assumptions can either be made in terms of raw scores if all traits are expressed in the same metric and have the same variance or which can be made in terms of standard scores if'not. The assumptions are homogeniety of variance assumptions: U: = 6::for all t, O: = d: for all j, t J If these assumptions hold for standard scores, then the parameters of the model can be obtained by averaging correlations between the various trait ratings of the different judges on the different traits. The homogeneity assumptions for population correlations are: 1. All correlations between variables involving the same judge have the same value rJ. 2. All correlations between variables involving the same trait have the same value rT. 3. All correlations between variables in which the traits are 1' G. different and the judges are different have the same value If the homogeneity assumptions are satisfied by a given set of ratings, then the Spearman assumptions for both traits and raters are also satisfied. To be concrete, suppose that a certain police precinct is managed by two sergeants, Sam and Bill, each of whom is familiar with all the officers who work from that precinct. Suppose that Sam and Bill each rate each officer on two traits, job knowledge and interpersonal skill. Each officer will then have four scores: Sam's rating on job knowledge, Sam's rating on interpersonal skill, Bill's rating on job knowledge, and Bill's rating on interpersonal skill. If these variables were numbered from one to four (instead of the two subscript notations employed in the model equations above), then there would be six correlations of three types. The first type is the pair of correlations r12 and r34 which are correlations between judgments of different traits by different raters; i.e., estimates of rG. the homogeneity assumptions are (1) r12 = r34 = rJ , (2) r13 = r24 . rT and (3) I14 . r23 = rO. These assumptions are for population correlations and would be sub- ject to sampling error in data gathered in real finite precincts. If the number of judges or the number of traits is larger than two, then the three types of correlations will no longer be present in equal numbers. If each of three judges assessed each person on four traits, then each person would have 12 scores and the correlation matrix would be 12 by 12. Of the 66 distinct correlations, 18 of the correlations would be between judgments made by a given rater or two different raters on the same trait, 12 correlations would be between- judgments made by two different raters on the same trait, and 36 correlations would be between judgments made by two different raters considering two different traits. If the homogeneity assumptions are met (to within sampling error), the parameters can be estimated by averaging the correlations of each type. This follows from the following formulae relating the correlations to variance components. Same rater — different traits: r = E(X x ) x x jt jt' jt jt' = E( (G + H + T + e ) (G + H + T + e'p ) ) p jp pt jpt p jp pt' lpt' = E(G2) + E(H? ) + E(G H ) + ... p JP 9 JP 62 + d2 + O + O + .... G J Same trait - different raters: r = E( x x ) X X jt j't jt j't = E( (G = H + T + e ) (G + H , + T + e ) ) p 39 pt th p J t pt J'pt = E(G2) + E(T2) + E(H H , ) + p p jp j p = 62-+ 02 + 0 + 0 +.... G T Different raters - different traits: rx E(X x x = ) Jt J/t/ jpt j'pt' E( (c + H + T + e ) (G + G + T + e ) ) p JP pt jpt p J'p pt' J'pt' ) +... 2 E(Gp) + E(ijHj,p) + E(TptT OE + O + O + .... pt' 10 That is, we have the population formulae: rJ =&+& G H r'1? =OZ+& G T rG :=q? If we estimate rJ by the mean of the correlations of that type r , and rT by the mean of the correlations of that type r , and rG by the mean T of the correlations of that type f , then we can estimate the model 8 variance components by: 2 __ U’ = r G G r—JQN ll HI I HI Stanley (1961) noted that the preceding model can be related to analysis of variance. The data matrix forms a three facet table which is of the type on which a three ANOVA could be performed; 11 i.e., a persons by traits by judges design. In this design, the general factor is analogous to the "person effect," the halo factor to the "person by judge interaction," and the trait effect ' However, in an often to the "person by trait interaction.' missed paragraph, Stanley notes that the ANOVA model will only give the correct estimates if all three factors are taken to be random; i.e., judges and traits as well as persons must be declared random factors. Furthermore, the appropriate variance estimates corresponding to the correlation averages given above are NOT the sume of squares in the ANOVA, but the variance estimates obtained from the expected mean squares in the all-random—factor model. Failure to note this para- graph has led to the use of slightly erroneous equations in Zyzanski (1962), Boruch, 353;. (1970), and Boruch and Wolins (1970). Estimation of Halo Within the Literature This study involved the construction and evaluation of a performance measuring instrument which would yield multidimensional ratings of job performance uncontaminated by rater halo. To gain an appreciation for the historical degree of halo within multidimensional ratings of job performance, the literature was 12 scanned for studies which utilized multidimensional measures of job performance and reported the data in the form of a multitrait, multimethod matrix. The analysis described in the preceding section requires multitrait, multimethod date as input to estimate general, rater (halo), and trait components of overall variance. After locating the studies, their data were analyzed according to the method outlined in the preceding section. The results of this analysis are displayed in Table 1. Although several writers (Lawler, 1967; Kavanagh, MacKinney & Wollins, 1971; James, 1973, and Smith, 1976) have urged that multidimensional criteria be evaluated through a multitrait, multimethod matrix approach (Campbell & Fiske, 1959), only eleven. studies which utilized the multitrait, multimethod or multitrait, multirater matrix approach to assess criteria could be located. The studies reported utilized primarily behaviorally anchored rating scales, peer nominations, and graphic rating scales. As the data in Table 1 indicate, these studies were singularly ineffective in eliminating the rater halo problem. An average of 30.6% of the criterion variance was due to rater halo effects, while only 8.1% of the variance was accounted for by the trait component. Only two studies produced more trait variance than rater halo variance. Keaveny and McGann (1975) found twenty percent of their variance was due to trait effects, 13 . A333 .053.— uo 105' 3.3!. 1.33 lo: 309;: 95.3 a no 09183 .104»- 03 «o 9.5 you ~15.— auguwa guinea out. uncontr- E .— Jam-«Padan- uo Honda 55!. uaou In) nun-us; use» I «.8 1o 0.3 0.3 c.~ 1n undue: n;— ousqu-u 8236—. 3 3 we on 63333-3 .3qu N N a nausea A: an: beefing: a .0533 .63»... 3 Z a 2 . 3&2: 3.1.8 u a .638 |.. '20 o3. - co ~. a“ census. oagaquu n a an .o .nuauon A3 :2 .3315 a £05.; 3 8 on 2 313 .53» 3.196 n 8 .8: £33.21 03.00 8.313083 «duo‘s—.3 muo— .588. an 8 3 3 £33. .32: 3.1.6 ON 2 u Stine ~. —~ an ea nod-o. snug-u gang-Lu an n no.— .uaasna ”no. .uualaom a 2 we no 2 .33. :5»: 9.21:0 ea . c .33 .uodush Annoy-u 55!: Ion-o. was: we 3 Nu us 1305:. hznuorueun 14. n :2 .uuluon «3.3005 :2 .00: so 3 2 3 ”Icon-5'0: not on n a Sandman odd-u- ncanny ~52 .ui an S a on 10.3423 31312.3: c~ n a daemon Ian-u. 93qu on: :33 an no ~n an ago-.95 53333:». on n a 3-»- o + l M N unison aucolluvsn 3:... .50 039:. . ngm V V W W Quanta... . 333m avg-SSH: «5-323. 352...... .8: 3608.00 093:» aquacoo was .39; .33 we scum-«you a can!" 14 but here the results of student ratings (n=4) were averaged across students for two methods: behaviorally based rating scales and graphic rating scales of professional performance. However, since the same students completed both performance rating forms, the trait correlations across methods were contam- inated to an unknown degree and the results of this study are suspect. In the Schmidt & Johnson (1973) study, the ratings were performed by peers. The subjects were all participants in an intensive forty hour human relations foreman training program. In addition, the ratings on the two dimensions assessed, drive and assertiveness and future success, were done using a forced- distribution rating format. Obviously, the results of this study, which employed peer ratings following intensive training and a forced-distribution rating format, might not generalize well to other performance rating situations such as are commonly encountered within the literature. The literature review reported above indicates a high level of rater halo effect and very little trait variance. In addition, seven of the ten reported studies averaged across raters so that halo within those studies is probably underestimated (since, as the previous section shows, averaging across raters tends to decrease the effect of halo within ratings.) Therefore, the 15 estimate of average halo produced by methods reported within the literature and portrayed in Table 1 is quite probably an underestimate to an unknown degree. Moreover, the type of averages struck within the reported studies might be quite unrealistic compared to the real world where one very seldom is able to average across a number of raters to obtain ratings of job performance. Chapter Two Method Overview The data for this study were gathered during the develop- ment and administration of a forced choice performance evaluation rating scale in the police agency of a large state. This chapter describes the state police organization; how the forced choice scale was developed and administered; differences in the format and administration of the scale at the three time periods studied; and the other measures utilized in the study. Copies of all measures used in the study are presented in Appendix A. The Organization The organization is a state policy atency in a large mid- western state. The state police perform a variety of duties ranging from road patrol (issuing traffic tickets and investigating accidents) through murder investigations. They have responsibilities in densely packed urban areas (cities Of more than one million) and rural areas that consist of farms and small, widely scattered towns and villages. The state police are organized along hierarchical lines, and the field personnel are grouped into posts. There are sixty-six posts in the state; each composed of fifteen to thirty troopers (the sizes varied from post to post), three to five sergeants (at least one per shift), and a lieutenant who is the post commander. The post commander and sergeants (the most immediate supervisors of the troopers) completed forced-choice scales evaluating each trooper at the post. 17 The Measures The Forced Choice Instrument Overview. Each item in the forced—choice scale consisted of a block of four statements called a tetrad. For example: 1. Does not act upon impulse. 2. Is very familiar with the work area. 3. Practices good first aid. 4. Has pride in himself and the department. Each statement in the tetrad wasa positively worded, descrip- tive phrase related to job performance. The rater‘wasasked to select those two phrases among the four which best describe the trOOper being considered. A score on a tetrad wasdetermined by summing the item weights of the chosen phrases from the tetrad. Since two items in the tetrad were always weighted zero, and the other two were always weighted one, the score from a particu- lar tetrad for a trOOper could have been two, one, or zero, depend- ing on which statements were chosen to describe the trOOper. Collecting Behavioral Phrases. All enlisted personnel, in- cluding officers, were asked to write a one-page essay describing the best trooper they had ever known. The purpose of this essay was to produce a list of statements referring to commonly occurring behaviors in the trooper's job. A total of 1200 essays were col- lected and analyzed, with 752 distinct phrases emerging from the analysis. These phrases were then edited to remove ambiguities, 18 misspelled words, or grammatical errors. Some phrases were re- worded to clarify their meaning. Seventy—two state troopers, including troopers, sergeants, and post commanders, then classified the statements into four categories: interpersonal relations, judgment, leadership, and clerical efficiency (report writing). If at least 51 of the 72 item raters (70%) did not agree as to which category a statement belonged, it was eliminated from the phrase pool. This was done in accordance with the method of constructing behaviorally anchored rating scales, to ensure that the statements which re- mained were meaningful and unambiguous in their meaning. After this step 510 phrases remained in the pool. Each of the remaining phrases was positively worded and behaviorally oriented. Constructing the Forced Choice Scale. There were three primary steps in the construction of the forced choice tetrads: 1. Development of the Discrimination Index: The phrases were collected into a booklet. This booklet was sent to all supervisory personnel at the posts throughout the state. The super— visors were asked to bring to mind the be§£_ trOOper they had ever known. The supervisors then rated that trooper on each of the phrases, using a five-point scale. 19 1. Not an accurate description of this man at any time. 2. Usually not an accurate description of this man. 3. Sometimes an accurate description of this man. 4. Usually an accurate description of this man. 5. Almost always an accurate description of this man. A few weeks later, the procedure was repeated, except that the supervisors were now asked to bring to mind the poorest trooper they had ever known. The discrimination index for each phrase was derived by subtracting the rating for poorest trooper from that the for the best trooper. A high difference score indicated that the statement applied very much to the best trooper, but very little to the poorest trooper. Conversely, a low difference score indicated that the statement applied equally well to the best and poorest trooper. 2. Development of the Job Importance Index: Again, the booklet of phrases was sent to the post super- visors. This time, they were asked to rate the job importance of each of the statements on a five-point scale: 1. Of no importance to the job. 20 2. Of minimum importance to the job. 3. Somewhat important to the job. 4. Quite important to the job. 5. Of great importance to the job. Putting Together the Scale: The tetrads were assembled so that all four phrases had similar job importance index scores. The aver- age job importance index was 4.24 on the five-point scale. However, two of the phrases had higher dis- crimination scale values than the other two phrases in the tetrad. Those statements which had higher scale values were given a weight of 1, those with a lower scale value were given a weight of zero. Weighted items had an average discrimination scale value of 2.35; zero weighted items had an average discrimination scale value of 1.54. In the following example of job importance and dis- crimination scale values for a tetrad, the first two items would be keyed l and the last two keyed 0: 21 STATEMENT DISCRIMINATION INDEX JOB IMPORTANCE INDEX 1 2.61 4.11 2 2.72 4.10 3 1.31 4.12 4 1.28 4.14 Each tetrad was assembled so that the differ- ences between any two phrases on the job importance index could not be greater than one standard error. However, the difference between any two phrases on the discrimination index had to be at least two standard errors. Administering the Forced Choice Scales. Three versions of the scale were constructed corresponding to the three time periods of the data collection. At time one, the scale consisted of twenty tetrads and was completed by the post commander and at least four post sergeants. At time two, twenty-two new tetrads were constructed and combined with three tetrads from time one to form the scale, and it was completed by the post commander and at least three post sergeants. At time three, the best tetrads from the scales at time one and time two were combined to construct a twenty tetrad scale which was completed by the post commander and at least three post sergeants. The post sergeants who completed the scale were the same for each trooper at that time period. However, over time, change due to attrition among sergeants did occur among those who completed the scale, so that raters were not 22 strictly comparable across time. Despite the variations in number of items and number of raters, across time, significant changes in scale reliability did not occur. The twenty item, four-rater format was finally adopted as reliable and administra- tively efficient. Other Measures Peer Ratings Peer ratings were obtained at each post. These ratings were in a forced distribution form; each trooper was asked to rank the troopers at his post into categories, including himself, to pre- serve anonymity: upper 102, next 207., middle 40%, next lowest 20%, and the lowest 10%. A letter signed by the state police commander was written in 1974 to each trOOper requesting his participation in the project. The letter explained the research nature of the peer rating and guaranteed the confidentiality and anonymity of the results. The trooper was explicitly told that the rating would not be used in any way to evaluate another trooper. In addition, the state police troopers' association allowed the writer and the assistant personnel director of the state police to address a meeting of the association's post representatives to solicit their support for this phase of the project and to again give assurance as to the purpose of the ratings. 23 Two hundred and fifty troopers responded to the request and completed the peer rating form which they received. This represents a 25% response rate for this phase of the project. However, the ratings were completed on 74% of the troopers in the state police (740) representing thirty-nine of the forty- eight posts. The peer ratings were averaged within posts (average coefficient alpha = .78), and all analyses reported in this paper on peer rating data used the average peer ratings. Post Commander Rankings In 1977, after completing the forced choice rating for the third time, rankings were obtained from the post commanders. The post commanders were asked to rank the troopers at their post from top to bottom in terms of overall performance. This data was transformed to a five-point scale: upper 10%, next 20%, middle 40%, lower 20%, and last 10% as was done with the peer ratings. All post commanders cooperated in completing this phase of the project. Graphic Ratings At time two, the post commander filled out an employee rating form consisting of fifteen statements in the four areas of per- formance assessed in the forced choice scale. For each of the fifteen statements, the response format asked the post commander to rate the trooper as: Not at all like this Not very much like this 24 Somewhat like this Very much like this Exactly like this Civil Service Test for Sergeants Each three years, the trOOpers were evaluated for possible promotion to sergeant. The promotional total score consisted of two parts: a written examination which tested job knowledge and "promotional potential points." The "promotional potential" is the average of the ratings given by a board of three raters after reviewing the post and district commander's recommendations which were supported by a brief narrative. This score could range from zero to twenty-five,’ though scores below fifteen were very rare. Promotion scores were obtained by randomly sampling three hundred personnel folders. Chapter Three Results The Content Dimensionality of the Three Forced Choice Scales The tetrads were originally grouped into clusters (dimensions) on an a priori basis determined by tetrad content. After administering the scales at each point in time, the tetrad scores were intercorrelated and the resulting correlation matrices were subjected to a cluster analysis (Hunter, 1977; Tyron and Bailey, 1970) using Hunter's and Cohen's (1969) PACKAGE system of computer routines. The cluster solutions for each point in time are presented in Appendix B. Item means and standard deviations are in Appendix C. The purpose of the cluster analysis was to test the a_ priori grouping of items for conformity with the data. A cluster analysis which confirmed the original grouping of items would add considerable confidence to the interpretation of a multidimensional criterion. The following criteria were used in forming homogeneous clusters: (a) internal consistency - all items in a cluster should be correlated more highly with their own cluster than with any other cluster and the pattern of correlation within the cluster should form a Spearman matrix, (b) external paral- lelism — the size and magnitude Of the correlation between all items within a cluster and any other cluster must be similar, and (c) homogenity of cluster content - it should be reasonable, based upon content, that the items which make up a cluster share some common variance. 26 Following the cluster analysis, actual cluster location for each of the items was compared with predicted cluster location. Table 2 contains the actual versus predicted clusters for each of the tetrads. The last row contains the residual tetrads which were not included in the final form of the cluster solution. In general, the residual tetrads had low reliability or ambiguous content. An inSpection of Table 2 indicates that the cluster analysis tended to confirm the a_priori item grouping. At time one, there were four items which deviated from the predicted structure, three of which entered the residual clusters. In the new items written for time two, there were six items which deviated from the pre- dicted scale structure, five of which entered the residual cluster. At time three, when tetrads from times one and two were repeated, only one deviation occurred and this tetrad entered the residual set. No new clusters emerged from the cluster analysis and the clusters which emerged were repeated, as predicted, at each new time period. New tetrads tended to enter the predicted clusters. Thus, the overall pattern of results strongly supports the original content analysis of the statement pool. The dimensions were intercorrelated at each point in time. These correlations (which are corrected for attenuation) are presented in Table 3 which also contains the sample sizes and 27 2 2.4.2 2 _ n 2.26 133qu a~.n~ .Na.» an . ca.- .53 no ma.na .a.o.~ a as.aa o~.m.n.n .euoa m «use o.a.ma an o~.a~ .Na.w . mu -.ma.s~ .aa.o~.~ a q~.n~ £13m; .33 «and. o~qn~ .e~.e.a on «a o_.n mo oa.m .o.n a 52.na .-.a.~ .euua a «can «H no a 33 an no a .33 A: no a .33 n «as» N «say a «any ounhfiunc wounsuu he auounanu vuuuya :uH3.nuuumsHo panama «nowum ¢.uo soauuammmm. N oases 28 on on mm mm ms nu ma om mm ~m on Ha mag maowumaum 0." an ac cn 0.” mm um mm o.” mm as ch HunonuonuwucH muauauuuuu o." h» we o.~ Nu mo o.~ on an Huuuuuflo 0.“ «a o.~ q» o.“ om acuamvan o.~ o.~ o.~ awsmuovm «H .uuHo .mvan .vmmq mH .uaHu «muse .vmwg _ «H .uuao .muan .uuua Aooo~lav m mafia Aooofilcv N «Bah Aochuzv H mafia oawH aw umwom now» an mcoamcuawn Guyana; mcouuufiouuoo .m «Hawk 29 cluster reliabilities (coefficient alpha). The average correla- tion between traits is .67. However, since these correlations are corrected for attenuation, it is reasonable to note that .67 is far below 1.00; i.e., the clusters are statistically distinct. The correlations were not 1.00, but they were much higher than expected. Most of the "low" correlations were between interpersonal relations and other traits. Thus, while the cluster intercorrelations supported the multidimensional structure of the scale, they were very high. Halo Analysis At each point in time, the trait by rater correlation matrix was subjected to halo analysis. These correlation matrices are presented in Tables 4, 5, and 6 with trait correlations underlined in the Table. Rater alphas (within trait) are in the diagonal of the matrix. Although each scale had only a few items at each point in time, rater alphas tended to be quite good (average alpha = 65). Therefore, raters appear to have responded in the same general fashion on each item of the various scales. Correlations between the leadership, clerical efficiency, and judgment clusters tend to be high at all three time periods, although correlations of interpersonal relations with the other traits appear to be smaller. Correlations within rater tend to TRAIT LEADERSHIP JUDGMENT REPORT RELATIONS PERSONAL INTER! TRAIT BY RATER CORRELATIONS (Uncorrected for Attenuation) AT TIME 1 30 TABLE 4 ‘ , INTERPRRSONAL RLPORT JUDGMENT LEADERSHIP RELATIONS RATER 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1, 2 3 4 5 1 65 2 3g_66 3 2.9g” . 4 23.22.29 69 seesaw 1 39 26 23 31 18 66 2 24 44 24 27 23 3g 64 3 23 31 38 31 25 ‘33.33 65 4 21 22 22 44 21 39_33_33_65 5 18 24 20 21 39 33 33 39 33 60 1 54 34 3o 38 27 64 39 4o 39 35 85 2 28 57 26 37 26 36 68 42 32 35 33_84 3 26 35 48 39 31 31 45 66 34 38 §§_§Q_82 4 26 29 31 53 25 36 32 36 64 35 .3g_33.33 81 5 28 38 32 29 45 34 41 37 37 62 £2.£_.§_.£Z 80 1 4o 28 24 19 17 46 35 33 24 23 62 45 35 31 35 74 2 19 45 22 24 19 27 58 31 27 32 34 67 36 32 37 34 79 3 15 28 4o 26 19 17 36 46 25 3o 31 42 61 35 38 .33.33 76 4 15 22 23 38 22 23 28 3o 48 26 32 36 35 6o 33 53.33_3g 75 5 19 25 23 21 34 22 29 29 27 43 29 34 34 35 57 .53.43 5 43 73 INTER- 31 TABLE 5 TRAIT BY RATER CORRELATIONS (Uncorrected for Attenuation) AT TIME 2 . INTERPERSONAL T : TRAI REPORT ' JUDGMLNT LEADERSHIP RELATIONS 11711811 12341234 1234 1234 1 65 ... 2 3g 69 ‘5‘ 3 3939 65 E; 4 22232.8. 71 1 53 3o 29 35 86 E 2 31 52 3o 34 3g 85 3 3o 27 so 27 3933 84 g 4 39323451 __7_2_6__5_5_84 a 1 51 3o 30 34 73 48 45 52 85 g, 2 31 60 31 35 46 71 39 47 5_8 84 1a 3 25 28 so 26 35 39 7o 42 3333 85 g 4 3836 35 57 444743 69 _5_8__@§{383 m 1 17 08 11 08 26 '11 14 14 4o 17 19 21 67 3% 2 16 22 06 09 22 3o 11 15 28 4o 17 24 3g 68 or; 3 07 07 28 07 1o 12 3o 09 19 21 45 19 3939 66 25 4 161109 34 171713 29 26 2519 43 3333363 In: ' Ba TRAIT LEADERSHIP JUDGMENT REPORT RELATIONS PERSONAL INTER- 32 TABLE 6 TRAIT BY RATER CORRELATIONS (Uncorrected for Attenuation) AT TIME 3 INTERFERSONAL REPORT JUDGMENT LEADERSHIP RELATIONS IRATER 1 2 3 4. 1 2 3 4 1 2 3 4 1 2 3 4 1 72 2 33 69 3 .2£.2£ 58 4 EQH§£.§§ 63 1 68 34 35 35 79 2 33 63 31 32 33 75 3 ' 29 33 56 34 .33 33 74 4 37333355 £33335 1 66 34 37 43 78 45 45 51 89 2 33 63 34 34 46 76 43 43 33_87 3 3o 30 54 37 41 39 73 47 .22.£Z 85 4 43 34 38 59 46 4048 75 33333387 1 15 04 07 13 25 12 O6 11 33 18 17 16 52 2 14 27 07 11 15 31 11 17 22 36 19 19 ‘33 59 3 13 17 17 07 17 16 25 14 22 18 33 17 .33.33 56 4 -16 15 11 20 21 16 15 32 27 17 21 35 _3H33.33 6o 33 be quite high, in general, with the exception of interpersonal relations again. This pattern of correlations across time suggests the presence of considerable halo within the ratings. To determine the exact amount of halo, trait, and general factor variance in the ratings, the trait by rater intercorrelation matrices were subjected to the halo analysis described in chapter one. Table 7 contains the results of the halo analysis. The average halo across the three time periods is 21% of the total variance (or 28% of the variance if measurement error is deleted). The general factor, halo and interaction components together, comprise 61% of the total variance (or 81%, if measurement error is neglected). The trait variance is only 15% of the total variance (18% of the variance if measurement error is eliminated). Interpersonal relations has average correlations lower than those for the other dimensions. Hence, the trait variance observed in the preceding analysis may have arisen from the difference between inter- personal relations and the other traits. A second analysis was performed omitting interpersonal relations from the trait by rater matrix. The results of this analysis are presented in Table 8. The trait effect dropped from 14% of the total variance to 6%. Thus, more than half the trait variance in the original four dimensions appears to be accounted for by the trait effect of interpersonal relations. While the trait 34 Tgple 7 Standardized Variance Components for Each Point in Time (Uncorrected) Time One Time Two Time Three Average Person 29 25 27 27 (General) . Rater 23 20 20 21 (Halo) Trait 13 16 14 14 PRT Inter- 13 13 13 13 action Measurement 22 26 26 . 25 Error M Standardized Variance Components Across Time Without Interpersonal Relations (Uncorrected) Time One Time Two Time Three Average Personal 32 34 37 34 (General) Rater 22 21 26 23 (Halo) ' Trait .06 .07 .05 .06 PRT Inter- 17 15 .06 13 action Measurement 23 23 26 24 Error 35 effect does not vanish after removing interpersonal relations from the trait by rater matrix, it does become very small. The measurement error component is 64% of the usual "error term," the other 36% is due to the person, rater, trait interaction representing idiosyncratic perceptions of traits by raters. The use of the forced choice multidimensional rating form to obtain performance ratings seems to have reduced the rater halo portion of total variance from the average noted in the literature in Chapter one (28%) to 21%. Thus, the use of the forced choice format does appear to have reduced the rater halo variance while increasing the trait variance from 9% to 14%. However, the rater halo contribution to the variance is still significantly larger than the trait variance component. This means that the use of dimensional scores in a counseling sense would be diffi- cult due to their inability to effectively discriminate among the job performance dimensions. 36 Stability of the Forced Choice Measures To assess stability, total scores were computed for each rater and then averaged. That is, the score for each trooper is the average total rating of four supervisors. The correla- tions among the forced choice measures at the three time periods are presented in Table 9 3 Forced choice scores correlated .71 and .72 for a time interval of 1.5 years and .55 for a time interval of three years. According to Heise (1970) this means a stability coefficient of .93. This is essen- tially identical to the average alpha of .91 from Table 9_; thus, there appear to be no transient factors in the error of the forced choice scale. Table 9 The Correlations Among Forced-Choice Scores Across Time Time 1 Time 2 Time 3 Time 1 1.0 Time 2 .71 1.0 Time 3 .55 .72 1.0 37 Other Criterion Measures Peer ratings, the post commander rank-ordering, the promo- tional potential, and the graphic rating were correlated with the forced choice scores at the appropriate time periods. The results of this analysis are presented in Table 10 . Peer ratings were obtained at Time 1 and correlated most highly with the forced choice measure at Time 1 (r = .62). The correla- tions across time showed a slight decline: Time 2,r = .57; Time 3, r =.55. The post commander rank ordering was obtained at Time 3 and correlated most highly with the Time 3 forced choice rating, r = .65 and less highly with earlier time periods: Time 2, r = .56, Time 1, r = .46. The correlation between post commanders' rank orderings at Time 3 and peer ratings at Time 1 was r = .49, nearly equal to the cross time correlation of .55 between the Time 1 and Time 3 forced choice measurements. The graphic rating equivalent of the forced choice scale was obtained at Time 2. As noted in the Method chapter, this rating was completed by only the post commander (though he was encouraged to seek input from the post sergeants in completing this scale). Hence, the correlation of .77 with the forced choice measure ob- tained at about the same point in time is much lower than the cor— relation that would have been obtained by averaging the graphic ratings of four judges to reduce halo, the idiosyncratic components, as well 38 a." fin. on. on. On. us. so. 0.“ he. ow. «m. an. o¢. o.H me. me. an. we. o.~ mm. pm. No. o.~ mm. mm. o.~ an. Arqfljoum u up EMS u mom 092E. 039 WWW 1800 mafia uEwH mafia mwcwumm uacmmuu van .Hmfiucouom chowuoaoum .mwnaxcmm Housmaaou umom .mwawumm hook now: ncoaunaohuoo oouonu vouuom OH «Hana magnum oweamuo Hmwucuuom Hmcoauoaoum Moonwaaoo umom wnwuau uoom n «Bah N «nah a weak 39 as measurement error. The correlation of .77 is also lower than had the forced choice score come only from the lieutenant since the idiosyncratic components would have been common to both variables. After time of administration and number of judges are taken into account, the forced choice scales, the graphic rating scale, and the post commander rank order appear to be essentially equivalent. The promotional potential appears equivalent also, but with lower reliability. The Determinants of Rank Order Do post commanders use the job performance dimensions assessed in the forced choice scales in arriving at overall judgments of performance? The post commanders' rank orders were regressed onto separate scores for the job competence factor (the average of the person's score on leadership, judgment, and clerical efficiency) and for interpersonal relations. The correlations were r11. .35; ,com = .34; and r - .64. The resulting regression Ir,rating com,rating equation was: post commander rank order = .58 (Competency)-+ .11 (Interpersonal Relationships), so both aspects of performance appear to be taken into account by the post commander in making his overall performance ranking 40 assessment, although more than three times as much weight is given the general job competency factor. Other researchers (Zedeck & Kafry, 1977) have discovered evidence which suggests that different raters might weight the dimensions of job performance differently in arriving at overall conclusions on job performance. That is, some research evidence suggests that raters might be reliably classified into subgroups which weigh the dimensions of job performance differently in arriving at overall performance assessment decisions. For example, commanders at rural posts might rate interpersonal relations more highly than commanders at urban posts. Posts were compared on three dimensions: (1) size (median split based on the number of troopers at the post); (2) rural versus urban; and (3) the rated diffi- culty of administering the post (posts were divided into two blocks by the state government on the basis of size, population, presence of absence of colleges nearby, presence or absence of nearby expressways and so forth. Separate regression equations were calculated for each of the separate blocks of post commander ratings and compared. However, no nontrivial differences were observed between any of the regression equation pairs. Other researchers have suggested (Guion, 1965; Klimoski and London, 1974; Lewis and Zwany, 1976) that peers and supervisors 41 may use different dimensions to evaluate job performance. For example, peers may attach more weight to interpersonal relations in making overall performance assessment. Previous results (Table 10) have indicated that peers ratings and post commanders' rank orders tend to be approximately equal 62; =.65) in regard to their degree (rpeer,FC1 =° rpostcom,fc3 of agreement with the forced choice ratings of performance, but peers and post commanders may have employed a different weighting scheme to arrrive at their respective overall conclusions. Peer ratings were regressed onto the general job competency and inter- personal relations factors. The individual correlations were r =.64; r, = .35; and r =.60. The r,comp 1r,rating comp,rating resulting regression equation was: Peer Rating = .64 (competency) -.06 (interpersonal relations). Thus, peers give no weight to interpersonal relations in arriving at overall judgments of job performance. Although the results appear to indicate a difference between peers and pcst commanders, the results must be interpreted with caution because the correlations are affected by the presence of correlated as well as uncorrelated errors in the prediction . Chapter Four Discussion The forced choice scale was successful by all the usual psychometric criteria. The parallel form reliability of the total score was .90, and the test-retest reliability was .72 over an 18 month interval. The scale was not subject to trans- ient errors, and it correlated .62 with peer ratings and .77 with graphic ratings. Moreover, both administrative officers and field commanders indicated through personal and group interviews that they believed the scale to be fair and valid. The conditions for multidimensionality of the scale were also met. The behavioral statements were successfully grouped into four content areas with a very high percentage of agreement among the raters. The initial cluster analysis of the forced choice items deviated only slightly from the predicted structure. Those failures were caused primarily by items entering the residual set; only two items changed from the predicted cluster to another, and no new clusters emerged from the analysis. New items written at Time 2 fell into the same pattern of predicted results. When items from Time 1 and Time 2 were repeated at Time 3, they fell within the same structural patterns and clusters noted at the previous two time periods. The clusters which emerged from the data analysis were unidimensional, parallel, and statistically distinct. The average correlation within raters was .48 compared with average cluster alpha of .62. 43 Despite having reliable, conceptually distinct rating dimensions, the halo effect averaged across the three time periods was 21% of the total variance and 28% of the true score variance. The trait variance was only 14% of the total variance and 21% of the true score variance. When the inter- personal relations cluster was removed from the analysis, the percentages for the trait variance were reduced to 5% and 7%, respectively. Thus, the use of behavioral items and the forced choice format did reduce halo from the level found using more conventional measures of job performance; however, it did not increase trait variance as much. Apparently, the use of the forced choice approach to multidimensional job performance evaluation increased the amount of trait variance and the amount of idiosyncratic variance in the scale. This hypothesis cannot be fully tested since the studies reported in the literature did not utilize a multiple judgments format which would permit the assessuent of the degree of idiosyncratic variance in published studies. The level of correlation between the forced choice scale, and the other simpler means of assessing job performance (i.e., the graphic rating scale, peer ratings, and post commander rank order) suggests that the forced choice ratings were not very different from those obtained by means of the simpler job performance measure- ment schemes. The forced choice scale is a multidimensional measure 44 of job performance; yet, if error of measurement is deleted, it was extremely highly correlated with the unidimensional, global assessment of performance whether peer ratings, rankings, or graphic ratings. These results tend to disconfirm the belief that multidimensional criteria are inherently superior to global assess- ment of performance. If the primary problem in a rating situation is ceiling effects, then the rank order is a simple alternative to the more expensive and time consuming forced choice scale. In this study, the post commander rank order correlated .65 with the forced choice instrument. The corresponding correlation for a one-rater forced choice scale is .79. Had this rank order been merged over four raters (as was the forced-choice scale, which_ reduced error variance, idiosyncratic variance, and halo variance by a factor equal to the number of raters employed), it would have been almostidenticalin.quality. If the goal is to reduce halo, the easiest solution would be to increase the number of people completing the rank-orders and average those rank orders. THis is a straightforward conclusion drawn from the AOV definition of halo as employed in this study. Another finding of the study was that lieutenants (post commanders) appeared to give greater weight to interpersonal relations than did peers. While this finding may be contrary to conventional wisdom (i.e., peers cannot be trusted to rate one 45 another because liking or friendship factors might interfere), it would follow from a consideration of the difference in role expectations for the two groups. Lieutenants are managers whose work frequently consists of solving human prdblems among peers or with the public. Thus, they would be more likely to consider interpersonal relations as part of "work" than would troopers. As noted earlier, these findings must be regarded with some suspicion because the multiple regression correla- tions were not corrected for attenuation. Conclusion The purpose of this study was to develop a rating format which would allow a multidimensional assessment of job performance while reducing the degree of rater halo from that typically found within the literature. The degree of rater halo was assessed by an analysis of variance model which essentially compares within rater correlations to between rater correlations. Eventfluause of behaviorally anchored item stems in a forced choice format does not completely eliminate rater halo as defined in the study. Indeed, rater halo in this study accounted for over 20% of the variance. Furthermore, there was no appreciable difference between the unidimensional global rank orderings of the troopers by their lieutenant and the multidimensional foréed choice scale. That is, this study found no support for the widely hypothesized superiority of multidimensional over unidimensional performance evaluation. Multi- dimensional measurement may be more useful for scientific purposes or counseling purposes, but it may offer little or no practical advantage if test validation is desired. The key to the reduction of rater halo as defined in this study in supervisor ratings is to use multiple supervisors. 46 REFERENCES LIST OF REFERENCES Bernardin, H. J. Behavioral expectation scales versus summated scales: A fairer comparison. Journal of Applied Psychology, 1977, 33, 444-427. Bernardin, H. J. Effects of rater training on leniency and halo errors in student ratings of instructors. Journal of Applied Psychology, 1978, 33, 301-308. Borman, W. C. The rating of individuals in organizations: An alternative approach. Organizational Behavior and Human Performance, 1974, 33, 105-124. Boruch, R. F., Larkin, J. D., Wolins, L., and MacKinney, A. C. Alternative methods of analysis: Multitrait - Multimethod data. Educational and Psychological Measurement, 1970, 3Q, 833-853. Boruch, R. F., and Wolins, L. A procedure for estimation of trait, method, and error variance attributable to a measure. Educa- tional and Psycholpgical Measurement, 1970, 39, 547-574. Burnaska, R. F., and Hollmann, T. D. An empirical comparison of the relative effects of rater response biases on three rating scale formats. Journal ofyApplied Psychology, 1974, 32, 307-312. Campbell, D. T., and Fiske, D. W. Convergent and discriminant valida- tion by the multitrait-multirater matrix. Psychological Bulletin, 1955, 33, 281-302. Dickinson, T. L., and Tice, T. E. A multitrait-multimethod analysis of scale developed by retranslation. Organizational Behavior and Human Performance, 1973, 2, 421-438. Dunnette, M. D. A modified model for test validation and selection research. Journal of Applied Psychology, 1963, 31, 317-323. Dunnette, M. D. Personnel Selection and Placement. Belmont, California: Brooks/Cole Publishing Company, 1966. Guilford, J. P. ,ggychometric Methods. New York: McGraw-Hill, 1954. 47 Guion, R. M. Personnel Testing. New York: McGraw—Hill, 1965. Guion, R. M. Recruiting, selection, and job placement. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology. Chicago: Rand McNally, 1976. Heise, D. Separating reliability and stability in test-retest correlations. American Sociological Review, 1969,.33, 93-101. Hunter, J. E. Cluster analysis: Reliability, construct validity, and the multiple indicators approach to measurement. Paper presented at workshop entitled: "Advanced Statistics," U. S. Civil Service Commission March 21, 1977. Hunter, J. D., and Cohen, S. H. Package: A system of computer routines for the analysis of co-relational data. Educational and Psychological Measurement, 1969, 32, 677-700. James, L. R. Criterion models and construct validity for criteria. gsychological Bulletin, 1973, 80, 75-83. Johnson, D. M. A systematic treatment of judgment. Psychological Bulletin, 1945, 31, 193-224. Kavanagh, M. J., MacKinney, A. C., and Wolins, L. Issues in managerial performance: Multitrait-Multimethod analyses of ratings. Psychological Bulletin, 1971, 13, 34-49. Keaveny, T. J., and McGann, A. F. A comparison of behavioral expectation scales and graphic rating scales. Journal of Applied Psychology, 1975, 39, 695-703. Klimoski, R. J., and London, M. Role of the rater in performance appraisal. Journal of Applied Psychology, 1974, 32, 445-451. Lawler, E. E. The multitrait—multirater approach to measuring managerial job performance. Journal of Applied Psychology, 1967, 33, 369-381. Lewin, A. Y., and Zwany, A. Peer nominations: A model, literature critique, and a paradigm for research. Personnel Psychology, 1976, 32, 423-447. 48 Nealy, S. M., and Owen, T. W. A multitrait-multimethod analysis of predictors and criteria of nursing performance. Organizational Behavior and Human Performance, 1970, 2, 348-365. Schmidt, F. L., and Johnson, R. H. Effect of race on peer ratings in an industrial situation. Journal of Applied Psychology, 1973, 21, 237-241. Schwab, D. P., and Heneman, H. C., III. Behaviorally anchored rating scales: A review of the literature. Personnel Psychology, 1975, 33, 549-562. Sisson, E. O. Forced-choice: The new army rating. Personnel Psy- chology, 1948, 1, 365-381. Smith, P. C. Behaviors, results, and organizational effectiveness: The problem of criteria. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology, Chicago: Rand McNally, 1976. Smith, P. C., and Kendall, L. M. Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 1963, 31, 149-155. Stanley, J. D. Analysis of unreplicated three-way classifications, with applications to rater bias and trait independence. Psychometrika, 1961, 33, 205-219. Tryon, R. C., and Bailey, D. E. Cluster analysis. McGraw—Hill: New York, 1970. Tucker, M. F., Cline, V. B.,\and Schmitt, J. R. Prediction of crea- tivity and other performance measures from biographical informa- tion among pharmaceutical scientists. Journal of Applied Psychology, 1967, 21, 131-138. Zavala, A. Development of the forced-choice rating and technique. Psychological Bulletin, 1965, 33, 117-124. Zedeck, S., and Baker, H. T. Nursing performance as measured by behavioral expectation scales: A multitrait-multirater analysis. Organizational Behavior and Human Performance, 1972, 2, 457-466. Zedeck, S., and Kafry, D. Capturing rater policies for processing evaluation data. Organizational Behavior and Human Performance, 1977, 33, 269-294. 49 Zyzanski, S. J. Analysis of variance applied to factors which do not have comparable scales. Unpublished master's theses, Iowa State University, Ames, Iowa, 1962. 50 APPENDICES APPENDIX A: THE MEASURES 53 PART I Select two phrases which best describe the trooper. Remember to treat each set of statements independently. 1 He makes good contacts with both the general public and public officials. He leaves a very good impression of the department with the younger generation. He does not accept or solicit gifts or services from the public. He knows the criminal element in the post area. .2. His attitude toward the job is one of sincerity and belief that the job is important. I He has pride in the department and himself. He keeps the firearms that he carries clean and in proper working order. He never compromises a principle or writes a ticket just to be number one on the activity sheet. .2 He seems to know when the letter of the law should be discarded in favor of the spirit of the law. He practices good first aid. He takes advantage of resource materials department-wide. His reports convey meaning without the use of superficial and excessive language. 3. 4. 1. 2. 54 3 He does not present a false front to command officers and fellow workers. He is diplomatic with the public. He knows his limitations. He knows the trouble areas in traffic and he works them. 2 He makes good informative reports that can easily be followed up by others if necessary. He writes reports that you can read and knows just what has taken place. He keeps the firearms that he carries clean and in prOper working order. He never compromises a principle or writes a ticket to be number one on the activity sheet. 2 He handles state property with respect. His reports convey meaning without the use of superficial or ex- cessive language. He practices good first aid. He treats the patrol vehicle like it was his own. 7 He is able to take a recruit and mold him into an efficient trOOper. He is sincere about his job. 55 At accident scenes he seems to know at once what is needed. He does not accept or solicit gifts or services from the public. 9. He refrains from becoming involved in compromising situations. His knowledge of his working area and use of informants is exceptional. When his issued equipment is in need of repair or replacement, he takes care of it. He is versatile in all aspects of law enforcement. .9. He can smile and listen to a citizen's story. He won't hurt a fellow officer's chance to get ahead. He gives credit for something you did while working together without trying to get it himself. He has developed a mutual respect between himself and command officers which permits him to question in a constructive way. 39 At accident scenes he seems to know at once what is needed. He is an excellent interrogator of both suspects and witnesses. When handling an investigation, he carefully organizes a case. He knows how to operate instruments and equipment related to his work. 56 .13 He is dedicated to the principles of police work. He knows the criminal element in the post area. He has the ability lead without necessarily commanding. He does not act upon impulse. _1_2_ He adheres to the rules set up by his employer. His police working tools, clip board, shoulder weapon, patrol car and equipment are checked and ready for use. He utilizes all investigative tools which are available to him. At accident scenes he seems to know at once what is needed. 33 He always sets an example which others endeavor to follow. He knows his limitations. His peers frequently look to him to take charge of a difficult or dangerous situation. He writes quality traffic summons rather than quantity. 14 He uses reason and logic to its fullest extent. He has a good knowledge of the area in which he works. 57 He is courageous, yet cautious. He makes excellent use of his discretion in making arrests, handling complaints, and general routine duties. 15 '— He makes it a point to stop in at other departments and visit from time to time. He is very highly thought of in the community by local citizens. This trooper can work with anyone at the post without arousing any ill feelings. He takes into account what is youthful exuberance, and what is criminal. l9. His reports are neat, thorough, accurate, and "on time." He is careful and serious in the preparation of reports. He knows the criminal element in the post area. He does not act upon impulse. 17 He has the ability to make other members of the department "want" to follow his example. He is actually looked upon by the recruits as a model they should strive to emulate. He knows the trouble areas in traffic and works them. He is capable of handling any departmental equipment made avail- able to him. 58 18 _— 1. He knows when to Speak and when not to speak. 2. He makes decisions promptly, but not hastily. 3. He does not act upon impulse. 4. He writes quality traffic summons rather than quantity. 19 -—.— 1. He takes good care of departmental equipment. 2. When his issued equipment is in need of repair or replacement, he takes care of it. 3. His reports convey meaning without the use of superficial and excessive language. 4. He refrains from becoming involved in compromising situations. 20 .— 1. He treats the public, other departments, courts, news media, etc., with respect and gains their respect in return. 2. His approach to the public is personable and polite. 3. He is very familiar with the area in which he works. 4. He uses restraint, instead of force, when possible. After completing Part I of the Inventory, please return it and the response sheet to the Personnel Division. The post commander should now evaluate Part II. He should consult ‘with the post sergeants in completing this part of the Inventory. You have eight weeks to submit Part II from date of receipt. 59 STATE POLICE 401121111123 4 02172101111531 INVENTORY (111320351: worm - PART I) W UHLlulnll l 1 1711mm ; 1: l ‘ First H . Enlistment Date:[lVI I 1 1 1 1 Race: [::] Post Number; [::E::] Describer's Vsme: Last ~ First H Civil Service Classification: i I I 4 12 3 4' 1 2 3 -1:]C31:1C]'11.[:][:]1:113' p . Phrase Number . . . .DDBD mama: -DDDD seams -DDDD mamas 3DDDD mamas -DDDD WEDGE ~DDDD mbmam -DDDD mamas USDDQ manna madam mamas N U 5 U 0 H Describcr‘s Signature: Date: NOTE: 6O ACHIEVEMENT AND DEVELOPMENT INVENTOR: (1) Select two phrases which best describe the trooper. (2) Remember to treat each set of statements independently. PART I Achievement Scale 1 Relays to fellow employees a feeling of genuine interest and understanding. Mature in a way that can only come with time, living with, and understanding people. Greets his fellow officers with a smile and pleasant remarks. Questions all things he/she doesn't understand. 3 Does not show his temper at all during working hours. Complaint arrests are far above the norm. Able to look up and cite court decisions pertinent to his pending cases. A good photographer. 3. Has the ability to take the initiative and originate projects and see that they are carried out. Even tempered. Leaves his personal problems at home. 61 Loyal to his fellow workers. §. Well versed in the departmental rules and regulations. Reports are neat, thorough, accurate, and "on time." Never does he/she belittle other departments. At no time does he/she permit anyone into a patrol car without checking the person for possible weapons. 2 Accepts group decisions without necessarily agreeing. Has the ability to make other members of the department "want" to follow his example. Very outgoing and truly likes people. His aggressiveness prompts supervisors to recommend duties with additional responsibility. 10 Strives to maintain a steady and well rounded performance in police work. Resourceful and imaginative in his investigation. Able to evaluate a situation easily. No force is ever used except as a last resort. 11 Socializes with persons other than police officers. Criminal investigations are a personal challenge. 62 Always sets an example which others endeavor to follow. 3 Has a friendly disposition. Never practices racial discrimination. Thoughtful. Can work with anyone at the post without any ill feelings on the part of either side. 2 An excellent organizer and planner. Realizes that traffic enforcement on the highway is a serious matter. Makes every effort to instill pride in younger officers. .9 Exhibits superior driving habits. Radiates a happy, friendly, enthusiastic attitude in his work. Never refuses c00peration or assistance from others. Has a pleasing personality and an even disposition that is con- tagious to persons around him. 7 Has respect for his fellow officers and command personnel. Thoughtful and considerate to his family. Respects danger and will not unnecessarily jeopardize his life or the lives of others. 63 A firm believer in giving verbal warnings and not always a traffic citation. Has an above average knowledge of all areas throughout the department. 12 Makes good informative reports that can be easily followed up by others if the need arises. Writes a report that you can read and know just what has taken place. Always listens to both sides of an issue before making a decision. As concerned with charging the right person as with making the arrest . _1_2 Takes constructive criticism well. Not one to gossip. In the deliverance of a death message, he treats the victim's relatives as if they were his own. Does not have any false fronts with officers or fellow troopers. 14 His patrol arrests are always high and of good quality.l Spends a great deal of time studying the law and criminal investiga— tion which makes him/her more effective. Treats the junior officers as equals. Does not seem to spend extra time on those complaints which do not warrant the extra time. 64 l§_ Does a good job of counseling his subordinates. Always gives the citizen the benefit of the doubt. Continually striving to be the best. Can smile and listen to a citizen's story. 16 Always ready and willing to assist other officers who approach him/her for guidance. Instills confidence in each of the younger officers he trains. Has excellent hygiene. Works just as hard to clear as to convict a suspect. .11 Follows orders to an exact point. His traffic work is more than fair to the violator. Makes it a point to stop in at other departments and visit from time to time. A good speller and usually uses the dictionary when in doubt. _lé Alert and aggressive as an investigative and patrol officer. His approach to the public is personable and polite. Always makes good arrests and obtains useful information due to his/her conscientious patrol efforts and intelligent inquiries. Cordial and fair with the people he/she deals with. 65 19 Has the ability to command the reSpect of both junior and senior officers. Demonstrates initiative and perseverance. Does not display the attitude of being better because he/she is a state police officer. Treats all people with dignity. 20 Careful and serious in the preparation of reports. Careful with his reports, makes them in detail, so they are of full value when used at a later time. Treats members of other police organizations as fellow police officers. A safe driver. 21 Productive in all areas of assigned work. Wants to continually improve his knowledge of new police tech- niques and policies. Respects the opinions of others. Has an ability to communicate with everyone. 22 Makes good contacts with both the general public and public officials. 66 Leaves a very good impression of the department with the younger generation. Does not accept or solicit gifts or services from the public. Knows the criminal element in the post area. a Knows when to speak and when not to speak. Makes decisions promptly, but not hastily. Does not act upon impulse. Writes quality traffic summons rather than quantity. 1‘: At accident scenes, he/she seems to know at once what is needed. An excellent interrogator of both suspects and witnesses. When handling an investigation, he/she carefully organizes a case. Knows hot to Operate instruments and equipment related to his work. 25 Treats the public, other departments, courts, news media, etc., with respect and gains their respect in return. His approach to the public is personable and polite. Very familiar with the area in which he/she works. Uses restraint, instead of force, when possible. 67 Don-imam oi Sun Polic- - ”"00va Division ACH|EVEMENT 8. DEVELOPMENT INVENTORY Rewome Form - Pan I ..., [Ti—Hi H l 1 ll 1.27.3.”‘Fi Ll [J lui‘lllT] mi 1 111mm; 1 1 1 I Hg] Cnlmmcmoau r] 1 '1 D So: D that] Post Numb" ED Desaibu's Name Last Civil Service Classification ED _ Fim M. PART I ACHIEVEMENT SCALE 1 2 3 4 .fiéfid .DDDD .DDDD .DDDD .DDDD .DDDD .DDDD nDDDD .DDDD Dumbo": SOWC'W. Phase No. 2. DUDE] s. DUDE] . DUDE] ... DUDE] ... DUDE] 11. DUDE] .0. BEDS ..gL—JDDC] .fififid' .DDDD .DDDD .DDDD .DDDD ".DDDD 2.. EDGE] ... DUDE] NOTE: 68 Department of State Police Personnel Division (1) Select two phrases which best describe the trooper. (2) Remember to treat each set of statements independently. PART I Achievement Scale 1 Usually prompt in answering complaints. Excellent interrogator of bothsuspects and witnesses. When conducting an investigation, the case is organized carefully. Knows how to operate instruments and equipment related to his/her work. 2 Relays to fellow employees a feeling of genuine interest and understanding. Mature in a way that comes with time, living with, and under- standing people. Greets fellow officers with a smile and pleasant remarks. Questions whatever he/she does not understand. .3. Has the ability to initiate projects and see that they are carried out. Even tempered. Leaves personal problems at home. Normally sets an example which others endeavor to follow. 69 5*. Makes informative reports that can be easily followed up by others if the need arises. Careful and serious in report preparation. Always listens to both sides before making a decision. As concerned with crime prevention as criminal apprehension. .2 Accepts group decisions without necessarily agreeing. An inspiration to other members of the department. Very outgoing and truly likes people. His/her aggressiveness prompts supervisors to recommend duties with additional responsibility. 6_ Does not become involved in compromising situations. Knowledge of working area and use of informants is exceptional. When issued equipment needs repair or replacement, it receives immediate attention. Does well in all aspects of law enforcement. 1 Is a doer - not a talker. Careful with reports, prepares them in detail, so they are of full value when used at a later time. Treats members of other police organizations as fellow police officers. Never officious; does not look down on others. 70 § Has respect for fellow officers and command personnel. Thoughtful and considerate to his/her family. Respects danger and does not unnecessarily jeopardize his life or the lives of others. Loyal to fellow workers. .9. Productive in all assigned work. Continuously tries to improve his/her knowledge of new police techniques and policies. Respects the opinions of others. Has the ability to communicate with everyone. 10 Excellent organizer and planner. Views traffic enforcement on the highway as a serious matter. Tries to instill pride in younger officers. Does not use force except as a last resort. 11 Makes informative reports that can easily be followed up by others if necessary. When in doubt, consults appropriate source for correct procedure. Keeps the firearms which are assigned clean and in proper working order. Listens to both sides before making a decision. our 4" -.-‘-n I... m. .—\I ‘ i- ‘5‘ 71 12 Makes good contacts with both the general public and public officials. Leaves a very good impression of the department with the younger generation. Does not accept or solicit gifts or services from the public. Knows the criminal element in the post area. 13 Strives to maintain a steady and well rounded performance in police work. Resourceful and imaginative in his/her investigations. Able to evaluate a situation easily. Force is only used as a last resort. _li Does a good job of counseling subordinates. Usually gives the citizen the benefit of the doubt. Continually striving to be the best. Smiles, and listens to citizens' complaints. 15 Treats the public, other departments, courts, news media, etc., with respect and gains their respect in return. Approach to the public is personable and polite. Very familiar with the work area. 72 Uses retraint, instead of force, when possible. 16_ Knows the departmental rules and regulations very well. Reports are neat, thorough, accurate, and "on time." Does not belittle other departments. Does not permit anyone to enter a patrol car without check- ing for possible weapons. 11 Commands the respect of both junior and senior officers. Demonstrates initiative and perseverance. Does not have an arrogant attitude because he/she is a state police officer. Treats all people with dignity. lé Knows when to speak and when not to speak. Makes decisions promptly, but thoughtfully. Usually does not act upon impulse. Writes quality traffic summons rather than quantity. .1: Has a friendly disposition. Never practices racial discrimination. Thoughtful. Can work with anyone at the post without arousing ill feelings. 73 .22 Aware of recent supreme court decisions. Respects the opinions of others. Takes advantage of resource materials throughout the department. Reports convey meaning without using excessive language. 714 NH" ’-" ‘ o I 0.5mm“! 0! Sun PoIiN - Nu IonMI Oimion ACHIEVEMENT a DilVELOWJHIT INVENTORY Response Faun — [um I .Zfijrgwliwlflumhtc l 1 Hum ‘I-E‘ I 1 [—] . . ‘. {12ml} 1 l I 11$] IFTIDFH [1114111].le Eniiumcmoaw L I 1 l [J J 801: C] ‘RmD : MtNumIm. ED Ocsaibu‘; Name tut - . . Hm _ M. CiviI Service Chailicuion CE] up. ‘ PART! , ACHiEVEMENT SCALE mm..annn\ .maas,x.aaaa ' .DDDD -DDDD'].DDDD .DDDD .DDDD]_.DDUD .DDDD .DDDD .DDDD .DDDD .DDDD-ngDDD .DDDD .DDDD ‘mDDDD .DDDD -DDDD ‘h‘stulr"| 31mm" ‘ 03R Canary Copy - l‘wsmsr-vlnwlausrs Whlt. Cr-svv - 'u.‘ '0 u; Ila] P..;;;.'.;~.m 'auk C \py ~~ Unlh I Head‘s-is Is". ACHIEVEMENT AND DEVELOPMENT INVENTORY 75 '“O-— c - Dana-Imam a! State Pulira - Personnel Division To“ -.-...— . ..‘v- Part II -— Troop” Dcvclounmnt and Counseling Guitle 0... TM; ls‘ “ms tori-rm Is'alhu‘l'l'" pun-r Nurse pro '1 fr: mly It" i-n u‘rrflnc. m J- t" l I. \a_—. u ‘- . s a' ‘o‘ 'LE«°.‘ J» m“. - ‘- a .n. .. .. .. _ SECTION I - Using the lullowuig scale. circle the one that but ticsCribcs the trooper Ior each statement: . PP.‘ eggpus ...-add“ 9‘99??? Exactly like this 2 Very much like this 3 Somewhat like this 4 Not very much like this Ooesaaoorllou oIcounselingIcllowttOODlltS. ......... ......... ..... ..... Work habits are excellent. ................................................. ................... ..... . ........ Is a valuable member ol the State Poliar. .......................... . ............... ..... .. ..... Completes assignments without detailed supervision. ., .............................. . ........... - Knowledge oI the job is excellent. ................................................................. . ......... Hes shown consistent impmvcment and is interested in further improvement. ............ ....... . ...... Maintains a proper state ol physical litni:ss. ....................... . ....... . ................ . ....... Shows good judgment in exercising his duties and responsibilities. . ........... .. ...... ..... ...... Accepts assignments willingly. . ............................................... . ........ Gets along wsll with others and works well as a team member. ...... Helps maintain high morale among Icllow troopers. ........... ........ ......... ..... Contacts with the public are vwil received. ............................. .......................... . ...... Participates in community/cuti:en activitii-s. ...................... . ....... . ......... ........ ..... . ......... Reports are neat, clearly Mitten. and to the point. ....................... Closely adheres to the d-‘partmcut's and post's rules and regulations. . .................................... 5 NO! I! III Iikt Ilus ‘d‘dd‘ d‘d ”NNQNNMNMNNUQN 2 UUUUUUUU'UUUUUUU zbbbbb&b.b.bbhbb WOU’OU’IOOOOOMGOU‘O I l l l 76 POST COMMANDER RANKING FORM STATION 14 Aaron, Tpr. E. Brown, Tpr. R. Butler, Tpr. 0. Charge, Tpr. F. Dale, Tpr. L. Flowers, Tpr. C., Jr. Kranter, Tpr. A. Nowles, Tpr. L. Peters, Tpr. A. Ronte, Tpr. E., Jr. Stall, Tpr. H. 77 DEPARTMENT OF STATE POLICE ACHIEVEMENT AND DEVELOPMENT INVENTORY PEER EVALUATION FORM‘ Performs in the top 10% of the Troopers at this Post: Name Performs in the next 20% of the Troopers at this Post: Name Performs in the middle 40% of the Troopers at this Post: Name Name Name Name Performs in the next 20% of the Troopers at this Post: Name Performs in the bottom 10% of the Troopers at this Post: Name APPENDIX B: FINAL CLUSTER SOLUTIONS 79 oo~ um um ecu an as ac oh Nb we «a on we us do Nh an an no mm am am an mm No hm cw No we no no on no we a~ he an no an mm mm «m «On Mon Oh wow mo NA «a as me on cm as we cos we as we on as we so an mm an an an on an we mm mm an mm km mm on an an on me co Nn an no we an an an an mm mm on no mm an me. ms nu an . we as an nn ma "a no on non n on a