A STUDY 0F APPRAISAL ME‘I'l-EODOLOGYg THE EFFECT OF THE COORDINATOR EN APPRAISAL < Th‘zfilfigifor fiiméfDogruvb‘ p‘yD. ' MICHIGAN. mm vNIvERsm- CENorman'Frisbey 7 V ' ;' ,g ;71958{L "‘ ‘ KHEmS This is to certify that the thesis entitled A Study of Appraisal Methodology: The Effect of the Coordinator in Appraisal presented bg Norman Frisbey has been accepted towards fulfillment of the requirements for Ph.D. degree inifilQILQlDEy Z/f./og;y/ Major professor Date Februqry 1'7. 1958 0-169 A STUDY OF APPRAISAL METHODOLOGY: THE EFFECT OF THE COORDINATOR IN APPRAISAL BY NORMAN FRISBEY A THESIS Submitted to the School for Advanced Graduate Studies of Michigan State University of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1958 L/r/7-jW’ 9.5’qll5‘ ACKNOWLEDGMENTS The writer wishes to express his sincere thanks to Dr. Carl F. Frost, major professor, for the guidance and valuable help he provided during the program Of study and the course of this research. Each of the author's Guidance Committee members, Drs. D. M. Johnson, M. Ray Denny, H. C. Smith, and E. H. Jacobson, have made suggestions and contributions for which he is grateful. The writer appreciates the Opportunity provided by Chrysler Corporation to conduct this investigation within its Operation. Special thanks are due Mr. Wayne E. Grimm, Director of Management DevelOpment, Chrysler Central Personnel Staff, the person responsible for providing the Opportunity to do the research and for making the data available for use in this study. In addition, Mr. Grimm gave direct assistance by providing the eXperimenter with the personal time required to do the work. Dr. Edwin F. Harris, Research Assistant, Chrysler Central Personnel Staff, showed a personal interest in the problem and gave advice on technical and Operational matters. The writer is grateful for the assistant provided by jDr. Harris to help collect the data. ii L f' . ' L -Y \/ L. k - K (1 _ O C {1 r'" ‘l x. L A t .z' - D A K F v 7*. I ' _ g a x v \_ g ‘-I f". \ '«_.' « » J L. c - . l 1'1 7'.) The writer is indebted to Dr. John Versace, Chrysler Engineering.Division, for his advice on the statistical problems. Thanks go to the Personnel Staff at the Jet Engine Plant for their cOOperation and help during the collection of the data. It is impossible to fully express my gratitude and thanks to my wife, Ardeth, for typing this manuscript and for her encouragement and help throughout the period of graduate school and during the time of this research. iii Norman Frisbey candidate for the degree of Doctor of PhilOSOphy Final Examination: February 19, 1958, 2:00 P.M. Dissertation: A Study of Appraisal Methodology: The Effect of the Coordinator in Appraisal Outline of Studies: Degree Major Minor Granted B.S. Mathematics Psychology March, 1950 M.A. Psychology Statistics August, 1952 Ph.D. PsycholOgy March, 1958 Biographical Items: Born, June 8, 1920, Ferndale, Michigan Undergraduate Studies, Michigan State College, l9h6-l950 Graduate Studies, Michigan State University, 1950-1958 Experience: Part-time Instructor, Michigan State College Spring and Fall - 1954 Research Specialist, Chrysler Corporation Central Personnel Staff — 1955—1958 Membership: American Psychological Association Midwestern Psychological Association Michigan Psychological Association iv k4 (Q x. , ._. ,L k A STUDY OF APPRAISAL METHODOLOGY: THE EFFECT OF THE COORDINATOR IN APPRAISAL BY NORMAN FRISBEY AN ABSTRACT Submitted to the School for Advanced Graduate Studies of Michigan State University of Agriculture and Applied Science in partial fulfillment Of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1958 ./\ //2{/ ’ ’7' // , Approved ( (CL/7% \V/ . éw/ l A __‘\/ l l ._ _ . - _ - - - ‘JV fl“. m—- Norman Frisbey Sixty-four first line supervisors, in an industrial plant, each rated or appraised the job performance Of three subordinates. Two methods of administration, two coordinators, and two locations were incorporated into a factorial design. ZMethOd Of administration, which involved comparing coordi- nated with non-coordinated appraiSals, was the variable of primary interest. The coordinator was a personnel staff person who conducted the appraisals by questioning the appraiser and recording the responses in a modified Field Review type interview. The Objectives of appraisal were: (1) evaluation of present performance, and (2) planning for individual improvement. It was predicted that the coordinated appraisals would be superior to the non-coordinated in meeting these aims. ' The relative merits of the methods as a system of evaluation were inferred by comparing the treatment groups on secondary criteria of a rating method. It was predicted that the ratings of the coordinated appraisals would be improved by increased discrimination, reduced leniency, reduced halo, increased coverage, and increased comparan bility between ratings. The findings did not support the above predictions. NO significant differences were found between the methods, coordinators or locations on these factors. It was concluded that the coordinator did not improve the effectiveness of the appraisal as a rating instrument. vi _\ .O .1. _ O .\ t. . 1 L .l‘ . , tlw . x i z . ex L o.. . .Ll ”\v . F... v . .4 r _L x. i\ \ t ‘ L t l. A. .J \' 4\ ' o _‘\ Q ‘ A I ._k F\ Vl/t \ .\ rt i i \ _\‘ /t .x l _ . u . . ..L 7 _\ ‘5 i L l, (x it x \ . _ ._ \J ., A it \ x i\ n\ O .\ ‘L .k . ,K _L _ ‘ i. t-\ M ~l _\ p . x, .\ _ , . k . t o , . O rt _ . o _ JU Norman Frisbey Immediate criteria used to evaluate the appraisal as a develOpment instrument were established from an examination of its function in the development procedure. Subsequent to the appraisal session the information recorded on the form 'was to be used for (1) a review by the appraiser's supervisor, (2) a performance interview with the employee, (3) training and development of the subordinate, and (h) follownup on action by the coordinator. The areas of the form important to the above steps were the supporting facts and performance summary which contained sections for a summary of individual performance strengths, development needs and develOpment plans. It was predicted that the coordinated appraisals would be superior to the non-coordinated in terms of the quantity and quality of reaponses recorded in these sections. The findings did support these predictions. The supporting facts of the coordinated appraisals contained a greater amount Of information and were more descriptive of Specific performance than those of the non-coordinated group. The non~coordinated group of appraisals contained a greater number of appraisals with Sections 1, 2, and 3 of the performance summary omitted. The performance summaries of the coordinated appraisals contained a larger number of performance strengths, develOpment needs, and methods of handling development needs. The develOpment needs for the coordinated group were more frequently related to the supporting facts. The coordinated appraisals more frequently vii m C) Norman Frisbey placed responsibility for development action on the super- visor and more frequently mentioned on-the-Job coaching as a method of development than the non-coordinated appraisals.- It was concluded that the coordinator did effect an improvement on the appraisals as an instrument for develOpment in terms of the procedure outlined and the criteria used for evaluation. In fact, it can be said that the coordinator plays an essential role and makes a significant contribution to the develOpment procedure. viii M \1 k . y TABLE OF CONTENTS ACKNOWLEDGMENTS . . . . . . . . . . . . . VITA. . . . . . . . . . . . . . . . . . . ABSTRACT. . . . . . . . . . . . . . . . . LIST OF TABLES. . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . INTRODUCTION. . . . . . . . . General Background . . . . . . . . . . Chrysler Appraisal System. . . . . . . Field R6V16W MethOd. e o o o o o o o o Pertinent Research . . . . . . . . . . Rationale. 0 e e e e e e e e e e o e 0 Criteria of Evaluation Method. . . . . Reliability 0 e e e e e e e e e e e Validity. e e o e e o o e e o e e e Discrimination or Spread. . . . . . Leniency. e e e o o e e e e e e e e H310e e e e e e e e e e e e e e e 0 Criteria Of a DevelOpment Method . . . Purpose and Criteria of Present Study. STATEMENT OF PROBLEM. . . . . . . . . . . 00.000.00.00. 0 Evaluative Aspects e e e e e o e e e e 0 Development Aspects. . O O 0 O 0 .EXPERIMENTAL PROCEDURE. Papulation e e o o e e o e e o e e e e D9818n o e e e e e o e e o e e o e o 0 Sample Selection 0 e e o e e e e o e e Instrument o o e e o e e e e e o e e e Appraiser Training . . . . . . . . . . COOPdiDEtOPS e e o o e e o o e e e e 0 Administration 0 e o o e e e o e o o 0 Data Collection. 0 o o o o e o c e e e Coding e e e e o e e e e e e e e e e o StfltiStical TOOlSe e o e o e e e e e 0 ix 0 00.000.00.00. 0 000.00.... 0.0.00.000000 O O Page 11 iv xi xiv _l:_L \. I O U 9 3 t. 9 1 0 Q 0 v . 0.. i O O .0! OD. .90 OD. 0.3 .9. 09! $0. 0.0 .IR "R III r\ y \H, kk bin TABLE OF CONTENTS Continued Page RESULTS AND DISCUSSION. . . . . . o . . o . . . . . . . uh Evaluative Aspects e e e e e e o e e o e e e e o e 0 AS Discrimination or Spread. 0 e e e e e e o o e o 0 us Leniency. o e e . e e e e e e e e e e e e e e o e #7 Halo Effect 0 e e o e e e o o o o e e o e o e o e 50 Coverago. e e e e e e e e e e e e e e o e e e e e 52 Comparability between Ratings . . . . . . o . . . 53 Item correlations o e o e o e e e e o e o e e o e 56 DevelOpment Aspects. e e e e e e e e e e e e e o e e 63 Supporting Facts. 0 e e e e e e e e e e e o e o e 65 Performance Summary . . . . . . . . . . . . . . . 88 Operational Aspects. e o e e e o e o e e o o e o e e 103 GENERAL DISCUSSION. . . o . . . . . . . . . . . . . . . 108 SUMMARY AND CONCLUSIONS . . . . . . . . . . . . . . . . 117 APPENDIX. . . e . . . . . . . . . . . . o . . . . . . . 121 BIBLIOGRAPHY. . . . . . . . . . . o . . . . . . . . . . 129 ‘_..L LIST OF TABLES Table Page 1 Comparison of overall rating distributions for the coordinated and nonncoordinated groups. as 2 Analysis of variance of the standard deviations of overall ratings for each appraiser . . . . 0 he 3 Analysis Of variance Of the mean overall ratings for appraisers. . . . . . . . . . . . . M9 h. Analysis of variance Of the mean of the standard deviations of item ratings . . . . . . 51 5 Analysis of variance of the mean proportion of identical ratings used by an appraiser . . . 52 6 Analysis Of variance of the mean number Of "Don't Know" or omitted responses per appraiser 5h 7 Correlation between rank score ratings and mean ratings and between rank score ratings and overall ratings 0 e e e e e e e e e o e e o e e 55 8 Correlations between items, overan.judgments, and potential ratings . . . . . . . . . . . . 57-58 9 Comparison between coordinated and nonp coordinated groups on the number of not significant correlations of item ratings With potential ratings. 0 e o e e e o e e o o 59 10 Distributions of overall ratings for first level supervisors previous to study . . . . . 61 11 Distributions of overall ratings for coordinated and non-coordinated groups from present StUdy o e e o e o e e c e e e e e o o 61 12 Comparison of overall distribution of Dodge with distribution of coordinated group from present study 9 e e e e e e e e e e e e e e e 62 13 Analysis of variance of the mean number Of words per appraisal o e e e e o e e e e e e o 67 1h Analysis Of variance of the mean number of major thoughts per appraisal. . . . . . . . . . . . 69 xi .3“; 9 I u' f7 F O \i: O O _ K ~ I I f‘\ f- . O I I D Q I 0 .—~ /~ "- .k’... g h 0 O l.) L, 9 O ‘ Q \’\.. O O \. C 3 t. 0 o O f O ' C) e ', \ 0 "V e U k.) _,k \ \ x (W Table 15 16 17 18 19 20 21 22 23 21+ 25 26 27 28 29 LIST OF TABLES Continued Analysis of variance of the mean number of miner thoughts per appraisal. . . . . . . . . . Analysis of variance of Scale I mean values. How complete or explicit is the answer? . . . . Analysis of variance of the mean number Of dimensions or criteria per appraisal. . . . . . Analysis of variance of Scale II mean values. Kind Of information or evidence . . . . . . . . Analysis of variance of mean scores from expert evaluation, on amount and specificity of information e o e e e o e e e e e e e e o e e 0 Correlation between overall ratings Of the appraisers and ratings estimated from the item content. c o e e e e e e e e e e e e e e 0 Analysis of variance of Scale III mean values. Tone or affect o e e e e e e o e e e 0 Analysis of variance of the mean number of l and 2 values for Scale III. . . . . . . . . . Analysis of variance of the mean number of h and 5 values for Scale III. . . . . . . . . . Analysis of variance Of the mean number of extreme or dogmatic words . . . . . . . . . . . Analysis of variance of the mean number of qualifyingWOPdSoeoooooeoeooeeee Comparison Of coordinated and non-coordinated groups for responses to Sections 1, 2, and 3 Of Part II. o e o o o e o e e o e e e e e o e 0 Analysis of variance of the number of performance strengths, Section 1. . . . . . . . Analysis of variance of the number Of development needs, Section 2. . . . . . . . . . Comparison between coordinated and non- coordinated groups on categories of develOpment needs 0 e o e e e e o e e e e e e e e e e e e e xii Page 70 72 7A 75 78 80 82 83 81L 86 87 90 92 93 95 O O . . ‘ 4 1* O I O O n O \1 9 C I Q ‘ ‘ O ‘ O 0 O Q \- K O O O 0 D I‘ (A‘ I O ,_ x .. p g o I 0 LP 0 a o _ L» If 0 C o 0 Q I P t I I O l O O - n x \J a '\ a . L .- i c r O O O l——; \J v\ LJU P I‘\ ,_ I \ u ufl '- J L - __ i I- ° r -_ _ C v, \‘U V » L U 0 r - t :7 l N \_‘ \ \H‘ ‘\ \. , A x ‘ ls _ \ K x ‘ I, x, _ k- + K K a_ ' \ ‘ .- —« k C ‘ LIST OF TABLES Continued Table Page 30 Analysis of variance of the number of methods of handling develOpment needs . e . . . . . e . 97 31 Comparisons between coordinated and non- coordinated and between plant and office in terms of person primarily responsible for initiation of develOpment action. . . . . . . e 98 32 Comparisons between coordinated and non- coordinated and between plant and office in terms of stated methods of handling development needs e e e e e e e e e e e o e e e 100 33 Analysis of variance of sort of develOpment plans according to specificity-generality . . . 102 3h Average time spent completing an appraisal. o . lOu 35 Analysis of variance of the average time in minutes per appraisal for each appraiser. . e e 105 xiii (if, : .7 _ . C .. ‘1. \ L x . . __ \ . F\ v . L; V . ,.) . . - K . . . . _ _ U _ - . - x . x ' l \ ~ , ‘ \U ,— r‘ A . ‘ \ K 4 x \ H \ LIST OF FIGURES Figure Page I Experimental design . . . . . . . . . e . . . 35 II Prescribed guide for word and thought counts. 66 III Scale I: Definition and categories. . e . . e 71 IV Examples of criteria or dimensions of performance e e e e e e e e e e e e e e e e e 73 V Scale II: Definition and categories . . . . . 73 VI Directions and categories for eXpert evaluation of item content. . . . . e e . . . 77 VII Scale III: Definition and categories. . . . . 81 VIII Examples of so—called dogmatic and qualifying'words. o e e e e e e e e o e e e e 85 IX Directions and categories for sorting of development plans 9 e e e e e e e e e e e e e 101 X Listing of departments in the sample by function and group size . . . . e . e e o 122~123 XI Appraisal instrument. . . . . . . . . . . 12h.126 XII Instructions for appraisal. . e . e e . 0 127-128 xiv ‘4— 1-4) ._.\. r I O (J r —.J O F L- ,\ k ( INTRODUCTION General Background Rating methods, which are today among the recognized means of measuring conduct or behavior are almost as old as any experimental methods in psychology. The beginnings of rating methods can be traced back to the work of Fechner and others in psychOphysics. In fact, the method of paired comparisons, a much used technique today, was develOped from Fechner's method of impression in l89h. Symonds (35) stated that the first rating scale, in its modern sense, was that published by Galton in 1883. This scale even today may be considered a model of its kind° One significant develOpment of rating was the man-to— man principle develOped by'Walter Dill Scott of the Carnegie Institute of Technology during World War I. The Army Rating Scale using this principle was first framed by Scott in May, 1917. Soon after this, in 1920, the true graphic scale was develOped by the Scott Company Laboratory. A more complete historical treatment of rating scale developments can be found in Symonds (35). Rating methods have been widely used for a number of years both for purposes of investigation and research and in practical ways to evaluate personnel in industry, school, armed forces, and civil service. A bibliography of merit O I V _ '\ A? \l ; \JLU - k k k \- '\l -(j\ _._ L \Jh \n \J \JL/V- \. \./ . (, x. 1 ‘4, \. r“ a cl \ c _ uuk k.‘ ‘ k, — ' \4’\ L x. C O i I gL i ‘ 1 w;.__ \ Ax ‘ 2t _.\_ «/C C ‘ L; \ t C n V Q) L} o r a: L. '. n u U U . x- ', 1 xv; 119 ’ I \_/ L.. V‘ \_, \ l x - . p . f‘ \. .1 P \.J F‘ . l \_/ \ , e _ - ; f) a. \ , e C, I; r ,- - 1-50.; ‘_L - a) ' J V- k ‘ x I _ (‘\ ‘ ‘ v O t c k \ I '\ \ 5 7! - I \x \_ A . \ f I i \. | {‘9 l \ V l V ,L \ , - x, \. \ v. _ k s -‘ I ,K‘ V, U \ I \ L) L \ \ 2 rating by Mahler (23) covering the years 1926~l9h6 contains approximately 500 references. The recent work of psycholo- gists and personnel research persons since‘World war II has been characterized by an increasing number of articles and studies concerned with the problems of rating. Although much has been written and many innovations suggested, some of the basic problems which faced the early researchers still remain essentially unsolved. For thorough coverage of the various types of rating techniques, purposes of rating, pitfalls, advantages and disadvantages of rating, the reader is referred to industrial psychology texts such as Bellows (2) and Tiffin (hO) or general tests such as Poffenberger (26) and Symonds (35). Two practical handbooks of rating are Dooher and Marquis (6) and Smyth and Murphy (29). In addition, information can.be found in the general management type periodical such as Personnel and Personnel Journal. Only the general background material pertinent to the method under investigation has been presented by the investigator. Chrysler Appraisal System It was the purpose of this study to investigate experi- mentally a rating or appraisal method which was devised to meet some of the short-comings common to rating systems. This particular method is the appraisal system used by the Chrysler Corporation as part of its Management Development Program. The commonly stated purposes of an appraisal or merit rating program are the following: 1. 2. 3. Promotion or transfer. Management is able to identify persons capable of greater responsibility or with skills needed for other positions. Employee improvement. The strong and weak points of employees are identified so that both.manage~ ment and the employee can direct their efforts toward the develOpment of skills for increased efficiency. Research. The identification of better and poorer groups of employees serves as criteria for the validation of selection procedures. The aims of the Chrysler Management Development Program are consistent with those above and broadly stated are to improve managerial competency and to build a reserve of wellntrained management personnel. This is accomplished through an evaluation of present performance and planned develOpment action on an individual basis. The appraisal and development of management persons includes the following steps: 1. 2. 3. 14.. Performance appraisgl of the management person by his immediate superior Appraisal review by the appraiser's immediate superior Performance interview with the subordinate appraised Training and development of the subordinate. k.‘ ~ The focus of attention for this study was directed toward the performance appraisal, and more specifically, the appraisal session itself. The action indicated by the steps following the appraisal or the utilization of the appraisal information is dependent upon the appraisal session. The management appraisal instrument is a five point graphic scale containing 35 to no items. The items vary somewhat in wording from plant to plant, but the content is essentially the same. The items represent managerial responsibilities which are characteristic of a level of supervision rather than specific positions. These items are arranged under four general areas of responsibility: 1. Planning and directing Operations 2. Maintaining a working force 3. Controlling costs h. Organizational relationships. Prior to the assignment of a rating evidence of per- formance pertinent to the item is recorded in a space provided and labeled "Supporting Facts". After rating the entire group of items, the rater assigns an overall perform- ance rating in terms of the same five-category scale as used for the items. This overall rating is Judgmental and does not represent any quantitative combination of item ratings. The next step in the appraisal process is a ”Performance Summary". Here the supporting facts for some of the higher ratings are listed as "Strengths” of the rates. Likewise, 1 5 some of the evidence for lower ratings is listed and called "DevelOpment Needs". A third section in the summary is titled ”Development Plans". In the latter section develOp- ment plans or recommended courses of action are suggested to help meet the needs previously mentioned. The distinctive aspect of the appraisal session is that a personnel staff person assists the rater or supervisor in the appraisal of each of his subordinates. This staff man is called a.Management Development Coordinator. The function of the coordinator in the appraisal situ. ation itself is crucial to subsequent phases of activity. With the appraisal instrument as a guide, the coordinator interviews the appraiser to assist him in the evaluation, performance summary and development planning for each sub- ordinate. The role of the coordinator in the appraisal session is outlined briefly as follows: 1. To conduct the appraisal interview and record the ratings and supporting facts 2. To interpret the items and to clarify as to specific application 3. To ask pertinent probe questions which assist in complete coverage h. To assure that the ratings and facts represent typical rather than infrequent behavior 5. To encourage a spread in the ratings by distinguish- ing between degrees of performance 6. To encourage that the items be considered as distinct from one another 7. To assist in a summary of performance 8. To guide in determination of an overall rating and potential rating 9. To assist in outlining a sound and attainable individual development plan. The coordinator is a person thoroughly acquainted with the appraisal system. His participation in the appraisal session is expected to increase: 1. The rater's understanding of the procedures and materials 2. The objectivity or clarity of judgments 3. The coverage of pertinent information h. The uniformity of application from rater to rater. There are about sixty persons acting as full-time coordinators in Chrysler where the management force totals approximately twelve thousand. It is difficult to determine how extensively the coordinator concept is being used by other companies. The usual surveys of rating methods such as those by Spicer (30) and Benjamin (3) do not give information of this type. Surveys do not usually indicate procedures unique to one or two respondents in the sample. The investigator has personal knowledge that a large manufacturer of electronic products uses coordinators in its 7 management develOpment program. This company's manual stated that the coordinator acts as the secretary "recording the remarks which the supervisor wishes to make and answering any questions which he may have regarding the use of the .form'. He also assists on the summaries and conducts the reviews with the personnel supervisor and head of the activity involved. However, it is suggested that after eXperience, the supervisor make the appraisals alone and that the coordi- nator go over the forms with him subsequent to rating. General Mills, Inc. follows this practice. Balch stated, 'We use a trained interviewer to question each appraiser individually concerning a man and to record the appraiser's Opinions. The inter- viewers frequently are personnel men, but some of the most successful have been operating men who were trained and used for short periods on this work. (1, p. 12) The Dartnell Corporation materials develOped by Robert N. McMurry and Company, a management consulting firm, suggested a Patterned Merit Review. This is conducted by a personnel department or home office representative. However, their manual stated, "The interviewer should never interject his own views. He is not evaluating the employee; he is merely providing the supervisor with a frame of reference” (21, p. h). The above programs appear to use a coordinator in the appraisal session in a way similar to that of Chrysler. All obtain supporting information for each item rated, summarize strengths and weaknesses and obtain an overall rating. ,1 Field Review Method Historically the notion of a coordinated appraisal appears to be derived from the Field Review Method of Employee Valuation and Internal Placement. 'Wadsworth (M1), in a series of six articles, presented the rationale and procedure for use of this method of evaluation. He indie cated that this procedure was first published in manual form by the Army Service Forces (ASF M213, May 19h5) from his pen. ‘Wadsworth further stated the method itself was an outgrowth of extended employee and internal placement research carried on initially in Southern California Gas Company and Southern Counties Gas Company of California in the 1930's. As outlined by'Wadsworth the Field Review Method "is essentially a program of planned supervisor contacts under- taken to make the employee evaluation and placement job effective, and includes everything that ordinarily transpires in these contacts" (hl, p. 103). He proposed that the job of the personnel department includes "guidance of the super- visors in the handling of their responsibility for assignment of Jobs, evaluation of employee performance and personnel planning" (hl, p. 99). Specifically, in regard to employee evaluationflfladsworth stated the following: Supervisor's impression of an employee is seldom develOped as precisely as should be desired without some prompting. His Opinion.may, in actuality, be based only upon casual impressions, colored in some degree by his personal reactions to the employee as an individual. . . . That H 'x/ our ideas can.be wrong is frequently demon- strated when we are called upon to say explicitly where, when, and under what circumstances a given impression has developed. Even the most fair-minded supervisor is not very likely to check himself, or to look for evidence which might revise his epinion of an employee, unless some- thing happens to prompt him to do so. (#1. P. 135) ‘Uadsworth preposed a pattern of inquiry and questioning which begins with an overall evaluation of performance on a three point scale. This rating is followed by a series of suggested probe questions designed to elicit evidence in support of the rating. Shaeffer, in a general discussion of merit rating plans which included a comparison of methods, stated that "the field review rating procedure is by far the most flexible, comprehensive, and practical rating procedure" (27, p. 698). Balch, in an article mentioned previously, indicated he had recently heard WHdlteriMahler, psychologist and consultant, present a research report on the relative effectiveness of various appraisal methods in which the field review method scored the highest for over—all effective- ness” (1, p. 13). The Chrysler appraisal system appears to deviate from the field review method as outlined by the above authors in the following respects: 1. Items based on job responsibilities are rated individually 2. Supporting facts are elicited before a rating is obtained I1 ”.7 (‘\ 10 3. The rating is based on a five point graphic scale h. The overall performance rating is obtained subse— quent to item ratings. Pertinent Research A search of the literature by the investigator did not reveal any studies that have been published regarding the effect of the coordinator upon the appraisal. Two studies dealing with supervised ratings which appeared to be relevant to the present problem have been published. In the first study by Taylor and Manson the evaluations were made in the presence of and under direct supervision of a qualified personnel technician. The technician worked with one supervisor at a time and after the preliminary introduction, the technician read the first trait description aloud and discussed it briefly as it applied to the specific position. . . . Then the tech- nician asked the supervisor to examine the list of subordinates to be rated and to indicate which was his best worker with respect to that characteristic. When the individual was named the supervisor was asked to justify the choice and indicate where on the scale the rated individual belonged. (36, p. 508) This study was conducted with five pepulations of office employees varying in size from 5h to 613 persons. .All groups received the same treatment with no control pepulation. The authors evaluated their results in terms of reliability, halo, and distribution skew. Inter-rater reliability coef— ficients were presented for seven common factors on two of I) 11 the groups. The authors stated that these reliability coefficients compared favorably with most found in the literature and were more than adequate for validation purposes. Halo was measured by interccrrelation among traits. The authors said that these coefficients were on the whole considerably lower than those generally found in the graphic scale and so assumed that a step had been.made to overcome the problem of halo. Distributions of overall ratings plus means and standard deviations for trait ratings were presented. Excessive skew was not evident as the distri- butions showed no pile-up in the upper ranges. The authors concluded that supervision of ratings did improve the ratings in the above respects. This investigator considered that the comparisons made by Taylor and Manson were gross. The lack of a control group and any tests of significance caused this experimenter to question the results. The second study by Taylor and Hastman (38) was pub— lished after the data collection for the Chrysler study had been completed. This study incorporated more in terms of controls and statistical comparisons than the earlier one. The pepulation of 712 office employees was divided among four conditions of administration. These were (1) traditional graphic ratings, (2) StevensAWOnderlic, (3) Group Stevens- ‘Wonderlic, and (h) Supervised StevensAHonderlic. The Stevens- ‘Honderlic is a practice suggested by them (33) which stated that each trait should occupy a page by itself and each 12 individual in a group of ratees should be rated on the first trait before considering the next, etc. The evaluations were made on a ten point scale for eight variables common to all four treatments. The relative merits of the four methods were inferred by comparison of what the authors called ”secondary criteria of rating system qualities”. These were inter-rater reliability, halo, variability, and leniency. As in the previous study the comparisons were primarily on the basis of observation of the results with a lack of tests for significance. The results were negative; no differences were indicated for either format or methods of administration. For all four conditions the findings were: 1. Almost the entire range of ratings was used by the raters 2. The standard deviations indicated a desirable dispersion 3. Rather than peaked, the distributions were somewhat platykurtic u. Means of the scales were only slightly above the mid—point of the range 5. Only slight negative skewness and no marked asymmetry 6. Correlations among the traits were generally consider- ably lower than are usually found in graphic rating scales. The authors considered, "These are rather unusual find- ings” (38, p. 20h). They suggested, by way of explanation, [1 r. 13 that these findings may be affected by the following three factors: 1. The ratings were done for research purposes 2. The situation was conducive to the production of desirable ratings 3. The rating scale was well defined in behavioral terms rather than trait names. Rationale The purpose of coordination is to increase the efficiency of the appraisal method. This study was directed toward evaluating the influence of the coordinator upon the appraisal. In other words, to what extent, if any, did the presence of a coordinator affect the accuracy or usefulness of the appraisal? The important question then became: How is improvement in a rating method determined? A discussion of the criteria of rating excellence seems apprOpriate at this point. As in the case of the program objectives, there are two goals in appraisal: 1. Evaluation or measurement of present performance 2. Improvement of individual performance or planning for personal development. In order to clarify the problems involved, the two aspects, :measurement and development, were considered separately in the following discussion. r) 11+ Criteria of Evaluation Method Over thirty years ago Freyd made the statement: Ratings are ultimate things, and the comparison of the various systems cannot be found by recourse to an external criterion. In the writer's Opinion there are no flawless methods of evaluating rating scales. The criteria which have been advanced may be divided roughly into those which appeal to such factors as ease of administration and scoring, pOpularity, and so forth, and those which employ statistical reasoning. (12, p. 89) In regard to the non-statistical criteria he stated, ”These are important criteria unless one has access to trained judges with unlimited patience." Freyd listed the following seven statistical criteria: 1. Comparison of ratings with intelligence test scores 2. Ratings on the same men by the same judge for different months 3. Ratings on same men by different judges he5. The form Of the distribution -- its normality and its spread 6. Absence of halo 7. Have a person other than the rater sort the ratings and indicate to whom they apply. The above were not definitive criteria and actually the situation has not changed significantly since that time. Likewise, as was indicated above in 2 and 3, some confusion still exists as to what is reliability and what is validity for a rating method. 15 Reliability. Two methods of determining the reliability of ratings are commonly used: 1. Comparison of re-ratings by the same rater 2. Agreement between independent raters or inter. rater reliability. These methods present certain problems. In the first case, re—ratings close together in time frequently produce high reliability coefficients because the rater recalls the previous rating and desires to be consistent. If a long interval of time elapses between ratings a difference in the ratings may reflect a true change on the part of the ratee. The second method assumes that each of the raters is equally familiar with the performance of the rates and that they assign ratings independently of each other, insuring discreteness not only in rating procedure but also in content. In an industrial situation this is usually not true. Driver stated regarding this method: The problem as to whether agreement between a number of independent raters indicates validity or reliability is one that has caused much of this confusion. Some investigators feel that if three or more raters all agree as to a particular quality, this indicates that they are all rating the same trait in the performance they are observing, and hence, that this method is a.measure of validity. Others have argued that agreement between judges does not mean they are all drawing the correct conclusions, but that they might agree on some erroneous answer, and that this method is a measure of consistency of rating rather than validity. This problem is difficult to resolve, but in most investigations such agreement is desired. Whenever individual raters do agree, it is felt that the ratings are of greater value. (7. Po 191) 16 'Whitla and Tirrell (#2) in a study concerned with the validity of ratings at various levels of supervision found: 1. The level of raters closest to the ratees was best able to rate them 2. The level of raters closest to the ratees, and only this level, was able to discriminate between sections of the rating instrument. In the present study it was not possible to obtain ratings for either reurating or inter—rater reliability. The study was conducted as part of an ongoing program and ‘the management felt that immediate re-ratings would create ‘undue confusion and misunderstanding. Organizational structure was such that the second level of supervision was not close enough to the ratees for rating purposes. Validity. The problem of validity is the most critical and the most difficult. A reliable rating may not be an actual picture of performance. Driver listed the following as methods employed to obtain validity measures, with varying degrees Of success. Methods Of Determining Validity 1. Comparison with some direct measurement of performance, i.e., production records, etc. 2. Comparison with psychological tests pur— porting to measure the same ability. 3. Comparison with worknsamples. h..Analysis of distribution of results. 5. Analysis to determine the presence or absence of 'halo effect'. 6. Follow-up procedures. 7. Miscellaneous methods. (7. p. 185) \4 17 In reference to numbers A and 5, he mentioned the dangers involved in making the assumptions necessary for their use. However, he stated that while these methods do not furnish conclusive proof of validity they are considered by many investigators to be fairly good indicators. Mahler suggested: Among the methods of testing the adequacy Of the ratings the following are most common: 1. Reliability - how consistent are the ratings? 2. Validity - how accurate are the ratings? 3. Distribution - do the ratings result in adequate spread and does it tend to a normal distribution? h. Halo effect - is there a tendency for the ratings of one or more traits to influence the ratings given to other traits? 5. Inter-correlation between traits - do the ratings on different traits tend to be discrete? 6. Variation in average rating - is there a wide variation in the average rating of individual raters, the average rating of employees in different occupations? (22, p. 318) Patterson had the following to say about objective measures, "Many apparently objective criteria are not Objective at all. What is usually meant by objective is quantitative. But the most subjective judgments can be expressed in an arbitrary quantitative form" (25, p. 277). He further indicated that military rank is not a valid index of success. Salary is not necessarily related to ability or output. Both output and salary are frequently dictated by union rules. 18 Taylor and Hastman, on the question of validity, said, The most important criterion of the format or administration of a rating scale would thus be its validity. But ratings are themselves fre- quently the criterion against which predictors are validated. To validate the ratings would require construction of a more ultimate criterion. If such an ultimate criterion could be quickly and economically constructed there would, of course, be no justification for the use Of more remote measures of performance such as ratings. (38, p. 18h) Below are listed a few random references and the type ef criterion employed to validate rating methods. Ferguson (9) - paired comparison ratings Mahler (2h) - volume scales Stockford and Bissel (3h) rating of second supervisor Chi (5) ~ consensus ratings Taylor, et a1. (37) - salary after bonus proved unfruitful ‘Sisson (28) - consensus of fellow-Officers. It appeared to the investigator that a more compre- hensive survey would indicate that the most commonly used criteria found in the literature were another rating method, consensus ratings, and ratings of peers or coworkers. Regarding the latter, Springer (31) found a low positive relationship between the ratings given by supervisors and coworkers. There was higher agreement between pairs of coworkers and pairs of supervision. The evidence seems to indicate that the picture is not clear regarding suitable criteria for the validation of rating methods. 13 19 In this study rankings were Obtained following the ratings from the same raters for a limited portion of the sample. The question was, whether this is an index of reliability or validity? Discrimination 25 S read. Another approach to this prOblem can be made in terms of measurement theory and the basic assumptions involved. In other words, an instrument should satisfy certain fundamental requirements to achieve any measurement at all. There are two characteristics of scales which require further attention. The first involves discrimination: the second involves constant and random errors. A basic assumption in evaluation is that differences do exist between persons in terms of job performance. In order to.measure these differences a scale should produce discrimi- nations between the individuals rated. In an industrial situation this is an important problem. Frequently appraisals are Obtained which make no discriminations in the group of employees. In this case correlation breaks down and hence reliability and validity cannot be calculated. Consequently, a basic requirement is some variability in ratings. On the other hand, increased dispersion or variability does not insure reliability or validity for in the extreme case this can be due solely to random error. In the case of appraisal, it is assumed that the rater is not making haphazard responses but that the judgments are based upon observable facts of performance. The question 20 which arises is whether increased emphasis on a spread of ratings or more discrimination results in an artificial spread er a.mere accurate measurement of performance. It is a.matter of degree rather than an all-orenone preposition. Various methods of increasing spread are employed, such as rater training, group supervised ratings, forced distri- bution, etc. Coordination is more than rater training or supervision because it focuses attention upon a specific individual at the time of the rating process rather than dealing with general principles of rating. It is expected that the coordinator obtains increased discrimination which is a.more accurate picture of performance. This is achieved through increased clarification of the scale values and explanation of the items plus the continual probing for supporting facts. In other words, the coordinator assists the appraiser in making a more objective evaluation of the employee being rated. Under these circumstances it is assumed that there is less distortion due to forced discrimination than to lack of variability. Following Guilford (1h) the rating a person receives can be thought of as a summation of the ”true" value and an error factor. The error term can be further broken down into types of constant errors and residual or random error. He mentions logical error, contrast error, and proximity error with emphasis on the well known types of errors in ratings such as leniency, central tendency, and halo. Guilford explains in detail an analysis of variance procedure which can be used a} ’ W /‘.| K I 21 to identify leniency, halo and contrast. With these error components isolated it is possible to adjust the ratings to eliminate their biasing effects. This analysis of variance procedure can be used only in a situation where two or'more raters evaluate the same group of individuals. Leniencz. The tendency of raters to be generous in the evaluation of their subordinates is one that has been fre- quently noted in the literature by critics of graphic scales. A scale or method which produces ratings with a mean near the mean of the scale and with ratings distributed symmetri~ cally about the mean is considered more useful than one with a high mean and considerable negative skew. The assumption that normality is the most desirable distribution can be in error. The true distribution of a trait is not known. However, with a large number of cases a spotty or highly skewed distribution indicates that certain portions of the scale are being neglected by raters, or steps in the scale are not equal in value. Egig. Halo may be defined as a tendency to rate any given employee on the basis of the rater's overall general impression. This is usually noted by the tendency of presumably independent scales to correlate extremely high with each other. The presence of halo is probably the most common criticism of graphic rating scales. The absence of halo is considered by most authorities of rating methods as desirable. The presence of halo tends to lower the validity 22 ef ratings. Closer scrutiny of the literature regarding this topic indicated that the question was not that simple. This phenomenon called halo effect was first noted by wells (35) in 1907. Thorndike (39) noticed this tendency for general impression to spread to specific traits and in 1920 gave it the name ”halo effect". Several methods have been suggested to reduce halo in ratings such as training raters and better definition of items. The best known technique for graphic scales is that of placing each trait on a separate sheet and rating all stimuli on each trait before proceeding to the next as suggested by Stevens and Wonderlic (33). Bingham, however, stated that all halo is not invalid, that there is ”a halo which cannot and should not be elimi- nated because it is inherent in the nature of personality, in the perceptual process, and in the very act of judgment" (11-: Fe 222) e A complete discussion of this topic was not practical at this place but a few comments were made to identify the problem to some extent. Johnson (17) has presented the most thorough coverage of this tepic and its relation to the judgment process. Lynch, in a discussion of the theory of rating scales, indicated that "more than a slight hint of an underlying psychological position" (20, p. h97) is implicit in the assumptions of appraisal. 23 The general impression of a person - which is valued by the Gestaltians as reflecting the nature of the whole - is, to the rating—scale practitioner, a phenomenon to be guarded against. Behavior is reduced to isolated parts each of which is evaluated independently, and then combined into a whole. Here, again, the underlying assumption that the individual is a summation of distinct response systems is in accord with the behavieristic conception. A person may indeed behave as-anwhole, but specific response groups may be isolated and independently evaluated. The whole is exactly equal to the sum of the parts. (20, p. 500) This would imply that perhaps the view a person takes regarding halo is to some extent a function of his theoreti- cal position. More than likely the answer is somewhere between these two extremes. The validity or invalidity, as designated by Bingham (h), of halo is no doubt a function of several variables. One that has been generally overlooked is the variation of the stimuli. Johnson (17) pointed out that the actual inde- pendence of the traits is of prime importance in order to determine whether the correlation is due to general impression er objective correlation between the traits. In the treat— ment of this tepic writers have dealt with the range from ill-defined personality traits to well-defined aspects of specific behavior without clearly indicating the importance ef this factor. Symonds (35) gave five reasons for a large halo which are also indicative of important variables. The trait or habit: 1. Is not easily observed _ \ ,\ ~ K - k ‘O KI 1 ‘ L x. _ \ .' \,, \ i . \ \ x . v ' f K. 7 \{1 I t C - _ - 1" ‘ - x c , . k \1 \ K J u \ f' - ‘ \, L \J . \ \ j \ , , v x L x _ \J '—‘ . , - \ _ K , a U V , f. K/ ( \ h \/ v , . I \ \ \, O | k - l A '~ , - k 3 , K . \ b . ‘ - ‘7 . . F a - \ _ \ \ \ J - r\ \4 \ ' ‘ > \J ‘ O . \ (\ 2. Is not commonly observed or thought about 3. Is not clearly defined h. Involves reactions with other peeple rather than ”self-contained” behavior 5. Is one with high moral importance in its usual connotation. Bingham, however, indicated that there probably is an invalid halo ”which marks a judgment as vague and undiscrimi- mating, carelessly recorded when the observer's attention has been focussed, not on the trait in relation to its setting, but on the ground alone" (A, p. 223). It is this kind of halo which the experimenter was interested in eliminating from the rating process. A variety of methods have been used to measure halo. The most common of these is intercorrelations between traits. Factor analytic studies of ratings such as those by Tiffin (ho), and Grant (13) have pointed out the presence of a large general factor which is commonly called halo. Johnson and Vidulich (18), using an analysis of variance procedure suggested by Guilford (1h), calculated the halo effect as the variance due to interaction between rater and ratee. These authors were able to isolate a significant halo by comparison of conditions designed to maximize and minimize the halo effect. For purposes of this study the experimenter used an index suggested by Bingham (h) to indicate relative halo effect. This was the proportion of identical ratings on all traits. .4 f r- ’ I h! ‘ \. v \. . - a . \ L.) "a. ,1 ) t _ x v‘ .‘ f\ I. _ - t . x V k. V _ a \ f‘ a 11 \' ’ / ('\ \’ _‘ ~ \ ' .\ ...—» x, x . r - ’\ 3 25 Criteria of a Development Method Whether the appraisal method serves to stimulate indi- vidual improvement is equally as important as ratings or evaluation. This, of course, depends upon the goals or emphasis of management. The investigator earlier indicated these to be of equal importance in this program. A thorough evaluation of the effectiveness of the method as a development tool would entail a comprehensive Judgment ef the entire program. The question would be, does the program produce more effective employees over an extended period of time? An investigation of this sort involves measurement of employee worth in terms of ultimate criteria. This was desirable but impossible to perform because of situational, economic and individual variables which were beyond the control of this experimenter. Flanagan (10) has criticized conventional approaches to rating and presented the "critical incident" technique as a more satisfactory method of evaluating personnel. He con- siders the critical incident method primarily as a deve10p~ ment tool. In fact, Flanagan and Burns stated: The Performance Record is not a yardstick. It is not a rating method. It is a procedure for collecting the significant facts about employee performance. These facts are gathered in such a way that they will be ef maximum usefulness to supera visors and management both in improving the employee's understanding of the requirements of his present job and in develOping his potential for more responsible positions. It is not simply a new form but a new approach. (119 Pe 102) \1 26 The critical incident method, like the system under study, attempts to obtain written evidence of behavior. Likewise, these facts are summarized for an interview with the ratee. The research by Flanagan and Burns (11) did not involve comparing the critical incident with other systems. Their work was concerned with a classification of incidents and the develOpment of the approach rather than a verification of its usefulness. Without exception authors of appraisal plans designed to improve individual performance agree that a summary of the appraisal is essential and that in some fashion this ‘material must be given back to the ratee. It is only reasonable that an individual must know his weaknesses or develOpment needs in order to do something about them. The coordinator plays an important role in assisting development through consistent and thorough follow-up subse— quent to the appraisal. He also has an important function at the origination of development planning during the appraisal session. The assumption is made that an effective follow through depends to a large degree upon the original analysis of the individual's needs and whether or not plans were made to effectively meet these needs. The sequence of events during the appraisal and as indicated on the form are as follows: 1. Supporting facts are recorded to justify each rating 2. A performance summary of strengths and weaknesses summarizes the higher and lower ratings 27 3. The coordinator and appraiser plan tentative develOpment action to meet these specific needs. If this procedure is followed.more realistic and specific development plans should be the result. in evaluation of this aspect of the appraisal can give immediate criteria of the method as a development technique. In the extreme case, if no summary or develOpment plans are made, none can be suggested to the employee being considered. Purpose and Criteria of Present Study It was the objective of this research to determine the effect of the coordinator on the appraisal with respect to the dual aims of evaluation and develOpment. No studies were found which bore directly on this problem. The studies on supervision of ratings were not conclusive. Research aimed at the appraisal as a development instrument has been primarily for the purpose of constructing tools rather than comparing their contribution. The experimenter was also interested in determining the utility of the procedure used fer evaluating the develOpment aspects of the appraisal. Evidence has been discussed which indicates that criteria of a rating method are not easily established nor free from question. In this study no effort was made to compare the methods in terms of outside criteria. The validity and usefulness of the coordinated appraisals were inferred from their relative merits by comparing coordinated appraisals With non-coordinated appraisals in terms of "secondary" \, \J -. ‘ e \ V\J . a t ' . / k A L ‘ ~ . ‘\ a L _ _ \ . . j , , - \ \ g . \ , 7/ . ‘ i , x ‘ \ Q , K . p , \z \ - _ s \ 28 criteria of a rating system" and criteria for the appraisal as a development tool. The latter were established by an examination of the function of the appraisal in the complete process. These were: 1. Discrimination or spread 2. .Leniency 3. Halo h. Ratings versus rankings 5. Quantity and quality of supporting facts 6. Quantity and quality of appraisal summaries 7. Quantity and quality of development plans. STATEMENT OF PROBLEM Evaluative Aspects The evaluative aspects are those concerned with the ratings alone, the scale values of the judgments and their relationships. It is expected that coordination of appraisals ‘will result in improved evaluation of performance. This improvement of the ratings will result from an increase in discrimination and a decrease in the constant errors of leniency and halo. Specifically, it is predicted that the coordinated appraisals will differ from the non-coordinated in these respects stated in the hypotheses listed below. Discrimination or Spread Hypothesis 1 - The spread of overall ratings or the use of the scale categories other than the central scale value is greater for the coordinated appraisals. Hypothesis 2 - The variability of overall ratings for each appraiser is greater for the coordinated group. Leniency Hypothesis 3 - The mean of overall ratings for the coordi- nated group is nearer the central scale value. Hypothesis h - The distribution of overall ratings is more symmetrical, with about the same number of high and low ratings for the coordinated group. 30 Halo effect Hypothesis 5 - The variability of item ratings is greater for the coordinated appraisals. Hypothesis 6 - The coordinated appraisals will contain a smaller proportion of identical ratings. Coverage Hypothesis 7 - The coordinated appraisals are more complete and hence contain fewer omitted or “don't know" responses. Comparability between ratings Hypothesis 8 - The correlation between separately obtained rank score ratings and overall ratings is greater for the coordinated appraisals. DevelOpment Aspects The supporting facts or evidence recorded for each item is the basic source of information for determining the per- formance strengths and needs for each appraisee. A summary of strong points is considered necessary for rounding out the feed-back interview. The summary of needs is the source of material for development planning. This plan should state explicitly how these needs can best be met, the method of training and the responsibility of the employee and/or supervisor for action. It is expected that coordinated appraisals will contain more adequate supporting facts, more adequate summaries of strengths and needs, and more meaningful develOpment plans than.the non-coordinated. The following predictions are T) \J 4'\ \. Ll 31 made in regard to the supporting facts and performance summary. Supporting‘Facts Hypothesis 9 - The supporting facts of the coordinated appraisals will: a. contain a greater amount of information b. be more descriptive of specific performance c. contain.more criteria of performance d. contain more examples or illustrations of specific instances. Performance Summary Hypothesis 10 - The non-coordinated group of appraisals will contain a greater number of appraisals with sections 1, 2, and 3 completely omitted. Hypothesis 11 - The coordinated appraisals will contain a larger number of performance strengths (section 1). Hypothesis 12 a The coordinated appraisals will contain a larger number of development needs (section 2). Hypothesis l3 - The ceordinated appraisals will contain a larger number of methods of handling the development needs. Hypothesis 1h - The coordinated appraisals will more fre— quently indicate action to be initiated by the super- visor or both supervisor and employee, rather than employee only. 32 Hypothesis 15 - On-the-job counselling or coaching is suggested more frequently in the coordinated appraisals as a method of training. Hypothesis 16 n The development plans of the coordinated appraisals are more specific or concrete than those of the non-coordinated appraisals. EXPERIMENTAL PROCEDURE Pepulation The study was conducted at C.0.N.I.R.A.P. (Chrysler Operated Naval Industrial Reserve Aircraft Plant) formerly known as Chrysler Jet Engine. The plant was operating on a large scale job shop basis, with only limited production line type activity. The employee pepulation represented a wide range of skills and jobs. The specific group involved in the study could be considered from two points of view: 1. A pOpulation of ratees, or 2. A population of raters. The latter was considered as the unit for the basic analysis. A sample of 192 persons was appraised or rated by a group of 6h first-line supervisors. First-line supervisors are those who supervise non-management employees, either hourly or salaried. Each supervisor appraised three persons randomly selected from his work group. The sample of supervisors or appraisers was the starting point for the design layout. The total number of first-line supervisors on the day shift was appreximately 70, on the afternoon shift about 10. The sample of on supervisors used included all of the total group who satisfied the criteria essential for the study which were /’l f—vfi 311- as follows: 1. A supervisor must have three or more persons in his work group (design requirements) 2. The persons appraised must have been under the present supervisor a minimum of three months. Two of the original group of supervisors left the Corporation before appraisals were begun. Two others were not able to participate because of extremely large groups and extenuating circumstances at the time. These were replaced by supervisors on the afternoon shift. As the design below indicates, the supervisor sample had to be in multiples of eight in order to simplify cal- culations. A total of eight conditions or treatments was used and it was desirable to have groups of equal size. Two of the supervisors were women with work groups consisting of only women employees. An additional ten women employees were present in the other groups giving a total of sixteen. Design The basic design involved a division of the entire supervisory sample into two groups: 1. Experimental group or coordinated appraisals 2. Control group or non-coordinated appraisals. A further breakdown of these two main groups by office vs. plant and by coordinators resulted in the design shown in It—‘w 35 Figure I appropriate for a 2 by 2 by 2 analysis of variance factorial design. Coordinated Group Non-coordinated Group (Experimental) (Control) Plant Coordinator Coordinator A B A B Office Coordinator Coordinator A B A B Figure 1. Experimental Design The above design consisted essentially of eight treat- ments with eight supervisors in each group. Each of two coordinators coordinated appraisals with 16 supervisors and contacted l6 supervisors in the non-coordinated group. Half of each coordinator's work was with plant supervisors and half with office supervisors. 36 Sample Selection The group of 6h supervisors, 32 office and 32 plant, were arranged in pairs according to the function of their work group. This gave two lists of raters in the office and two lists of raters in the plant matched according to the type of work performed. Some of the functional pairs of supervisors were then reversed so that each list contained about the same number of large and small work groups. The supervisors for each location were then assigned in a random fashion to the coordinated or non-coordinated group and to one of two coordinators. See Figure X, in the appendix, 'which presents the final listing of supervisors by function and group size. The condition and coordinator to which each ‘was assigned is also shown. The design called for three appraisals from each super- visor. Random numbers were used to select the ratees from an alphabetical listing of personnel in each supervisor's group. Instrument The appraisal form presented in appendix Figure XI was prepared for this study. It represented a modification of the management appraisal form. The sequence of procedure and the content required were basically the same as for the management form. The significant changes were in terms of items. The items represented fairly general, yet typical, types of performance rather than managerial responsibilities. No attempt was made to justify the selection of these z. 37 particular items. Perhaps it would have been desirable te prepare a scale oriented to more specific job duties but the heterogeneity of the rates pcpulation made this unfeasible. The set of instructions, also shown in appendix Figure XII, served as an outline of appraisal procedure for the intro- ductory sessions and the non-coordinated group of supervisors. The instrument used was common to both the experimental and control groups with differences in administration as the critical variable. Appraiser Training At the outset, all supervisors were given an intro- duction to the program and instructions in appraising through supervisory conferences. These sessions lasted about one hour each, with groups of from 8 to 15 supervisors. A set of instructions and an appraisal form were distributed to each supervisor and the experimenter went through the steps of completing the form and then answered questions which came up. The majority of appraisals, both coordinated and non- coordinated, were completed in a period of three months although complete collection of the data covered a period of five months. Coordinators Two coordinators were used in the study. The experi— menter served as a coordinator and was assisted by another OI \ x x / d C) k s. \ x \ ux. 38 person from the Central Personnel Staff. Neither coordinator had had previous experience as a coordinator. In order to compensate for this fact, both were given opportunities to observe experienced coordinators at work. Both coordinators had attended workshops on the technique of coordination given.by Central Personnel Staff for the training of plant coordinators. Following this, the coordinators spent several hours discussing technique and establishing rough ground rules of operation. Before beginning work with the supervisors in the sample, each coordinator conducted an appraisal while the other observed. In this way it was possible to establish uniform. ity in approach and begin work with the sample group at a high level of efficiency. During the second week of operation each coordinator was observed at work by the Management Development Program Assistant to determine if his method of coordination was in line with that typically performed. Administration The typical pattern was the coordination of three appraisals during a three hour session with a supervisor. The three appraisals were usually completed within this time. The appraisals were conducted away from the super- visor's work area, in a private office. The coordinator questioned the supervisor about the performance of each rates in regard to each item. He then (fl ‘4 39 recorded the evidence on the appraisal form and asked the supervisor to rate the item on the five point scale. The supporting facts and the rating were both elicited from the supervisor. The coordinator might have questioned in either case for more detail or evidence. The three ratees were all considered in respect to each item before going on to the next item, etc. The performance summaries were written following the same pattern of inquiry. At the close of the coordination period each supervisor was asked to rank the persons in his group on overall performance. The non-coordinated supervisors picked up the materials at the effice at which time they were given the opportunity to ask individual questions about the procedure. At the time the non—coordinated appraisals were distributed a tentative date was set as to when the supervisor would return them. A week to ten days was the average length of time set. When the supervisor returned his completed appraisals he was asked to rank his entire group and estimate the time spent in making the appraisals. Very few supervisors completed and returned the appraisals during the initial period of time agreed upon and so it appeared fruitful to record the number of calls made to each person and the amount of time which elapsed between distribution and return of the appraisals for each supervisor. \J ho Data Collection Each supervisor was assigned a code number. As the appraisals were collected each was marked with the super- visor's number and an employee number. The content of each appraisal was typed in order to expedite coding and to elhminate identification of procedure which might have been possible through handwriting. The typed cepy contained only the appraisal number as identification. Coding In order to evaluate the content of the written item responses and performance summary content codes or scales were developed by two methods: 1. Apriori, based on assumptions stated as hypotheses 2. Empirically, by an analysis of several appraisals. The scales were established by the experimenter. The coding was conducted by a person trained for this purpose. The ceder was unacquainted with the specific experimental aspects of the study. She was aware that the appraisals were administered under various conditions'but did not know what these were nor which appraisals were conducted under a particular condition. For each scale the experimenter and coder established a common frame of reference by working together on a few appraisals. After definitions and rules for approach were established the coder evaluated the entire group of appraisals. _‘ Al The experhmenter evaluated one-third of the appraisals independently for each scale which served as a reliability check for the coding process. The item response was the basic unit for the content scales. These were then averaged to obtain a mean scale value for each appraisal. These latter values were the‘ scores used to compare the appraisals in the final analyses. Statistical Tools Product-moment correlation, Chi—square, and analysis of variance served as the basic statistical tools for analysis of the data presented. Chi-square is a non-parametric test which is very useful in comparing distributions where little is known about the form. Heel stated that "the X2 distribution is concerned with the values of the e1 but not with the form of the distribution from which they might have been obtained as samples" (15, p. 189). The prhmary concern with e1 is that no expected cell frequency is less than five. In cases where this does occur other cell frequencies may be combined until the condition is satisfied or corrections made for this discrepancy. In using X2 it was possible to make the required corrections. Edwards (8) suggested correction for continuity is very easy and should always be made in a 2 by 2 table. This suggestion was followed. \/ |_u. #2 In a discussion of the assumptions underlying the F-test, Lindquist stated, “It is very important, in any application of the stapleurandomized design, to consider very carefully the assumptions underlying the F~test of the null hypothesis and the effects on the validity of this test of the failure to satisfy one or'more of these assumptions”. He further stated, “Generally, if one or more of the conditions is not satisfied, the distribution of msa/msw‘will be more variable than the F-distribution. This means that if a 'significant' mean square ratio is obtained in an experiment, it could have resulted from a failure to satisfy any one of these conditions" (19, p. 72). Bartlett's test of homogeneity of variance was applied in each instance before using analysis of variance to check the assumption that "the variance of the criterion measures is the same for each of these treatment pepulations". Lindquist stated that this test is only needed when the treatment groups are quite small (probably three or four) or when inspection of the data indicates heterogeneity is marked. In a few instances the Bartlett's test was significant. A transformation of the data resulted also in a significant Bartlett's test. The‘F-ratio was significant. According to Lindquist we have occasion to doubt the validity of the Fetest when this situation occurs. #3 The Norton studies on the ”Effects of Non-normality and Heterogeneity of Variance" reported by Lindquist (19) served as a reference for the subsequent decision by the experimenter. Norton constructed card populations of 10,000 cases each to empirically study the effects of the above factors. Based on the results of Norton's studies Lindquist made the following statement: Accordingly, where marked (but not extreme) heterogeneity is expected, it is desirable to allow for the discrepancy by setting a slightly higher “apparent" level of significance for this test than one would otherwise employ (the ”apparent" level being that indicated by the F-table). For example, if one wished the risk of a Type I error to be less than 5%. he might require that the obtained F exceed the 2.5% point in the normal-theory F-distribution. The apparent" level of significance would then be the 2.5%. but the actual level would be the 5% level. (19’ De 83) It appeared safe to assume significance in the above mentioned situation, where the F value is not borderline but sufficiently large to allow for any discrepancy which might possibly be due to heterogeneity of variance. In the data reported here the significant F values were considerably larger than double the table values as suggested by Lindquisto The experimenter cites the above as justification for accepting the results in a few such situations as good evidence of significant differences between treatments. The other alternative was to use a nonuparemetric test. The use of such a test would prohibit the use of factorial design and hence reduce considerably the amount of infor. mation regarding subgroups. . .U s c L . . . \ . i.\ . . .L s a I \ . . w. . L O ..u . L - .. ... a \ ... . . .U n .V ...x .. .C . a .. .. \ ... . .. . \ . .. c c . c ... . ._ e v ...n 3 r . I . e \ . ... I. . . \ .. .J . F r; .i . . .. ...\ . I . ... u , . . . . C . ..x .\ v .u. x \ ... x .\ u . . . .. .x . .. s .. s . n . _ .. . .L _\ .. c .x . u ... .. .V x O . . . ... . c . . ... u _. . . . . J .. . . .. . .. t . .._ e . I x L . . . ._ .. . . .. .\ . .... . . . y U . L — .u . _ . v \ _. . .. .... v . .\ .. . n. . .\ . . . u ... x \ n v .. . . v I . . DJ .\ e . .. a . x . .x . \ ... I . ... V . . . l C ..v ._ . C . L . x .. n . . x . . n . c . ; L e . i . a ... s \ . o _ . ..\ C .. . e .. . . ..o .n l“ . m . A. .\ ... ..u . i . .f . .. n .. .. U .\ rJ . \ x O .. . . I . . . s . .v . ... x .. . . v ..c . . . a . r.. .. a . ... .\ RESULTS AND DISCUSSION The results of the experiment are presented according to the numerical order of the hypotheses. Hypotheses 1 through 8 are concerned with the evaluative aspects. The develOpment aspects of the appraisal are considered in.Hypotheses 9 through 16. Findings dealing with the Operational aspects were not outlined by hypotheses. The results are followed by a general discussion. The statistical design for the experiment was shown in the experimental procedure, Figure I (Page 35). Although this design was a 2 by 2 by 2 analysis of variance factorial resulting in eight treatments, the primary purpose of the study was to determine the effect of the coordinator upon various aspects of the appraisal. A consideration of differences between coordinators and between locations was of secondary interest. The majority of the analyses were conducted by analysis of variance. The results are presented with the emphasis upon the differences between the coordinated and nonncoordinated groups. In a few instances where chi square was used for analysis the only comparison was between the coordinated and nonpcoordinated groups. The within-groups mean square was used as the error term or the denominator of the F ratio in all cases. #5 Following a suggestion by Edwards (8, p. 182), no F values were computed if the value of the numerator of the F ratio was smaller than.that of the denominator. Therefore, only the F values larger than one are given in the analysis summaries. This procedure facilitates inspection of the results presented in the tables. Evaluative Aspects No attempt was made to compare the coordinated appraisals to outside criteria of rating excellence. The tenuousness of such criteria and the reasoning'behind this decision were discussed in the introduction. Instead, the relative merits of the system were inferred by comparing the coordinated with non-coordinated appraisals on characteristics of ratings considered desirable or essential. Discrimination 2; Spread. It is necessary in any system of evaluation or measurement that distinctions are made between the individuals or sthmuli in question. The failure to discriminate between the persons being rated has been a problem with many rating methods. This lack of discrimination is usually indicated by a pile-up of ratings in one category or portion of the scale. Because this concentration of ratings is most frequently found in the middle of the scale it is also called central tendency. Hypotheses l and 2 were directed toward this problem. V L,, I A I x . \ i "~_/ g ' \ a ' . \ I V K, i A I) f h6 Hypothesis 1 - The spread of overall ratings or the use of the scale categories other than the central scale value is greater for the coordinated appraisals. A comparison of the overall rating distributions for the two groups was made by chi square. These data are shown in Table l. The value of chi square was not significant; consequently, the hypothesis was not confirmed. Inspection of the table indicates there were fewer ratings in the central category for the non-coordinated group, whereas the coordinated group had fewer ratings in categories 3 and A combined. This was a rather gross method of comparison but it can be effective in detecting large differences between distributions. It was possible that spread or discrimini- nation.measured in this fashion could be a function of rater differences rather than differences in rates performance. TABLE 1 COMPARISON OF OVERALL RATING-DISTRIBUTIONS FOR THE COORDINATED AND NONnCOORDINATED GROUPS Category Total 1 2 3 u S Coordinated 0 13 51 29 3 96 Non-coordinated l 7 M6 39 3 96 Total 1 20 97 68 6 192 x2 = 3.1.135 For 2 df: .05 level x2 = 5.991, .01 level x2 = 9.210 .\ . \ . _\ x . o ... . J I . s . O .L .L .L ... ... .... I _\ .. \ s l .. .\ e O _ _. . . C w e . .. . . . c Q a a . n U .. _ . . ..v 0 .L .... .I _ ‘ . . _ .l._ LL? Ideally, spread or discrimination should also be considered within each rater. In other words each appraiser should indicate greater differences between his people for the coordinated group. This was the basis for the next hypothesis. Hypothesis 2 - The variability of overall ratings for each appraiser is greater for the coordinated group. Each appraiser rated three persons selected at random from his work group. If discrimination was increased between the ratees of the individual appraisers, the variability of these three ratings should be greater for the coordinated group. The index of variability used was the standard deviation. These data are presented in Table 2. No signifi- cant differences were found; therefore, Hypothesis 2 was not confirmed. The coordinated and non-coordinated distributions did not differ significantly in regard to discrimination or central tendency. Leniency. A scale or method which produces ratings with a mean near the mean of the scale and with ratings distributed symmetrically about the mean is considered more useful than one with a high mean and considerable negative skew. Both of these factors were considered important. A curve could be skewed considerably without affecting the mean significantly due to the influence of extreme values. On the other hand, a distribution could be symmetrical about some point other than the mean. These factors were con- sidered in Hypotheses 3 and h, respectively. as TABLE 2 ANALYSIS OF VARIANCE OF THE STANDARD DEVIATIONS OF OVERALL RATINGS FOR EACH APPRAISER wfl - m i Sum ofw» Mean Source of variation squares df square F Between: Methods 1,903 1 1903 2.05 Coordinators 75 l 75 Locations 1,651 1 1651 1.78 Interactions: Methods x Coordinators h05 1 #05 Methods x Locations 3,437 1 3&3? 3.71 Coordinators x Locations 293 1 293 Methods x Coordinators x Locations 2,10h 1 210h 2.27 Within groups 51.932 56 927 Total 61,800 63 v— For df l and 55: F.05 = 4.02, F.01 = 7.12 Bartlett's test: Not significant Hypothesis 3 - The mean of overall ratings for the coordinated group is nearer the central scale value. These data are shown in Table 3. The hypothesis was not confirmed. 119 TABLE 3 ANALYSIS OF VARIANCE OF THE MEAN OVERALL RATINGS FOR APPRAISERS Source of variation Sum 0f Mean squares df square F Between: Methods A1 1 bl 2.28 Coordinators 51 1 51 2.83 Locations Ml 1 hl 2.28 Interactions: Methods x Coordinators 6 1 6 Methods x Locations 21 1 21 1.17 Coordinators x Locations 1 1 1 Methods x Coordinators x Locations 21 1 21 1.17 Within groups 1027 56 18 Total 1209 63 For df 1 and : F = '.02 F = .12 55 .05 L)‘ .9 .01 7 Bartlett's test: Not significant Hypothesis h - The distribution of overall ratings is more symmetrical, with about the same number of high and low ratings, for the coordinated group. Table 1 contains the data for a comparison of the overall rating distributions in terms of symmetry. The distribution 50 for the coordinated group was more symmetrical. It had more ratings in the 2 category and fewer in the h.category than the nonpcoordinated group. However, both distributions were somewhat skewed and the differences between them were not significant. Hypothesis 4 was not confirmed. The appraisal groups did not differ on the above indexes of leniency. £13313 Effect. Hypotheses 5 and 6 were both directed toward the problem of relative halo. The experimenter felt that these hypotheses were getting at different aspects of the same factor. As indicated in the introduction a variety of measures have been used in the past to indicate the presence or absence of halo. The method used by Johnson and Vidulich (18) was the first to accurately measure and isolate invalid halo. This method required that several raters rate the same stimuli and so was not feasible in this study. The index which was used was based on the preportion of identical ratings as suggested by Bingham (A). Hypothesis 5 - The variability of item ratings is greater for the coordinated appraisals. Hypothesis 6 - The coordinated appraisals will contain a smaller proportion of identical ratings. The data are presented in Tables h and 5, respectively. Neither hypothesis was confirmed. The coordinated ratings did not differ from the non-coordinated ratings in the amount of halo present. \/ q " u 0» g > P x. . I‘ \ . k x 51 TABLE 4 ANALYSIS OF VARIANCE OF THE MEAN OF THE STANDARD DEVIATIONS OF ITEM RATINGS .u: I Source of variation Sum of Mean squares df square F Between: Methods 127 l 127 Coordinators 105 l 105 Locations 638 1 638 3.69 Interactions: Methods x Coordinators 3 1 3 Methods x Locations 2 l 2 Coordinators x Locations 10 l 10 Methods x Coordinators x Locations 232 1 232 1.3h ‘Within groups 9,683 56 173 Total 10,800 63 For df l and 3 F 7-"5 e02 F =5 e12 55 .05 A ’ .01 7 Bartlett's test: Not significant 52 TABLE 5 ANALYSIS OF VARIANCE OF THE MEAN PROPORTION OF IDENTICAL RATINGS USED BY AN APPRAISER Sum of Mean Source Of variation squares df square F Between: Methods 169 1 169 1.69 Coordinators 10 1 10 Locations 121 1 121 1.21 Interactions: Methods x Coordinators 2 l 2 Methods x Locations 11 1 11 Coordinators x Locations 9 l 9 Methods x Coordinators x Locations 5 1 5 Within groups 5609 56 100 Total 5936 63 For (if 1 and 55: F 05 =5 4e02, F =3 7e12 .Ol Bartlett's test: Not significant Coverage. Basic rules of rating state that valid ratings cannot be obtained unless there is an Opportunity for the behavior to occur and an Opportunity for the rater to Observe it. Therefore, rather than force a rater to use every item the directions for appraisal suggested that the 53 appraiser record ”DK" for "don't know" wherever he lacked information on the rates. The coordinators discouraged the use of this category by questioning directed toward making sure the rater did not know. It was felt that the non- coordinated appraisers would use this category more fre- quently as an "out" than the coordinated appraisers. Hypothesis 7 - The coordinated appraisals are more complete and hence contain fewer omitted or ”don't know" responses. Table 6 contains these data. The hypothesis was not confirmed. The non-coordinated appraisals did not contain more "DK" or omitted ratings. Comparability between Ratings. Subsequent to the appraisal Of his peeple each supervisor was asked to rank the persons in his group according to overall performance on the job. A few supervisors refused to do this ranking because their groups were either too large (30 to A0 persons) or too heterogeneous. The ranks which were Obtained were converted to standard scores from a table by Richardson, Bellows, Henry and Company (32, p. 13). This procedure Of converting ranks to standard scores is not recommended for groups Of less than five persons. Therefore, the ranks for groups of three and four ratees were omitted from this sample. Regardless of these limitations, rank scores on 65 of the coordinated group and 63 of the non-coordinated group were correlated with the mean of the item ratings and the overall .\ \ .... . \ a { c I. \ K K _ o . v a _ O, I a I. x O ,\ ._ ‘ ‘ x. _ .o. x w , x . ix , 1 J , .. . a _ w n . \ . Sh TABLE 6 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF "DON'T KNOW" 0R OMITTED RESPONSES PER APPRAISER. DATA TRANSF0RMED T0 SQUARE ROOT 0F x-+ .5 Sum of Mean Source Of variation squares df square F Between: Methods 1 1 1 Coordinators 2O 1 20 Locations 52 1 52 2.08 Interactions: Methods x Coordinators 81 l 81 3.2M Methods x Locations 5 l 5 Coordinators x Locations 2 l 2 Methods x Coordinators x Locations 38 1 38 1.52 Within groups lhlZ 56 25 Total 1611 63 For df l and 55: F.05 = h.02, F.01 = 7.12 Bartlett's test: Not significant ratings. The question Of whether this was evidence of reliability or validity was not crucial. In either case it was felt that the relationship, and hence the index, should be greater for the coordinated group. 55 Hypothesis 8 e The correlation between separately obtained rank score ratings and overall ratings is greater for the coordinated appraisals. These data are shown in Table 7. Differences between the coordinated and nonpcoordinated groups were not signifi- cant. In addition the differences were in the Opposite direction to that predicted. Hypothesis 8 was not confirmed. The correlations were all large enough to reject the hypothesis that the pOpulation correlation was zero. They were not large enough to be considered evidence for satis- factory reliability, but compare favorably with correlations -of this type frequently presented in the literature. TABLE 7 CORRELATION BETWEEN RANK SCORE RATINGS AND MEAN RATINGS AND BETWEEN RANK SCORE RATINGS AND OVERALL RATINGS W Mean ratings Overall ratings n r z' n r z' Coordinated 65 .51 .563 65 .A? .510 Non-coordinated 63 .56 .633 63 .65 .775 z z .388 z = 1.u78 Differences also in Opposite direction to that predicted. 2.05 '3 1°96 Zoe]. :3 2.58 For 60 df: .05 level rzz .250, .01 level r== .325 _ _ . _ V , l x . . x » l r . . t n . . _ _ 1.. x . t , A O — X . . . l. _ . l ,_ , . \ x. . a _ n . \ . n _ _ u M . . \ . 9 . . r\ ’ . , 1.. . a \J .. . a . ~ . l I o b. , . _ . y _ l 2 \ ._ . - . * Q Q . Q - 1 VJ _ . ... s ,u , 1 \ l . i . l — . a , xi \ I A .x v. 1 C O i u A K i ._ I 0 a i O I OI. . 56 133m1Correlations. A step beyond those factors considered in the above hypotheses was taken to explore further the possibility Of differences between the coordinated and non- coordinated groups. This consisted of calculating item- overall and item-potential correlations for the coordinated and nonpcoordinated groups individually and combined. These correlation coefficients are shown in Table 8. Inspection Of the table did not indicate any large differences between the coefficients of the coordinated and nonfcoordinated groups for items with overall judgments. In addition, neither group was consistently above the other in terms of the size Of coefficients. The same comments apply to the item-potential correlations except for one fact. The non- coordinated groups contains a significantly (at .05 level) greater number of not significant coefficients than the coordinated group. The data for this comparison are shown in Table 9. This was some evidence that the non-coordinated group of appraisers did not see the same relationship between the potential rating and these particular items as did the coordinated appraisers. The concept of potential proved difficult for many supervisors to understand. EXplanation by the coordinators no doubt helped to clarify the potential rating. It does not follow, however, that increased under— standing should result in a stronger relationship of the potential ratings to these items alone. u _ r? .. .3... .\ x. _ .L I. .1" x a J. x . . f1 . L, _\ a ,K .» ...—"fi __/ 'A e e 'l a e I O O Q O o e- ( e e 9 f: I r v " / A- . e r\ I r ‘ Q I Q U Q ‘ O Q l I/ ( ' 0 0 g /' o I- . U’ C ‘ . , I - Q 0' II V . C x I O 6 r- I f I ( a F ' I ' "\ Q '0 9 57 om. mm. 0:. mm. pm. or. .Mdapos eaomen eweeeee Seances e assesses .oa mm. am. m cm. 0:. :m. m:. .ahonpo spas Haas sno:_oe sausage .e an. Em. em. em. Em. He. ..eoe.eon one new on scene RH moment memos deESmud .m * ma. no ma. # am. mm. No. om. .oEHp do one Enos.mdwpeom .N am. 0:. pm. we. mm. 00. .bao>apoomwe ndowpsapam macs snapsOaldod onaomwm or mm. mm. om. MJ. om. m3. .xsosvmdaoo no who: am mpdofio>oaaSH mdwpmomwam .m mm. em. em. em. mm. me. .eseee seams... use mdfihaaeo dd hpaaanooaodoa .: am. pm. in. co. me. we. .coama>aoadm OprHH moons mdaxaoz dd mmoao>fipoomhm .m :m. as :H. :m. om. 6:. mm. «see any» on meow Macs.nozs mom .N mm. om. mm. :o. mo. :m. «coco macs we hpaasdo one we penz .H .oaooo .onooo mpom Idoz .oaooo npom :dOz .OAOOO EopH mmdanwh Hwapdopom spacew65n Hawao>o L41} UIIU‘ u Q m¢ZHB¢m AdHBszom 92¢ amazmzwnbh uno .mzHBH ZMMZBmm mZOHBoo .psemaaeoe psOEpHB wdapaoooo .oa pm. on om. * mm. Hm. mm. mi. .dheozoo assess HO ascends so OoSAOMEA uAOHAoasn wdaaoem .mH mm. a: EH. om. mE. am. 0:. .moapso Bod msaoasoa EH ocean .EH e we. as me. am. mm. mm. mm. .eeoaeeenpeca new measeeaoe .ma mE. eE. oE. om. mm. om. .63. use see» mac: on» no nonena hence nacho speed on upaoamm .NH .6 we. a: so. a es. en. em. em. .esoaeeasmon one media madbheono .HH 0 I. .l. l . .VI'II!‘ 1 . I '1 59 TABLE 9 COMPARISON BETWEEN COORDINATED AND NONHCOORDINATED GROUPS ON THE NUMBER OF NOT SIGNIFICANT CORRELATIONS OF ITEM RATINGS WITH POTENTIAL RATINGS W Not Significant Total significant Coordinated l 17 18 Non-coordinated 8 10 18 Total 9 27 36 I? 3 5e333* For 1 df: .05 level x2 = 3.8h1, .01 level x2 = 6.635 ”Significant beyond .05 level NO evidence was found in this study to indicate that the method of coordination of appraisals had any effect on the ratings in those respects stated in Hypotheses 1 through 8. At this point it might appear that these hypotheses were not originated on a sound basis. In order to partially Justify the position stated in the hypotheses two items Of information are presented and discussed. Taylor and Manson (36), in a study mentioned earlier, concluded that ratings individually supervised by a personnel technician were more reliable, contained less halo, and showed less skew than non—supervised ratings. In this particular study there was no control group; the comparisons were gross, with no tests of significance. Perhaps the authors reached conclusions not adequately supported by their data. >.._.._.. —_ ...-w» _— .fil E . K _ _ 1. 1. x \ .\. new. . _ I m Fl 1 a _ _ E . \ 0 _ f \ l\ _k l 5 a ‘ . . ,x x . . , l .\ A\ \\l. l w \ ak C ( . 0 _\ . \ .\ . l A _ s _ .u e x .11 A _ . _ x x _ . /\. .\ .K . _ n .V e a _ (M .i rl . x, J A i x . _ _ e x r . _ l \ a _ l . . A _ t , t O a _ , _ . .2 .\. ‘ . 1‘ p a .x . fix. . e e. , .m . l k - PIA. ..1 x V . .k . .\ .x. .l x. _ .k a _ E Q _\ I a _ ,u v. 0 \ 1 * r; L \ v~. _ . _\ \ ‘ C . _ \ ,K K x \ U ... . . x x . P ..\ . . .\ "K .u » Ox ./ . . l _\ V _ C * . r \ .\ ._U u I. x z 60 Regarding the second item, Table 10 shows the overall rating distributions Of coordinated appraisals from several Chrysler plants not included in this study. Two things were indicated in these distributions: 1) a definite central tendency, and 2) a leniency tendency (more A ratings than 2 ratings). These data were some of the first available at the outset of the appraisal program. Increased emphasis on discrimination by coordinators later improved the distri- butions considerably. It was expected, therefore, that non-coordinated appraisals would show even more central tendency and/or leniency than that noted in Table 10. Table 11 shows the distributions for both the coordinated and nonncoordinated groups with the percent of responses in each category. A visual comparison of the percentages in each category indicated less central tendency and leniency in both the distributions in Table 11 than those in Table 10. This finding had not been anticipated in the non-coordinated group (Hypotheses 1 and h). In Table 12 a comparison was made between the Dodge distribution at the bottom of Table 10 and the coordinated group from this study. A significant chi square was found. Since the groups Of this study did not differ significantly, a general improvement over the Dodge distribution was indi- cated. If this general improvement had not been found in the nonpcoordinated group, a significant chi square would have been found and Hypotheses l and A would have been confirmed. Ox 61 TABLE 10 DISTRIBUTIONS OF OVERALL RATINGS FOR FIRST LEVEL SUPERVISORS PREVIOUS TO STUDY ‘ _ _— I Plant Category 1 2 3 h 5 Total Dodge 2 15 13H 41 O 192 Mopar 0 15 58 7 0 80 L.A. O 31 65 11 O 107 Canada 0 5 123 21 1 150 Total 2 66 380 80 1 529 Percent .H 12.5 71.8 15.1 .2 100 ABD Mack 0 3 NOB 77 1 A89 Percent O .6 83.h 15.7 .3 100 Dodge 0 5 184 65 0 25h Percent O 2.0 72.h 25.6 0 100 TABLE 11 DISTRIBUTIONS OF OVERALL RATINGS FOR COORDINATED AND NON-COORDINATED GROUPS FROM PRESENT STUDY __Categ0ry l 2 3 h 5 Total Coordinated 0 13 51 29 3 96 Percent 0 13.5 53.1 30.3 3.1 100 Non-coordinated l 7 A6 39 3 96 Percent 1.0 703 H709 40.7 3.1 100 62 TABLE 12 COMPARISON OF OVERALL DISTRIBUTION OF DODGE WITH DISTRIBUTION OF COORDINATED GROUP FROM PRESENT STUDY Category Total 2 3 1+ Dodge 5 18h 65 25h Coordinated 13 51 32 96 Total 18 235 97 350 x2 =-22.633** For 2 df: .05 level x2 = 5.991, .01 level x2 = 9.210 **Significant beyond .01 level In the study by Taylor and Hastman (38) no differences were found among four groups of ratings in regard to alternate rater reliability, dispersion, halo, and leniency. The four groups involved two types of format and two types Of adminis- tration. One type of administration was individually super- vised ratings, which was relevant tO this study. These authors attribute their lack of positive findings to the acceptance of the rating system and a situation conducive to the production of desirable ratings. As in this study, the ratings of the control group were more satisfactory on the characteristics measured than was eXpected. 63 DevelOpment Aspects A second, but equally important, Objective of the appraisal program was the individual develOpment or improve~ ment Of job performance for each appraisee. The appraisal instrument served as a record Of present performance and of the plans for prOposed develOpment action. The results and discussion presented in this section were directed toward the evaluation of the appraisal in this function, and the extent to which the coordinator affected its usefulness as such an instrument. Specifically, the eXperimenter was interested in the quantity and quality Of the: 1. Supporting facts 2. Appraisal summaries 3. DevelOpment plans. The evaluation or classification of written material for a particular content or the detection of degrees of emphasis intended by the respondent to a question is called content analysis or coding. Like rating it is a somewhat subjective technique and involves the scaling or categorization Of the written material on the basis of a selected characteristic or factor. It is the same process as rating. The coder or analyst reads the material and evaluates or classifies it according to a specific factor previously established. This method of analysis served as a basic device for the evaluation of the information recorded on the appraisal form. The investigator used "A Manual for Coders" (16) as the primary source of information on the utilization of this technique. /s l \u' ‘1 \ \ . 61+ Because Of its subjective nature it is recommended practice that the coding be done by a person other than the experhmenter. .An advanced college student was selected and reimbursed as coder for this investigation. She was given a copy of the manual to study. Time was spent with the experi- menter in a discussion of the general principles involved. The coder was not aware of the eXperimental design or purpose of the experiment. The appraisal material was typed on sheets identified by code numbers known only to the experi- mentor. The scales were set up by the investigator. The experimenter then went over the scale with the coder to explain and illustrate its particular application. Following this initial practice both the coder and eXperimenter evaluated independently one-third of the appraisals. This one-third consisted Of one appraisal for each supervisor and hence covered all treatments of the design. A correlation was computed between the values assigned the appraisals by the coder and the experimenter. This served as an index Of reliability. If satisfactory agreement was evident the coder then evaluated the remaining two~thirds 0f the appraisals. In the case of more objective evaluation, such as the counting of words, counting of particular words, and recording Of the presence or absence Of data, reliability checks were not conducted. In these instances some Of the work was done entirely by the coder and some by the eXperimenter. 65 The supporting facts were evaluated on the basis of content per item. These item values were averaged to obtain an appraisal score. For the performance summaries and develOpment plans an appraisal score was Obtained directly because there was only one occurrence per appraisal. Supporting 22233. The supporting facts or evidence for the ratee's behavior on a specific item was considered the key to subsequent planning because it was the initial record of details Of performance. Hypothesis 9 covered four aspects of the supporting facts. Hypothesis 9 - The supporting facts Of the coordinated appraisal will: a) contain a greater amount of information b) be more descriptive of specific performance 0) contain.more criteria of performance d) contain more examples or illustrations of specific instances. Figure II contains the rules or definitions used to evaluate two aspects of Hypothesis 9a, the amount of infor- mation. In both instances these guides were develOped by considering a cross section Of the raw data. Both of these counts were directed toward the measurement of the amount of material present on the appraisal form. Table 13 contains the analysis summary for the word count. A very significant F ratio was found between methods. The coordinated appraisals contained more words than the 1 ,L . .. . ~ 7; l , . E , . c , \ - .. x _ ., K. . _ y , L a .V e x a o . I u .. , v . \ r0 . ., . . _ _ , , n y 66 Word Count All words counted except: Personal pronouns Articles Conjunctions Two-letter words Thought Count Major thought u Direct or fairly direct answer to the question Physically present, not only implied Usually contains adjective or adverb stating how person does it Only one Minor thought - Examples or illustrations Explanatory or qualifying statements - under what conditions, amount of time, etc. Conditional statements Supports major thought None or several Reliability: Major count, r== .77 Minor count, rt: .96 Figure II. Prescribed Guide for Word and Thought Counts V\ IK. 1(- \. .—._. I. . {'1 x \ \. \ ~ - ..__.. ‘-J \ 1 '7 . V o ‘k ‘. O - ‘ A \ \4 v . . ' ......—- 67 TABLE 13 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF WORDS PER APPRAISAL Source Of variation Sum Of Mean squares df square F Between: Methods 85,923 1 85,923 33.97ee Coordinators 7,5u8 1 7.5u8 2.98 Locations 10,177 1 10,177 H.02* Interactions: Methods x Coordinators 17,789 1 17,789 7.03% Methods x Locations 1,23h 1 1,23h Coordinators x Locations 1,269 1 1,269 Methods x Coordinators x Locations 7H7 1 7H7 Within groups 141,636 56 2,529 Total 266,323 63 For df l and 55: 13.05 = h.02, F001 = 7.12 *Significant beyond .05 level **Significant beyond .01 level Bartlett's test: Significant at .05 level . . . E a . _ l v n\ r c. . v . . . fl , 1 ex .. C F . . A . . l _, 9 I. O I _ F\ 68 nonpcoordinated. Heterogeneity Of variance between treat- ments indicated by the significant Bartlett's test may have contributed to the significant F ratio. However, the F was large enough to assume that this was not the sole cause for significance. This was the first occurrence where a trans- formation Of data did not reduce the heterogeneity. Norton‘s studies are referred to in support for the acceptance of such situations. Table 1A shows the data for the major thought count. NO significant differences were found. As the definition indicates, the major thought was any answer to the item. These were identified and counted primarily to allow more accurate isolation and counting of the minor thoughts. The latter were considered a better index of the amount of information and, as defined, the kind of information considered good evidence of performance. Because the major thoughts were such elementary responses it was understandable that no differences were indicated. The data for the minor thought count are shown in Table 15. A significantly greater number of minor thoughts were found in the coordinated appraisals. In addition, the appraisals done by coordinator B contained more minor thoughts than those done by coordinator A. As defined by word and minor thought counts, the coordi- nated appraisals contained a greater amount of information. Hypothesis 9a was supported. 1- .\L L. _ . l \J c A a . i .u . i. J .I. .x 4 .\ b I _ I \ .u _ .\ IO 0 . ..u . ... .1 .\ J . ,C ,1. O\ 1L .w A E it \/ \./ 69 TABLE 1H ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF MAJOR THOUGHTS PER APPRAISAL Sum of Mean Source Of variation squares df square F Between: Methods 877 1 877 1.51 Coordinators 206 l 206 ‘ Locations 819 1 819 1.u1 Interactions: Methods x Coordinators 1,114 l 1114 1.92 ! Methods x Locations 21 l 21 Coordinators x Locations 1H7 1 1h? Methods x Coordinators x Locations 1,066 l 1066 1.8h Within groups 32,h09 56 579 Total 36.659 63 Bartlett's test: Significant beyond .01 level . a _ a p . . A . . . . V _ . 1 r .r, n. I . a. — I l e a — E r . , — u i _ O t i U _ — _ . 70 TABLE 15 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF MINOR THOUGHTS PER APPRAISAL .—_. Sum of Mean Source of variation squares df square F Between: Methods 504,278 1 504,278 159.38** Coordinators 37.977 1 37,977 12.00%* Locations 8,441 1 8,441 2.67 Interactions: Methods x Coordinators 50,120 1 50,120 15.84es Methods x Locations 2,244 1 2,244 Coordinators x Locations 29 l 29 Methods x Coordinators x Locations 251 1 251 Within groups 177,175 56 3,164 Total 780,515 63 For df l and 55: F.05'z 4.12, F“01 :=7.12 **Significant beyond .01 level Bartlett's test: Not Significant Figure III contains a description of Scale I and an index of its reliability. In Table 16 the analysis summary is shown. The F ratio between methods was large enough to be considered very significant even though Bartlett's test was also significant. The item responses for the coordinated , I ,\ r C I {8 Q ( " ET I " t k r. r' F r l\ r I . . v r T k \ ~ . ‘ I ‘ . . R —‘ - ‘ > r V ) x. I ;, _ x, x. ‘ ‘7 . ' I \ L . I ' ‘ V t \ M 71 Scale I How complete or explicit is the answer? ”What has he done (any evidence or explanation)?" Categories: 1. Not stated 2. Vague a incomplete 3. Understandable - somewhat complete 4. Clear - complete Reliability: r = .91 Figure III. Scale 1: Definition and Categories groups were scaled higher in regard to completeness or explicitness. Hypothesis 9b was confirmed. Figure IV contains illustrations Of criteria or dimensions. These illustrations indicate what was considered in this respect. The criteria were selected through examination Of several appraisals. A summary of the analysis is shown in Table 17. Significant F ratios were found for the three main effects and one interaction. There were more criteria present in the coordinated than non-coordinated appraisals, more in the appraisals of coordinator B than A. The interaction meant that one of the coordinators, B, made more use of coordination than the other. Actually neither coordinator had much effect on the non-coordinated appraisals in respect I' A \j I - . \J \7, . ‘ . \. . — b - k‘ \ ' a 1 ‘ . \. r. \1\_/ " I . ' I l u - ‘ »‘> r . _ . . I k ‘ Q ,. H \ ., . v C . . » _ IV. I 7 ' '\ "' -- “‘4 - U -‘L‘ i x. '- .E‘. Q -_._-‘-.--_-_-..a...- 1 n \ ‘n \ l i \ ' - ' ‘ _ 3 ‘ . ' I . ‘ U _ -l E. \ u \ It . . , . . \ i I v _ _ ’ J \‘ ~ - I ‘ K x ‘ ‘ L ' t . ’ —‘ .- ' ‘ _, t \ .I' ' \' \ ' x L. , J \ ‘\ _ - ~— I A \ ( ’ \ ' ‘ , V\ . ‘ ‘ x. . . -.. L.‘ . . v , . . I ‘ - Q J i _ , \J ‘ k k . I . . . . - . ‘ \l . . ' ‘ - ‘ k k g _ P . ‘ .‘ n \ . - ‘ _ "‘ x. \_ \ \ \ \. d \ i. \ I 72 TABLE 16 ANALYSIS OF VARIANCE OF SCALE I MEAN VALUES. HOW COMPLETE OR EXPLICIT IS THE ANSWER? Source of variation Sum 0f Mean squares df square F Between: Methods 506 1 506 25.95%* Coordinators 1h 1 14 Locations 1h 1 1h Interactions: Methods x Coordinators 39 l 39 2.00 Methods x Locations 39 1 39 2.00 Coordinators x Locations .3 1 .3 Methods x Coordinators x Locations .7 1 .7 Within groups 1091 56 19.5 Total l70u 63 df d : F = .02 F = . For 1 an 55 .05 h , .01 7 12 **Significant beyond .01 level Bartlett's test: Significant beyond .01 level to this scale. Therefore, a significant F for this inter— action only substantiates the difference between coordinators noted in the main effect. Hypothesis 9c was confirmed. Figure V shows Scale II. The category definitions are self—explanatory and indicate rather clearly what was r‘ . .\ , k . x . p I O i «w I . Pr a . » I _\ G I u r. g f ' - _ V . . l k V. ; C I I .\ \ p |\ C \¢ . \y s L i I I F K O I x s 73 Item. Criteria er Dimensions 1 n‘What is the quality Errors - scrap ~ rework of work done? Correction a clear - concise Tolerance - standards 8 - Assuming added duties Emergency in order to get the Vacation time job done. Out of classification Repairs equipment Trains others 9 - Ability to work well COdworkers with others. Supervisor Other departments Vendors ll - Observing rules and Absence - tardiness regulations. Safety - relief time Accidents - care of tools Reliability: rt: .92 Figure IV. Examples of Criteria or Dimensions of Performance Scale II Kind of information or evidence - type of material. Categories: 1. Only adjectives or adverbs - restatement of item a verbal expression of scale category - personality characteristics or traits 2. Restatement plus qualification - enlargement or explanation in general terms 3. Example by class of work - reference to total skill area - performance on the job h. Example or illustration by reference to a specific Job or occasion. A for-instance. Reliability: r = .91; Figure V. Scale II: Definition and Categories L_) U I \I‘ k - -‘ ' .' U . _ \ -v~ . ‘ O O \ v 3‘ r‘. ‘ _. u‘ --- i g - v- - c v. - c _ . - — - s -- a a. - w.- - - .. . d- - .- ...-- ~- D ' ‘ 1; " Z “,I ‘ ‘ - \J J. g . ~ I .. .- 4 :\ f s i . ¢ I , Ix "" r \ \I L .J . «‘::.. _ - ' C .. - I g, ‘. "" w _ --s .1 k) .' . ~._ - . . t ' , ; ' \ . v, x v ‘ . . '- ‘ 0 _ — .‘ . v ‘ ‘ e |.. \4. \ k \» I.) \f \ ‘~ &» . J ... < 0 ‘ . \- V_- _ A. f) k... \ Q y' C » V ' f .. n ~— .. a I.‘ - \ k. | 1 \, . .. V I . . ' ‘ ’- I a _ K AK" A _ r’ . v“ 1.. . I s ' ’_' ’ ' _, v, "' . _ _ x. C x g s _ _ h I A ‘V _’ v t l ‘- - ..O _ . . _ x. \ | _ -' O V ' C 'l u'! . ‘, s A ' a . \J' x i u \ (“A \ “ (— >7 ‘ . ‘ . .. _ . \ '- \ \_ ' - .'_ ‘ ~ I... II . 4 .7 k _ - l "' K a -.\ _‘ x, I} U " \ ‘ .. - ...... ...-my”- .. ...—......- ...... l I C ‘ - - ~ I - . ‘ K - .. .. ...--v.---... -_.-_-...-.--..-_.. .-..-i—h-.-._........-_._-.—-._....—...--- . I' . c ‘ . r t l _, » O _ . y 4 - o \x ij, \ - r r‘ \J_, y.) .... ”-..... ...-a— F - I ,\ , A . ,' g, ., ._ , \.'L J \ V .... v» ~ A 1 \ _ r ‘ _ a \ ~ . V” (O I k, \ I l - I - ‘ . ( ‘. ‘ I -' A r __\.) L. x x, g u "" ,U x. ‘ _-u \I - . __ . i . . 4 - h. f c 1 ( . L - ~. L1 \4\. \z x. I \J C _ \ _ s- I ' L ' r " ' ' I . { e K, _ _ J \' \ .- \- \ ' _ .k) . ‘ , . _ ,7 i I x x. , v "' _,x i i- - ,- _ L." \ 'a a o ‘ e I , - , . x x K.‘ T‘ _ .. . - - I. i. x 3 ' )v -1 - - ‘ ~ ' n; ) _ \ t - l k » A . . ‘ \V/ x, \D . . "‘ . ' , . r . -. , - ‘ < - ,‘r . Jx ‘ 11L *«v s \. I _ 1 I _ . .1 \. - u . . ' : P . A ‘ ~- U ‘ \. . ~ 0' ‘ \ x. q 1 ‘7 b *--~-, m u‘ — - Wm. M-n.-‘ ' v- I -— o “ ‘ ‘ . s _ . - o \p _ - _ \V . -. - ...—.m H‘H ...-.- ....- nu. - .. —. - ... - - ...-» .— .....- .. - .. - > — won- _ - — - - - - - a. - — ...;— .— ML . _ . . . _ r‘, - . — . t .- .- O . \ . K - - \—‘ n \— - ‘ ° \ - _-n-— 7h TABLE 17 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF DIMENSIONS OR CRITERIA PER APPRAISAL Source of variation Sum 0f Mean squares df square F Between: Methods 120,235 1 120,235 169.35*% Coordinators 8,977 1 8,977 12.6h*% Locations 5,814 l 5,8lh 8.l9*% Interactions: Methods x Coordinators 11,583 1 11,583 16.31** Methods x Locations #73 1 473 Coordinators x Locations 127 1 127 Methods x Coordinators x Locations 61 l 61 Within groups 38,766 56 710 Total 187,036 63 For df 1 and 55: F.05:= 4.02, F.01 : 7.12 **31gn1ricant beyond .01 level Bartlett's test: Not significant considered in each. The analysis summary is shown in Table 18. The F ratio between methods was large enough to be considered significant even with a significant Bartlett's test. The coordinated appraisals were found to contain more examples by \ . , . . ‘ v \ \l . .\ i . .\ V I l o e r n P. a! fi . ,. ..\ . .-. xx \ ..- . ~ _ . L . I _ .L . .. p ' ‘ P ~..\ AK 6 A .K I V / _\ O I u L _ .L _ .V . t t { l. l . x a 75 TABLE 18 ANALYSIS OF VARIANCE OF SCALE II MEAN VALUES. KIND OF INFORMATION OR EVIDENCE Source of variation Sum Of Mean squares df square F Between: Methods #73 1 #73 26.28%* Coordinators 6 1 6 Locations 39 1 39 2.17 Interactions: Methods x Coordinators 18 1 18 1.00 Methods x Locations 100 1 100 5.56% Coordinators x Locations 1 1 1 Methods x Coordinators X Locations 1 l 1 Within groups 1006 56 18 Total 1644 63 df 1 d : F s '.02 F g . 2 F°r an 55 .05 4 ’ .01 7 1 *Significant beyond .05 level **Significant beyond .01 level Bartlett's test: Significant beyond .01 level 76 class of work and examples of specific jobs or occasions than the nonpcoordinated appraisals. The F ratio for the interaction methods by locations may have been a function of heterogeneity of variance as it was only slightly beyond the .05 level of significance. Hypothesis 9d was confirmed. The statements in Hypothesis 9a through 9d and the indexes used to check these were established by the experi- menter. They were develOped from the stated objectives of the appraisal program and the rather global criteria for good supporting facts suggested by persons experienced in appraisal and responsible for the Operation of the program and training of new coordinators. The statements or hypotheses may not have represented completely independent factors. No index of their independence was established. They were directed toward what seemed to be different aspects of the quantity and quality of supporting facts. It was not con- sidered as important that each scale measure a specific factor as it was that quantity and quality be investigated thoroughly. An attempt was made to confirm the fact that these were important factors. This was done by an expert evaluation of the appraisals. Figure VI shows the scale and directions used for an expert evaluation of the supporting facts. Two experienced coordinators, not involved in the experiment, were asked to sort the entire sample of appraisals using these instructions. _ _—a"""— 77 Expert Sort of Item Content Directions: 'What is the amount of information given about a person's performance? How much does it tell you about what and how the person does? ‘ Consider both: Quality or specificity Quantity or coverage Sort into five stacks then clip the category card on the front of each group and return. Consider only the supporting facts in Items 1 - 18. Categories: 1. Very poor very unsatisfactory - least 2. Poor _ less than.eXpected 3. Fair — average - good as can be expected A. Good - more than eXpected 5, Very good - very satisfactory - most Reliability: r== .72, n = 192 Figure VI. Directions and Categories for Expert Evaluation of Item Content They did not evaluate each item but the complete group of Items 1 through 18. The correlation between these two sorts was taken as an index of reliability and is shown in Figure VI. An analysis of variance was conducted using the mean score for the two experts. This summary is shown in Table 19. A significant Bartlett's test was found indicating ._ L g, . ...—...... ... W...” , . \ v \f . . A i ‘ ' \ _ .... \J . \ V I l ‘ ‘ N ‘ _, » \ . \7 . \ i L \ \ I . ..— K 4 \. ... rt \ .. n—h _ J ‘ ,. - I - K ' I l \ 7- x, & —~ \ - ,— ,- - ... , .._.___ ‘ r ‘. \ v, ~-—— ~— TABLE 19 ANALYSIS OF VARIANCE OF MEAN SCORES FROM EXPERT EVALUATION, ON AMOUNT AND SPECIFICITY OF INFORMATION 78 Sum of Mean Source of variation squares df square F Between: Methods 52,041 1 52,041 29.47** Coordinators 18,666 1 18,666 10.57** Locations 7,119 A, 1 7,119 3.11 Interactions: Methods x Coordinators 31,907 1 31,907 18.07** Methods x Locations 1,000 1 1,000 Coordinators x Locations 900 1 900 Methods x Coordinators x Locations 1,515 1 1,515 Within groups 98,875 56 1,766 Total 212,023 63 For df l and 55: F005 % 4.02, F001 S 7.12 **Significant beyond .01 level Bartlett's test: Significant beyond .01 level 79 heterogeneity of variance. The F ratio between methods was large enough to assume significance in light of the Bartlett's test. The coordinated appraisals were rated as highest on both quantity and quality of information given in the supporting facts. The F ratio found between coordinators was not much beyond the .01 level of significance. It showed that the appraisals done by coordinator B were evaluated higher than those by coordinator A. The F ratio for methods by coordinators confirms this difference between coordinators. This finding was considered as a confirmation of the counts and Scales I and II and consequently support for all sections of Hypothesis 9. An effort was made to determine whether the increased quantity and quality of supporting facts for the coordinated appraisals actually represented better evidence for the ratings given by the appraisers. Four persons were asked to read the supporting facts and assign an overall rating of performance to the rates based on this information. Corre- lations were computed between the actual and estimated ratings. The entire group of appraisals was done by the coder and a naive person. Two eXperienced coordinators each did one-third of the appraisals. The coefficients were computed for each onenthird of the appraisals and broken down by coordinated versus non-coordinated. The coefficients and their z' score equivalents are given in Table 20. The difference between these two groups of coefficients was C , .x, _ ,. n.-. C a O a .s (l, u .\ e , , w, x, r\ . ‘ x x t r\ I . . \ IK . 'll qu . k V ,x u, .. . a . ~ ‘ \ W _ i , .k l . k . _\ .1 O b ,\ t _ :1 . x , _\ w s T v , ,1; \ l. s . . ‘ \ \ . \ \ , (x. _\ M t i ~ I I a .L s rv \ v. .k .\ v . l A x V\ y , _\ .. k _, 1L L . .u a \ v , s . _\ L t a y . n. I. \V ,k _k , _ (V . ) c \ a x. ,\ w t A v {a _ x. fix \ . , .\ k . I e a .I .x. .L . _V . y ..K ..V a b _ u C _ .. _ m . a » I1 . , r\, l x . O K . . AK , . . , . . . ~ _ x. \. ( 1. ~ A _ k .. .U KW .0 i . A _ /A\ . . .L my > \. 1r, .1 ‘\. mg I! .K . kl _ , .\ . .k, \l ”1:”. (V r: , i if. m -_ «x a ..k ‘ , «; 80 TABLE 20 CORRELATION BETWEEN OVERALL RATINGS OF THE APPRAISERS AND RATINGS ESTIMATED FROM THE ITEM CONTENT Coordinated Non-coordinated Person n r z' n r z' A 32 .73 .929 32 .63 .741 A 32 .83 1.138 32 .73 .929 A 32 .74 .950 32 .87 1.333 B 32 .81 1.127 32 .56 .633 C 32 .86 1.293 32 .35 .365 D 32 .68 .829 32 .58 .662 D 32 .76 .996 32 .66 .793 D 32 .63 .741 32 .52 .576 m a 1.007 m = .754 :iff 2.279% 2.05 = 1.96, 2.01 z 2.58 For 30 df: .05 level r z .349. .01 level rrz .449 L t a f .- P h a D h . 9. I i ... 3 I .r T l h t .e a . e . U a I I l .v i .1 1 V ... ', . 81 significant beyond the .05 level. This indicates that some additional information was found in the content of the coordinated appraisals which would allow a person other than the appraiser to duplicate his rating. Additional exploratory analyses were conducted on the item content not directly related to the stated hypotheses. Figure VII shows a scale of general tone or affect. The summary data are given in Table 21. No significant F ratios were found. It was suggested that this result may have been distorted by the use of mean values and scores per appraisal. In order to check on this an analysis of variance was con- ducted of the number of 1 and 2 values and of the number of Scale III General tone or affect (feeling). Categories: 1. Very negative - entire negative 2. More negative than positive 3. Neutral - fifty-fifty 4. More positive than negative 5. Very positive - entirely positive Reliability: r = .96 Figure VII. Scale III: Definition and Categories l r. .. .c _ _. y _ fl .1 ~ 4 . \f O F. . , . O C N 1 4 u . . \ ,K ‘7 .k . 1d . . .1 . r1 _ a _ V 7 s . . . . . 1 . A, _ Q . .ll , . \J \ i _ _ _ t O k 4 a _ l . . C I ._ 1 L l C , t. . w _, . o _ . V _ 1 my, 4 v v r; a I _ . , I, _ _ _ _ I \ r1 _ , \ 6, a K I _ . a, ..V 1 a F n . . 1 . .x , l l _ e 1 x ,e a n .x _ l , , x J ,L _ Q l .U . _7 ..\ t t O .1 . .6 ‘ rm . _ ,N C i _. v 82 TABLE 21 ANALYSIS OF VARIANCE OF SCALE III MEAN VALUES. TONE 0R AFFECT Sum of Mean Source of variation squares df square F Between: Methods 30 1 30 1.35 Coordinators 7 1 7 Locations 12 1 12 Interactions: Methods x Coordinators .6 1 .6 Methods x Locations 42.0 1 42.0 1.89 Coordinators x Locations .6 l .6 Methods x Coordinators x Locations 28.8 1 28.8 1.30 Within groups 1243 56 22.2 Total 1364 63 For df l and 55: F.05 = 14.002, F001 =7 7e12 Bartlett's test: Not significant 4 and 5 values for Scale III per appraisal. These data are shown in Tables 22 and 23, respectively. No significant F ratios were found. In regard to negative and positive responses to the items no differences were found between group 3 o . _ _ . . o . 4 , . \ A C C . . . I .. . ., . . . . 1 _ l .. , [ V i _ .1 l / e l . _a a . I l E a i a V . _ l . l f , . , . . . . _ , x _ v _ . : _ , . f . y 1 . I I O l . 1 , _ , , u _ , .. .p . t .x p . I ‘ I t a \ 7 83 TABLE 22 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF 1 AND 2 VALUES FOR SCALE III Source of variation Egflaggs df 3832?. F Between: Methods 1,038 1 1,038 2.50 Coordinators 1 l 1 Locations 86 1 86 Interactions: Methods x Coordinators 16 1 16 Methods x Locations 1,296 1 1,296 3.12 Coordinators x Locations 11 1 11 Methods x Coordinators x Locations 210 l 210 Within groups 23,221 56 415 Total 25.879 63 For df l and 55: F.05 ‘34 14.002, F.01 :1 7e12 Bartlett's test: Not significant TABLE 23 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF 4 AND 5 VALUES FOR SCALE III Sum of Mean Source of variation squares df square Between: Methods 2 l 2 Coordinators 85 l 85 Locations 663 1 663 Interactions: Methods x Coordinators 110 l 110 Methods x Locations 256 1 256 Coordinators x Locations 28 l 28 Methods x Coordinators x Locations 553 l 553 Within groups 41,456 56 740 Total 43.153 63 For df 1 and : F = 1.12 F = .12 55 .05 4 3 .01 7 Bartlett's test: Not significant 85 Figure VIII shows a list of typical words used to modify statements about performance. Examination of the appraisals gave rise to the question as to whether the coordinated appraisals differed from the non-coordinated in the use of such words. The analysis summary for the so-called dogmatic and qualifying words is shown in Tables 24 and 25, respectively. No significant F ratios were found in the first case, indicating no difference in the use of the number of dogmatic words. A significant F ratio between methods was found in regard to qualifying words. ‘Words of this sort were used more frequently in the coordinated appraisals than in the non-coordinated appraisals. This indicated that the appraiser was perhaps more cautious in his statements to a coordinator than when he recorded them himself. Classes of Modifying'Words Dogmatic Qualifying Never Occasionally None Sometimes Always Usually Constantly Frequently Figure VIII. Examples of So-called Dogmatic and Qualifying Words A significant Bartlett's test, which indicated hetero- geneity of variance between treatments lowered the precision of the F test. In those cases where this was true a much \4,'.\) . _ 1.4V\ 1 I 1 ‘ l ._ \, ‘. ' l . k v A.-. ——.- . .— ...... - ~ . . .. , . 7 .. .. ._‘ ‘. ... ~' .‘ . .1 ' k > ‘ x L \. __ 1 ~ ‘ ‘ f" 1‘ .' C . _ . \l a .L \. r: r: r ! 86 TABLE 24 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF EXTREME OR.DOGMATIC WORDS. DATA TRANSFORMED TO SQUARE ROOT OF x + .5 r fi Sum of Mean Source of variation squares df square F Between: Methods 5,166 1 5,166 2.08 Coordinators 1,754 1 1,754 Locations 58 1 58 Interactions: Methods x Coordinators 5,184 1 5,184 2.09 Methods x Locations 1,360 1 1,360 Coordinators x Locations 3.235 1 3,235 1.30 Methods x Coordinators x Locations 1,704 1 1,704 Within groups 139,041 56 2,483 Total 157.502 63 For df l and 55: F005 : 4.02, F.01 = 7.12 Bartlett's test: Not significant 87 TABLE 25 ANALYSIS OF VARIANCE OF THE MEAN NUMBER OF QUALIFYING-WORDS. DATA TRANSFORMED T0 SQUARE ROOT 0F x + .5 Source of variation ggflaggs df 3233?. F Between: Methods 37,153 1 37,153 20.47ee Coordinators 218 l 218 Locations 2,998 1 2,998 1.65 Interactions: Methods x Coordinators 5,184 1 5,184 2.86 Methods x Locations 225 1 225 Coordinators x Locations 132 l 132 Methods x Coordinators x Locations 161 1 161 Within groups 101,645 56 1,815 Total 147,716 63 For df l and 55: F.05 : 4.12, F.01 = 7.12 **Significant beyond .01 level Bartlett's test: Not significant 88 greater F ratio was needed to assume the existence of significant differences. This was no problem with these data and particularly in regard to the main effect of methods which was of primary interest. On the other hand the Bartlett's test did give useful information regarding the differences between the coordinated and non-coordinated .groups of appraisals. In every case where heterogeneity was indicated this was a function of the reduction in variance found in the ceordinated groups. Coordination reduced the variation between appraisers for the various treatment groups on these particular scores. This means that for the ceordi~ nated groups the appraisals were more uniform or alike than the non-coordinated. This was true for the following factors: 1. Number of words 2. Number of major thoughts 3. Scale I _ completeness or explicitness 4. Scale II n kind of evidence 5. Expert evaluation. Coordination then assures more consistency between appraisers on these aspects of quantity and quality of the supporting facts. Performance Summary. The performance summary, Part II, of the appraisal contained three sections: (1) strengths and abilities, (2) development needs, and (3) methods of deve10p~ ment. The thorough and detailed use of this part of the appraisal form was considered essential to any subsequent \I)\/ . \_I .11 C Si C , 143 Al ,1‘1 \17 J' :5: I . _T. g. k,- .. b \— . xv’ ‘4 1 V--U U] VJ. 7 ‘ (\- ‘ ‘ -. - ‘7' \ \' __-' v v ...K. \J . - \4 x . 1 k A.‘ \ V 1 . . .1 ( . ._ ‘ I i ' ...-. ' \ l‘ r ‘ ’ v.1: ‘1 -. . _, . .1 p t” VU1 \4‘ ' - ... v.3. .. .. l .\ ' ' t—‘ , - \- .. : ,5. - . _ u t .. 1 s1 . -1 v -_\. w... .1- ~ - , \.1 n J . -4 e; ' '- - ‘ . ' t L 6 r "\ ‘ 5. v‘ . 4‘ \J K. K . K. x 3 k. l > _. _ \ I \ A I. _.\_, _ e \ ’ " _ " " ‘ r I p \ - V _. . -_ x \. w \. v- \.« — f“ 1 1 . - t L ' U o I I o 3 J '- — \4 ~ 1 ‘ ~/ V \ \1 \ \ ‘ N.— N.) v .\ .- -7 — _ . _ - a ' 1 ‘1 1 \JL 1 1 ‘ 1 \1 \u 'v Q _ x- k.) .. .- r _‘_ .— a __ ' ‘- 5‘ _ _ u I v \ _ x \1 .\ v .1. -1 . . ( - \‘ \4 ‘ . _ s. f» \ 3-- v , r. _ - . . .v - .- ,~ , -1 x. »- b . 1 ‘ 1 - 1 x, Q \J x ...x .1 I 1 - ‘ - . 7 - ' .‘f . - v -V _ ‘\A ' \./ _ u ‘ \I -. y .1 .‘ - 1 ( - _' . _ ‘k. \1 L ‘ _ _ 1 5 u. 1, O J‘ ,_ _ \1 11. \3 11 x . \, ._C ' r C) n I ‘ ' I ‘ a ' \ n . _ \r ‘ ~— __ ‘y \ — a‘v \./\. . . ‘ ..3 s . ' ' ‘ ;., ' , 1'___ \. \_. ...") 1 \_1 \ 11 g g _ 1 Ox. _. ‘ - v...\) s ’\ \ . I 1‘1 k1 \_/ 1. w \ I“ — - ‘ \ I ' . IIL, ( K. 0.. VI V . o g . . ’x ’ I" ' ’ . \ ‘ V . f‘ . 1 x. \z _ \ . J: v 1 \I _. \_, ~ L \ 1 g "' . \— 1 \) . K ' e 4’“ ~ ‘4 1 \ k) 1' .1 H \1 _ \1’ '-| . . K 1— \ v — . -‘ l c C I I '- I - t u 7. V . U \.) s/K, _ \J \I ‘ \ \1 - . 0V . _ ‘ 1 , 1 - . N \3 _ \ _ \ 1 x O - \- \ L . - \- \4 . , _\ . . .— A \1 \J ) _. \J m . y «1 K/ - \J .. - H ._.-—.— ..- 4. -. ._ _. - — ”...—s { \ t ‘- q I ' ' I U s ‘ - f V ‘ -\ r \+/ . ¥V\ ~1 v- «\ 1- x K v -1 - -/ k U ‘V - , 7 I x l V _ . 7p . \ \ ) Q - \1 x 1, L, ,_ a . .1. \ I ‘ \1 . \ 1. i __ 1 I 1 . - . -,- . t ‘ _' _ ._ 1 ‘1 x1 _ _ \J \ ‘ . \11 \1 \ O \- V A \ a I .‘ K “. ‘ r , )b 1 \ u x - v v _ -- 89 action. Preliminary observation of the completed forms showed that some appraisers had failed to fill in this part of the appraisal. For this part of the form satisfactory completion appeared to be a contribution of the coordinator. Hypothesis 10 n The nonpcoordinated group of appraisals will contain a greater number of appraisals with sections 1, 2, and 3 completely omitted. Table 26 shows a comparison between the two groups. The three chi square values are all significant beyond the .01 level. Only 14 non-coordinated appraisals lacked a summary of strengths and abilities while 25 did not contain any develOpment needs. This difference may have indicated a reluctance on the part of the supervisor to record what he considered unfavorable or a lack of understanding as to what he should record. He may have considered the employee entirely satisfactory and not in need of improvement. Thirty-two non-coordinated appraisals lacked develOp- ment plans. The written instructions for appraisal as well as the introductory oral instructions emphasized the completion of Part II of the appraisal form. In both cases definite suggestions were made regarding the type of material expected in the section on methods of development. It is unlikely that strengths, needs, or plans would be communicated to the ratee in a feedback interview unless these appeared on the appraisal form. Likewise, follow-up with the supervisor on the course of development action would be impossible if no development plans were suggested. '\1/\_, k..' v K. \/ j . -. ' 'x \a k _ {x _I .'\,‘ . r“ . , x x \1‘ 1, ~/‘ \1 1’) a . \1 ‘ k . 7 V v '\ x. \ \ p- . 1 ' r‘ 4 . .44 .v .1 y \4 x , e : I x h L. v r e . r ,. - x (J . 7 j - ..'_\ e 3 1 ' ’ " '_ - J \' 'La - '\.o ‘ ’ .¢ '1 ": _ \_ K A. , - ,1 p ...» . 7 _ ' , .. f . 1 x 1, \ ...k: v-.. -4 ‘ \ -_ .1 . . . k _ _ U 1 x; ... x 1 _ 1‘ . .11 v . -1 k. v \ , V . fl «\ -' . ' r k k x ’ \. \u v a v 1 - _ r t - - \a ‘J . \ - x. A _ \ —- f‘ - .. — ' -_ v 1 \— ‘ \J . 11 . ‘ . i ‘ ‘ _ .1 \ L - . Q __ x . \ '. f" . ' ‘ 1 .11 1 a J x L - \1 \ . \ ‘ _ v 1 \, . .. K n - ' ( s - O ' \l . K V \ . \ v V“ - r ‘ f .. 1. _ 3 Q .1 - ‘ k. — \./ k b h ~ \. 0V ‘ 1 ‘ \J . \ ,. V (x K ‘ ‘. ‘ . .1. D I K V 90 TABLE 26 COMPARISON OF COORDINATED AND NON-COORDINATED GROUPS FOR RESPONSES TO SECTIONS l, 2, AND 3 OF PART II With Without Total reaponse reSponse Section 1 Coordinated 94 2 96 Uncoordinated 82 14 96 Total 176 16 192 x2 = 8.035%* Section 2 Coordinated 89 7 96 Uncoordinated 71 25 96 Total 160 32 192 x2 = 10.837ss Section 3 Coordinated 94 2 96 Uncoordinated 64 32 96 Total 158 34 192 x2 = 30.057%* For 1 df: .05 level X?- x 3.841; .01 level x2 = 6.635 **Significant beyond .01 level 91 The following analyses concerning the number of strengths, needs and methods of development exclude the appraisals where no material was recorded. This was done to eliminate the bias caused by the appraisals where the sections were omitted. Hypothesis ll - The coordinated appraisals will contain a larger number of performance strengths (Section 1). Table 27 contains these data. Hypothesis 11 was con- firmed. The F ratios for locations and methods by locations were barely significant at the .05 level. Because the Bartlett's test was also significant at the .05 level these were not considered large enough for significance. Hypothesis 12 n The coordinated appraisals will contain a larger number of development needs (Section 2). Table 28 shows the analysis summary for these data. A significant F ratio was found between methods. The coordi- nated appraisals contained more development needs; hence Hypothesis 12 was confirmed. The appraisal instructions suggested the supporting facts from the items be used directly in the summary to indicate strengths and needs. However, this procedure was only followed in a few cases by the non-coordinated appraisers. In most instances completely new or much briefer statements were made in these sections. In order to eliminate credit given for merely following instructions these short summaries ( a fitv . \ I a . x —I \1 \ -‘1 (u "“ \1/ , .l \ '\ \ _: O . .— ‘ - .. A V \ ‘ I \1.‘ .1— r\ ’ _ ,7 g, \-' L __ I A A» u '5‘ .— 0 V I (Q \ '7 cl \. ....g v .‘ \..' \J \, \1 \ \ ‘ __u L. L. k 06 a r k —‘ I \1 . 1 - 4 U ‘ \J . ‘ \1 I, \ - \- 2. K1 .4 \. L, g . .1 g k 1‘ 1 1 ‘ -L I ~ "J - ‘ ‘. \— .-V.J-. .1 , . l _, _‘ .[V “ l \. \J \J UV \. L 1" ’ _ \V s \v p. . j V":\ i \2 C 7' ‘ ' . ..L \— _ f" \ x A x1 . . A I .,,. ,. ’ \ ‘ \. V 1 .\ \J \Jl- ‘4, x \ \ - K. ., KI .. \r' .- r') ( 7 7&- '.‘ .- k. . .1 _[ \J a g ‘ k .. _[ "\ .A \ t F p 5 ‘ fl \1 . 1 V _| u L v- i 3, ‘- AUL \. L — \ K s} \ ‘ 1 b r \/-. . A .bx \1 L, .' C‘ y \ 11~./ | .. 1 \_/ \4 1 \J .. ..JN 1. O 1 .. \—' -.K. 1 - ,1 \ - .. \a’ 1' 1 \. \. . \- -A a, w .5 l ‘5 4 \. . 92 TABLE 27 ANALYSIS OF VARIANCE OF THE NUMBER OF PERFORMANCE STRENGTHS, SECTION 1. DATA TRANSFORMED TO SQUARE ROOT 0F x + .5 Source of variation Sum °f Mean squares df square F Between: Methods 11,575 1 11,575 15.584a Coordinators 72 1 72 Locations 3,039 1 3,039 4.09% Interactions: Methods x Coordinators 255 l 255 Methods x Locations 3,069 1 3,069 h.13* Coordinators x Locations 38 1 38 Methods x Coordinators x Locations 255 1 255 Within groups u0,110 54 743 Total 58,u13 61 FOP df l and 55: F S 3 “.002, F :7- 7012 .0 *Significant beyond .05 level **Significant beyond .01 level .01 Bartlett's test: Significant at .05 level 93 TABLE 28 ANALYSIS OF VARIANCE OF THE NUMBER OF DEVELOPMENT NEEDS, SECTION 2. DATA TRANSFORMED TO SQUARE ROOT OF x + .5 Source of variation Sum 0f Mean squares df square F Between: Methods 21,894 1 21,894 55.15** Coordinators 0 l 0 Locations 341 l 341 Interactions: Methods x Coordinators 239 l 239 Methods x Locations 281 l 281 Coordinators x Locations 179 l 179 Methods x Coordinators x Locations 248 1 248 ‘Within groups 20,634 52 397 Total 43,816 59 : 5' F = o For df l and 50 F.05 4.03, .01 7 17 **Significant beyond .01 level Bartlett's test: Significant at .05 level 9h were considered as satisfactory regardless of the adequacy of explanation. Only the thought had to be present. A word count was not made for the same reason. There was no intention to make comparisons based on verbage alone. A classification of development needs was made using the items and other recurring statements as reference points. For instance, a problem of getting along with others was classed as an Item 9 need even though the item number was not Specified. Two additional categories immediately became evident. These were experience and education. Needs which recurred infrequently were attitude, leadership, confidence, responsibility, etc. The latter were all put in the "other" category which was not large enough to break down further. Table 29 shows the number of needs per category. For the coordinated appraisals items, 9 n Ability to work well with others 11 - Observing rules and regulations 12 9 Efforts to learn about other phases of the work than his own, were referred to most frequently. A slight difference was noted between plant and office with the latter emphasizing 17 - Ability to express himself orally 18 - Ability to express himself in writing. In the non-coordinated group not enough references were made to the items to indicate a need trend. I (,4 \/ ~-\.« ( \4 \ A .‘J A n . t ..u fl. . m m.- .J L ,. ..k, .v \ \J _\ .lk _ _k ll; - .L x l_\ ,v ,2. \_ _ .6 x (u \/ _ \ '\. x. _ L 95 TABLE 29 COMPARISON BETWEEN COORDINATED AND NON-COORDINATED GROUPS ON CATEGORIES OF DEVELOPMENT NEEDS Need Coordinated Non-coordinated Total Items 188 41 229 Experience 31 28 59 Education 14 14 28 Other 17 19 36 Total 250 102 352 x2 a 40.817“ For 3 or: .05 level x2 = 7.815, .01 level 1:2 == 11.341 “*Significant beyond .01 level In Table 29 data are shown where a comparison was made between the coordinated and non-coordinated groups in regard to the need categories. A significant chi square indicated a difference between groups. For the coordinated group 76 percent of the needs referred to items. For the non- coordinated group this was forty percent. Eighteen percent of the needs for the coordinated group were in eXperience and education while this figure was 41 percent for the nonncoordinated group. The methods of handling development needs listed in Section 3 were analyzed according to number, responsibility for action, and type. \ x; \. 1h ‘1' . .. . -..“- -..-..— -..-. - ‘\r\ .1 - ,-...- I -,. .....- . .1----_..-__ ..I ‘1’ . f _ ‘ \. \./\ x1.» \ _ -.. I 6‘ ‘ | _ x . - , \ k x \ x . - v \ ‘ .. I 1 1 \ \J Mb— ——e-~v——.—u .- —-.-.*-ug -.. .. .. -.....— . . - .. _-,.. u ,,t ~- ..--~. --- ... o --—-4—--—-.-—- I V» . . . ; A _ . (. - ,1 -1 - \ ‘¢' . . , x .9 ,— . ... ‘ A ----- ...—»-~.——.—_ .. ~ . 96 Hypothesis 13 a The coordinated appraisals will contain a larger number of methods of handling the development needs. Table 30 Shows the analysis summary. A significant F ratio was found between.methods. The coordinated group contained a larger number of methods, thus Hypothesis 13 was confirmed. The number of strengths, needs and methods of handling the needs was not important in itself. However, a reasonable number of these must be present to assume an adequate coverage of each tepic and to furnish a satisfactory amount of material for the feedback interview and subsequent action. There was no evidence of padding or excessive listing of information. The amount of information was usually minimal and the analyses of quantity were within this framework. The appraisal instructions contained detailed instructions for Section 3. Variety and number of methods, therefore, only indicated a thoughtful consideration of the problem. Hypothesis 14 n The coordinated appraisals will more frequently indicate action to be initiated by the supervisor or both supervisor and employee, rather than employee. The appraisal instructions and the coordinators emphasized the point that develOpment action was the responsibility of the supervisor or jointly that of the supervisor and employee. This sort of involvement was considered essential for active interest and participation. r J \ r l 4 l ,1,’ D L , r . , r ‘ x x, 1 K) ‘ 4 T‘ \. V V, l I .1 V \ K 1 n} x. ’4 \ "— 97 TABLE 30 ANALYSIS OF VARIANCE OF THE NUMBER OF METHODS OF HANDLING DEVELOPMENT NEEDS Source of variation Sum 0f Mean squares df square F Between: Methods 413 1 413 l6.59** Coordinators 5 1 5 Locations 62 l 62 2.49 Interactions: Methods x Coordinators 24 1 24 Methods x Locations 17 l 17 Coordinators x Locations 2 1 2 Methods x Coordinators x Locations 5 l 5 Within groups 1220 49 25 Total 1748 56 For df 1 and 50: F005 2 4.03, p.01 = 7.17 **Significant beyond .01 level Bartlett's test: Not Significant Table 31 gives data where the groups were compared on this factor. The coordinated appraisals placed the responsi- bility for action on the supervisor or supervisor and employee more frequently than the non-coordinated. Hypothesis . _ .. r \J .K 1. l \‘ 1 \ e .H .\, a \ 1 F‘ .\ \ ,\ V ’ .x J \ l .\ E . d . l i D _ q . l _ . _ _ _ _ . x , e ,V . . .... 98 TABLE 31 COMPARISONS BETWEEN COORDINATED AND NON-COORDINATED AND BETWEEN PLANT AND OFFICE IN TERMS OF PERSON PRIMARILY RESPONSIBLE FOR INITIATION OF DEVELOPMENT ACTION Coordinated Non-coordinated Total Employee 2 . 26 28 Supervisor 33 22 55 Both 59 16 75 Total 94 64 158 x2 = 39.116ee Plant Office Total Employee 14 14 28 Supervisor 28 27 55 Both 31 44 75 Total 73 85 158 X2 = 10333 For 2 df: .05 level x2g 5.991, .01 level x2 = 9.210 14 was confirmed. No difference was found between plant and office. A statement of responsibility for action on the appraisal form did not necessarily mean the supervisor accepted this fact. This was no doubt more true in the coordinated than .1 flmM' 99 the nonncoordinated appraisals. It should be a step in the right direction, and in the follownup by the coordinator the appraisal form served as a basis for fixing the responsibility. Hypothesis 15 - On—the-job counseling or coaching is suggested more frequently in the coordinated appraisals as a method of training. Table 32 shows these data. Hypothesis 15 was confirmed. No differences were found between locations. Previous analyses of development needs at Chrysler have indicated that the majority of the problems are of such a nature that they could best be handled within the department by extra assistance from the supervisor. This is true because of the specific nature of the problems and the generality of clgssroom training. Therefore, the emphasis has been in this direction through the coordinators and the suggestions for develOpment planning. It was noted that classroom training received the greatest emphasis in the non—coordinated appraisals even though experience was considered an important need. This method of training suggested in the non-coordinated appraisals seems to contradict experience. Coaching by the supervisor can be the most profitable method of training and meeting the needs in the experience area. IHypothesis 16 n The development plans of the coordinated appraisals are more specific or concrete than those of the non-coordinated appraisals. 100 TABLE 32 COMPARISONS BETWEEN COORDINATED AND NON-COORDINATED‘ AND BETWEEN PLANT AND OFFICE IN TERMS OF STATED METHODS OF HANDLING DEVELOPMENT NEEDS Method Coordinated Non-coordinated Total Rotation l5 9 24 Coaching, counseling 67 17 84 Classroom course 36 31 67 Self study (reading) 15 5 20 More responsibility 18 6 24 151 68 219 x2 = 12.557* Method Plant Office Total Rotation 12 12 24 Coaching, counseling 46 38 84 Classroom course 24 43 67 Self study (reading) 6 14 20 More reSponsibility 9 15 24 97 122 219 x2 = 8.741 For 4 df: .05 level x2 = 9.488, .01 level x2 =- 13.277 J x) I 101 An important quality aspect of the develOpment plans was labeled specificity or concreteness. It was thought that plans which gave definite direction as to what should be done, by whom, and how were more meaningful than those which were vague and ill-defined. An evaluation of this factor of the development plans was made according to the directions and categories shown in Figure IX. This sorting was done by the coder with a reliability check by the experimenter. The analysis summary is shown in Table 33. A significant F ratio was found between methods. Hypothesis 16 was confirmed. Sort of Development Plans Sort according to specificity - generality or concrete-abstract. Sort into five stacks according to the categories: 1. Very poor - very unsatisfactory - least 2. Poor ~ less than expected 3. Fair average - good as can be expected 4. Good ~ more than expected 5. Very good - very satisfactory - most Reliability: r'= .69 Figure IX. Directions and Categories for Sorting of Development Plans L . 7.x .\ . 102 TABLE 33 ANALYSIS OF VARIANCE OF SORT OF DEVELOPMENT PLANS ACCORDING TO SPECIFICITY-GENERALITY m Sum off Mean Source of variation squares df square F Between: Methods 672 1 672 8.40%* Coordinators 112 1 112 1.40 Locations 253 l 253 3.16 Interactions: Methods x Coordinators 52 1 52 Methods x Locations 36 1 36 Coordinators x Locations 71 1 71 Methods x Coordinators x Locations 18 l ‘ 18 Within groups 3916 49 80 Total 5130 56 For df l and 50: F005 2 4.03, F.01 = 7.17 *%Significant beyond .01 level Bartlett's test: Not Significant ‘uI 103 Operational Aspects Two additional factors were investigated which were of practical importance for the administration of such a program but were not directly connected with the information on the appraisal form. These were: 1. The actual time required by the appraiser or appraiser and coordinator to complete the appraisal form 2. The time Lapse between the delivery of the forms to the appraiser and the return of the completed forms. In the non-coordinated group the supervisors were contacted individually following the introductory conferences. At this time they were given Opportunity to ask questions about the procedure and appraisal techniques. Each super- visor was given three appraisals, an instruction sheet, and a list of the three employees to be appraised. A time was set as to when the appraiser eXpected to have the appraisals completed, with the understanding that he would be called at that time and asked to return the forms. This initial period of time was usually a week or ten days. For the coordinated group the supervisors were contacted for three-hour appointments, at which time three appraisals were conducted. Preliminary experience with coordination using this form indicated that three hours was usually sufficient time to conduct three appraisals. The coordinators recorded the approximate time spent with each supervisor in the_coordinated group. ‘When the _ 7 r . a v 1‘. \ K " ' f I ’ . V 7 t V .V 7 .7 V . he; AVLV. V C. - , —'. . 7 , _ ' ’ , '- r \4 7 l . V V V. V. a V V dV \V V 1 \ ' 7. \ 7 x A «V V x ‘1 K I x u - V. ,. “ ' ‘ ' ‘ ‘ ' ' ' I V V V _ V V V \_ av ' p u 1 x ’x \ V ~ . r V ~ — » r 7 a \ V V V . d A, _ _ g‘. n 1 a \ \ ' V \ V 1 y 7 . . __l ‘1 _ V l g 7 V I, .. . - V g » V JV ‘ ' l ' V V V. V _ \V \. x, »..\l \ ‘x x. \ ‘ I V x x x x. x 1 . q 7 l - - ’. .7 e V u~ ..O .V t 7 Q . (I V . 7, 7 V - V 7‘ . - o l. \ n V \ ~ , vu \s . C L, " 7 7 r » ~ A . , 7’ — s V V V: \ \ V l ,, ._ V 7 k n.» \ V k w . 1 A x ( ~L .. O t 7 l 1 ’ V V ~ V V . e .1 K U; V ~ a \V O \‘k __ . e) _ \ 7 » l . '. 7 - 7 1 - . ~ . m V \/\. ~ \ u v l-V. V v 7 u \ V x 1 \A d x. .3 A 7.1 ‘ .l‘x/ \ . L. t k A , ‘ -\V_" \ _ _ A \. v ‘ . 7 . - l '1. . COD .Ve .1 \ , , l . V _ p ... v _ . ,7 V x. _ x ‘, .r . , V V7 K ‘ . ’ . : r: r - - . 7 7 .7 7' » . '. ' - . A _ ux. .V x ‘v » d \ \ V \4 x v. ‘ V. \V V V. x - Kl- k \V V . . V7 . A\ . , ‘_7 x \. _ ‘, Vi, a \ 1:- ‘ v .V ‘ v ‘ {J V \ A _ —"— 104 nonecoordinated supervisors returned their appraisals they were asked to estimate the time spent in completing the appraisals. In this way it was possible to determine the approximate time spent per appraisal in each case. Summary information is given in Table 34. TABLE 34 AVERAGE TIME SPENT COMPLETING AN APPRAISAL L— Coordinated Nonncoordinated Number of supervisors 32 32 Minutes per appraisal (average) 58 113 Range (time in minutes) 30-70 20-360 Table 35 shows an analysis of variance summary for the same data. A significant F ratio was found between methods. A significant Bartlett's test indicated heterogeneity between groups which was anticipated in a comparison of the ranges in Table 34. The important finding here was that the coordinator served as a pacer. He controlled the time involved and insured that the appraisal was given adequate consideration. Twenty minutes was probably not sufficient time to do a good job of appraisal. 0n the other hand, 360 minutes (6 hours) was no doubt excessive. A good appraisal should be done in considerably less time. This extreme variability in time no doubt was to some extent an important factor in the variation of quantity and quality of appraisals noted earlier. \J --k - e \ _<—. u» - - 105 TABLE 35 ANALYSIS OF VARIANCE OF THE AVERAGE TIME IN MINUTES PER APPRAISAL FOR EACH APPRAISER Source of variation ggfiaggs df 323:?6 F Between: Methods 49,562 1 49,562 13.79** Coordinators 969 l 969 Locations 722 l 722 Interactions: Methods.x Coordinators 311 l 311 Methods x Locations 285 l 285 Coordinators x Locations 4,641 l 4.941 1.29 Methods x Coordinators x Locations 1,859 1 1,859 Within groups 201,335 56 3.595 Total 259,684 63 For df l and 55: F.OS‘= 4.02, F.01 e 7.12 **Significant beyond .01 level Bartlett's test: Significant beyond .01 level P. P nV 106 The average time per appraisal was almost twice as much for the non-coordinated group as for the coordinated. In the coordinated situation, of course, two persons were involved rather than one. 'From the information on time it was possible to get an approximate idea of the relative costs involved. Based on average salaries for supervisors and coordinators it was found that the actual appraisal manpower costs were about: 1) Coordinated - $6.31 per appraisal 2) Non-coordinated - $6.48 per appraisal. These figures were approximations, but it seemed safe to state that in regard to manpower costs there was essentially no difference between the coordinated and non-coordinated appraisals. It should be noted that the above costs were based on average salary for first level supervisors. In the manage- ment program only supervisors of the second level and above were appraisers. Therefore, the average cost of the super- visor's time was greater and increased with successive levels of supervision. In this case the coordinated appraisals probably cost less per hour than the non-coordinated. The second factor mentioned is difficult to name. Essentially it involved getting the appraisals finished and returned. In the case of coordination, the coordinator provided the time and place to get the job done. Problems of setting appointments and of cancelled appointments were A \J I (1 (- (f ( O -. I“ , ,# r A, d (‘ l ( 107 encountered. Nine of the 32 coordinated supervisors cancelled their original appointments and had to be re- scheduled. It was necessary to have two sessions with seven supervisors and three sessions with two supervisors because they could not leave their work area for three hours at a time. On the other hand, 15 of the supervisors in the non- coordinated group of 32 returned their appraisals when called once as scheduled. The period of time here was from four to ten days. Twelve supervisors had to be called from one to seven times more in order to get the appraisals returned. For the non-coordinated Supervisors the time required to get the appraisals returned ranged from four to 75 days with a mean of 18 days. This count was based on working days, five per week. Five of the supervisors had the appraisals over seventy days before they returned the forms. It appeared that without coordination the completion of an appraisal program might be extended indefinitely. It must also be kept in mind that only three appraisals were involved in each case. It was impossible to estimate the cost involved in additional calls, contacts, and delays encountered in the non-coordinated situation. These extra costs do add up, however, and would increase the cost per appraisal as previously estimated. GENERAL DISCUSSION The evaluative aspects were those concerned with the ratings alone, the scale values of the judgments and their relationships. In lieu of outside criteria of validity secondary criteria of a rating method were used to compare the two methods of appraisal administration. It was expected that coordination of appraisals would result in improved evaluation of performance. It was thought that this improve- ment of evaluation would be reflected in the ratings by an increase in discrimination, a decrease in the constant errors of leniency and halo, fewer omitted items and greater reliability for the coordinated appraisals than the non- coordinated. Failure to confirm the Hypotheses 1 through 8 meant that the coordinated ratings did not differ from the non. coordinated ratings on these criteria. Therefore, an improvement or an increase in the validity of the ratings was unlikely due to the reduction of these errors. The eXperimenter suggested that the lack of positive findings appeared to be due to the unexpected characteristics of the non-coordinated ratings. The non-coordinated ratings were more satisfactory on the indexes used than was antici— pated. The non-coordinated ratings did not contain the ,4 1k 0. \ 171 1x , C ..— Ax, ‘ O Q J 1 rv 1 \ .7» \V 1. . \ -V 109 extrmme amount of constant errors usually found in such ratings. Distributions from other plants showed less spread and more central tendency than that found in the nonecoordia nated ratings. Taylor and.Hastman (38), likewise, suggested that their lack of positive findings might be attributed to a situation where the control group gave more desirable ratings than usual. A more definite emphasis by the coordinators en spread and omissions may have given positive results. The coordi- naters were trained and directed to do a careful job of coordination in an effort to duplicate as nearly as possible the operation of other coordinators currently engaged in the ongoing program. This approach was adapted in order to avoid possible criticism that the experimental group or coordinated ratings were atypical or represented an arti- ficial forcing of ratings. Subsequent observation and eXperience, not verified eXperimentally, has indicated differences between coordin nators and plants in distributions of ratings. Apparently many situational factors such as acceptance of the program, size of work groups, general management climate, etc. operate to affect the amount of errors present in ratings. The coordinator probably effects more of a change in some situations than others. In addition, special emphasis by the coordinator can be effective in obtaining more spread. In some cases this may ‘61 r 7» .... . "— ll Al .. a _ r'. n .V e. .L 1.. (A .4 . k _\ 1 . .7 \ ..V - ‘ 1 L 1 k r \7)) A _ t. . \ tV ‘1/ 1 c « _ L K, .\ \I O 0.1 \ WU _ .. 110 occur at the expense of good supporting facts. For example, in the present experiment the coordinators could have been directed to insist on a forced distribution of ratings and they likely would have come close to attaining this. Such a distribution for the coordinated ratings in this study would no doubt have given differences regarding Spread and skew. Likewise, insistence that no items be omitted would likely have produced differences. The experimenter has suggested that the effect of the coordinator upon the ratings is probably a function of two factors: (1) the situation, and (2) the emphasis of the coordinator. This study did not support the notion that the coordinator consistently effects a change in the ratings on the characteristics used as indexes. An improvement in the ratings for the coordinated appraisals in terms of the criteria used was not supported by the findings relevant to Hypotheses 1 through 8. The criteria of the appraisal as a develOpment tool were established from an examination of its function in the development process. It was necessary to derive the criteria in this fashion because the specific role of the appraisal was considered important and because there was no previous research on develOpment devices which appeared relevant to this system. The steps in the development procedure are: 1. Performance appraisal of the employee by his immediate supervisor (W: L; O \, O 4 ~— 111 2. Appraisal review by the appraiser's immediate superior 3. Performance interview with the subordinate by his supervisor 4. Training 32g develOpment of the subordinate 5. Follow-up on the course of action by the coordinator Hypotheses 9 through 16 focused attention on the develOpment procedure itself. The supporting facts and the performance summary are both used in the review to acquaint the reviewer with the status of present performance and the \J x _I K \. .. ‘ \ . “4 a. )\ \' .__ k \ K i | . . U ‘ . \)\ K ‘ t \ . . _ U v» . ‘ O . , ‘ t ‘ .‘, - -, - ix, \ U \ \ - \. l “ \ \ a I -7- K s ‘ . x ." i l \. ' I ; ;-~ x” c, - g. g: e x I L \ . i-\ ‘ l p“ I . . I“ I _t ‘- .- K. t, W L, ' a 3 \v I ' \ \ \ ‘ e . \A. t‘ ‘ I I \ 1.4 o ’ " K . k. ' i ‘l ‘ \ . ( ;‘ \a . V A v \ ,\ I \' ‘ \ . _ J - a - ‘ n. x. 5‘ L' 1 x 1 _ x e ‘ ..., .. . . .1\ \— n I . " t e ‘\ \ v _ ..- \,- k. \d 'l \l )o \ \.\ ,«'\- \JU ‘. ._\- \ <. 113 definitive findings of this study. A satisfactory interview with Specific suggestions for development, definite develop- ment action, and a detailed follow-up can not be eXpected unless information is available to give direction to the process. The coordinated appraisals were more satisfactory in regard to completion of the form than the non-coordinated. In the latter group a significantly greater number of sections were completely omitted. It appears that a coordi- nator is essential to insure that material is recorded at this point in the process for use later. The number of strengths, needs and methods of develop- ment recorded in the performance summary in itself was not considered important. Too much material in the summary could lead to confusion and be as detrimental as a lack of infor- mation. An over—abundance of information was not a problem. The average number of strengths, needs and methods for the non-coordinated group was two, one and one. For the coordie nated group these numbers were three, three and two, respect- ively. The ceordinated appraisals did contain a greater number of strengths, needs and methods than the non-coordi- nated appraisals. These findings indicate that a more ade- quate amount of information for the performance interview and subsequent development action was obtained in the coordinated appraisals. The analysis of develOpment needs indicated that the needs were more frequently related to the items for the Jgk __\ I . \ ..4.. . . .. . _ - _ . _ K . ~ U n. x .e .. y ‘11. .\ .\.. u. x. is O ..U . v‘\ . , . .u ... i. k . I—U _\ A l.\_ O .. .K 1 n n, t ‘ _- ,. \ L . .\ ‘ . v ., \ .\ ax . . I . . :_ ..e \ _ .L . _ , .V x J x, .x .\ a e .\ . . . .... , in“. F .L C .J a ‘ ._ .I I ;k \ . .\ ,_ c . . .\ .\ .s n\ h. e. p _ L‘\. .L .k i t ‘4 . _\ r \ NJ. ..., . a‘ 11h coordinated appraisals than the non-coordinated. In the non-ceordinated appraisals needs in the areas of experience and education were more frequent than in the coordinated. These results would suggest that the summary of needs was more truly a summary of the item facts in the coordinated appraisals than in the non-coordinated. The item facts are more definitive of the needs because of their specificity and individuality. The needs of experience and education represent rather general needs which might apply to everyone. The responsibility for the training and development of employees is considered a line management function. The personnel staff assists line management with techniques and peeple in carrying out this function. Each supervisor has the responsibility for the training and develOpment of his subordinates. Furthermore, eXperience has indicated that the majority of development needs can be handled most satisfactorily in the department by the supervisor. An aim of the appraisal program was to begin the education of supervision in terms of the above principles. Efforts were made to aid supervision in the recognition of this reSponsibility for training and development. The coordinators were directed to emphasize the appraiser's role in the entire process. The appraisal instructions were also slanted in this direction. The findings indicated that the coordinated appraisals placed the responsibility for action on the supervisor or ..v . . . z . '\ . 4..— i. r .. V s. .x H O O .. .x. . &v\ ‘30 f7 1 . wit. \ r .\ ‘ . _\ I . I 1 _ ~ _ .t. _\ -.U . A I _\, .l . ck _\ _ . '\ x . C x ,\ .. O \ .\ . . ,l ..K. \’ .‘. .' ca ..b -1 \l 115 supervisor and employee more frequently than the non- coordinated appraisals. Coaching or onpthe-job counseling was suggested more frequently by the coordinated appraisers than the non-coordinated. The recording of these approaches on the appraisal form does not mean acceptance by the super. visor but it does indicate what will be expected from him through follownup. A significant F ratio between coordinators was found on the three factors: minor thought count, number of criteria, and expert evaluation. The appraisals conducted by coordi- nator B contained more minor thoughts and criteria and were valued higher by the experts than the appraisals of coordi- nator A. Differences between coordinators can be expected. 'When they become too large, however, this may be indicative of a training need for coordinators. No significant differences were found between locations. Coordination was equally effective in the office and plant. It might be expected that the office supervisors would benefit less from coordination than the plant supervisors because their work is primarily paperwork. No differences of this kind were observed. Although heterogeneity of variance reduced the precision of the F test in several instances the information gained from a significant Bartlett's test was also useful. Coordi- nation produced increased uniformity between appraisers on quantity and quality of supporting facts. Likewise the To ,k k. t f . .k .\ \ ‘sv \ ‘. L Q k x a e rs. . O K .\ .\ ,\ If .\ . _ . .k . . r. r a. . M. 116 extreme variation in time spent per appraisal was reduced by the coordinator. Evidence has been produced which indicated that the coordinator did effect an improvement in the appraisal as a develOpment tool. In fact, if the appraisal is to fulfill its intended function in the develOpment process the coordinator is essential. The procedure used here for the evaluation of the appraisal as a develOpment tool was not new. The elements of the procedure were found in other types of research. A systematic and thorough application of these techniques to a problem such as this was not found in the literature. The emphasis in management circles today is on develOpment, although little has been done in the way of research because of the magnitude of the problem. The eXperimenter suggests that a solution might be a series of studies at various steps in the program. The next important phase for study in the Chrysler Management DevelOpment is the performance interview. Individual develOpment depends upon the communication of the appraisal information to the ratee and his acceptance of the develOpment plans. SUMMARY AND CONCLUSIONS Sixty-four first-line supervisors each rated or appraised the job performance of three subordinates. Two methods of administration, two coordinators, and two locations identified as coordinated vs. non-coordinated, coordinators A and B, and plant vs. office, reSpectively, were incorporated into a factorial design. Method of administration, with or without a coordinator, was the variable of primary interest. The coordinator was a staff person who assisted in appraisal by questioning the appraiser and recording the information in a modified Field Review type interview. The objectives of appraisal were: (1) evaluation of present performance and (2) planning for individual improve” ment. These goals were considered separately in the eXperi- ment. It was predicted that the coordinated appraisals would be superior to the non-coordinated in meeting these aims. The relative merits of the methods as a system of evaluation were inferred by comparing the treatments on secondary criteria or characteristics of a rating method. Satisfactory outside criteria were not available. It was predicted that the ratings of the coordinated appraisals would be improved by increased discrimination, reduced leniency, reduced halo, increased coverage and increased comparability between ratings. Specifically, the coordinated 1‘ .L .x .\ . _ Ox . .\ . . . .x. .. e . { . Q , . . , C e ,. _ w ,\ ,. . ,\ L . \ . .\ r x. 3 . ; O , e . . \ _\. .\ e \i a . . \, . . e _ l \ \ . . u e .K . L _ . i n\ _ x . . n . o _ x, . , \ I . n \ I\ ~ , . J x . x _ . . s I _ . u I \a . I . L t vi _ I \ ...i . . s x x I \x, a . v . u a f x _ ,x .\ n . .\. e ...c . . I .l w\ E \ 6 ., _ . _ . Q n I. C . Ix . , Flo . x ..I .. i i . K .\ .. ... _ . , .x ‘ l . , a .\ ,x . Q Fl ‘ . (H, i. .L .,L .1 —|. ll8 appraisals would differ from the non—coordinated in the following respects. The coordinated ratings would show: 1. Increased spread of overall ratings 2. Increased variability of ratings for an appraiser 3. Mean of ratings nearer central scale value A. More symmetrical distribution of ratings 5. Greater variability of item ratings 6. Smaller preportion of identical ratings 7. Fewer omitted reSponses 8. Greater relationship to separately obtained ranks. The above statements were not supported by the findings. No differences were found between the methods, coordinators or locations on these factors. The conclusion was that the coordinator did not improve the effectiveness of the appraisal as a rating instrument. The eXperimenter offered additional evidence which suggested that the negative findings might be due to the fact that the non-coordinated ratings were improved on these characteristics over what was usually found. Immediate criteria used to evaluate the appraisal as a develOpment instrument were established from an examination of its function in the complete develOpment procedure. subsequent to the appraisal session the information recorded on the form was to be used for (l) a review by the appraiser's supervisor, (2) a performance interview with the employee, (3) training and develOpment of the subordinate, and (h) . rI.I\ _ K 119 follow-up on action by the coordinator. A content analysis was conducted on the response material found in certain sections of the appraisal to determine whether the coordinated appraisals exceeded the non-coordinated on quantity and quality of information. The areas of the form important to the above steps were the supporting facts and performance summary which contained sections for a summary of individual performance strengths, develOpment needs and development plans. . The findings indicated that the supporting facts of the coordinated appraisals: l. Contained a greater amount of information 2. Were more descriptive of specific performance 3. Contained more criteria of performance A. Contained more examples or illustrations of specific instances 5. Contained more qualifying words than the nonncoordinated appraisals. The supporting facts of coordinator B were superior to those of coordinator A on three factors which indicated variation between coordinators. There was no difference between the coordinated and non” coordinated supporting facts in terms of number of negative and positive comments or the use of dogmatic words. The non-coordinated group of appraisals contained a greater number of appraisals with Sections 1, 2, and 3 of the performance summary omitted. ‘7 . In . I. ..v a .\ _ _ . v .x x) \ r rt. . J V . . \ _\ \ ‘ . — \ _. _\. ‘_ J l a-\ .r i k X f .\ \ (M (k . k .\ .\ O . L L L _\_ . ,. O ..x, .r ,\ , . I . 1.! P .3 L , . C .a p\ .u —\ . ..u ...U. I \ ..x . . .. a .\ . . (v \ 120 The performance summaries of the coordinated appraisals contained a larger number of the items listed below than the non-coordinated appraisals. 1. Performance strengths 2. DevelOpment needs 3. Methods of handling development needs. The development needs for the coordinated group were more closely related to the supporting facts. The coordinated appraisals placed the responsibility for the initiation of development on the supervisor or both supervisor and employee rather than the employee only and suggested on—the- job counseling or coaching as a method of development. The development plans for the coordinated appraisals were more specific than those for the non—coordinated. There were no differences between coordinators in terms of the performance summary. No differences were found between the plant and office on any of the development criteria. It was concluded that the coordinator did effect an improvement on the appraisals as a tool for development in terms of the outlined procedure and criteria used for evaluation. In fact it can be said that the coordinator plays an essential role and makes a significant contribution to the development procedure. Q re . k \a U \. ..U _ k (V -.U U APPENDIX Coordinated Function Janitor MillingiMachine Inspection Tool Room Heat Treat Electrician Tool Stores Sheet Metal Equipment sub-assembly Tool Engineering Tool Engineering Time Study Accounting Comptometer Laboratory Budget Coordinator A Nonucoordinated Group Function Jigs 2.1.92.2 29 Janitor 15 Milling Machine 9 Inspection 13 Heat Treat 5 Steamfitter 17 Tool Cribs S Boilerhouse 6 Machining Office A Laboratory 13 Tool Engineering 11 Tool Engineering 7 Time Study 6 Telephone Operators 5 Laboratory 4 Cost Estimate 3 Machine and Tool Layout Figure X. Listing of Departments in the Sample by Function and Group Size 122 Group .222 1h 35 30 16 13 16 15 12 ll 13 L Coordinator B 123 Coordinated Non-coordinated Function Group 'Function Group .aise. ..siss nan; Milling Machine 22 Milling Machine 5 Planning 6 Planning 32 Inspection 6 Inspection 8 Tool Room 36 Tool Room 17 Heat Treat h Heat Treat 3 Millwrights 15 Machine Repair 19 Sheet Metal and Sheet Metal and Machine Assembly 1h Machine Assembly 25 Inspection Laboratory h Machine Laboratory 3 Office Machine and Tool Layout lh Planning 8 Planning 3 Tool Engineering A Plant Engineering 13 Plant.Engineering 5 Accounting 6 Accounting 7 Accounting 5 Accounting h Master Mechanics 7 Master Mechanics 8 Resident Engineering 8 Engine Test 7 Tabulating 8 Traffic 3 Figure X. Continued 12h pdcfidhpmsH Hmmweaddd. .Hx easwrm EZBZLOQM>SQ afiZZommmd zoneemoamou msnmsmzo new 20 ssz . .:.oz zoeeeonsnmmaqo .ieszZonme mos I‘ll! it) .1szaaa V liezmzemaamm 1|“: I see: azaz 3mH>mm Meadzmommmm 106:va .300 05» 90w 0» hovho and. . oedadt doped wdaEfimed .o 'm .i 'Mi N H { l m .5 fl ..N .H ‘11 _ .. i . . 005$ . l A . do ado goo: wnappoo .h m 3 m N H 11 ozoapesaum mac: 1 odapdoataoa waaaesom .0 m 3 m N H ‘1. it , .1 r.:it : .uao: mauop we who: as . muaeao>09defi waaumewwam .m m N H II, hv‘ vi 1; 1H“ .mxmep ooawaeee use mu“ m : shapes CH haaaanepaodoa .: m d. m N H 1 1W. r - -- .soamd>aedsm ofipuda hoods waaxmoz ca mweso>aooommm .m .aeo: as» soon essence. coated on» so: sesamsoo "onsm mozazmosmwa II «950 and» on moon Enos 5055 30m .N ml :3 ml a: ..l r 1 l H - 1 . assoc mac: r we huHHesv one ma non? .H ml a! m! mecca wcdaaodmaw .pospompod x903 mo heapcood one hwaaoav one mace homemaoo "BDdBDOIKmOB ho moHfimHmmeowu Sena some pom .poodwo 50% use: we sauce Ga ecoapodsm coon» msaomaod popes mcwon doomed on» so: hopaeaoo .mEouH wddzoaaou on» me some do H Bmo on» open sch case: :on .maoueou Had wdaaopumsoo .d $5.4m qflumgo undeadoaoboo no avenues oopdofiaooem .m «consanohnod o>dpoomue 0908 new wooed Hepdoadoaepoo udeedmadwam man one eons. .N «new on» do popdnpmdoaeo me moapaaane one mnpwdoaom redefiuadmum man one song. .H . ZOHB¢DA¢>M MOZdSmommmm ho MMdSZDm HH amdm 126 coon . oHaua aooaoaand .musossoo Hesosuaeea Aheaoodmv .oopuSe heaven ea doused on» noun3_AOH mac: we came hence hae.epeoaeaH .doupdmOQ asaaowosea e on pdofiooaoboo aou.3on doaueaooaedoo oudaoz. A V A phonon caches wnoa zomv .uzosaoaobop somehow find: nonpaeom Hedaomsces e on pcoaoeaeboe new dofiueaopnoaoo mouse: A V .Heahopea Hedaoweaes so: hanenoam A v «ma ceased was» .obonm on» o» phowoa :H noapfiaodd manuhooeea Ho dodmmommOQ one .o .mowvaaanamdodooa nonman pdoood on oaaeeq .n .coauenaaowao on» :a condom pdomohm .o . .edfieeum one deacon..ow4 .o .maouoeu weakeaaom one we cowpcaopdmdoo a mafia codeSHOHAod need was anemone a.oehoadao on» he oodasuouoo ma mane .moueaaabmndomoea Heahoweaoe sneaked one caucus o» modaane on» no aoueeoanQ on» ma Hoapaou0d nuxoaw chosen . mazomc ampsbm ho zOHBm ‘HHH.sm7 ‘ (J . I \ .1- . - - ~ - - or ' L'— y 1 ° ' k I n ‘ . _ \-'- (I —‘ _—.— - .— ' ---—-‘ 1 _'x . - l k ( {\I ‘ ‘ ‘-‘ t \J" A I \i “‘. . 7‘ r ., 1 .- ,--o-'b- ' —' x] " ‘ Q " “ V “ .. ~ --._.' I w \ ‘.\ Q C C \ . i ‘ \_.V ‘ ...-... -.. .. — fl ' . . g . K. F _ ‘ i Q \_ \ \ V “ ‘- \/-r T. a ’L‘F . ~— -— ‘ \ \' ' V -; ” .. - - _,..- _- .. .. . . a I -'- Q ’ ' . _ ..'-\‘ V _. -___ .. .. .- . . o ‘ . _ V \ u U C \J (;_.____ o t LI ‘ .-. . 3;? ‘ ‘ ck .- r‘ f c 1 x h“-- .H _~.. - ’ ‘ ' ‘V. t . OK I“ :C“‘A‘ _ ‘-‘ "-0-. It..""—‘-'-"" 0 Q‘ I A .7 i ..- .L _o~'- - {. \' . 9 Q 7 7 . a ' ° ‘1‘ AV . , r. _ | . ‘j .- " \ ~ \I . , r . _- ' . o ’ ( r ._r 3' f “ .4 a; ‘:‘_ up" - ... — o ' Q i- N o \ \. ‘ 0 ‘ t ‘ 5 a. 4 ‘ ' o ' t. I ._ -. 2*“ {g ‘ 0 ° Q I : -'~' 3‘” I ‘-.~—;-;—— - o ————-:'“ h "I o \. u .- r r a "I - t‘ L, . . {k K ' 0 Fa 1h. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 130 Guilford, J. P. Fe chometric Methods (2nd ed.). New York: McGrawéHIll, 195E. Heel, P. G. Introduction 32 Mathematical Statistics. New York: JothWiley & Sons, 19H7. Institute for Social Research, Survey Research Center. §.Manual for Coders. Ann Arbor: University of Michigan, 19352—— Johnson, D. M. The Psychology of Thought and Judgment. New York: Harper & Brothers, 1955. Johnson, D. M. & Vidulich, R. N. EXperimental manipu- lation of the halo effect. .g. appl. Psychol., 1956, no, 130-13h. Lindquist, E. F. Desi n and Anal sis of Experiments .;3 Ps cholo and ducation. New ork: Houghton Mifflin, 19E3. Lynch, J. M. The psychology of the rating scale. Educ. Admin. g Supervis., l9hh, 30, h97-50l. McMurry, R. N. & Co. Patterned Merit Review Plan (Manual). Chicago: Dartnell Corp., 1950. Mahler, W. R. Let's get more scientific in rating employees. Personnel, l9h7, 23, 310-320. Mahler, W. R. Twent Years 2; Merit Rating. New York: Psychol. Corp., 19H7. Mahler,'w. R. An experimental study of two methods of rating employees. Personnel, l9h8, 25, 211-220. Patterson, C. H. On the problem of the criterion in prediction studies. .g. consult. Psychol., 1946, 10, 277'.280. Poffenberger, A. T. Principles 2; Applied Psychology. New York: Appleton-Century, 19E2. Shaeffer, R. E. Merit rating as a management tool. Harvard Business Review, l9u8, 27, 693-705. Sission, D. E. Forced choice - the new army rating. Personnel Psychol., l9h8, 1, 365-381. Smyth, R. C. & Murphy, M. J. Job Evaluation and Employee Rating. New York: McGraw-Hill, 19E6. "'3 L 30. 31. 32. 33. 3h- 35. 36. 37. 38. 39. #0. M1. h2o 131 Spicer, L. G. A survey of merit rating programs in industry. Personnel, 1951, 27, 515-518. Springer, Doris. Ratings of candidates for promotion by co-werkers and supervisors. ‘1. appl. Psychol., 1953. 37. 3&7-351. Standard Oil Company, Employee Relations Department. Made 32 Measure. New Jersey: Author, 1951. Stevens, S. N. & Wonderlic, E. F. An effective revision of the rating technique. Personnel 1.. 193h. 13. 125-13h. Stockford, L. & Bissel, H. W. Factors involved in establishing a merit-rating scale. Personnel, 19h9, 26’ 9u‘116. Symonds, P. M. Diaggosing Personality and Conduct. New York: Appleton-Century, 1931. Taylor, E. K. & Manson, Grace E. Supervised ratings — Making graphic scales work. Personnel, 1951, 27, 504-514. Taylor, E. K., Schneider, Dorothy E., & Symons, Nancy A. A short forced-choice evaluation form for salesmen. Personnel Psychol., 1953. 6, 393-h01. Taylor, E. K. & Hastman, R. Relation of format and administration to the characteristics of graphic rating scales. Personnel Psychol., 1956, 9, l8l~206. Thorndike, E. L. A constant error in psychological ratings. ‘g. app . Psychol., 1920, h, 25—29. Tiffin, J. Industrial Psychology (3rd ed.). New York: Prentice-Hall, 1952. Wadsworth, G. W. Jr. The field review method of employee valuation and internal placement, I — VI. Personnel g., 1948-h9, 27, h7—5h, 99-106, l35-lhl, 183-190, 227-232, 263-268. Whitla, D. K. & Tirrell, J. E. The validity of ratings of several levels of supervisors. Personnel Psychol., 1953: 6: hél-héé. O“. ...— —. I r V 0 A t \ k ...—..- “a \ - u— >- - w u k L), G D Q -~ F ' . ~J. . . - c - ’ ‘ ’ ~ .. \. , . e 9 Q ‘ \ .\- ‘ .-L K \D F... F . H 9 \< - ta“ Q -- ‘ - n , \ x. I M c "" Q - 0 0 C — Q \ "‘ x O O { .‘\ _ to 0 Q 9 x — l i . 1" ‘ H ’ \ f C Q O K L ~ . U o I Q o q a . v a V.\ O O\ --M‘. ... I- t t " ' O O D t k I- ... _ Q, I; n' f’ W ' 7 1—1 ROOM USE ONLY ROOM USE 0ND: