Zambezia (1999), XXVI (i).ASSESSING TEACHER PERFORMANCE: A COMPARISONOF SELF- AND SUPERVISOR RATINGS ON LENIENCY,HALO AND RESTRICTION OF RANGE ERRORST. J. NHUNDUUniversity College of Distance EducationAbstractSelf- and supervisor ratings of the performance effectiveness of teachers on30 teaching and teaching-related tasks were obtained and compared todetermine the potential usefulness of self-ratings. The study also comparedself- and supervisor ratings to determine areas of agreement in perceivedperformance effectiveness of teachers and whether there was any correlationbetween the rating scores from the two sources. It was found that supervisorrating scores were more inflated and that supervisors tended to rate teachersglobally instead of looking at specific performance tasks. However, asignificant Spearman r (r=0.97) obtained from rank-ordered mean scores ofthe two groups showed that teachers and supervisors held similar perceptionsover areas of lesser and greater performance effectiveness. Finally, Pearsoncorrelation coefficients computed to determine the comparability of therating scores of the two groups were statistically significant, indicating thatboth groups were measuring the same performance behaviours.IN EDUCATION THE question of who should evaluate teacher performance isnot as much an issue as are the purposes of evaluation or what should beevaluated and how it should be evaluated. There is greater concern overmethodological issues than over key players in the evaluation process. Ithas almost become axiomatic in schools that teacher evaluation is carriedout by supervisors and administrator-supervisors only and not peers andstudents and, least of all, supervisees themselves. There is, therefore,virtual dependence on teacher performance profiles provided byimmediate, local, district, regional and central office personnel, contraryto emergent research findings which raise serious questions on the utilityof continued dependence on supervisor appraisals as the only teacherevaluation approach in schools (Nhundu, 1992).The question of who evaluates merits serious consideration becauseof advantages and disadvantages associated with a given evaluator, groupor combination of evaluators. For example, the use of administratorevaluators in assessing teacher performance is likely to induce fear in theevaluatee due to perceptual dilemma resulting from contradictorybureaucratic and professional expectations inherent in administrative3536 ASSESSING TEACHER PERFORMANCEand supervisory roles which reside in the same person. It is difficult tocompletely allay a supervisee's fears associated with the bureaucraticposition occupied by the supervisor whose dual authority bases bring tothe supervisory process a threatening atmosphere. Teachers, therefore,see the role of supervisors who also occupy administrative positions asnot directly related to the improvement of instruction but associate themwith supervision for administrative decisions.Traditional performance ratings using superiors may, therefore, notbe the best teacher evaluation method. On the contrary, teachers viewself-ratings as the most appropriate evaluation method compared withsupervisor/administrator and peer evaluation which they ranked secondand third, respectively (Stark and Lowther, 1984, 97). Research findingsalso show that teachers are not happy with traditional assessmentpractices (Mclaughlin, 1984; Reavis, 1978; and Wolf, 1973). Hence, Levin(1979) and Paulin (1980, 10) have found that teachers, individually orthrough their professional organizations, have expressed unwillingnessto be evaluated, especially when they do not trust the evaluator's expertiseand also when they are not represented in both the design andimplementation of the evaluation.Self-evaluations, on the other hand, have the greatest potential ofproducing changes in teaching practices because they provide teacherswith the rare opportunity to reflect on their teaching and modifyaccordingly. Johnston (cited in Balzer, 1973) compared the effects oftraditional and self-evaluation practices on behaviour modification andfound that self-ratings showed greater potential in changing teachingbehaviour than traditional approaches. This finding is also supported byNatriello (1977) who cites similar evidence from his studies with the USarmed forces.Unfortunately, when self-ratings are obtained on a compared-to-othersbasis, research shows that there is greater leniency, less halo error andless variability (restriction of range error) on the part of self-evaluations(Ash, 1980; Heneman, 1974; Hubert and Dueck, 1985; Johnston andSackeney, 1982; Klimoski and London, 1974; Levin, 1980; Meyer, 1980; andThornton, 1968 and 1980). Restriction of range error occurs when one setof paired corresponding standard deviations obtained from independentperformance ratings of two rating sources is significantly smaller. Smallerstandard deviations indicate a relatively narrow range in the distributionor spread of the rating scores from which they are based.On the other hand, leniency error arises when inflated performanceeffectiveness scores from one rating source are significantly differentfrom the ratings obtained from other sources on the performance of thesame group of ratees. This, in turn, often presents a measurement problemT. J. NHUNDU 37because it narrows the range (spread) of possible performance ratings ofratees. According to Holzbach (1978, 579),Leniency errors present a measurement problem to the extent thatrestriction range on the performance ratings limits the magnitude ofthe potential relationship between ratings of performance and othervariables of interest.In addition, rating scores that are inflated may send incorrect signalsto the ratee since they misrepresent a person's performance effectiveness.Finally, halo error is a type of rater bias which arises when a raterfails to distinguish specific job dimensions in the appraisal process and,instead, employs global assessment. The relative incidence of halo effectis obtained when the magnitude of intercorrelations for supervisor ratingsare compared with those for self-appraisals which are independentlyobtained using the same performance rating instrument. Hence, a ratingsource which produces larger intercorrelations indicates greater haloeffect. In this connection, previous research has shown intercorrelationsfor supervisor ratings to be consistently higher than correspondingintercorrelations for self-ratings, indicating greater halo error forsupervisor ratings (Heneman, 1974, 642).Whilst a number of studies might have reported higher leniencyerrors with self-ratings, Heneman (1974, 642) and Miner (1968) argue thathigh leniency errors of self-ratings may be attributable to the purpose towhich the ratings will be put. They have concluded that when the purposesof self-ratings are not administrative but research only, self-ratings maynot be as inflated as reported in other studies. Subsequent studies byHolzbach (1978) and Nhundu (1992) concur with Heneman and Miner.Nevertheless supervisor ratings remain the dominant methodologyfor evaluating teachers in spite of continued teacher discontent with thismethod. Furthermore, the assessment of teacher performance usingsupervisor-ratings is often too sporadic, and supervisory visits are toofew and far apart in their frequency that they may not have any meaningfuleffect in the modification of teaching behaviour. Such supervisorypractices are also often superficial because they are too broadly focussedand all-encompassing to stimulate teacher change and growth. In addition,the traditional supervisor-teacher rating relationship often createsinsecurity and induces fear in the supervisee. According to Ness (1980,405)There is an assumption, both implied and stated, that the authority toevaluate personnel carries with it fear of being judged, and this fearstands in the way of helping teachers . . . Little or no growth occurs asa result of a formal observation and instruction does not improve as aresult of summative evaluations.38 ASSESSING TEACHER PERFORMANCESuch criticisms of traditional supervisor ratings raise seriousquestions over the potential usefulness of teacher performance evaluationscompared with self-ratings. The merits of self-ratings are notable in theireffect to bring about changes in teaching practices. In addition, self-appraisals may also be associated with teacher self-esteem and higherproductivity (Meyer, 1980). Accordingly, if the crucial question inperformance evaluation is accepted as the extent to which it produceschanges in teaching practices and results in teacher growth, then self-appraisals hold the greatest potential in this regard. Thus, in view of lackof consensus concerning current status of self-ratings compared withsupervisor ratings, more research should be undertaken to assess thepotential usefulness of self-ratings in terms of their relative leniency, haloeffects and variability.In short, the main purposes of this study were to assess the potentialusefulness of self-ratings as an alternative evaluation method by (a)obtaining self- and supervisor performance rating scores on selectedteaching and teaching-related behaviours, and (b) assessing areas ofgreater and lesser performance effectiveness of teachers using self- andsupervisor ratings, (c) determining and comparing self- and supervisorratings in terms of their leniency, halo effects and variability (rangeerror), and (d) assessing the rating scores of teachers and supervisors todetermine the level of agreement in their selection and ranking ofperformance effectiveness dimensions.RESEARCH METHODSupervisors were asked to assess the performance of their teachersusing a thirty-item performance assessment questionnaire graduated ona five-point rating scale ranging from 1 = poor to 5 = exceptional. Theperformance assessment questionnaire was sent to a large randomly 'selected sample of teachers and their corresponding supervisors as partof a larger study on job satisfaction (Nhundu, 1994). The questionnairesfor teachers and supervisors contained identical performance scales.Teachers were asked to rate their performance in teaching and teachingrelated tasks while supervisors rated these teachers on the same scales.Participation in the study was voluntary and the participants were assuredof the confidentiality of their responses. Respondents were furtherinformed that their responses were to be used for research purposesonly.However, participation in the study was done pairwise, using adependent random sample comprising 229 teachers and theircorresponding supervisors (N=229). Only certificated teachers with morethan three years of teaching experience took part in this study. It was feltT. J. NHUNDU 39that since untrained teachers were not professionals, their understandingof the teaching profession would be limited. Teachers who had notpersisted beyond the heavy attrition period of three years, on the otherhand, were considered less experienced and, thus, assumed to beunfamiliar with the rating scales and the measurement constructs andtheir expectations. According to Thornton (1980, 450), objectivity of self-assessments depends, in part, on the accuracy with which rating scalesare interpreted. Thus, familiarity with rating scales leads to clearerunderstanding of the meaning of concepts being measured which, inturn, results in accurate interpretations and objective measurements.RESULTSComparison of Areas of Greater and Lesser PerformanceEffectivenessTeachers' Ratings of the Effectiveness of their PerformanceTeachers were requested to rate their performance on selected teachingand teaching-related tasks using a five-point Likert-type rating scale rangingfrom 1 (poor), 2 (fair), 3 (good), 4 (very good) and 5 (exceptional). Thethirty tasks on which they rated the effectiveness of their performancewere further classified into four broad job dimensions, viz; Curriculumand Instruction (CI), Human Relations (HR), Personal Development (PD)and School-Community Relations (SCR). Means and standard deviationswere computed for each of the thirty tasks. Their responses were rank-ordered to reveal areas of greater and lesser effectiveness.Table 1 below lists the top ten tasks which were rated by teachers astheir areas of greater performance effectiveness. The overall mean scorefor the ten top tasks was 3.52 which is greater than the theoretical meanscore of 3.00 (assuming normal distribution of responses). All the top tentasks had mean scores above the theoretical mean.Of the top rated ten tasks appearing in Table 1, nine were classifiedas "Human Relations", and one as "Curriculum and Instruction". Theapparent preponderance of human relations tasks in the top rated tentasks clearly indicates that teachers in the research sample were mostconcerned with idiographic dimensions of their job than with thenomothetic aspects of teaching. The teachers' performance effectivenessin idiographic-related tasks was generally superior compared with othertasks. These results thus suggest that teachers in the sample valuedmore and performed better where their relationships with superiors,fellow teachers and students were concerned than in other aspects oftheir job.40 ASSESSING TEACHER PERFORMANCETable 1TOP TEN TASKS RANKED ACCORDING TO TEACHERS1 RATINGS OF THEEFFECTIVENESS OF THEIR PERFORMANCE ON SELECTED TEACHINGAND TEACHING-RELATED TASKS (N=229)Rank12345678910Category*HRCIHRHRHRHRHRHRHRHRPerformance TaskMaintaining good rapport withcolleaguesClassroom management and controlAbility to make friendship withcolleaguesMaintaining good rapport withsuperiorsAssisting in extra-curricular activitiesParticipating in staff meetingsConsistence and fairness withstudentsProviding good leadershipReflecting and acting uponsupervisory adviceCooperating with colleagues inlesson planningMean3.743.643.613.603.493.463.453.443.413.40sd0.930.830.950.981.000.970.881.120.870.92'Category: HR = Human Relations; CI = Curriculum and InstructionThe above finding further highlights the importance of human relationsas a possible source of job satisfaction among teachers in Zimbabwe.According to Pigge and Lovett (1985) and Siegel and Bowen (1971) citedby Nhundu (1992), job satisfaction is both a result of, and dependent on,good performance. Accordingly, job satisfaction for teachers in this samplewould more likely derive from human relations aspects of teaching wheretheir perceived performance effectiveness was greatest compared withperformance in other areas of their job.Table 2 which lists the lowest rated ten performance tasks showsthat four of these tasks belonged to "Curriculum and Instruction", threewere on "School-Community Relations", one was on "Human Relations",and two were on "Personal Development" job dimensions. According toTable 2, all but three of the mean scores for the least rated tasks hadvalues above the theoretical mean score of 3.0. The least rated task had amean score of 2.60 and the highest rated received a performance ratingscore of 3.24, while the overall mean score for the lowest rated ten taskswas 3.01. The overall rating mean score for the ten bottom ratings showsthat teachers in the sample rated their performance on these tasks as212223CIHRCIT. J. NHUNDU 41"good". But when compared with the corresponding overall mean scorefor the top ten tasks of 3.52, the performance of teachers on the tenbottom tasks is substantially inferior.Table 2BOTTOM TEN TASKS RANKED ACCORDING TO TEACHERS' RATINGS OFTHEIR PERFORMANCE EFFECTIVENESS ON SELECTED TEACHING ANDTEACHING-RELATED TASKS (N=229)Rank Category* Performance Task Mean sdKeeping accurate records 3.24 0.94Showing empathy for students 3.18 0.87Preparation of long and shortterm plans 3.18 0.8724 CI Appropriateness of lessonintroduction and closure 3.16 0.8125 CI Responding to students' needs,aptitudes and learning stylesIngenuity and innovativenessStudent counsellingEncouraging parental involvement instudent learningParticipation in community activitiesHolding parental conferences"Category: CI = Curriculum and Instruction; PD = Personal Development; SCR = School-CommunityRelations; HR = Human RelationsThe preeminence of "Curriculum and Instruction" followed with"School-Community Relations" tasks among the ten bottom rated tasksshows that teachers in the sample considered themselves relatively lesscompetent in carrying out the tasks which are central to the teachingprofession, that is, curriculum and instruction and, in particular, issuesconcerning recent government policy towards greater community andparental involvement in local school governance. The fact that apreponderance (70%) of the ten least rated performance tasks belongedto these two job dimensions may suggest that teachers perceived theirperformance in these areas to be relatively weaker. Hence, in view of thecentrality of curriculum and instruction issues to a school's mission andthe emerging parental role in school-based decision making, thepredominance of these two job facets in the ten bottom rated tasksshould be considered from the perspective of the potential which2627282930PDPDSCRSCRSCR3.143.073.012.912.632.600.840.840.951.111.251.1142 ASSESSING TEACHER PERFORMANCEdiminished teacher performance in these areas may have on teachingand educational standards.From this finding, it also appears that concern for the human side ofthe school enterprise (where only one item appeared among the tenbottom rated tasks) is emphasized at the expense of pedagogy andpedagogy-related issues. This finding should be a cause for concern forpolicymakers, educationists and school administrators in Zimbabwe.Accordingly, Government's recent shift from quantitative expansion toqualitative improvement in primary and secondary education andenhanced local school governance should take cognisance of relatedresearch findings so that appropriate teacher training interventionprogrammes (including pre-service) can be designed to prepare, improveand strengthen teacher performance in curriculum and instruction-relatedareas. This finding also has important implications for in-service andother staff development programmes which seek to raise teachingcompetencies of practising teachers.Supervisors' Ratings of the Effectiveness of TeachersSupervisors were requested to assess the performance of teachers on thesame dimensions, using a performance assessment questionnaire identicalto that used by teachers. Their mean rating scores which now appear inTables 3 and 4 below were rank ordered to determine areas of lesser andgreater teacher performance effectiveness.The overall mean score for supervisor ratings of the top rated tentasks listed in Table 3 was 3.69 compared with 3.52 obtained for teacherratings. Table 3 also shows that the areas of greater teacher performance(according to supervisor ratings) were also predominantly in humanrelations which accounted for seven of the ten top ranked tasks. Acomparison of teacher ratings of their performance and the supervisors'ratings of the performance effectiveness of teachers which appears inTables 1 and 3 respectively, shows a remarkably close agreement betweenthe two independent ratings.Firstly, there is general agreement that teacher performanceeffectiveness is greatest in the area of human relations. Tables 1 and 3further show that all the seven human relations tasks rated highly byteachers were also rated in nearly the same rank order by supervisors.Finally, the first four tasks that received the highest performanceeffectiveness scores according to teachers' ratings are identical to thoseon the supervisors' list except that the mean performance scores forsupervisors are slightly inflated compared with those for their supervisees.2345678910CIHRHRHRHRHRHRPDPDT. J. NHUNDU 43Table 3TOP TEN TASKS RANKED ACCORDING TO SUPERVISORS' RATINGS OFTHE PERFORMANCE EFFECTIVENESS OF TEACHERS ON SELECTEDTEACHING AND TEACHING-RELATED TASKS (N=229)Rank Category* Performance Task Mean sd1 HR Maintaining good rapport withcolleagues 3.92 0.90Classroom management and control 3.83 0.90Ability to make friendship withcolleagues 3.77 0.94Providing good leadership 3.65 0.93Maintaining good rapport withsuperiors 3.64 0.93Assisting in extra curricular activities 3.63 1.07Consistence and fairness withstudents 3.63 0.88Participation in staff meetings 3.62 1.09Initiativeness 3.60 0.90Ability to make independentdecisions 3.59 0.87*Category: HR - Human Relations; PD - Personal Development; CI = Curriculum and InstructionThe areas of least teacher performance effectiveness according tothe supervisors' assessment appear in Table 4 below. Six of these tasksbelong to 'Curriculum and Instruction', two to 'School-CommunityRelations' and one each to 'Human Relations' and 'Personal Development'.The overall mean performance rating score for the ten bottom ratedtasks computed from supervisor ratings was 3.20 compared with 3.01obtained from the self-ratings by teachers. This indicates that while theoverall performance mean rating scores for both sub-groups were abovethe theoretical mean score of 3.00, indicating that the two sub-groupsrated the performance of teachers on the ten bottom rated tasks as good,both the overall and item-by-item rating scores of supervisors remainedconsistently inflated than those for self-ratings.Furthermore, supervisors' assessment of the areas of least teacherperformance effectiveness agrees with the assessment by teachers in sixof the ten bottom tasks. There was also general agreement betweenteachers and their supervisors that the job dimension that was leastperformed by teachers was 'Curriculum and Instruction'. For teachers,four of the bottom ten tasks belonged to this job facet while supervisors'assessment identified six of the bottom ten tasks as belonging to the44 ASSESSING TEACHER PERFORMANCETable 4BOTTOM TEN TASKS RANKED ACCORDING TO SUPERVISORS' RATINGSOF THE PERFORMANCE EFFECTIVENESS OF TEACHERS ON SELECTEDTEACHING AND TEACHING-RELATED TASKS (N=229)Rank Category* Performance Task Mean sdOrganising student learning activities 3.33 0.86Developing challenging teachingactivities 3.32 0.92Appropriateness of lessonintroduction and closure 3.31 0.89Preparation of long and short termplans 3.29 0.78Cooperating with colleagues in lessonplanning 3.19 0.92Responding to students needs,aptitudes and learning styles 3.18 1.00Suitability of learning materials,illustrations, etc. 3.18 0.99Ingenuity and innovativeness 3.13 0.79Encouraging parental involvementin student learning 3.04 1.0930 SCR Participation in community activities 2.62 1.24ŁCategory: HR = Human Relations; CI = Curriculum and Instruction; SCR = School-CommunityRelations; PD = Personal Developmentsame job facet. The next least performed job facet was 'School-CommunityRelations' whose tasks were the least performed of all the bottom tenleast rated tasks. However, the ratings of supervisors remainedconsistently, but slightly, inflated compared with those for supervisors.Variations in mean ratings ranged from 0.02 to 0.08 between the leastperformed and best performed of the bottom ten least performed tasks,respectively.Comparison of the Perceptions of Teachers and their SupervisorsConcerning Relative Leniency, Restriction of Range and HaloErrorsLeniency: Means and standard deviations were computed for all 30 jobdimensions and these were used to compare the relative leniency andrange errors, respectively. The results of this analysis appear in Table 5below. While these results show that 24 of 30 mean supervisor ratingscores were larger than self-ratings, and that only one mean rating score212223242526272829CICICICIHRCICIPDSCRT. J. NHUNDU 45was the same for both groups, this however, does not allow for therejection of the null hypothesis that there is no difference in the ratingsof the two groups. To test the hypothesis, a Wilcoxon matched-pairssigned-rank test was computed. The test takes into account the magnitudeand direction of the differences between paired mean rating scores ofteachers and their supervisors obtained using the same rating scale.The results of a two-tailed Wilcoxon matched-pairs signed-rank testanalysis (N=29, T=64, p>0.05) showed that the ratings of supervisors weresignificantly higher than corresponding ratings of teachers, even at 0.01level of significance. This finding shows that supervisor ratings had greaterleniency error than self-ratings.Table 5MEANS AND STANDARD DEVIATIONS OF SELF AND SUPERVISORRATINGS COMPUTED TO ASSESS RELATIVE LENIENCY AND RANGEERRORS (N=458)Supervisor Ratings Self RatingsPerformance Dimension mean sd mean sdPreparation of long and shortterm plansDesigning appropriate objectivesOrganising students activitiesKeeping accurate recordsCooperating with colleagues inlesson planningReinforcing studentsDeveloping interesting andchallenging learning activitiesResponding to students' needs,aptitudes and learning stylesUsing a variety of appropriatequestioning techniquesSuitability of learning aids,illustrations, etc.Appropriateness of lessonintroduction and closureIngenuity and innovativenessShowing empathy for studentsStudent counsellingConsistence and fairness withstudents 3.62 0.88 3.45 0.883.293.363.333.423.193.553.323.183.373.183.313.133.393.430.780.840.860.961.040.960.920.920.910.990.890.790.881.003.183.293.333.243.403.343.343.143.293.273.163.073.183.010.870.840.780.940.920.890.860.840.900.830.810.840.870.9546 ASSESSING TEACHER PERFORMANCETable 5 (cont)Performance DimensionEncouraging parental involvementin students' workHolding parental conferencesStudent supervisionParticipation in community activitiesParticipation in staff meetingsMaintaining good rapport withcolleaguesInitiativenessProviding good leadershipAssisting in extra curricular activitiesReflecting and acting uponsupervisory adviceAbility to make independentdecisionsMaintaining good rapport withsuperiorsDeveloping own teaching approachesAbility to make friendship with otherteachersClassroom management and controlSupervisor Ratingsmean3.042.393.522.623.623.923.603.653.633.443.593.643.543.773.83sd1.091.150.851.241.090.900.900.931.070.890.870.930.910.940.90Self Ratingsmean2.912.603.392.633.463.743.253.443.493.413.393.603.393.613.64sd1.111.111.011.250.970.930.821.121.000.870.830.980.900.950.83Restriction of Range Error: Corresponding paired standard deviationsused to assess restriction of range error (relative variability of ratingscores) of supervisor and self-ratings also appear in Table 5. Accordingto the results in Table 5, 19 of the 30 standard deviations were larger forsupervisor ratings suggesting greater variability of supervisor ratings onthese items. Of the remaining 11 standard deviations, nine were larger forself-ratings and two were the same for both groups. However, to determinewhether these preponderantly larger supervisor ratings indicated overallsignificant differences between the ratings of teachers and supervisors, atwo-tailed Wilcoxon matched-pairs signed-rank test was computed. Theresults of this analysis (N=28,T= 145. p>0.10) failed to produce statisticallysignificant differences between variances of supervisor ratings andcorresponding self-rating variances, indicating that the observeddifferences might have been due to chance.T. J. NHUNDU 47Halo Error: Intercorrelation matrices for performance dimensions forself-ratings and those for supervisor ratings were used to assessthe relative halo error between performance ratings of the twogroups. The monomethod-hetero-trait triangles (which were too large toinclude in this article) were used to investigate the incidence of haloerror. The method involves comparing the sizes of intercorrelationsobtained from more than one rating source (e.g. supervisors and teachers)based on rating scores independently obtained using the same ratinginstrument. When halo effect is measured using this method, a ratingsource which produces larger intercorrelations indicates greater haloeffect.According to intercorrelation matrices obtained for this analysis,there were 435 possible comparisons between self- and supervisor ratings(each triangle had 435 intercorrelations). However, only 425 comparisonswere possible since 10 intercorrelations were the same for the two groups.Since the level of halo effect for this study was obtained by comparingthe magnitude of intercorrelations for items obtained from the ratings ofteachers and supervisors, a comparison of the 425 intercorrelations forthe two groups showed that intercorrelations for supervisors were greaterin 229 comparisons. However, to determine whether these preponderantlylarger supervisor intercorrelation coefficients indicated overall significantdifferences between the ratings of teachers and supervisors, a two-tailedWilcoxon matched-pairs signed-rank test was computed. The results ofthis analysis (N=425, T=477, p>0.05) show that intercorrelations forsupervisors were significantly larger than those obtained for self-ratings,indicating that supervisor ratings had significantly greater halo errorthan self-ratings.Overall Performance AssessmentAn analysis of the perceptions of teachers and their supervisors on theirratings of all the 30 items on the questionnaire and also on the top andbottom ten tasks revealed that there was general agreement concerningtheir selection and ranking of the performance effectiveness of teachers.The overall mean performance score obtained from the ratings of teacherswas 3.29 compared with a slightly inflated overall mean score of 3.47 forsupervisors. A Spearman rank order correlation coefficient computed forrank-ordered means of teachers and supervisors on the 30 items produceda high rank order correlation coefficient (rho = 0.97), indicating a verystrong positive relationship between the perceptions of the two groups.The high Spearman r obtained from rank-ordered means of the twogroups further indicates that there was very high consistency in therankings of teachers and supervisors. This also shows that the two groupsheld similar perceptions over areas of lesser and greater task performance48 ASSESSING TEACHER PERFORMANCEby the teachers. However, the mean scores for supervisors were generallyslightly higher on all 30 tasks.A two-tailed t-test analysis was run on the 30 items to determinewhether the observed variances between the mean scores of teachersand supervisors were statistically significant.Table 6A T-TEST ANALYSIS OF THE PERCEPTIONS OF TEACHERS AND THEIRSUPERVISORS CONCERNING THE PERFORMANCE EFFECTIVENESS OFTEACHERS ON SELECTED TEACHING AND TEACHING-RELATED TASKS(N=458)Category*CIPDPDCIPDHRPerformance TaskClassroom managementand controlInitiativenessAbility to makeindependent decisionsReinforcing studentsStudent counsellingShowing empathy forstudentsMean ScoreTeacher3.833.603.593.553.433.39Supervisor3.643.253.393.343.013.18T-value1.993.632.051.993.832.10**Pvalue0.0480.0000.0410.0480.0000.037ŁCategory: CI = Curriculum and Instruction; HR = Human Relations; PD = PersonalDevelopment*'Probability value based on two-tailed test of significanceThe results of the t-test analysis which appear in Table 6 above showthat only six of the 30 job tasks produced statistically significantdifferences between the ratings of teachers and those of supervisors. Ofthe six areas where statistically significant differences emerged betweenthe ratings of the two groups, three of the tasks were on 'PersonalDevelopment', two were in the area of 'Curriculum and Instruction' andone was on 'Human Relations'.DISCUSSIONSupervisor ratings are commonly valued and used by school jurisdictionsto acquire insight into teacher performance effectiveness and to assistthem make administrative decisions because they are considered to bemore objective and robust compared to other performance assessmentmethods. According to Holzbach (1978, 587), objectivity of supervisorratings is attributable to the supervisors' wide experience andT. J. NHUNDU 49responsibility in evaluating job performance as well as their familiarityand sensitivity to differentiate among specific job-related behaviours ofindividual supervisees. The findings of previous studies on rater bias interms of leniency, halo and restriction of range errors were replicated inthis study. Contrary to most previous research findings (Ash, 1980; Hubertand Dueck, 1985; Johnston and Sackeney, 1982; and Levin, 1980), thisstudy showed that supervisor ratings produced greater leniency errorthan corresponding self-ratings. On variability and halo effects, the resultsof the present study are in agreement with previous research by Heneman(1974), Holzbach (1978) and Lawler (1967).However, the results of this study on leniency provide more supportto previous studies (Heneman, 1974; Miner, 1968. and Nhundu, 1992)which are at variance with the more prevalent findings of other studieswhich show that self-ratings have higher leniency error compared tocounter position ratings. While supervisor ratings were consistentlyinflated, the differences in the ratings of the two groups were significantlydifferent in only six of the 30 rating scales. Comparability in performanceassessments between the two groups was determined using Spearmanand Pearson correlation coefficients. The results also showed that self-ratings had significant correlations with supervisor ratings on identicalperformance tasks.A possible explanation of these findings on leniency is that rater biasmay be influenced by the rater's sensitivity and awareness of specificperformance scales and job-related behaviours that contribute tomeasures of performance effectiveness. It is therefore possible that,because of the highly selective nature of the study sample whichcomprised of experienced teachers only, teachers in the sample werefamiliar with and had a clearer understanding of the rating scales and thejob behaviours being measured. This would then make it possible forteachers to carry out more objective diagnostic assessments of theirindividual performance behaviours than supervisors who may have amore generalized global understanding of job-related behaviours ofsupervisees.Additionally, the fact that the ratings were obtained under conditionswhere the findings were to be used for research purposes only may havecontributed to more objective self-assessments as previously suggestedby Heneman (1974). However, it is also important that research shouldseek to identify the sources of rater bias if it is going to contribute tomeaningful improvement of performance practices. It is not enough forresearch to show that leniency error is attributable to rating sourceswithout being able to identify the sources of leniency error. Accordingly,future research should seek to identify the sources of leniency attributableto rating sources. At the same time, future research should use more50 ASSESSING TEACHER PERFORMANCErating sources such as peers, superiors, and students so that multiplecomparative analyses can be carried out to establish the behaviour ofself-ratings against these sources. The results of these studies willcontribute towards a better understanding of current performanceassessment practices, especially in universities where multiple ratingsources are routinely used in evaluating lecturer performance.The evidence obtained from this study clearly shows that althoughsupervisor ratings exhibited greater leniency error than self-ratings,Pearson correlation coefficients computed to determine the comparabilityin performance ratings between supervisor and self-ratings revealed thattheir ratings were significantly correlated, contrary to previous researchfindings by Holzbach (1978). The current finding indicates that teachersand supervisors were measuring the same performance behaviours andalso that they had a common understanding of the measuring scalesused. Similarly, a very high Spearman rank order correlation coefficientobtained for rank-ordered mean scores for the two groups indicates thatthe teachers and their supervisors held similar perceptions over areas ofgreater and lesser teacher performance effectiveness.Results on restriction of range error which produced larger (19 of 30comparisons) variances for supervisor ratings failed to produce significantdifferences (p<=0.10, two-tailed Wilcoxon matched-pairs signed-rank test)indicating that the differences may have been due to chance instead.Since restriction of range error attributable to specific rating sourcesoccurs only when the variances from different rating sources concerningthe same ratee group are significantly different, the present findingsuggests that the observed variability in scores for self- and supervisorratings were, therefore, generally similar. This explanation is furthersupported by significant correlations which showed that the two groupsshared similar perceptions on performance rating scales used in thisstudy which, consequently, led to close agreement over areas of greaterand lesser performance effectiveness.The fact that the variability of the ratings of teachers and theirsupervisors was generally the same may further indicate that the twogroups had clearer understanding of the concepts measured and therating scales used, and also of the job performance behaviours ofsupervisees. A clearer understanding of the meaning of concepts beingmeasured increases accuracy in the interpretation of rating scales(Thornton, 1980) which, in turn, might have helped narrow the range ofvariance between the two groups.The finding on the incidence of halo error is consistent with previousresearch (Heneman, 1974; Holzbach, 1978 and Klimoski and London,1974) which found that self-ratings contained less halo error thansupervisor ratings. Since halo effect is a form of rater bias which resultsT. J. NHUNDU 51when a rater assesses the performance of a ratee globally because of therater's failure to differentiate among specific job performance behaviours,the results of this study suggest that supervisors tend to assess theperformance of their supervisees globally. Their long experience andresponsibility for routine subordinate performance evaluation may,inadvertently, influence supervisors to evaluate their subordinates globallywithout much reference to specific job performance behaviours. Thus,the findings of this study suggest that although supervisors may readilyand globally identify a good teacher from a bad one because of their longyears of experience in performance evaluation, data for training needs ofteachers should be based on an assessment of their performance inspecific job dimensions and not global performance. Accordingly, sinceself-ratings showed less halo error, indicating ability to discriminateamong specific performance dimensions, teachers in the research sampletended to have greater awareness of areas of strengths and weaknessesin their performance than their supervisors who assessed them globally.In conclusion, the findings of this study show that self-ratings havethe greatest potential for providing the database from which acompendium of training needs of teachers can be compiled. Such adatabase provides scope for incorporation into pre-service teacherpreparation programmes. The value of self-performance assessmentshighlighted in this study further suggests the need to provide teachers,during both pre-service and in-service training, with basic supervisoryskills which will enhance their capacity for self-assessment. Equallyimportant is the need to alert supervisors of the value of self-assessmentsand how these can inform and enhance the supervisory process.Meanwhile, greater halo error associated with performance ratingsof supervisors indicate that supervisor ratings tended to provide a globalpicture of teacher performance without identifying specific job-relatedbehaviours that are of interest in the design of staff developmentprogrammes that seek improvement in teacher performance. Effectivestaff development programmes should address identifiable areas ofdeficiency in teacher performance; and such programmes can, therefore,benefit more from self-ratings. Hence, the presence of less halo errorreported in this study may further explain why Balzer (1973) and Natriello(1977) concluded that self-ratings had greater potential of producingchanges in teaching behaviours than supervisor ratings. What the presentstudy has further found is that self-ratings are of potentially superiorvalue to educational managers and educationists because they possessless leniency and halo errors compared to supervisor assessments.Whether self-ratings will retain less leniency and halo errors underconditions where the ratings are used for non-research purposes is asubject for further investigation. Similarly, future research should also52 ASSESSING TEACHER PERFORMANCEseek to identify the sources of leniency and halo effects so thatperformance counselling can be directed at the most needy areas forteachers.ReferencesAsh, R. A. (1980) 'Self-assessment of five types of typing abilities', PersonnelPsychology, XXXIII, 273-282.Balzer, L. (ed.). (1973) A Review of Research on Teacher Behavior(Columbus, Ohio, ERIC/SMEAC).Heneman, H. G. (1980) 'Comparisons of self and supervisor ratings ofmanagerial performance', Journal of Applied Psychology, LIX, 638-642.Holzbach, R. L. (1978) 'Rater bias in performance ratings: Superior, self-and peer ratings', Journal of Applied Psychology, LXI11, (v), 579-588.Hubert, B. D. and C. G. Dueck. (1985) 'On-the-job training of assistantprincipals in selected tasks in the Calgary School system', AlbertaJournal of Educational Research, XXXI, (xxxi), 270-287.Johnston, J. M. and L. E. Sackeney. (1982) Principals' Classroom SupervisoryPractices (Saskatoon, University of Saskatchewan, College ofEducation).Klimoski, R. J. and M. London. (1974) 'Role of the rater in performanceappraisal', Journal of Applied Psychology, LIX, 445-451.Lawler, E. E. (1967) 'The Multitrait-multirater approach to measuringmanagerial performance', Journal of Applied Psychology, LI. 369-381.Levin, B. (1979) 'Teacher evaluation: A review of research', EducationalLeadership, XXXVII, (iii), 240-245.Levin, E. L. (1980) 'Introductory remarks for the symposium onorganizational applications of self-appraisals and self-assessment:Another look', Personnel Psychology, XXXIII, 259-262.Mclaughlin, M. W. (1984) 'Teacher evaluation and school improvement',Teacher's College Record, LXXXVI, (i), 193-207.Meyer, H. H. (1980) 'Self-appraisal of performance', Personnel Psychology,XXXIII, 291-295.Miner, J. B. (1968) 'Management appraisal: A capsule review and recentreferences', Business Horizons, I, 83-96.Natriello, G. (1977) A Summary of Recent Literature on the Evaluation ofPrincipals, Teachers and Students (ERIC Document ReproductionService).Ness, M. (1980) 'The administrator as instructional supervisor',Educational Leadership, XXXVII, (v), 404-406.Nhundu, T. J. (1994) 'Facet and overall satisfaction with teaching andemployment conditions of teachers in Zimbabwe', Zimbabwe Journalof Educational Research, VI, (ii), 153-194Š (1992) 'Job performance, role clarity, and satisfaction among teacherinterns in the Edmonton Public School system', The Alberta Journal ofEducational Research, XXXVII, (iv), 335-353.T. J. NHUNDU 53Paulin, P. (1981) 'The Politics of Evaluation at the Local Level: A ViewThrough Teachers' Perspectives', Paper presented at the AnnualMeeting of the American Educational Research Association, 13-17April 1981, Los Angeles, California.Pigge, F. L. and M. T. Lovett. (1985) Job Performance and Job Satisfactionof Beginning Teachers (ERIC Document Reproduction Service).Reavis, C. (1978) 'Clinical supervision: A review of research', EducationalLeadership, XXXV, (vii), 580-584.Stark, J. S. and M. A. Lowther (1984) 'Predictors of teachers' preferencesconcerning their evaluation', Educational Administration QuarterlyXX, (iv), 76-106.Thornton, G. C. (1968) 'The relationship between supervisor and self-appraisal of executive performance', Personnel Psychology, XXI, 441-455.Š. (1980) 'Psychometric perceptions of self-appraisal of job performance'Personnel Psychology, XXXIII, 263-271.Wolf, R. (1973) 'How teachers feel toward evaluation', in Ernest Haus,(ed.) School Evaluation (Berkley, California, McCutchan).