W
m V.
- - «If; .91.-

Io ltn‘. '1 '

'b

I’»

1“

1:

f2,

-1!

)13V

fff' f 4 4 ‘ 2 .
'1: ».-
1. 11' -"-.a '-
L 4- ‘ “
‘7

3).
~34.
'1’
411:»:
32$
}
:7;
::‘1.'_'
- {a
' 'Ia '
Lo
If.
5:1“ .1 .
. . {WC-2:.“ “1‘ _.~
_‘ ‘lLb: *1 : _‘. .
-‘:.;."‘
"r
5,-
{11;
I"
I

_ “‘35: ' 1“
I .' - I $.01 “I -Ji7“ :"‘1Q.5111-~'~":"}7£I.. ;‘~:L,\:':3;|';"‘ if: ‘ 9,1"
‘ :&m- '1.1.‘. ‘.‘.v '. 1;,3L'I‘ IJJ 1.’ , (L — 1'4-
_ ““I :‘u. VIE",- . D h 3:”. 1‘". A; 21 ‘igj' _,:'
. - . . 1.111111 ' ~ 1'71111'1- - '-.

5‘. 1-1:" and: 91%
"41'- ..Iu‘ﬂnwh‘IJV” z.
_ "i'ﬂﬁr {bi-f" £3.13?“ 'é: ”Mf'ﬁ‘ 5:?
" : "L'\1"~‘:>J .‘_"I1‘. .
"P1“.

 

     
   
    
      
 
   

ail 4E: “§L. 'L" c "

J 115:; zy‘I"‘6?g\..3'.‘
‘11“‘1.SLV1: ‘11‘17'
1'11: 2:11: 1.11 1.1

' x.
“3" 40;: "5.22%.". "J ,1
‘- .1 I" t]:
1‘ 1}

- ‘ 115:.“ 5:"
“IV-L” «5‘ “'1' "i‘ I}: J
' grit-sag; .I 91‘. Hugh“

. . ,2 ‘ﬂﬂﬁv‘ﬂ: 3w;
at» “$3“; ‘1'?’ 1 +1" 1"1
fl} u£~ . 153.5in1 u “LG;
.9. V,”

N‘. D -

(-5 Q1:

    
   
   

<1"F W1;
I . -‘ ::££:.- ‘
m," ‘L "‘ '1 -- .‘ZF‘LW --‘. I. 111;: SP: "ﬂax; “Jr ”631'
‘V _ "_ ”I ' -‘ ‘ - Ami.“ i1" .‘1‘ l' ‘. I; ‘ “'vp' .
ﬁvﬁblﬁﬁié‘i‘ ""i‘: _. ‘ . Em‘ " i‘ ”'4‘; IT"£ ‘q

"Zél'tri‘u .

1:?
333"“? M1 ‘-
I. L‘ﬂl'J ' I.
3.4.... I1,
1 I

 
  
 

   
   
    
    
 
 
      
      

  
 

  

     
 

ﬁre 1:2 ti€""f:1‘1'*§ ‘3"

  
   
   

A

 
 
    
  

 

 
 

'3 5'11 ‘13 l 1*
“£514. "%5 1‘ ' " :l ‘i' '
1' ‘8‘ :¢.-' ﬁx].

  
 

-f,
'1:
‘ V v
- I
‘ L 'V ‘
f
.31 £81
¢ —; -

r1

._ (iii:
1

4

I’

 
 
    
    
     
    
       

     
   
   
 
 
 

) I! "E‘ 4 I . .

{(41% y‘f'qrr.‘ I: JV"; ,,; £F“€hl.‘V-V .1574“ “ .

my; 141%: '11. WI?" . ‘11-. 5‘”“~‘~§ﬁ"é ‘?'§”“?"’-‘7.1'113’-:""~J1";F
a'IJT ' I;:

  

    

   

             
  
 

 

      
     

    

 

  

_ h 2
.1 . p 1‘ .A‘I‘ ‘. ' -, II
If"? ”"1 “€11, '1 ‘1'" I ’2 ' ‘ I " ,
': " ' W '1’: ""v "I '3 'I ' . JI ' . ﬁne???"
it. I #I1'Ig.gug&, ‘ 'n ”1"?” J
1 H “ﬁt" L I "I FL". I "2 . 1.“ ' IIV‘IVI‘F”? -‘ .‘ V
I .‘ ,I‘" u ff“... : 'Il'ul 3:0; “I. If." ' L. ‘ . ‘1‘ -‘1"I'1
I I '1".- ' I1 , ‘MJ . .- IV ' 7..
II: L (III): [IV “Ir .1. ‘III’N' J; ’p"" .'| 0' - ' ..' '-
1 .
I _' f' ' Z
,1 :1. .A'I'” L,
D. H. ”(I VA!" . I I. 't.| 'Iﬁ 1;; ‘h , : 1:4,. I5 4 ‘\ rf"..= ' A -;;{.h‘
“v ﬁg": '. " ' VJ" .‘ ;:l"‘J-"I \I E' -- ' 1'1'5'71‘1': “3"". .prfég ' 11‘” '5 #Ii»:" ”5%?" '1 'I"
k I . I I, 3 “ﬂ. J .1 I I V I - 5 ”F l ---' '
‘| ‘ WIN}: ' M ‘I tr". ‘I '1: W’f " If” h "'J. M 5' [In ' t C :33}; €15} "m v.15- I. 3}" 1", "” I39:
'\ I:' '1‘? ”‘1" N ‘L I I“ M? 'l1 ,‘J ' I“ ’itdl !‘ 'Il. "an“ t In: 3.3%”? " - Wk
.. L." ’ ’ “‘JL" 1:: WI |' '.: "I I ”1.1.‘ 1;. I! H. A I f‘ 1" "4 '3 “ i »‘ 5 7“ ‘ J \:.\ . ~ .
I“ “'II ”.31 «IAIN I,“" .~ "‘11" 'kI"'}‘I}' III II In _ " I I'K.‘ ' "k 3:
g I “.73.. ".‘.'I. v ‘ :IIII “-‘J'. 1“”?“1 ,p I ,;.';‘II ‘II’ _
Q . I‘II. «I'M. 'I" "‘1 .
3' ‘ |"

‘ .'1 , ' Im's”, "9‘21 ‘.. '(V‘MI'

“W 7515”!” 13g} M' . £1.13???” 3“? 1 ' "'r'

“WWW " ’3‘ I"'II’1'I.11I.~11I iI - ' "i.” '. 1 , "1.I'.E'1.'I.
'3. ,."', I. .391: 1.. 1 , . 1 -- 1P 1. 1.. »

“”1“” :IIIIF‘$"1.;"-;-}I£ 3p {1091061316 “I "'11, l i {rt "N” I”; ' . &;

42:“ ”)1th I ‘ . . J'

W ~-. 'I“

. 5.1.1,; . . I‘III

' II IBM," 1pr 'II iJII I. AWN,“

JI 4r”; " "UNI. I ml. V”.

\I' W1, a."L '1239'1.‘.""§I"3’I~¢1”’:"1 W :1?“ 1‘. WI.“
II"‘I1I:.'J' Info

1,1311 1. 1,? 1‘1 ' ‘

“40" {WEI} wa‘lwn "I53

0 ILILHA“ M: | 22”!" . I‘lwflg‘lt-I‘

    
  
  

  

 
 

. " 11L,
111$ “1.112321";

THESIS

List-arr" ;
MiCHEGAF-f 3‘. - a I ”To, KETY
EAST L’ulti‘k ti». - J. in .:4"V+ ‘11:;4

This is to certify that the

dissertation entitled

The Relationship of Rating Error to
Personality Characteristics of the
Myers-Briggs Type Indicator

presented by

Thomas R. Holmes

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degreein Counseling Psychology

 

Major professor

[xn60ctober 26, 1983

MSUis an Affirmative Action/Equal Opportunity Inuuuuon 042771

 

‘ Illlllllllllllllllllllllllllllu L

3 1293 010821167

MSU]

 

LIBRARIES
n

 

 

RETURNING MATERIALS:
Place in book droﬁ‘to
remove this checkout from
your record. FINES will
he charged “ hrw- is

r A

 

THE RELATIONSHIP OF RATING ERROR TO PERSONALITY
CHARACTERISTICS OF THE MYERS-BRIGGS TYPE INDICATOR

BY

Thomas Holmes

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

College of Education

1983

ABSTRACT
THE RELATIONSHIP OF RATING ERROR TO PERSONALITY
CHARACTERISTICS OF THE MYERS-BRIGGS TYPE INDICATOR
by

Thomas Holmes

This study explored the relationship between the
personality of raters and the type of rating errors they
make. The personality typology of Carl Jung, as
operationalized in the Myers-Briggs Type Indicator (MBTI)
was explored and several personality types were selected as
most related to factors involved in rating error.

The history of the study of rating identified the
rating errors: Leniency/Severity, Range Restriction, Halo
and low Interrater Reliability. The personality types were
then used to predict the nature and degree of rating error
expected.

A sample of fifty-six raters, undergraduate students,
rated six therapist - client interactions, and three
speeches. There was a total of seventy-two ratings from
each rater. The raters, who had been tested on the MBTI
prior to making their ratings, were categorized according to
personality types. Their patterns of rating and the nature
of their rating errors were then analyzed to see if there
were significant differences between type.

The results yielded a number of significant results.

It was found that, as predicted, the Sensing/Judging

managerial style persons made
consistently more severe than
managerial style persons. In
Sensing/Judging type was less

Unconditional Positive Regard

Thomas Holmes

ratings which were

the Intuitive/Feeling

addition it was found that the
accurate in their ratings of

than the Intuitive/Feeling

types, while there was no difference in their accuracy in

rating Accurate Empathy. This same result was found for the

MBTI Judging type versus the Perceiving type.

The implications are two-fold: 1) that certain

personality types will have predictably different levels of

accuracy in their ratings; and 2) that these errors tend to

vary according to the task they are rating.

ACKNOWLEDGEMENTS

I would like to express my appreciation to the
following people: to William Farquhar for his patient and
skilled guidance throughout the dissertation process; Ralph.
Kron for his intuitive insight and caring support; Marcia
Carlyn for Sharing her enthusiasm and expertise in the
Myers-Briggs Type Indicator; and William Mehrens for
assisting with my dissertation. I want also to acknowledge
the computer expertise of my son, Dan, who innovated
programs when SPSS was inadequate; and the editorial help
and support of my wife, Lauri. In addition I want to thank
Tim Trichler for his diligence and patience in the

preparation of the manuscript.

TABLE OF CONTENTS

List of Tables . . . . . . . .

CHAPTER I. INTRODUCTION . . . . . .

Purpose of the Study . . .
Research Hypotheses . . . .
Theory . . . .
Rating Error Constructs .
Response Set Theory . .
Jung's Personality Theory .

Extraversion/Introversion .
Perception: Sensing/Intuiting
Evaluative Processes: Thinking/
Myers Briggs Type Indicator
Judging/Perceiving . . .
Psychological Types . .
Overview . . . . . .

eel

oooo'ﬂoooooooo
ooooH-oooooooo

CHAPTER II. LITERATURE REVIEW . . . .

Literature on Rating .

Rating Scales . . . . . .
Rating Error . . . . .
Response Set Theory . . . .
Literature on Personality Type . . .
The Myers Briggs Type Indicator .
Combination Types: Managerial Styles
Summary . . . . . . . . .
CHAPTER III. DESIGN OF THE STUDY . . .
Sample . . . . . . . .
Measures .

The Myers Briggs Type Indicator
Structural Qualities of MBTI
Rating Scales . . .
Design of the Study .
Operational Definitions
Methods of Analysis .
Hypothesis One .
Hypothesis Two .
Hypothesis Three
Hypothesis Four
Hypothesis Five
Summary . . .

omqnbwu H <

I"
H

HHHHH
OUIWUN

CHAPTER IV.

Hypothesis One . .
Hypothesis Two . .
Hypothesis Three .
Hypothesis Four .
Hypothesis Five .
Exploratory Findings
Hypothesis Six . .
Hypothesis Seven .

Summary . . . .
CHAPTER V. SUMMARY AND CONCLUSIONS
Summary of the Study .

Discussion of the Findings
Hypothesis One . .
Hypothesis Two . .
Hypothesis Three .
Hypothesis Four .
Hypothesis Five .

Exploratory Hypotheses .

Limitations of the Current Study
Recommendations for Further Research

Sample Composition .
Design Considerations

Personality Type Considerations

PRESENTATION OF FINDINGS

Rating Scales and Rating Tasks .

Conclusions . . . .

APPENDIX A . .

RATING SCALES USED IN THE RESEARCH

APPENDIX B .

VIGNETTES USED TO ESTABLISH CORRECT
RATING FOR ACCURATE EMPATHY AND
UNCONDITIONAL POSITIVE REGARD.

APPENDIX C .
RESEARCH PROCEDURES

BIBLIOGRAPHY . . .

97

103

LIST OF TABLES

Table
1.1 Definitions of Measures of Quality .
1.2 Four Dimensions of the Myers-Briggs
‘ Type Indicator . . . . . .

1.3 Table of Sixteen Personality Types

of the Myers-Briggs Type Indicator .
3.1 Sample Distribution . . . . .
3.2 Reliability of MBTI Type Categories .
3.3 Reliability of Rating Scales . .
3.4 Design of the Two-Way ANOVA for Rater

by Rater Interaction . . . .

' 4.1 Comparison of Mean Ratings of

Sensing/Judging and Intuitive/

Feeling Types . . . . . .
4.2 Comparison of Ratee Main Effects

for MBTI Personality Types . . .
4.3 Comparison of Mean Ratings of

Thinking and Feeling Types . . .
4.4 Reliability for Six MBTI Types

Across All Scales . . . . .
4.5 Reliability for Six MBTI Types

for each Scale . . . . . .
4.6 Correlation Between Raters' Ratings

Within MBTI Type and for the Sample

as a Whole . . . . . . .
4.7 Comparison of the Mean Variance From

the Correct Rating for Sensing/Judging,

Intuitive/Feeling Types . . . .
4.8 Comparison of the Mean Variance From

the Correct Rating for Judging vs.
Perceiviﬁg Types . . . . .

Page

14

16
40
43
47

54

61

63

64

66

67

69

71

71

CHAPTER 1

INTRODUCTION

The task of having one person judge another person's
performance is a common activity in the fields of counseling
research, applied psychology, and in clinical settings.
Rating scales are the most popular device used in this task,
and considerable effort has gone into developing and
improving the accuracy of these scales. Researchers have
identified specific patterns of error, causing further
efforts to develop scales which are less vulnerable to error
patterns. Extensive research has been conducted on various
methods of construction and analysis of rating scales.

That the source of some patterns of rating error
lies beyond the rating scales and in the personality of the
raters themselves is acknowledged by writers and researchers
but has not been directly investigated. Mehrens and Lehman
see the personality of the rater as one of four sources of
error: ”Error may be due to the scale itself (ambiguity),
the personality of the rater, the nature of the traits being
rated, and the opportunity offered the rater for adequate

observation."l

 

1William A. Mehrens and Irvin J. Lehmann,
Measurement and Evaluation in Education and Psychology,
Holt Rhenehart and Winston, New York, p. 380.

The studies of Ford,2 Gross,3 and Crow and

Hammond,4

have shown that individual rater variables,
independent of training and experience, accounted for
significant amounts of rating error. Thus these studies lend
support to the notion that rater personality is an important
factor in rating error. These observations, however, were
only artifacts for the researchers, whose attention was
focused elsewhere. The relationship between personality type .
and rating error has not been directly investigated, but‘
these earlier studies provide a basis from which to start.
The value in identifying the relationship between
personality and rating error is three-fold. First, it
provides empirical evidence regarding the assumption
concerning personality and its implicit relationship to
rating error. Second, it might enable researchers to limit
rater bias which would be an artifact to identifiable
personality traits, which, if they were dominant in the
sample, could produce high interrater reliability but poor

validity. The third value is that it would test the

predictive validity of the personality constructs and

 

2Adelbert Ford, ”Neutralizing Inequalities in
Rating,“ Personnel Journal, 1931, Vol. 9, pp. 466-469.

3C.F. Gross, 'Intrajudge Consistency in Ratings of
Heterogeneous Persons,“ Journal of Abnormal Psychology,
1961, Vol. 62, pp. 605-620.

4W.J. Crow and H.R. Hammond, “The Generality of
Accuracy and Response Sets in Interpersonal Perception,“
Journal‘gf Abnormal and Social Psychology, 1957, Vol. 54,
pp. 384-369.

measures involved. Positive results would lend further
support to the use of personality considerations when
training and supervising counselors, managers, and other

personnel involved in making evaluations.

Purpose of the Study
The purpose of this study is to examine the relation—
ship between personality characteristics of raters and the
type of rating errors they tend to make. Fifty-four raters
were categorized according to personality type defined by

5 a personality test based

the Myers-Briggs Type Indicator,
on the theories of Carl Jung. Predictions were made about
the kind of rating error which might be expected from
certain different personality types. After being given the
MBTI, the raters were asked to make a series of ratings of
taped interview interactions and speeches. These ratings
were analyzed and compared with the personality type of each
rater's MBTI to determine whether the nature and degree of

the rating errors varied according to the personality types

as predicted.

Research Hypotheses
Five research hypotheses were tested. Each postulated
a relationship between personality type and the nature of

rating errors expected. The six hypotheses are as follows:

 

5Isabel Briggs Myers, The Myers-Briggs Type
Indicator Manual, The Educational Testing Service,
Princeton, N.J., 1962.

 

1. Ratings made by the Sensing/Judging personality
type will be more Severe than those made by
Intuitive/Feeling personality types.

2. The Range Restriction error of Perceiving
personality types will be greater than those found
with Judging types.

3. Ratings made by Feeling personality types will have
more Leniency than those of Thinking types.

4. .The Introvert's ratings will have less Reliability
than will the Extravert's.

5. There will be more Interrater Reliability within
personality type than in the sample as a whole.
Theory
The concepts underlying this research have their roots
in three areas: the rating error theory of the applied
psychologist, the response set theorist's work with
personality, and the personality theory of C.G. Jung as

operationalized in the Myers Briggs Type Indicator (MBTI).

Rating Error Constructs

 

The applied psychologists, in their work with the
development of criteria for rating the quality of rating
scales, have focused on four primary categories of rating
quality. These are Halo error, Leniency error, Range
Restriction/Central Tendency, and Interrater Reliability.

The error terms commonly used are reviewed and analyzed

in a comprehensive work by Saal, Downey, and Lahey6 where

 

6Frank E. Saal, Ronald G. Downey, and Mary Anne
Lahey, ”Rating the Ratings, Assessing the Quality of Rating
Data", Psychological Bulletin, 1980, Vol. 88, pp. 413-428.

 

the definitions of previous researchers are reviewed.

Halo error is the ”tendency to attend to a global
impression of each ratee rather than to carefully
distinguish among levels of different performance
dimensions...a rater's inability or unwillingness to
distinguish among the dimensiOns of a given ratee's job
behavior."7 The Leniency/Severity errors are defined as
ratings which are consistently too high or too low in
relation to the mid-point of the scale or in relation to
some established standard. Range restriction refers to
raters who use only a narrow part of the rating scale, thus
reducing the extent to which obtained ratings can
discriminate among different ratees' performance level.
Interrater reliability is the fourth type measure of rating
quality, and is probably the most widely referred to in the
use of rating scales. Interrater reliability is defined
here as the ”extent to which two or more raters
independently provide similar ratings on given aspects of
the individual's behavior....' Reliability is generally
accepted as a form of consensual or convergent validity.8
The conceptual definition of rating errors can be seen in
Table 1.1. 7

The four types of rating error are the more frequently

 

7Ibid., p. 415.

8Ibid., p. 419.

Table 1.1

Definitions of Measures of Rating Quality

 

 

Halo Error

Leniency/Severity Error

Range Restriction

Interrater Reliability

 

gsaal et. a1.I p. 415.

1°Ibia., p. 417.

11Ib1d., p. 417.

12Ibid.. p. 419.

“Tendency to attend to a global
impression of each ratee rather
than to carefully distinguish
among levels of differen
performance dimensions.”

Ratings are given by a rater
which are consistently too high
or too low in relation to the
midpoint of a scale or in
relation I8 some established
standard.

”The extent to which obtained
ratings discriminate among
different ratees in terms of
their refpective performance
levels.“

“extent to which two or more
raters independently provide
similar ratings on given
aspects of tI? individual's
behavior...”

used concepts in applied psychology. In a related field of
study, reSponse set theory, different constructs are used
yet the research done by response set theorists has

implications for rating error research.

Response Set Theory

Response set theorists differ from applied psych-
ologists in that much of their work has been done with
objective tests rather than with rating scales. To them the
response sets were seen as contaminating variables affecting
the quality of their tests, much as rating error was seen by
the applied psychologists. "In recent years, there has been
considerable interest in treating the response set component
of test scores, not as error variance, but as an expression
of a personal stylistic variable."13

Efforts to understand the impact of personality on
response sets led to studies which found correlations
between extreme response sets and such personality traits as
concreteness, rigidity, authoritarian personality, and
intolerance of ambiguity. However, the results of such
studies were not always consistent. Some studies found no

significant correlations between response set and

personality traits and others found results which

 

13Richard R. Schuz and Robert J. Foster, "A Factor
Analytic Study of Acquiescence and Extreme Response Set,"
Educational and Psychological Measurement, Vol. XXIII NO.
3, 1963 p. 435.

occasionally contradicted earlier studies. The mixed
results pointed to a weakness in the response set concept
which becomes apparent when seen in conjuntion with rating
error theory.

The extreme response set construct is represented by
two constructs in rating error theory, leniency and
severity. Each of these two terms has been shown to
characterize opposite rater tendencies. Thus, research
studying the relationship of personality to extreme response
set would actually measure only traits common to both types
of raters or traits common to the type of rater most
predominant in the sample. This finding could explain the
mixed results obtained in past research. Research on the
relationship of extreme response set to personality does
indicate that rating error may be related to personality%
These indications were used to form the hypotheses of this

study, along with the theory of Carl Jung.

Jung's Personality Theory

In this section the personality theory of Carl Jung
will be outlined and the operationalization of his
constructs in the Myer Briggs Type Indicator (MBTI) will be
presented. The management styles which were derived from
the MBTI will be discussed and related to rating error

theory.

In his work Psychological Types14

Jung reviewed and
documented attempts since ancient times to characterize the
typical differences between people. From his study of these
past systems and from his own clinical experience, Jung
developed his theory of psychological types. His typology
is related to the task of rating error theory in a
fundamental fashion. Jung states that a person's
psychological type determines and limits his judgment
throughout life.

Jung's primary concept of type was that each person has
a preference for one of two attitudes toward the world,
Introversion or Extraversion. Jung also posited four
psychological functions. These consist of two perceiving
functions, Sensation and Intuition, and two judging
functions, Thinking and Feeling. According to Jung, one of
the psychological functions will become the dominant force

in shaping a person‘s psychological processes as well as his

adaptation to the world.

Extraversion/Introversion

 

Jung sees the Extravert as the person whose life
focuses around the external conditions in life. "When
orientation by object predominates in such a way that
decisions and actions are determined not by subjective views

but by objective conditions, we speak of an extraverted

 

l4C.G. Jung, Psychological Types, Princeton
University Press, Princeton, N.J., 1971.

10

attitude. When this is habitual, we speak of an extraverted
type. 'If a man thinks, feels and acts and actually lives in
a way that is directly correlated with the objective

15 The

conditions and their demands, he is extraverted.“
extraverted type then is more comfortable with the
environment and usually more at ease with people and things.

Jung conCeptualized the Introvert as differing from the
Extravert in that instead of orienting himself to objective
factors in the world he orients himself to subjective
factors within his own disposition. In responding to
external events the Introvert tends to rely on a subjective
response rather than on a direct response to the event
itself. Under stress, the Introvert tends to draw into
himself rather than to move towards people as the Extravert
would tend to do. Where the Extravert has the gift of
action the Introvert has the gift of conceptualization and
inner illumination.

It was therefore hypothesized that the Extravert would
then likely be more in tune with environmental demands made
by rating scales and would be less likely to make the
subjective judgments the Introvert would make. This would
translate into less rater reliability for the Introvert than
for the Extravert.

In addition to the attitudes of Extraversion-Intro-

version, Jung postulated four psychological functions. The

 

15Jung, Op. Cit., p. 333.

11

four functions consisted of two perceiving functions,
Sensation and Intuition, and two evaluative or judging

functions, Thinking and Feeling.

Perception: Sensing/Intuiting

The process of perception referred to as Sensation
involves direct perception of the concrete physical
properties and details of the environment. The focus is on
practical facts, known qualities, and actualities. The
Sensing type person is known for precise work and attention
to details and routine. The Sensing type is usually
impatient with complexity and abstraction, being a steady
and realistic worker who enjoys using skills which have been
developed. The Intuitive process is an indirect rather than
a direct mode of perception. The person in whom Intuition
is the primary mode of perception looks at the relationship
between the object being perceived and other objects,
mediating perceptions in an unconscious way. So, rather
than looking at the individual tree, as the sensing type
would, the Intuitive would tend to see the tree as part of a
forest, looking at the bigger picture rather than at
details. The Intuitive generally enjoys learning new skills
more than actually applying old ones over a long period of
time and tends to see things from a global rather than a
specific perspective.

In relation to rating scales, the Intuitive could be

expected to differentiate between dimensions since the

12

strength is in looking at relationships on a theoretical
level, where the sensing type may get lost in the details of
the ratings and not make good dimensional differentiations.
This would be particularly so where there were not clear

behavioral definitions of each dimension.

Evaluative Processes: Thinking/Feeling

Thinking, according to Jung, is the psychological
function which connects and orders ideas and thoughts.
Persons in whom the thinking mode of evaluation is
predominant utilize a logical process in objective,
impersonal analysis to make judgments on the contexts of
ideation. The thinking type tends to be critical of himself
and others on the basis of their intellectual ideas, tending
not to be aware of the affective components of people's
perceptions.

Feeling is the psychological function which imparts a
value rather than an objective judgment to the things a

person perceives. Thus: ”feeling is a kind of judgment,

 

differing from intellectual judgment in that its aim is not
to establish conceptual relations but to set up a subjective

criterion of acceptance or rejection."16

The focus for
the feeling type then is on making judgments according to
cultural and personal experiences. Feeling types operate
best in activities involving human relationships and in

activities which conform to their central values and

 

16Jung, Op. Cit., p. 434.

13

beliefs.

The primary characteristic relevant to a feeling type's
activity as a rater is the sensitivity to the feelings and
impulses of others and the value of harmony with others. One
could expect that this tendency would make them more lenient
as raters, in contrast with the thinking type, whose
inclination to be critical of self and others might lead to

severity errors on rating scales.

Myers Briggs Type Indicator

The MBTI translates Jung's concepts into four bipolar
dimensions. The first is the attitude dimension,
Introversion/Extraversion, the second is the perceiving
dimension, with the functions of Sensation and Intuition at
opposite poles, and third is the dimension judging process,
with Thinking and Feeling at the poles. The final dimension
of the MBTI was created to determine the preferred
Extraverted psychological process, that of Perceiving or
Judging. The result is a scale with Judging on one pole and
Perceiving on the other. These four dimensions are

presented in Table 1.2.

Judging/Perceiving

 

The dominant psychologiéal process used in adaption to
the environment determines the style with which the person
adapts to the world. If the dominant process is the Judging

one, the person will find decision-making easy. Because of

14

Table 1.2

Four Dimensions of the Myers-Briggs Type Indicator

 

 

(E) Extraversion
(S) Sensation
(T) Thinking

(J) Judging

 

 

 

Introversion (I)
Intuition (N)
Feeling (F)

Perceiving (P)

 

15

this preference for making decisions, the person's life will
be ordered and planned. This creates a life-style which is
regulated and controlled, and opinions which are readily
made and reluctantly changed.

On the other hand, a person whose dominant process is
one of the perceiving functions will find decisions are hard
to make because they always feel the need of more .
information. They will have a life-style which emphasizes
more spontaneity and adaptability, and they will be
reluctant to judge themselves or others.

The person whose dominant function is a Judging one
could be expected to make ratings which give a clear
preference at one extreme or the other. Thus we would
expect them to be lower on range restriction error. The
person who prefers the Perceiving process would be expected
to make considerable range restriction error since the
perceiving-dominant person would be reluctant to judge

themselves or others.

ngchological Types

The four functions of the MBTI have been studied
extensively during the past thirty years and considerable
work has been compiled concerning their reliability and
validity. The different combinations of the four dimensions
form sixteen personality types. Table 1.3 shows the sixteen
types generated from the eight personality preferences.

Each preference is indicated by an initial representing the

Table 1.3

Table of Sixteen Personality Types of the
Myers-Briggs Type Indicator

ISTJ
ISTP
ESTP

ESTJ

(E)
(S)
(T)

(J)

ISFJ
ISFP
ESFP

ESFJ

Extraversion
Sensation
Thinking

Judging

INFJ
INFP
ENFP

INFJ

(I)
(N)
(F)

(P)

INTJ
INTP
ENTP

ENTJ

Introversion
Intuition
Feeling

Perceiving

l7

direction scored, thus a person preferring Extraversion (E),
Sensing (S), Thinking (T), and Judging (J) would be referred
to as ESTJ. i

The sixteen personality types have been combined into
four managerial types by Kiersey and Bates. Two of these
styles are used in this study to predict rater error because
they describe characteristics of managers which relate to
how they evaluate and interact with personnel. These two
styles are termed the Sensing/Judging style and the
Intuitive/Feeling style. These are described as follows by
Riersey and Bates:17

The Sensing/Judging individual is described as a
Traditional/Judicial manager. Persons with this style are
seen as deciding things quickly and firmly. They have a
tendency to see people as good or bad and they tend to
emphasize the negative while taking the positive for
granted. A personality style such as the Sensing/Judging
would be expected to be most prone to severity errors.

The Intuitive/Feeling manager is known as the catalyst.
Persons with this style are known for their sensitivity to
staff morale, and for their ability to bring out the
positive in people. Their weakness is tending to see

individuals' personal needs above organizational needs. The

 

17David Keirsey, and Marilyn Bates, Please
Understand Me, Prometheus Memesis Books, DeI Mar, Ca.,
1978, ch. V.

 

18

Intuitive/Feeling style rater should be prone to making more

leniency errors.

Overview

The research studies and theoretical works which
explore the relationship between personality and rating
error will be reviewed in Chapter II. In Chapter III the
design of the study is described, the test instruments are
presented, and the method of analysis outlined. The results
of the analysis are described in Chapter IV, and in Chapter
V the study is summarized, the conclusions are drawn and

directions for future research suggested.

CHAPTER II

 

LITERATURE REVIEW

The literature relevant to this research is drawn from
three areas: applied psychology studies of rating scales
and the nature of rating errors; response-set theory
focusing on the relationship between response style and
personality factors; and literature regarding Jung's
theories and their operationalization in the Myers Briggs

Type Indicator.

Literature on Rating
Rating Scales
A number of different rating scales are described in
the literature: numerical, graphic, standard, cumulative
points, forced choice,1 comparative, paired comparison,2

3 Those most used in

and the Behavioral Expectation Scale.
applied psychology research are the numerical rating scale
and some form of the graphic scale.

In the numerical scale "a sequence of defined numbers

 

1J.P. Guilford, Psychometric Methods, McGraw-Hill,
New York, 1954, p. 263.

2W.A. Mehrens, and I. Lehmann, Measurement and
Evaluation in Educatign and Psychology, Holt, Rinehart and
Winston, New York, 1978, p. 355.

3John A. Bernardin, and P.C. Smith, ”A Clarification
of Some Issues Regarding the Development and Use of
Behaviorally Anchored Rating Scales,“ Journal of Applied
Psychology, 1981, Vol. 66, No. 4, p. 458.

 

 

19

20

is supplied to the observer."4 Here the rater must select
a numerical value which represents his rating:
How would you rate the applicant's composure?

l 2 3 4 5 very good
good
average
poor

very poor

Uhhtuund
Illlllllll

The graphic rating scale consists of a continuum which
may or may not contain numbers. Even if it does, the rater
is not forced to select a number but may place the rating

anywhere on the scale:

 

calm, self very nervous
assured uncertain

Rating Error

 

Since the early use of rating scales, researchers have
observed that certain rater response patterns reduced the
quality and meaning of the rating results. Edward Thorndike
addressed this in his 1929 article "A Constant Error in

5 In this article Thorndike

Psychological Ratings.”
described halo error, still considered one of the common
forms of rater error today. He observed that certain raters
were "unable to analyze out these different aspects of the

person's nature and achievement and rate each in indepen-

dence of each other....Their ratings were apparently

 

4Sanford p. 263.

5Edward Thorndike, ”A Constant Error in Psychological
Ratings," Journal of Applied Psychology, 1929, Vol. 4, pp.
25-29.

21

affected by a marked tendency to think of the person in
general as rather good or rather inferior and to color the
judgments of the qualities by this general feeling."6
Other types of rater errors were first described by

Kingsbury in 1922 when he discussed high and low raters, and
the rater's "fear" of making distinctions.7 This work was
the first conceptualization of the concepts of
Leniency/Severity and Central Tendency Error. They were not

8

labeled as such until Kneeland's work which addressed the

tendency of raters to ”rate well above the midpoint of the

9

scales used” and defined this as leniency.

10, 1930, the term "severe” was

In Ford's article
first used. He analyzed ratings of factory foremen, noting
that some of the foremen used only the high end of the scale
while others used only the lower end. He labelled those who
gave only high ratings as lenient and stated that they "may
give too many men the benefit of the doubt." Ford labelled

those who always rated low as severe and said of them that

they have “possibly an unreasonably high standard of

 

6Thorndike, p. 25.

7F.A. Kingsbury, "Analyzing Ratings and Training
Raters," Journal of Personnel Research, 1922, I, pp.
377-383.

8Natalie Kneeland, "That Lenient Tendency in Rating,"
Personnel Journal, 1929, pp. 356-366.

9Kneeland, p. 356.
10Adelbert Ford, ”Neutralizing Inequalities in

Rating,“ The Personnel Journal, 1930, Vol. Ix, No. 6,
pp. 466-489.

22

performance.” He observed another group of men who rated

11 and these he saw

“good men very high and poor men low'
as the most effective raters. Ford also noted that in the
Lenient and Severe rater there was a range restriction
(failure to use the full distribution of the scale).

In addition Ford attempted to reduce the error in
ratings. He noted, "we found evidence of wide differences in
severity standards even where the greatest patience had been
exercised in giving the foreman directions for scoring."12
In fact, this error was so resistant to training and was so
stable that he developed instead a system for correcting the
error by designing a ”correlation factor" which could be
developed for each foreman and then applied to his ratings so
that they would have more universal meaning. Ford's clear
delineation of these rater tendencies forms an early basis
for the idea that personality variables have a significant
relationship to the type of rating error.

The first complete and systematic analysis of rating

13 Here he

errors appeared in Guilford's book in 1954.
describes the best-known rating errors as error of leniency
and negative leniency or ”hard rater error," error of
central tendency, and halo effect. These error types are

given operational definitions in this work. The less common

 

llFord, p. 466.

12Ford, p. 467.

13Gui1ford, 9p. cit.

'23

error types, logical error, contrast error, and proximity
error are grouped into what Guilford called a residual error
category. Guilford was very thorough in his explication of
the statistical methods used to determine rating errors and
this material will be reviewed in the analysis section.
Recent work with rating errors in applied psychology
has been well summarized and elaborated in a work by Saal,

14 This work not only reviews the

Downey, and Lahey.
literature on rating errors, it compiles and summarizes
current conceptual and operational definitions of the
primary rating errors and offers evidence as to the
soundness of those definitions. Their work was central in
developing both the conceptual and operational definitions
used in this study and will be elaborated on further in the
appropriate sections.

While the typology of rating errors has become more
specific in the field of applied psychology in recent years
it is necessary to turn to the parallel field of response

set theory to find research on the relationship between

personality type and response styles.

Response Set Theory
Response set theory differs from rating error research

in that the primary focus is on response styles as

 

14F. Saal, R. Downey, and M.A. Lahey, "Rating the
Ratings; Assessing the Quality of Rating Data,“
Psychological Bulletin, 1980, vol. 88, pp. 413-428.

24

“consistent patterns of responding to objective test

15

items." These response sets were usually seen as error

variance which needed to be eliminated as much as possible.
Berg16 outlined five elements which determined response:
chance, stimulus variables, response alternatives available,
fractional antedating responses and subject variables. The
subject variables category includes personality charac-
teristics and is the area of research in response set theory
which will be focused on in this study. This concern with
subject variables is what led some response set theorists to
begin to interpret the response style not merely as error
but as a potential indicator of personality character-

17 18 in his literature review

istics. Hamilton
summarized response styles as falling into four categories:
acquiescence, deviation, social desirability, and extreme
response set. It is the extreme response set studies which
will be analyzed in this study since this concept closely
parallels the rating error categories of leniency and
severity.

Hamilton demonstrated that the extreme response style

is a reliable response set which exists over time and across

 

15David Hamilton, "Personality Attributes Associated .
with Extreme Response Style,” Psychological Bulletin,
1968, Vol. 69, p. 192.

16LA. Berg, (Ed.,) Response Set in Personality
Assessment, Aldine Publications, Chicago, 1966.

17

 

Hamilton, p. 192

18Hamilton, 9p. cit.

25

tests. In addition he pointed to a number of studies which
‘indicated extreme response set to be related to a number of
personality attributes. These attributes were concreteness,
abstractness, rigidity/flexibility, and intolerance of

19 carried out a correlational

ambiguity. White and Harvey
study between the concreteness-abstractness dimension and
extreme response set and concrete modes of conceptual
functioning as described by Harvey, Hunt, and Schroder.20
Shutz and Foster21 designed a study to investigate
the functional structure of several test response set
measures. Analyzing the extreme response set of 150
college students, they found loading on Authoritarian
and Inflexibility factors, supporting the contention
that authoritarian personalities tend toward extreme
response sets. In another study Brim and Hoff22
obtained significant correlations between extreme response

set and the desire fOr certainty or intolerance of

ambiguity. Further support was lent to this contention

 

198.J. White, and O.J. Harvey, "Effects of
Personality and Stand on Judgment and Production of
Statements about a Central Issue," Journal of Experimental
and Social Psychology, 1965, I, pp. 334-347.

20O.J. Harvey, D.E. Hunt, and H.M. Schroeder,
Conceptual Systems and Personality Organization, Wiley,
New York, 1961.

21R.E. Shuts and R.J. Foster, ”A Factor-analytic
Study of Acquieseent and Extreme Response Set,“ Educational
and Psychological Measurement, 1963, 23, 435-447.

220. Brim and D. Hoff, ”Individual and Situational
Differences in the Desire for Certainty,” Journal of
Abnormal and Social Psychology, 1957, 54, pp. 225-228.

26

in a review of Cattell's studies by Damarin and Messick,23

which found several factors associated with extreme response
sets which could be interpreted as a need for certainty.
These constructs closely parallel the Judging-Perceiving
dimensions of the MBTI and were used in this study to
predict the nature of rating error.

The results of studies in this area have not been
uniform, however. A number of studies have failed to find
correlations between extreme response set and personality.

24 cited a number of studies of the

Borgatta and Glass
relationship between extreme response set and Cattell's 16
PF. No significant relationships were found within college
student samples. In a mental patient sample several
relationships did occur. With a male sample of 17 there was
a significant relationship between extreme response set and
shrewdness, confident adequacy and phlegmatic/composed.
Within the female sample of 10 there was a significant
relationship to realistic/tough and radicalism. In a
population of ten female prisoners there was a correlation
with control, exacting, will power.

25

Borgatta and Glass also examined studies of the

 

23F. Damarin and S. Messick, "Response Styles as
Personality Variables," Research Bulletin # RB-65-10,
Princeton, N.J., E.T.S., 1965.

24E.F. Borgatta and D.C. Glass, ”Personality
Concomitants of Extreme Response Set,” The Journal of
Social Psychology, 1961, 55, ppﬂ 213-221.

25Ibia.

27

correlation between response sets and the Edwards Personal
Preference Scale in college students. For the 84 female
students there was a significant negative relationship
between extreme response set and exhibition score on the
Edwards and a significant positive relationship to the
deference score. For the 183 college males the only
significant relationship was a negative relationship to the
change score. In the study as a whole there was no
consistent relationship between the personality variables
measured by the Edwards and extreme response set. It should
be noted, however, that the Edwards and Cattell's 16 PF do
not measure characteristics which have shown the strongest
relationship to extreme response set.

A factor analytic study done by Zuckerman and
Norton26 found results which appear to contradict the
result of extreme response set by Foster mentioned above.
In this study the extreme response set was correlated with a
non-authoritarian attitude. This suggests that the division
'of extreme responses into severe and lenient by the rating
error theorists may lead to a more consistent correlation
with personality types than merely using the general term
"extreme response set.”

If indeed extreme response set was not a single pattern

but a combination of two patterns the results of studies

 

26M. Zuckerman, J. Norton, and D.S. Sprague,
'Acquiescence and Extreme Sets and Their Role in Tests of
Authoritarianism and Parental Attitudes,” Psychiatric
Research Reports, 1958, I, pp. 28-40.

 

28

would vary according to the dominant feature of the response
set. For instance, if the extreme responses were all in the
severity direction one might get a correlation with the
authoritarian personality.

Research from the field of rating error indicates that
leniency and severity ratings are not usually character-
istics of the same person. This being so, it would appear
that while important directions have been pointed out by
extreme response set research, the refinement of rating
error theory should yield even more accurate predictions of
the relationship between rating error and the personality of
the rater.

When generalizing the response set research to rating
error constructs it should be mentioned that response set
theory is based largely on responses made to self
description questions. Ratings are generally of someone
else's performance. The difference between how a person
rates himself and how they rate others would limit the
generalizability between response set research and rating
error research. For the purposes of this study, however,
the response set literature has been used as a source of
trends since there has been so much more research
correlating personality with response set then with rating
error. For this use the difficulities in generalizability

are not a serious problem.

29

Literature on Personality Type

The Myers-Briggs Type Indicator

A number of authors have assessed the MBTI's corres-
pondence to Jungian theories. Carlyn's analysis of studies
done by Stricker and Ross found that the Extraversion-
Introversion (E-I), Sensing-Intuition (S-N), and Thinking-
Feeling (T-F) scales were all "generally consistent with the
content of Jung's typological theory."27

Other content validity was shown in a study

28 which compared the self classification of

by Bradway
Jungian analysts to their results on the MBTI. The
comparison found 100% agreement on the E-I classification,
68% agreement on the S-N dimension, and 61% agreement on the
T-F classification. These levels of agreement were similar
to another Bradway study,29 where MBTI classifications

were compared to the Gray-Wheelwright, also an indicator
designed to measure Jungian type. Here it was found that
there was 96% agreement on E-I, 75% on S-N, and 72% on the

S-N when Jungian analysts were studied. Another study cited

by Carlyn as supporting the context validity was dOne

 

27Marcia Carlyn, 'An Assessment of the Mvers-Briggs
Type Indicator," Journal of Personality Assessment, 1977,
Vol. 41, n. 468.

28K. Bradway, “Jung's Psychological Types:
Classification by Test Versus Classification by Self,"
Journal of Analytical Psychology, 1964, vol. 9, p. 130.

29K. Bradway, p. 34.

30

3° involving a comparison of

by Stricker and Ross,
continuous scores between the.Gray-Wheelright and the MBTI.
The results showed a correlation of .79 between the two E-I
scales, .58 between the S-N scales and .60 between the T and
F scales. All of the correlatiOns were significant at the
.01 level. The MBTI has been used in a number of studies as
a predictive instrument. Goldschmid31 found it to have a
moderate ability to predict the choice of major by college
undergraduates. Other studies reported by Carlyn indicated
that the MBTI has some ability to predict grade pointaverage
and dropout rate, but that this predictability was not
consistent. While some predictive studies have been done
using the MBTI, the literature here is not as extensive as
that of the construct validity studies.

The construct validity literature on the MBTI is that
which gives the basis for the predictions made in this
study. There have been considerable correlational studies
done with the MBTI, many of which have been summarized by

32 Correlations with the E-I scales have shown

Carlyn.
the extravert to be "talkative, gregarious, and impulsive,

with underlying needs for dominance, exhibition, and

 

3oL. Stricker, J., and J. Ross, "Some Correlates of
Jungian Personality Inventory," Psycholgical Reports, 1964,
14' pp. 623-643.

31M.L. Goldschmid, “Prediction of College Majors by
Personality Type, “Journal of Counseling Psychology, 1967,
Vol. 14, pp. 302-308.

32

 

Carlyn, 9p. cit.

31

33 They tend to prefer active careers where

affiliation."
they interact with others. The introverts were found to want
to reflect before acting and preferred working alone. On
aptitude tests they show strengths in abstract reasoning,
reading abilities, and aesthetic values.

Sensing types were shown in Carlyn's literature review
to have interests in that which is solid and real. They
tend to work consistently and have respect for authority.
They have a factual orientation and a strong need for order.
The Intuitive types, on the other hand, have a high
tolerance of complexity and they prefer open-ended
instruction. They have a strong need for autonomy and
change. 'The Intuitive type tends to be rated high in
imagination by faculty.

The studies summarized by Carlyn further showed the
Thinking types to be objective, analytical, and logical in
making decisions. They have a strong need for order,
autonomy, dominance, achievement and endurance. The Feeling
types, on the other hand, have been shown through
correlative studies to be extremely interested in human
values and interpersonal relationships. They have strong
needs for affiliation and further nurturance, are
generally seen as "pleasant" and have more free-floating

anxiety than Thinking types.

 

33Carlyn, p. 469.

32

Judging types cited in the Carlyn article were shown to
be responsible, steady, industrious workers. They have a
strong need for order and like to have things decided and
settled. They have a high capacity for endurance and tend
to prefer vocations requiring administrative skills
particularly business careers. The Perceiving types were
found to be spontaneous, flexible, and open-minded. they
tended to score high on measures indicating impulsiveness
and showed a strong need for autonomy. The Perceiving type
(did better on tests of abstract reasoning and scholastic
aptitude but tended to get lower grades in school. The
research showed that perceiving types enjoyed change and had

a high tolerance for complexity.

Combination Types: Managerial Styles

Carlyn in this review also noted that combination types
have been shown to be valid constructs. The major
research cited showed type-combinations predominating in
various fields. The ST type predominates in business and
administration; the SF type sales and professions; the NF
were reported to outnumber other types in fields involving
counseling and writing; and the NT tended to go into science
and research. More recent work on type combinations has
been done by Riersey and Bates.

The work done by Keirsey and Bates which is of

particular interest for the purpose of this study is their

33

work with managerial styles.34

They conceive of
temperaments resulting in the four managerial styles referred
to in Chapter 1: the Sensing/Judging SJ manager, the
Intuitive/Feeling NF manager, the Sensing/Perceiving SP
manager, and the Intuitive/Thinking NT manager. They see each
of the managerial types as having particular strengths and
weaknesses.

The SJ manager according to Kiersey and Bates is
decisive, enjoys the decision-making process, and is a
persevering and patient worker. According to their theory
the SJ types seldom make error of fact and they tend to be
outstanding at precision work. The SJ manager likes to get
things cleared, settled, and wrapped up. They are people
who know, respect, and follow rules. The weaknesses that
come with this style are that the SJ manager may decide
issues too quickly, or become impatient with delays and
complications. The SJ also has a tendency to believe that
some people are good and some bad, and that the latter
should be punished. The SJ manager tends to respond to
negative elements as they become tired and may become
blaming or denigrating. This last attribute of the SJ type
is most directly related to the process of ratings: the SJ
manager may rate people low. This particular style
contrasts most with the Intuitive/Feeling NF style of

management.

 

34David Kiersey and Marilyn Bates, P1ease_Understand
Mg, Prometheus Nemesis Books, Del Mar, CA., 1978.

 

34

The NF managers tend to see people's strengths. They
are comfortable with unstructured meetings and quite
sensitive to the organizational climate. The NFs easily
forget negative disagreeable events of the past and look
toward the future from a somewhat romantic position. The NF
managers when at their best are very skilled at turning
liability into asset. A weakness of the NF managers which
may affect how they make ratings is a tendency to avoid
unpleasantness. This, combined with the tendency to see
people's strength would make them vulnerable to making
leniency errors.

The other managerial types, Sensing/Perceiving SP and
the Intuitive/Thinking NT, have styles which are not as
easily translated into rating error constructs. The SP
managers have the strengths of being very practical and
concrete in problem solving. They can observe a system and
see where it breaks down. They are adaptable, create change
easily, and have acute powers of observation. If this
theory is true, this type should be a most accurate rater
and make less error than the other types. The NT manager
has the strength of being a visionary. They have the inner
workings of systems in both long and short-term perspective.
Their weaknesses are that they have vision but would rather
that someone else carry out the construction and execution.
The NTs tend to be unaware of others' feelings and may be
seen as cold and distant, but neither their strengths nor

weaknesses appear to relate directly to the type of rating

35

errors this type would make.

The literature on managerial styles from Kiersey and
Bates provides a clear indication of the types of errors
which can be expected from the SJ type and the NF type.
Hypothesis One is based on their premises, and positive
results on this hypothesis should not only lend support to
the notion that personality traits can be used to predict
the nature of rating error, but also support the predictive

validity of their particular use of the MBTI manager styles.

Summary

In this literature review the history and current
trends in rating error theory, the contributions of response
set theory, and the validity of the MBTI were discussed
along with other literature which might indicate the nature
of the rating error different personality types might make.
Rating error research began with the early works of
Thorndike, who studied Halo error, and progressed to the
current status where a range of rating errors are
identified. The operational definitions of these errors are
diverse. The major rating errors discussed were
Leniency/Severity error, which is universally understood to
mean tendency to rate high or low. Halo error is understood
to mean carrying over a bias for a given rater across the
traits being rated. Range restriction error is the failure

to use the full range of the scale.

36

The relationship of response set theory to rating error
constructs was explored in the context of error variance in
test responses which could be attributed to personality
characteristics. The error variance was found to be similar
to Leniency and Severity rating error. With this in mind
the literature relating extreme response set to personality
characteristics was explored as a source for predictions
concerning the relationship of rating errors to personality.

The final section covered the literature validating the
scales of the MBTI and looked at literature which led to

making the predictions found in the research hypothesis.

CHAPTER III

 

DESIGN OF THE STUDY

In this chapter the sample, the measures, and the
design will be described. The hypotheses and method of

analysis will be presented.

Sample

Fifty-six students from two undergraduate classes
composed the sample of raters for this study. The students
were asked as a class if they would volunteer to participate
in the research in exchange for interpretations of their
MBTI personality profiles. The first group, consisting
primarily of college sophomores, was an introductory
Sociology class at western Michigan University. Twenty-nine
students, 17 females and 12 males, participated from this
class of forty. The second class consisted of juniors in
the Nutritional Science program at Michigan State
University. Twenty-seven students, all of them female,
participated from this class of thirty.

Certain personality characteristics in the Myers Briggs
Type Indicator are highly correlated with gender
differences, and this meant that the study sample, being
predominantly female, reflected a higher proportion of
certain traits. These traits will be described specifically

later in the chapter.

37

38

Measures
Two types of test instruments were used in this study.

1 was used to assess the

The Myers-Briggs Type Indicator
personality characteristics of the raters in the sample, and
several rating scales were used by the subjects in their
tasks as raters. These rating scales were as follows: two
2.
I

interpersonal process scales developed by Truax a

counselor effectiveness scale developed by Ivey3; and a
rating scale used by judges in speaking contests to rate

speeches.4

The Myers-Briggs Type Indicator
The MBTI is designed so that it measures four bipolar
dimensions stemming from Jungian personality typology:
(E) Extraversion......Introversion (I)
(S) Sensation...........:Intuition (N)
(T) Thinking...............Feeling (F)

(J) Judgment............Preception (P)

 

1Isabel Myers, MBTI Manual, Consulting
Psychologists Press, Princton, N.J., 1962.

2Charles B. Truax, ”A Scale for the Rating of
Accurate Empathy, and ”A Tentative Scale for the Rating of
Unconditional Positive Regard," in Rogers, Gendlin, Kiesler
& Truax (Eds) The Therapeutic Relationship and Its Impact,
Madison, Wisc., 1967, pp. 555-579.

3A.E. Ivey, Microcounseling:Innovations in
InterviewingTraining, Charles C. Thomas, Springfield, II.,
1971, p. 183.

4Waldo W. Braden, (Ed) Speech Methods and
Resources, Harper and Row, New York, 1971, p. 126.

 

39

Forced-choice items are used to indicate a preference for
one pole of each dimension. Each question has one item
which indicates a preference for one pole and one item for
its opposite. Some items have been weighted more heavily in

5 The highest

an attempt to offset social desirability.
score on each dimension represents the type preference. The
scoring manual provides a procedure for breaking ties.

In this study both Form F and Form G are used. Form F
is the original form and consists of 166 items. Form G has
been developed more recently and consists of 126 items.
Studies have shown Form G to be equivalent to Form F, and
that the two forms may be used interchangeably.6

A preference on each dimension yields a possible
sixteen different personality types. These types were
discussed in more depth in the theory section of Chapter I
and Chapter II. In the sample population the distributions
were evenly divided on the primary dimensions with the
exception of the Thinking-Feeling dimension. There the
sample had 27 percent Thinking and 73 percent Feeling. This
distribution is similar to that found in the female
population at large and is mirrored in our population sample

which is predominantly female. The distribution in the

sample of raters can be seen in Table 3.1.

 

5Isabel Myers, p. 86.

6Isabel Meyers, MBTI Form G Manual, p. 4.

40

Table 3.1

 

 

Distribution of MBTI Types for the Present Sample

(E) Extraversion

N 8 29
% = 52

(S) Sensing

N 8 31
% = 55

(T) Thinking

N = 15
% 8 27

(J)‘ Judging

N I 34
% = 60

Introversion

N = 27
% = 48

Intuition

N = 25
% = 45

Feeling

N = 41
% = 73

Perceiving

N = 22
% = 40

(I)

(N)

(F)

(P)

41

Structuralggpalities of the MBTI A considerable
amount of testing has been done on the independence and
reliability of the scales of the MBTI. In a comprehensive
assessment of the MBTI, Carlyn found that the three type
categories directly related to Jung's theory - Extraversion-
Introversion, Sensing-Intuition, and Thinking-Feeling - were
all relatively independent of each other. The Judging-
Perceiving dimension was found to be consistently correlated
to the Sensing-Intuition scale and occasionally correlated
to several of the other dimensions.7

Two aspects of reliability have been investigated:
internal consistency and stability of type of category. In
her assessment of the MBTI, Carlyn described the two primary
methods of measuring internal consistency with the MBTI.
Phi Coefficient estimates are used with the Spearman-Brown
prophecy formula. This estimate tends to underestimate the
reliability, while the tetrachoric correlation coefficient
together with the Spearman-Brown prophecy formula tends to
give an inflated estimate of the reliability. Carlyn
summarized the reliability estimates as follows: The low
estimates for the Extraversion-Introversion scale range from
'.55 to .65 and the high estimates from .70 to .81; for
Sensing-Intuition the lower estimates were .64 to .73 and

the high from .82 to .92; for Thinking-Feeling the scores

 

7Marcia Carlyn, 'An AsseSsment of the Myers-Briggs
Type Indicator,” Journal of Personality AssessmentL 1977,
Vol. 41, 5, p. 462.

42

range from .43 to .75 on the low side and .66 to .90 on
the high side; and for the Judging-Perceiving scale the lows
were .58 to .84 and the high estimates .76 to .84. Although
there is considerable range in the estimated reliabilities
they appear to be satisfactory.8
The split-half reliability of the MBTI type categories
for the present sample of raters was found using the more
conservative Phi Coefficient estimates. The reliabilities
are displayed in Table 3.2. For Sample A using the MBTI
Form G, the reliability was higher than that found in Sample
B. The difference was especially great on the E-I scale and
the T-F scale. In Sample A using Form G, the E-I
reliability was .79 while for Sample B using Form F the
reliability was .59. On the T-F scale Sample A with Form G
the reliability was .90 while for Sample B using Form F the
reliability was .47. These results confirm the improvement
in reliability which some had predicted for Form G. The
reliability on the other scales is good considering the
conServative nature of the statistics used. The E-I and the
T-F reliabilities on the Sample B could, however, weaken the
study with unclear distinctions between personality types.
Studies of the stability of type category on test-
retest studies were also summarized by Carlyn. The four
studies which were summarized found that the proportion of

agreement between the first testing and the retesting was

 

8Carlyn, p. 465.

43
Table 3.2

Reliability* of MBTI Type Categories

 

 

Sample MBTI Type Category

E-I S-N T-F J-P

 

Sample A
17 females, 12 males
MBTI Form G .79 .72 .90 .88

Sample B
27 females
MBTI Form F .59 .86 .47 .80

 

*Calculated using Phi Coefficients and applying

Spearman-Brown prophecy formula.

44

greater than chance. The majority of the subjects showed
shifts on no more than one of the four dimensions. In three
of the studies the stability of each scale was studied
separately. All of these studies produced test-retest

results which were reasonably stable.9

Rating Scales

Three of the rating scales used in the study, Scales 1,
2, and 4, were numerical and one was graphic, scale 3. The
two Interpersonal Process scales of Truax, 'A Scale for the

10 and ”Tentative Scale for The

11

Rating of Accurate Empathy,”
Rating of Unconditional Positive Regard,“ are single-
dimension numerical scales with well-defined rating levels.
Ivey's Counselor Effectiveness Scale,12 a graphic scale,
has 25 dimensions, 15 of which were used in this study.
These dimensions are defined by a key word describing the
extremes on each end of a line with seven blank spaces
between the extremes. The fourth scale, that used for the
evaluation of speeches,13 has seven dimensions, with each

dimension briefly described and given a rating scale of one

through seven, with one being poor and seven designated as

 

9Carlyn, p. 467.

1OT—uax, p. 555.

11Truax, p. 569.

lszey, p. 183.

13Braden, p. 126.

45

excellent. Copies of the scales used are found in
Appendix A.

The two scales of Truax were designed to assess brief
interactions between a client and a counselor. They have
been used in assessing interactions as brief as two-
counselor and one-client-statement episodes, to interactions
lasting up to four minutes.

Forms of the two interpersonal process scales have been
used in many studies. The reliability of these scales based
on correlations between raters' ratings has been moderately
good, according to Truax and Carkhuff. The Accurate Empathy
scale showed a higher reliability than the Unconditional
Positive Regard scale, ranging from a high of .95 to a low
of .43. The median from twenty-five studies was better than
.80. The reliability for the Unconditional Positive Regard
scale ranged from a high of .95 to a low of .25 with a
median of .60. The range is great for both of these scales
and probably reflects the differences in the type of rater
and degree of training.4

The Truax Accurate Empathy scale is a numerical scale
with nine levels of empathy. The lowest level is "an
almost complete lack of empathy” and the scale continues to

"a level where the therapist unerringly responds to the

client's full range of feeling and recognizes each emotional

 

14C.B. Truax and R.R. Carkhuff, Toward Effective
Counseling and Psychotherapy, Chicago, Aldine Press, 1967.

46

15 The Truax

nuance and deeply hidden feeling."
Unconditional Positive Regard scale is a nominal scale with
five levels. This scale has a continuum, “beginning with an
almost complete lack of Unconditional Positive Regard and
continuing to a level where the therapist unerringly
communicates to the client a deep and genuine caring for him
as a person with human potentialities, uncontaminated by
evaluation of his thoughts and behaviors."16

The reliability scores for our sample were opposite the
trends in the studies cited by Truax and Carkhuff. The
Unconditional Positive Regard scale had a reliability of .95
while the Accurate Empathy scale's reliability was .43.
These reliability scores were determined through the
intraclass correlation method, and are displayed with the
reliability of the other two scales in Table 3.3.

Both the Truax scales were transformed into seven-
1eve1 scales for this study. Audio-taped dialogues of those
written by Truax were played to the subjects, thus assuring
that the ”correct levels“ of interpersonal functioning of
the counselors were as Truax defined them. The vignettes

used to represent the different levels of counselor response

can be seen in Appendix B.

 

15C.R. Rogers, E.T. Gendlin, D. Kiesler, and C.B
Truax, The Therapeutic Relationship and Its Impact: A study
of Psychotherapy with Schizopgrenics. Madison: University
of WiSconsin Press, 1966, p. 569.

16Rogers et al., p. 555.

47

Table 3.3

Reliability of Rating Scales

 

 

 

Scale # What was Measured Reliability*
1 Accurate Empathy .43
2 Unconditional Positive Regard .95
3 Counselor Effectiveness .58
4 Speaker Evaluation .79

 

* The reliability was calculated by the
intraclass correlation method.

48

Ivey's Rating Scale of Counselor Effectiveness17 was

created to measure both counselor effectiveness and client
attitude. This instrument has been shown to be a reliable
and valid instrument even when used by inexperienced raters.
In a parallel form reliability study, Ivey found a

coefficiency of equivalence of .975.18

The rating scale
consists of twenty-five items placed in a semantic
differential format describing counselor qualities. There
is a clear valence to each item since the scale is designed
to differentiate between "good" and “bad” counselors. The
extreme 'good' rating was designated a seven and the extreme
"bad” a one on the scale. Five intermediate levels were
provided.

Unlike the interpersonal scales, there is no specific
process being rated which would have a correct level of
response. Instead, the scale was used to rate the global
impressions gained by the raters as they listened to the
counselors' respond to the clients in the taped vignettes
used for the first two rating scales. In this study the '
scale has been modified from 25 to 15 items. This may have
lowered the reliability of the instrument, but that actually
.strengthened the research design, since a range of

reliability was desirable.

The reliability of Ivey's Counselor Effectiveness scale

 

17Ivey,.p. 183.

18Ivey, op. cit.

49

as used in this study was .58, which was better than that of
the Accurate Empathy scale but lower than the reliability of
the other two scales. That the reliability was no higher
than this is not surprising since the ratings were made on
relatively little data and with little explanation of the
meaning of the traits measured. This was done intentionally
to generate a measure with lower reliability so that the
impact of a poor rating scale on the rating error and
personality interaction could be observed.

The scale for the rating of speakers was taken from a
college level textbook in debate and public speaking19
and is representative of many scales developed to guide
the evaluation of speeches. It is a standard numerical
scale, with ratings of "one“ indicating a poor performance
on a particular dimension and "seven" indicating an
excellent performance. The activity of rating a speech
was added to the study design in order to Control for the
effects on rater judgments which might result from the
interactional nature of the material being judged in
counselor-client vignettes. The final scale developed to
rate speakers was found to have a .79 reliability, thus
providing a reliable scale which was not rating counselors

or the counseling process.

 

19Braden, p. 126.

50

Design of the Study

The major premise of this study was that the nature of
rating errors can be predicted by assessing the personality
characteristics of the rater. Prediction of rating error
was based on personality characteristics measured by the
Myers Briggs Type Indicator. The MBTI was administered to
each of two undergraduate classes; then the claSses heard
taped counselor-client vignettes and three speeches. Three
counselors were rated on three different scales and seven-
teen different dimensions. Three speakers were rated on
seven dimensions. A detailed outline of how the scales and
the taped vignettes were presented is found in Appendix C.

The ANOVA method of testing rating error was presented
by Saal et. a1. as a method best used when a complete design

is possible.20

This type of analysis allows the

comparison of discrete components of the variance found in
ratings. The design of this study makes possible the use of
ANOVA statistical procedures by providing a sample where all

raters observed all ratees, on all dimensions. This allows

more powerful analysis of rater error.

Operational Definitions
In this study the ANOVA method was used whenever

possible, but some types of error were best measured by more

 

20Frank Saal, Ronald Downey, Mary Anne Lahey, "Rating
the Ratings: Assessing the Quality of Rating Data,"
Psychological Bulletin, 1980, Vol. 88, p. 424.

 

51

traditional means. The rating error terms are defined as

 

follows:
Leniency/Severity
Leniency and Severity error was defined as the
relationship of mean ratings to each other. The
higher ratings were considered ginient and the
lower scores considered Severe.
Range Restriction

 

A comparison of ratee main effect is the basis
for Range Restriction calculation. The absence
of ratee maigzeffect is considered Range
Restriction.

Interrater Reliability

Two methods of measuring Interrater Reliability
were used. First, intraclass correlations were
used to measure reliability when sample units
were small enough for the two-way analysis of
variance procedure to be carried out. The second
method was used when comparing larger groups of
raters. Here correlations were calculated
between pairs of raters rating the same
individual on the same dimension. These
correlations are summed through the use of z
transformations, andzlarger correlations indicate
greater reliability.

Rating Accuracy

It was possible to define accuracy for scales one
and two. Rating accuracy was defined as the mean
distance between the rater's Eating and the
predetermined correct rating.

 

21
22
23
24

Saal, Downey and Lahey, p. 417.
Epig., p. 418.
IpiQ., p. 422.
Epig., p. 417.

52

Methods of Analysis
Each hypothesis, with its method of analysis is
presented below. The reasoning for the alternate hypotheses

is also given.

Hypothesis One:

Null hypothesis: There is no difference between the
mean ratings of Sensing/Judging and
Intuitive/Feeling types.

Alternate hypothesis: Sensing/Judging types have lower
(more severe) ratings than Intuitive/Feeling types
who have higher (more lenient) ratings.

The alternate hypotheses was created from the theory of
managerial styles developed by Kiersey and Bates where they
postulate that the Sensing/Judging type will be more
critical in their style of management and tend to see the
negatives more than the positives. This should result in
more severe ratings by the Sensing/Judging types. The
intuitive/Feeling types on the other hand tend to be more
aware of the employees feelings and this should if anything

make their ratings more lenient. The method for determining

the difference between the means was a one-way ANOVA.

Hypothesis Two:

Null hypothesis: There is no difference in the
frequency of ratee main effect between Perceiving
types and Judging types.

Alternate hypothesis: The frequency of—ratee main
effect for Perceiving types is less than the
frequency for Judging types, indicating more Range
Restriction in the Perceiving type.

53

The alternate hypotheses was developed from the MBTI
descriptions of personality type which suggests that the
Judging type will have clear opinions about the events they
encounter and that they readily make desisions. The
Perceiving type on the other hand is described as reluctant
to make decisions and prefers to withhold judgement. These
faCtors should result in the Judging type being less prone
to Range Restriction error than the Perceiving type. Range
Restriction was determined by assessing the frequency of
ratee main effect. Ratee main effect is considered a
measure Of Range Restriction.

Ratee main effect was determined for both personality
groups, Judging and Perceiving. Once the significance of
ratee main effect for each group was determined the
frequencies were compared between groups. The ratee main
effects were determined according to the formula:

MS (Ratees) 25

MS (raters x Ratees).

 

A two-way ANOVA of Rater X Ratee produced the mean squares
used. The two-way ANOVA was done for each Rater group on
each of the four rating scales. The design is presented in
table 3.4.

Some data was lost in using this design because the
computer could not handle more than twelve raters and six

ratees at one time. In order to minimize the data loss

 

25Ibid., p. 422.

54
Table 3.4

Design of the Two-Way ANOVA for Rater by Ratee Interaction

 

 

Rater (of a Given Personality Type)

1 2 3 4 5 6 7 8 9 10 11 12

 

Ratee A *

v

 

Ratee B

 

Ratee C

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

* Ratings from the scale being analyzed fill each cell.

55

and improve the research design, the rater groups were
divided into Sample A and Sample B. The groups were further
reduced by limiting those raters whose MBTI scores were less
clearly differentiated on a given type dimension. This was
done by removing those raters whose scores were less than 9
on a given type dimension. This score of 9 is commonly
accepted as the level at which the score obtains greater type
stability. There is, however, no published research to
support validity of this common practice. For the purpose of
this study it was a convenient method of reducing the sample
size while increasing the probable reliability of the type
categories. The final design yielded eight tests for
significance of ratee main effect for each group of raters.
Difference in the frequencies of ratee main effect between
the Perceiving type and the Judging type was tested with a

Chi Square statistic.

Hypothesis Three:

Null Hypothesis: The mean ratings for Feeling types
the same as the mean of the ratings made by
Thinking types.

Alternate Hypothesis: The mean ratings made by Feeling
types is higher (more lenient) than the mean of
the ratings made by Thinking types.

The alternate hypothesis was developed from MBTI

descritptions of the Thinking and Feeling types which
suggests that the Thinking type base their judgments on

logic and systematic evaluation. The Feeling types base

their judgments on other subjective value systems and are

56

often influenced by the impact of their judgment on the
person being judged. This could give the Feeling type the
tendency to rate lenient while the Thinking type should not
be suceptable to that error.

The testing for differences between the mean ratings of
Thinking types and Feeling Types is analyzed with a one-way

ANOVA as was done with Hypothesis 1.

Hypothesis Four:

Null hypothesis: The interrater reliability, as
measured by intraclass correlation, is the same
for the Introverted type as it is for the
Extraverted type.

Alternate hypothesis: The interrater reliability, as

- measured by intraclass correlation, is greater for
the Extraverted type than for the Introverted

type.
The alternate hypothesis is developed from the theory

of Jung which suggeSts that the Introvert's basic stance
toward the world is more subjective, whereas the stance of
the Extravert is primarily objective. This means that the
Introvert responds more to interanl stimuli than to external
stimuli. The Extravert on the other hand is more responsive
to the external events. This would result in ratings for
the Introvert which were more variable between raters since
the Introverted rater would be less responsive to the
external event, and more responsive to their own subjective
experience. The Extravert on the other hand should have
greater Interrater Reliability since they are theoretically

more responsive to the environmental cues, the ratee, than

57

they are to their own subjective experience.

The testing of this hypothesis involves the calculation
of intraclass correlations for each rater group. The
intraclass correlations were calculated with the following
formula:26

MS Ratees - MS RATERS
MS Ratee
The reliability was calculated for each rater group on each
of the scales using Sample A and Sample B separately as was
done in Hypothesis two. The differences between reliability
scores were calculated with the standard formula found in

Blalock as follows:27

qz z2 N -3 + N -3 z q -z
21 2

Hypothesis Five:

Null hypothesis: The Interrater Reliability is the same
within type groups of raters as it is for the whole
population of raters.

Alternate hypothesis: Interrater Reliability is
greater within type groups of raters than it is
for the population of raters as a whole.

The alternte hypothesis was developed from the notion

tht the ratings of similar groups should be more highly

correlated with each other than groups of divergent natures.

To test this hypothesis, correlations of all possible

 

26Saa1 et. al., pp. 422.

27Hubert M. Blalock, Social Statistics,
McGraw-Hill, Inc., 1972, p. 406.

58

combinations of raters were calculated. The average

correlation based on Fisher's r to Z transformation28 was
calculated for both the sample as a whole and for each of
the personality types. Tests for the difference between

correlations were calculated using the same formula used in

Hypothesis Four.

Summary

In chapter three the sample was presented, the measures
used were discussed, the design was outlined and the
testable hypotheses presented with their methods of
analysis.

The sample of fifty-six undergraduates rated was
presented their distribution on the MBTI. The predominant
Characteristic of the distribution was fairly equal except
f6r the Thinking/Feeling scale which was divided 27%
Thinking and 73% Feeling. This was attributed to the fact
that the sample was predominantly female and that the
distribution found is similar to the distribution found in
the female population at large.

The MBTI was shown to be a personality test with
moderately good reliability with the individual scale
reliabilities ranging for the most part from .65 to .85 in a
wide range of studies. The rating scales used were shown to

have adequate reliability in past studies, and the

 

28

Saal, et. al. pp. 422.

 

59

modification for this research was discussed.

The design of the research was presented with its
unique characteristic of allowing a large number of raters
to rate the same ratees on the same dimensions. This allows
both for the comparison of large groups of raters and the
use of ANOVA procedures when assessing for rating error.
The operational definitions of the rating error terms were
presented as were the testable hypotheses. The methods of
analysis included one ANOVA to compare means, two-way ANOVA
to test for ratee main effect and to be used in intraclass
correlations, as well as person-product correlations used

with Fisher's r to Z transformation.

CHAPTER IV

PRESENTATION OF FINDINGS

In this chapter the results of the analysis are
presented. The findings of the original Five hypotheses and
two additional hypotheses stemming from those findings are

reported.

Hypothesis One

Null hypothesis: There is no difference between the
mean ratings of Sensing/Judging types and those of
Intuitive/Feeling types.

Alternative hypothesis: Sensing/Judging types have
more Severe ratings than those made by
Intuitive/Feeling types.

According to the personality theory of Carl Jung as
interpreted by Kiersey and Bates, Sensing/Judging (S/J)
managerial types would be expected to be very critical in
their style, while Intuitive/Feeling (N/F) types would have
difficulty being critical when they need to be. These
tendencies should result in the S/J's ratings being lower
(more Severe) than the N/F's. A one way analysis of
variance was used to test for differences between the two
groups.

The null hypothesis was rejected and the alternate
accepted. While significant differences were found in the
predicted direction, less than one percent of the variance

is accounted for by the difference in the mean ratings.

Table 4.1 shows the results of the analysis.

60

61

Table 4.1

Comparison of Mean Ratings of Sensing/Judging and
Intuitive/Feeling Types.

 

 

 

Source of

variation: df MS F Probability
Between groups 1 21.425 8.231 .004*
Within groups 3016 2.603

 

* Rejected null at .05 level.

62

Hypothesis Two
Null Hypothesis: There is no difference in the
frequency of ratee main effect between Perceiving
types and Judging types.

Alternate Hypothesis: Perceiving types have less
frequent ratee main effect than Judging type.

The analysis of the frequency in which ratee main
effect was found in the Judging and Perceiving types yielded
no significant differences using a chi square analysis.
While the differences were not significant there was a
pattern in the direction opposite to that predicted. The
pattern was especially apparent when all personality types
were compared. This indication was used to develop the
exploratory Hypothesis Seven. The distribution of the ratee
main effect for all testable type categories can be seen in

Table 4.2.

HypOthesis Three

Null hypothesis: There is no difference between the
mean ratings of Feeling types and the mean rating
of Thinking types.

Alternative hypothesis: The mean ratings of Feeling
types is higher than the mean ratings of Thinking
types.

A one way analysis of variance comparing the main

ratings of the two groups failed to Show significant
differences. The results of the analysis can be seen in

Table 4.3.

63

Table 4.2

Comparison of Ratee Main Effects for MBTI Personality Types

 

 

Extraverted Raters Introverted Raters

 

 

 

 

 

 

 

 

 

sample scale df F sample scale df F

A 1 (2,22) 2.09 A 1 (2,22) .2

A 2 (2,22) 35.98 * A 2 (2,22) 36.18 *

A 3 (2,22) 6.28 * A 3 (2,22) 1.67

A 4 (2,22) 7.55 * A 4 (2,22) 3.84

B 1 (2,16) 8.9 * B 1 (2,16) 6.8

B 2 (2,16) 13.2 * B 2 (2,16) 13.9

B 3 (2,16) 3.51 B 3 (2,16) 1.83

B 4 (2,16) 3.79 * B 4 (2,16) 7.76
Freq. Ratee Main Effect -- 6 Freq. Ratee Main Effect 3

Sensing Raters Intuitive Raters

sample scale df F sample scale df F

A 1 (2,22) .94 A 1 (2,14) .5

A 2 (2,22) 23.5 * A .2 (2,14) 40.3

A 3 (2,22) 4.94 * A 3 (2,14) 6.24

A 4 (2,22) 4.77 * A 4 (2,14) 2.3

B l (2,26) 4.5 * B 1 (2,20) 5.49

B 2 (2,26) 17.5 * B 2 (2,20) 41.8

B 3 (2,26) 2.7 B 3 (2,20) 2.55

B 4 (2,26) 14.3 * B 4 (2,20) 6.54
Freq. Ratee Mainiﬁffect = 6 Freq. Ratee Main Effect =

Judging Raters Perceiving Raters

sample scale df F sample scale df F

A 1 (2,24) 1.36 A l (2,22) 1.624

A 2 (2,24) 25.99 * A 2 (2,22) 54.46

A 3 (2,24) 2.9 A 3 (2,22) 6.5

A 4 (2,24) 2.29 A 4 (2,22) 6.67

B l (2,14) 3.21 B 1 (2,10) .599

B. 2 (2,14) 18.1 * B 2 (2,10) 13.77

B 3 (2,14) 2.26 B 3 (2,10) .81

B 4 (2,14) 11.4 * B 4 (2,10) 4.58
Freq. Ratee Main Effect = 3 Freq. Ratee Main Effect =

* Statistical significance at .05 level.

64
Table 4.3

Comparison of Mean Ratings of Thinking and Feeling Types.

 

 

Source of

 

Variance df MS F probability
Between Groups 1 .720 .278 .598*
Within Groups 4020 2.591

 

* Failed to reject null at .05.

65

Hypothesis Four
Null Hypothesis: The Interrater Reliability as
measured by intraclass correlation is the same for
the Introvert as it is for the Extraverted type.
Alternate Hypothesis: The Interrater Reliability as

measured by intraclass correlation will be greater
for the Extraverted type than the Introverted

type.

The analysis showed no significant difference between
the Interrater Reliability of Introverts and Extraverts.
The Extraverts Interrater Reliability was .80 and the
Introverts .62, across all the scales; see Table 4.4. Given
that there was some difference, further analysis was done on
each scale to determine if the Reliability of the scales
would affect the Interrater Reliability of the raters. The
analysis showed that the differences between the Extraverts'
Interrater Reliability and the Introverts' Interrater
Reliability was negligible on the scales which were very
reliable but the difference were considerable on scales with
low reliability. On Scale 2, the most reliable scale, the
Interrater Reliability scores were identical at .95 and yet
on the two lowest Reliability scales, 1 and 3, the
Interrater Reliability for the Extraverts was .70 and .77
compared to .42 and .29 for the Introvert. The Interrater
Reliability of all the personality types can be seen for

each scale in Table 4.5.

Hypothesis Five

Null Hypothesis: The Interrater Reliability will be
the same within rater type groups as it is for the
whole population of raters.

66
Table 4.4

Reliabilityl for Six MBTI Types Across all Scales

 

 

 

 

MBTI Type Reliability MBTI Type Reliability
Extraversion .80 Introversion .62
Sensing .71 Intuition .81
Judging .70 Perceiving .59

1

Reliability derived from intraclass correlations.

67
Table 4.5

Reliability1 for Six MBTI Types for each SCale

 

 

MBTI Type Reliability MBTI Type Reliability

 

Scale 1: Accurate Empathy

Extraversion .70 Introversion .42
Sensing .47 . Intuition .19
Judging .38 Perceiving .41

Scale 2: Unconditional Positive Regard

Extraversion. .95 Introversion .95
Sensing .95 Intuition .98
Judging ' .95 Perceiving .95

Scale 3: Counselor Effectiveness

Extraversion .77 Introversion .29
Sensing .60 Intuition .72
Judging .71 Perceiving .42

Scale 4: Speaker Evaluation

 

Extraversion .80 Introversion .84
Sensing .81 Intuition .70
Judging .78 Perceiving .81
1

Reliability derived from intraclass correlation.

68

Alternate Hypothesis: Interrater reliability is
greater within rater type groups than it is for
the whole population of raters.

Comparison of the correlations of raters within type
group with the correlations of raters in the whole rater
population showed no significant differences in the level of
correlation. The results yielded correlations both above

and below that of the population of all raters. The results

can be seen in Table 4.6.

Exploratory Findings

Two additional hypotheses were developed from the results
of the original five hypotheses. The exploratory hypotheses
were developed to follow up trends observed in the original
analysis by studying the accuracy of the ratings. The accuracy
was determined by Ohmparing the rating of the rater with the
predetermined ”correct” rating. "Correct” ratings were
available for Scales 1 and 2, where Truax's examples of
different performance levels were used to create the vignettes
which were rated. Comparisons were made between the
personality style of the raters and the mean variance from the

"correct" rating.

Hypothesis Six
Null Hypothesis: There will be no difference in
accuracy of the ratings of the Sensing/Judging
type and those of the Intuitive/Feeling type.
The analysis showed that the ratings of the

Sensing/Judging type were significantly less accurate on

69

Table 4.6

Correlation Between Raters' Ratings Within MBTI Type
and for the Sample as a Whole

 

 

 

MBTI Type Correlation MBTI Type Correlation
Extraversion .27 Introversion .23
Sensing .24 Intuition .26
Thinking .22 Feeling .26
Judging .23 Perceiving .29

 

Overall Correlation Between Raters' Ratings was .25.

70

Scale Two rating Unconditional Positive Regard, but not
significantly less accurate than the Intuitive/Feeling types on
the Accurate Empathy Scale. Interestingly, Scale Two is the
more reliable of the two scales and yet that is where the
differences in accuracy occurred. The results of the analysis

can be seen in Table 4.7.

Hypothesis Seven
Null Hypothesis: There is no significant difference

between the accuracy of ratings of the Judging
type and the Perceiving type.

Alternate Hypothesis: The ratings of the Judging type
are less accurate than the ratings of the
Perceiving type.

The alternate hypothesis was developed from the trends observed
in Hypothesis Two where the Judging types appeared to be less
reliable raters than the other types. The results indicated
that on Scale Two again the ratings of the Judging type were
significantly less accurate than those of the Perceiving type,
and that on Scale One there was no difference in the accuracy
of the two type's ratings. This finding substantiates the
indication in Hypothesis One that the Judging type's ratings

appeared less reliable than the Perceiving type's. These

findings are displayed in Table 4.8.

Summary
A relationship of personality to rating error was found in
three of seven hypotheses tested. It was found that the

ratings of Sensing/Judging types were significantly more Severe

Table 4.7

71

Comparison of the Mean Variance From the Correct Rating
for Sensing/Judging vs. Intuitive/Feeling Types.

 

 

 

 

Source of
Scale Variance df MS F
1 Between Group 1 5.2 .49
1 Within Group 40 10.45
2 Between Group 1 24.68 5.06*
2 Within Group 40 4.87
* Significant at .05 level.
Table 4.8

Comparison of the Mean Variance From the Correct Rating

for Judging vs. Perceiving Types.

 

 

Source of

 

Scale Variance df MS F
1 Between Group 1 17.75 3.63*
1 Within Group 54 4.88
2 Between Group 1 .85 .083
2 Within Group 54 10.14

 

* Significant at .05 level.

72

than the ratings of Intuitive types across all of the scales
used in the study.

Ratings of the Sensing/Judging types were significantly
less accurate than those of the Intuitive/Feeling types when
rating Unconditional Positive Regard; there was, however, no
difference when rating Accurate Empathy. It was also found
that the Judging type rater was less accurate than the
Perceiving type when rating Unconditional Positive Regard, and
again there was no difference in the accuracy when rating
Accurate Empathy.

No significant relationship was found between the Range
Restriction error of Judging vs. Perceiving types; nor was any
difference found in Severity/Lenience error between Thinking
and Feeling types. The data also failed to find any
statistically significant relationship between Extraversion and
Introversion and rating error, though some consistent patterns
did emerge. There was no significant difference in the
reliability of ratings within type group vs. the reliability of

ratings in the sample as a whole.

CHAPTER V

SUMMARY AND CONCLUSIONS

The question addressed in this study was what influence
does the personality of raters have on the ratings they
make. Types of rating error were explored with the goal of
finding those errors which appeared to have the strongest
theoretical and empirical link to the personality constructs
of C.G. Jung, as operationalized in the Myers-Briggs Type

Indicator (MBTI).1

Summary of the Study

The review of literature on rating error yielded a
consensus on the primary measures of rating quality:
Leniency/Severity error, Halo error, Range restriction, and
Interrater Reliability. Since little research had been done
relating rating error to personality type, parallel
literature was searched for empirical indications of the
relationship of rating patterns to personality. Response
set research yielded indications of the relationship between
personality traits and extreme response set. Research in
this area, combined with the theories of C.G. Jung,
particularly the concept on which the MBTI is based,
resulted in five hypotheses. These hypotheses predicted the

existence of relationships between rating error and

 

1Isabel Briggs Myers, The Meyers Briggs Type
Indicator Manuel, Consulting Psychologists Press, 1962.

73

74

personality, and the nature of those relationships.

In the study each participant was presented with four
different rating tasks. The tasks involved the use of
selected rating scales. Two of the scales were developed by
Truax, measuring Accurate Empathy and Unconditional Positive
Regard.2 Another scale developed by Ivey measured

3 and the fourth was a scale

counselor effectiveness,
designed especially for this study to measure the
effectiveness of public speakers. All of the scales were
modified to a 1 to 7 Likert format. The raters used the
scales to rate audio-taped vignettes of counselor-client
interaction and three speeches designed for use with the
scale. All of the raters rated all of the taped
interactions or speeches. Having all the raters rate all
the segments on all the dimensions, (a total of 72 ratings
per rater), while using such a large number of raters,
allowed for the use of a wide range of statistical
procedures to analyze the rating errors of the different
personality groups.

A variety of methods were chosen as measures of rating

error. For the purposes of this study the rating

errors were operationally defined as follows:

 

2Charles B. Truax, Op. Cit.

3G.E. Ivey, Op. Cit.

4Saal et. al., p. 417.

75

1. Leniency/Severity error was defined as the
relationship of mean ratings to each other. The
higher ratings were considered Eenient and the
lower scores considered Severe.

2. Range Restriction error was dgfined as the
absence of ratee main effect.

3. Interrater Reliability was calculated in two
ways. First, intraclass correlations were used to
measure reliability when units of comparison were
small enough to permit the use of ANOVA. The
second method was used when large group comparisons
were needed. In this case correlations were
calculated between pairs of raters rating the same
individual on the dame dimension. These
correlations were summed through the use of z
transformation and larger correlagions were assumed
to represent greater reliability.

4. The rating error for scales 1 and 2 was determined
by the mean difference between the rating given by a
rater and the predetermined correct rating.

The Myers Briggs Type Indicator (MBTI) measures four
dimensions of personality: Extraversion-Introversion,
Sensing-Intuition, Thinking-Feeling, and Judging-
Perceiving.. The hypotheses developed for this study were
based upon Jungian theory, on which the MBTI is based as
well as upon the recent work of Keirsey and Bates8 which
related the MBTI profiles to management style. When these
theories were combined with the rating error constructs, the

following hypotheses were generated:

1. Ratings made by Sensing/Judging types will be more
Severe than those made by Intuitive/Feeling types.

 

5Ibid., p. 422.

6Ibid., p. 422.

71bia., p. 417.

8David Keirsey and Marilyn Bates, Op. Cit.

76
2. The Range Restriction error of Perceiving types
will be greater than that of the Judging types.

3. Ratings made by Feeling types will have more
Leniency than those of Thinking types.

4. The Introvert's ratings will have less Reliability
than will the Extravert's.

5. There will be more Interrater Reliability within
personality type than in the sample as a whole.

These hypotheses were tested using 56 raters from
undergraduate classes in sociology and nutritional science.
The sample was predominately female and their personality
types as measured by MBTI were fairly evenly distributed
with the exception of the Thinking-Feeling dimension.
Because of the largely female sample there was a
preponderance of Feeling types, the same as there is in the
female population as a whole.

The analysis yielded a number of significant
relationships between personality type and rating errors..
These relationships were found primarily with those
personality characteristics which clearly have an impact on
the evaluative process and when evaluating measurements of
rating error which allowed the use of powerful statistical

procedures.

Discussion of the Findings
Hypothesis One
Hypothesis One was supported by the analysis showing a
significant difference between the Sensing/Judging type's

ratings and those of the Intuitive/Feeling type. These

77

differences were in the predicted direction with the
Sensing/Judging type's ratings being more Severe and the
Intuitive/Feeling type's being more Lenient. While these
differences were not great in magnitude, they were»
consistent across all scales, thus yielding a statistically
significant result. This result supports the notion that
the nature of rating errors can be predicted according to
personality type. It begins to define the nature of that
relationship, and it gives support to the notion of
management styles of Kiersey and Bates whiéh indicates that
a contrast between Sensing/Judging and Intuitive/Feeling

types would occur.

Hypothesis Two

The analysis showed that the Range Restriction error
between Perceiving personality types and Judging types did
not differ significantly. The prediction of this hypothesis
that the Judging types who were characterized as making
quick decisions and having strong opinions would make less
Range Restriction error in comparison to the Perceiving
types who are characterized as being hesitant to make
decisions did not hold true in this sample.
Hypothesis Three

Hypothesis Three predicted that the Thinking types
would make lower ratings than the Feeling types. The
analysis showed that such was not the case. The lack of

difference between the two populations could be attributed

78

to two factors: one, that there was a strong imbalance
between the number of Thinking and Feeling types (15 to 41),
and, two, that the variable reliability was .90 in one
sample and .47 in the other. Despite these difficulties the
number of ratings made was so large that differences between
the two types had a high chance of being identified if they

did, in fact, exist.

Hypothesis Four

Hypothesis Four predicted that the reliability of the
ratings made by the Extravert would be greater than those of
the Introvert. The hypothesis was based on the concept that
the Introvert is more subjectively oriented and therefore
would make less accurate observations of the world than
would the Extravert, whose orientation is more toward the
external world. The results showed no statistically
significant difference between the reliability of the
Extravert and the Introvert on all the scales taken
together. When the results were compared by individual
scales, there were still no statistical differences, yet the
spread between the scores of the two types showed a pattern
which could indicate direction for future research. The two
personality types had similar reliability scores on the
scales which had high reliability, but on the scales which

had lower reliability, the scores were quite divergent.

79

Hypothesis Five

Hypothesis Five predicted that the reliability within
personality groups would be greater than the reliability of
the population of raters as a whole. The results of this
analysis did not Show any differences between the ratings of
given type groups and the combined reliability of all the
raters. There was some variation of the reliability between
the different types, but none that was significant. The
direction of the differences supported the other measures of
rating error so it is possible that, if a design could be
developed to increase the power of the study, some
difference might be found in comparisons of reliability on
this level. However, other directions for research appear

more promising on the basis of this study.

Exploratory Hypotheses

 

The two exploratory hypotheses were used with scales
one and two which had predetermined correct ratings thus
allowing for a comparison of rater accuracy.

The first comparison made was between the Sensing/Judging
types and the Intuitive/Feeling types. The analysis showed
no difference in accuracy on Scale 1, but rating Accurate
(Empathy showed the Sensing/Judging types to be significantly
less accurate in their ratings than the Intuitive/Feeling
types on Scale 2 which rates Unconditional Positive Regard.

The result is interesting for several'reasons; first the

80

Unconditional Positive Regard scale was the most reliable in
the study; thus the differences in ratings between

different personality types is not necessarily most likely
to occur when the scale has low reliability though it is
logical to assume that it might. Other variables may have
more impact. In this case it is possible to make the
conjecture that the difference lies in the interaction
between the rater's personality and the nature of the rating
task. It does seem likely that the Sensing/Judging types
who are described as viewing people as either good or bad9
would have difficulty accurately assessing Unconditional
Positive Regard.

The second exploratory hypothesis again used the first
two scales to check a pattern observed in Hypothesis Two
which was not statistically significant. The pattern was
that the Judging type personality appeared to have less
ratee main effect. It was hypothesized that if the Judging
type had low ratee main effect, which is seen as an
indicator of Range Restriction and poor Reliability, it
would show up in the lack of accuracy in their ratings on
Scale One and Two. The Judging type was no less accurate on
Scale One, but was significantly less accurate than the
Perceiving type on Scale Two. The finding supported the

indications of lower Reliability, and suggests that pursuing

 

9David Keirsey and Marilyn Bates, Op. Cit. pp. 142.

81

research in this direction would be productive.

If the result of the Judging type being a less
reliable rater were substantiated in further research, the
question of theoretical prediction would need to be
studied. It was thought initially that the Judging type
would be more accurate than the Preceiving types because
their readiness to make judgments and strong opinions would.
keep them from making range restriction errors. It is also
possible that such a strength of opinion could work in the
opposite direction by reducing their responsiveness to
differences in ratees. It would be possible to test such a
hypothesis by using a design which contained a large number
of predetermined correct ratings on a variety of rating

tasks.

1 Limitations of the Current Study

One limitation of this study is that the sample was
largely undergraduate females. The generalizability is
therefore limited to undergraduate social science majors who
are female. Another limitation is that while significant
differences were found in the ratings of several personality
types, the proportion of the variance accounted for was
small. This finding suggests that while there is support
for the theoretical links between personality and the nature
and degree of rating error, an insufficient proportion of
the variance is accounted for, so that a practical tool for

selection of raters has not been established. It is

82

possible, however, that future research could account for
sufficient amounts of the variance for personality
assessment to become a tool in decisions regarding rater

selection.

Recommendations for Further Research
The aspects of this study which relate to future
research are sample composition; design; personality type

considerations; and rating scales and rating tasks.

Sample Composition

The Sample of this study was undergraduate students,
predominantly female, who rated counselors and public
speakers. Future research is needed with different
populations. It appears particularly important to repeat
this research with a predominantly male sample, and a sample
whose profession is closer to the one to which one wishes to
generalize , i.e., managers, supervisors, teachers or other
people who are in positions where they are called upon to

rate the performance of others.

Design Considerations

Design difficulties resulted when rating error was
measured using methods involving a two-way analysis of
variance. The methods used to measure Range Restriction
produced an assessment of the quality of the raters' rating.
It assessed ratee main effect as a ratio with the variance

attributable to the rater-ratee interaction. While

83

rater-ratee interaction is a meaningful measure of the
quality of the ratings, it is difficult to develop powerful
methods of comparing groups on these dimensions. The sample
in this study was larger than most studies of raters and the
ANOVA statistic became cumbersome with the two-way
interactions needed. It is suggested that if researchers in
the future continue to use this measure of rating error they
build into their design a larger number of small units of
analysis, and that they consider the limitations of their

computers when designing the study.

Personality Type Considerations

Certain MBTI personality types seem to be likely topics
for future research. In addition, some of the broader
implications of personality interaction with rating presses
also should be considered in designing future research.

The finding that Sensing/Judging types rated
consistently lower than Intuitive/Feeling types emphasized
two important considerations for future research. First,
personality characteristics which are clearly related to the
' process of making judgments are more likely to be predictive
of rating error. Secondly, a typology which uses
combinations of two MBTI factors may be more useful than
categories which use only one factor. This is further
supported by the results of data used to test Hypothesis Six
and Seven where, although the S/J and J were both

significantly in error, the result is more marked for the

84

S/J than for the J alone. Thus it appears that further
research should use combination MBTI types which directly
relate to the evaluating and judging process.

Several implications for the use of MBTI categories
come from the results of the prediction that the Judging
type would make less range restriction error than the-
Perceiving type, which turned out to be false. It appears
that this result is because of the lack of reliability and
accuracy of the Judging types' ratings. It seems that the
hesitancy of Perceiving types to make judgments does not
restrict the range of their responses. What emerges as an
area for future research is the possibility that the
Judging type may be consistently less reliable and less
accurate rater; there is sufficient indication of this
tendency to merit further exploration.

Other implications for further research come out of the
data on the Thinking/Feeling factors. The study's findings
suggest that given the difficulties in the reliability of
this scale, and the difficulties with male-female
distribution in the population, the Thinking/Feeling scale
by itself is not the best area on which to focus research
effort in the future, especially in the area of
Severity/Leniency error. The result relating to Hypothesis
One showed that Feeling preference in conjunction with
Intuition is a good predictor when compared to.the
Sensing/Judging combination, and it is in combination with

other personality dimensions that the Thinking/Feeling

85

dimension is most likely to be useful in further research
into the relationship of personality to rating error.

The personality dimensions of Introversion/
Extraversion is another area for future research. A
response pattern emerged in this study which indicated the
Extraverts may be more resilient to poorly constructed
scales than the Introvert. Further substantiation of the
Extravert's resiliency could have important implications.
It would be useful not only in selection of reliable raters
for low reliability rating tasks but also for high
ambiguity situations such as hiring or student selection

processes when the criteria are not clearly spelled out.

Rating Scales and Rating Tasks

The results of this study have shown that it is
important to consider the Interrater Reliability of the
rating scales when studying the relationship of personality
to rating error. It appears that in some instances the
effect of personality on rating error is increased by low)
Reliability, as was the case for the Interrater Reliability
of Extraverts' and Introverts' ratings. In other cases, the
high Reliability of the scale may have facilitated finding
differences in rating error. This is possibly the case with
reactions to Scale Two, which rated Unconditional Positive
Regard. This scale had the highest Reliability of all the
scales and it was the one on which personality differences

most affected the accuracy of ratings. In designing future

86

research, depending on the objectives of the study, it may
be important to have high Reliability in some cases and low
Reliability in others.

In addition to considering Reliability of scale, it is
important to consider the nature of the task which is being
rated. The Sensing/Judging lack of accuracy in rating
Unconditional Positive Regard implies that certain
personality types may have difficulty recognizing certain
interaction patterns. Either this should be taken into
account in designing a study or it could be the focus of a
study itself.

Besides the implications described above, further study
might have implications for clinical supervisors of a
client-centered orientation. It could have immediate
relevance because the ability to accurately judge another
counselor's skill in giving unconditional positive regard is
an important part of selecting, training, and evaluating a
counselor's performance. Future studies exploring the
relationship of personality to a supervisor's proficiency
might focus on rating accuracy on a variety of scales
developed to measure counselor effectiveness. There is a
wealth of research studying the therapeutic process itself,
'but little studying a person's ability to assess that
process. The present study produces results which indicate
that the ability to assess therapeutic conditions may be

related to personality.

87

Conclusion

This study directly analyzed the relationship of
personality to rating error. The results show that the
relationship of some MBTI types to certain rating errors can
be predicted. The relationships found provide a basis for
replication and a focus for further research. A number of
predicted relationships were not found; however there were
indications as to where future research might find
significant results. There was sufficient evidence to

indicate that future research could be valuable.

APPENDIX A

RATING SCALES USED IN THE RESEARCH

88

APPENDIX A

A Scale for the Rating of Accurate Empathy

Note: You will make only one mark on this sheet for each counselor.

 

1r- - -

Counselor Counselor Counselor
A B C
i
l 1 1
2 g 2 2
E
i
l
3 i 3 3
4 4 4
5 f 5 5
2 l
i
i
!
6 g 6 6
i
l
7 g 7 7
I
l __

 

 

 

 

 

 

Level

Level

Level

Level

Level

Level

Level

Therapist seems completely unaware
of even the most conspicuous of the
client's feelings. His responses
are not appropriate to the mood

and content of the client's
statements and there is no
determinable quality of empathy,
hence, no accuracy whatsoever.

Therapist accurately responds to
all of the client's more readily
discernible feelings. He shows
awareness of many feedings and
experiences which are not so
evident, too, but in these he tends
to be somewhat inaccurate in

his understanding

Therapist unerringly responds to
the client's full range of feeling
in their exact intensity. Without
hesitation he recognizes each
emotional nuance and communicates
an understanding of every deepest
feeling.

Note:

Scale for the Rating of Unconditional Positive Regard

You will make only one mark on this sheet for each counselor.

.Counselor Counselor Counselor
8 C

A

 

 

 

l 1 Level
2 2 Level
3 3 Level
4 4 Level
5 5 Level
6 6 Level
7 7 Level

 

 

 

The therapist is actively offering
advice or giving clear negative
regard. He may be telling the
client what would be 'best' for him
or may be in other ways actively
either approving or disapproving

of his behavior.

The therapist indicates a positive
caring for the client but is a
semi-possessive caring in the sense
that he communicates to the client
that what the client does, or does
not do, matters to him.

The therapist communicates
positive regard without
restriction. There is a deep
respect for the client's worth
as a person and his rights as a
free individual

90

Rating Scale of Counselor Effectiveness

Counselor A

 

 

 

 

 

sensitive ____§____:____- : : :____insensitive
skilled ____:____:____:____:____:____:____unskilled
nervous ____:____:____-____: : : calm
confident ____:____: : : : :____hesitant
attentive __:_:__:_:__:__:__unattentive
gloomy __:_:__:_:__:__:__cheerful
intellient ____:____:____:____:____:____:____unintelligent
irresponsible ____;____:____:____:____:____:____responsible
sincere ____:____:____:____:____:____:____insincere
apathetc __:____:___:___:__:__:__enthusiastic
tense.____:____:____:____:____:____:____relaxed
sociable ____;____:____:____:____:____:____unsociable
shallow‘____:____:____r____:____:____:____deep
careless ____:____:____:____:____:____:____careful
polite - : - : : : rude

 

91

Scale for the Evaluation of Speeches

 

 

 

 

Speaker A
poor excellent

Suitability of Subject: ' 1 2 3 4 5 6' 7
is the subject timely
and Worthwhile?
Thogght Content: Does l 2 3 4 5 6 7
it have depth? Is the
approach fresh and
challenging?
Or anization: Is the l 2 3 4 5 6 7

 

 

introduction adequate? Are
points apparent? Are transitions
clear? Is the conclusion
adequate?

Development of Ideas: Is 1 2 3 4 5 6 7
there adequate use of

repetition, example and

illustration etc.?

 

Use of Language: Does the . l 2 3 4 5 6 7
wording have s1mplicity,

accurateness, vividness

and forcefulness?

Voice and Diction: Is the l 2 3 4 5 6 7
voice pleasant and appealing?

Is there adequate pronunciation

and enunciation?

 

Communication: Does the l 2 3 4 5 6 7
speaker make contact with '

the audience? Is he sincere,

direct and persuasive?

APPENDIX B

VIGNETTES USED TO ESTABLISH CORRECT RATING FOR ACCURATE
EMPATHY AND UNCONDITITIONAL POSITIVE REGARD

92

Appendix B

Vignettes Used to Establish Correct Rating for Accurate

Empathy and Unconditional Positive Regard.

The vignettes used with Scales One and Two are

presented here. They are examples of interactions developed

by Truax to depict the various levels of Accurate Empathy

and Unconditional Positive Regard. Level One is the lowest

rating, level Seven the highest.

Accurate Empathy

Level One:

 

”C:

I wonder if it's my educational background or if it's
me.
M-hm.
You know what I mean.
Yeah. -
[pause] I guess if I could just‘solve that I'd know just
about where to hit, huh?
M-hm, m-hm. Now that you know, a way, if you knew for
sure, that your lack, if that's what it is -- I can't be
sure of that yet [C: No] is really so, that it, it might
even feel as though it's something that you just
couldn't receive, that it, if, that would be it?
Well -- I -- I didn't, uh, I don't quite follow you --
clearly.
Well [pause], I guess, I was, I was thinking that --
that you perhaps thought that, that if you could be sure
that, the, uh, that there were tools that, that you
didn't have, that, perhaps that could mean that these --
uh -- tools that you had lacked -- way back there in,
um, high school [C: Yeah] and perhaps just couldn't
perceive now and, ah --
Eh, yes, or I might put it this way, um -- [pause] If I
knew that it was, um, let's just take it this way -- if
I knew that it was my educational background, there
would be a possibility of going back [T: Oh, so I missed
that now, I mean now, and, uh] and really getting myself
equipped.

93

I see, I was -- uh -- I thought you were saying in some
ways that um, um, you thought that if, if that was so,
you were just kind of doomed.

No, I mean --

I see -- [interrupts] '

Uh, not doomed. Well, let's take it this way, um, as I
said, if, uh, it's my educational background, then I
could go back and, catch myself up [T: I see -- ] and
comelup --

Um."

Level Four

'C:
T:
C:
T:

C:

I gave her her opportunity . . .
Mhm.
. . . and she kicked it over. [heatedly]
Mhm -- first time you ever gave her that chance, and --
she didn't take it? [inquiring gently]
Not She came back and stayed less than two weeks -- a
little more than a week -- and went right straight back
to it. [shrilly] So that within itself is indicative
that she didn't want it. [excitedly] [T answers ”Mhm"
after each sentence.]
Mhm, mhm -- it feels like it's sort of thrown -- right
up in your face. [gently]
Yah -- and now I would really be -- crawling. . .
Mhm.

. . if I didn't demand some kind of assurances -- that,
that things was over with. [firmly]
Mhm, mhm, it would be -- pretty stupid to -- put
yourself in that -- same position wher it could be sort
of -- done to you all over again. [warmly]
Well, it could be -- yes! I would be very stupid!
[shrilly]
Mhm.
. . . because if it'sznot him -- it might be someone

. else._[emphatically]"

Level Seven

T:

...I s'pose, one of the things you were saying there
was, “I may seem pretty hard on the outside to other
people but I do have feelings."

C: Yeah, I've got feelings. But most of'em I don't let
'em off.

M-hm. Kinda hide them. [C, faintly: Yeah.] [long pause]

 

1Truax, op. cit., p. 557.
2Ibid., p. 562.

94

I guess the only reason that I try to hide 'em, is,
seein' that I'm small, I guess I got to be a tough guy
or somethin'.

M-hm

That's the way I, think, I think people might think
about me.

Mm. ”little afraid to show my feelings. They might
think I was weak, 'n' take advantage of me or something.
They might hurt me if they -- knew I could be hurt."

I think they'd try anyway.

"If they really knew I had feelings, they, they really
might try and hurt me.“ [long pause]

I guess I don't want'em to know that I got'em.

Mm.

'Cause then they couldn't if they wanted to.

'So I'd be safe if I, if I seem like a, as though I was
real hard on the ousside. If they thought I was real
hard, I'd be safe.”

Unconditional Positive Regard

Level One:

 

'C: ....and I don't, I don't know what sort of a job will
be offered me, but -- eh --- '

: It might not be the best in the world.

C: I'm sure it won't. [T: And uh.) But --

T: But if you can make up your mind to stomach some of the
unpleasantness of things [C: M-hm] you have to go
through -- you'll get through it. [C: Yeah, I know I
will.] and , ah, you'll get out of here.

C: I certainly, uh, I just, I just know that I have to do
it, so I'm going to do it but -- it's awfully easy for
me, to -- [sighs] well, more than pull in my shell, I-I
just hibernate. I just, uh -- well, just don't do a darn
-- thing.

T: It's your own fault. [severely]

C: Sure it is. I know it is [pause] But it seems like
whenever I -- here -- here's the thing. Whenever I get
to the stage where I'm making active plans for myself,
then they say I'm high. An'

T: In other words they criticize you that --

C: Yeah.

3

Ibid., p. 569.

95

T: So tender little lady is gonna really crawl into her
shell. [C: Well, I'll say 'okay.'] "If they're gonna
throw, if they're gonna shoot arrows at me, I'll just
crawl behind my shield and I won't come out of it."

[forcefully]
C: That's right. [sadly]
T: And that's worse. [quickly]"4

Level Four:

 

”C: It's gettin' so I can't even -- can't even sleep at
night anymore -- roll and toss all, toss all night long

T: Pretty upset?

C: Oh, well, just lay there and think of everything -- and

some of the guys that come in after I did.
there's some of them guys what of gone home,
still in here.

There,

'n' I'm

T: It's sort of up to you when you, as to when you go.

C: You can't do anything?

T: Well, I said, I sort of feel you have been -- ah --
you've been holding down that job -- you still work in

the kitchen, don't ya?
C: Yeah -- [mumbled]

T: O.K., but you -- you been holding that job, and you have
your card, well, O.K. You fouled up somewhere, but

you'll have your card again. And, well, you,

in a sense

showed the staff that you can handle these things,
without getting into difficulties, you are on your way

home.

C: That doggone kitchen detail, detail -- seven
-- just ta scribble bunch of junk. [mumbled]

T: Well, you're sure as hell not gonna get rich
What about this trouble, talking about money
about this trouble you were raising the last
borrowing some money from this gal, have you
decision on that?

cents a day

on it. --
-- what
time? About
come to any

C: Well [pause] I'd rather not say, I ain't gonna say

nothin' as long as that tape recorder's on.

T: Want me to turn it off for a while" -- It's a part of

the project. That's why I sort of feel it's

responsibility to -- to record these things."

 

41616., p. 571.

5Ibid., p. 575.

our

96

Level Seven:

"T:

And I can sort of sense -- and when you want to, when
you feel like it, I'd be glad if you shared some of
those --

What? [abruptly]
I said, when you want to, and when you feel like it, I'd
be glad if you shared some of those feelings with me --

[C1ient, breaking in and speaking with Therapist: Why,
why -- whoa, whoa, whoa --] I'd like to just sort of see
In __

Why, you gettin' rich off this silent character or
somep'n or what? [raucous laughing sound] Ten, fifteen,
twenty dollars an hour? [loudly] Then he just sits here
-- an' that's it, huh? Oh, I know -- [mumbling]

I'd say that's -- that's a good point -- what ya mean --

'[softly]

Oh, I don't know -- [pause]

Well, that -- uh, makes me say something stupid -- uh
[laughs] -- I sometimes get paid fifteen, twenty dollars
an hour, but that, I'm not getting paid --
[interrupting loudly, overtalking Therapist] Why, the
state's paying ya that now, ain't they?

Not for you, no. I thought you might think that.

Who is, then? [insistently]

No, I get a salary from the University for doing
research. [calmly]

Oh -- research! [incredulously]

M-hm -- [pause]

I think that's just a -- roundabout way to put it --
th-that's what, that's what I think.

Well, let's put it this way: I get it, but -- I get
exactly the same salary whether -- I see you or not
[gently]

Oh, there, there probably is a -- there probably is a
-- that type doctors there, but -- uh, but I wouldn't
call it research! [scornfully] -- I, I, I, I, I, I, I
don't know, I don' know, I don' care -- I don' -- I --
[ending in angry confusion]

[speaking with conviction] yell, I'd like to know you --
that, that's not research.”

 

6Ibid., p. 579.

APPENDIX C

RESEARCH PROCEDURES

97

APPENDIX C

Research Procedures

Step 1: Handing out research package:

Hello, I'm Tom Holmes, I appreciate your
willingness to participate in this research. I
think you will find the study interesting and the
feedback after the study useful. I am going to
handout the forms you will use in this research.
If you chose not to participate, please let me
know as I am handing out the material. Do not
open the envelope until I instruct you to do so.

Step 2: Introduction to the overall experiment:

Please open the envelope and remove the stapled
booklet. The directions for the study are on the
top page. Please read them to yourself as I read
them aloud. (read directions)

Step 3: Orientation to the accurate empathy scale:

Now turn to page 1 titled ”A Scale for the rating
of Accurate Empathy". You will notice that there
are seven possible ratings on this scale. Next to
rating levels 1, 4, and 7 are descriptions of
those rating levels of accurate empathy. Read
these to yourself as I read them to you, starting
with a level 1 response. (read levels 1,4, and 7)

I will be playing tape recordings of counselors
working with clients, you will make your ratings
of the counselors responses according to the
descriptions on the scale. Do the counselor
responses represent level 1, 2, 3, 4, 5, 6, or 7
on the accurate empathy scale.

You will notice that on the left hand side of the
scale there are three columns labeled: Counselor
A, Counselor B, and Counselor C. In each column
are numbers corresponding to the level of accurate
empathy. If in your judgment Counselor A showed a
very high degree of accurate empathy then you
should circle 7 under Counselor A. If you feel
that Counselor B exhibited an all most complete
lack of accurate empathy then you should circle a
1 under Counselor b. If you believe the
counselors response to be somewhere in between you
should circle the number which you feel best

Step 4:

Step 5:

98

describes your opinion of the counselors
performance. You will circle only one number for
each counselor.

Introduction to the counselor/client tapes:

The tapes you are about to hear are reenactments
of actual counselor/client interactions. Tape
recordings such as these are often used to assess
students who are being trained as therapists. The
dialogues you will hear are the result of a past
project at another university. The clients
responses have been edited and sOme of the clients
were in a residential treatment setting at the
time of the counseling sessions.

I will play a short sample of a counselors work
with a client. Listen carefully to the counselors
responses. It is the counselors responses which
you are rating, not the client. Make your
judgment as quickly as possible. When you have
decided which level of accurate empathy you feel
best describes the counselors performance,
indicate that by circling the corresponding number
in that counselors column. Remember 1 is the
lowest level, 7 the highest.

Playing the tapes:

I am now going to play the tape for Counselor A.
Rate his level of accurate empathy. The first
person to speak on this recording is the
therapist. (play tape) 0 - 33 Rate the
counselors responses not the client.

Now please record your rating of Counselor A's
level of accurate empathy in the proper column.
We will now repeat the process for Counselor B.
In this recording the client begins. (play tape
of Counselor B) 33 - 55.

Record your rating for Counselor B's level of
accurate empathy....Now here is the recording of
Counselor C. The client begins this recording.
(play tape) 55 - 98.

Please rate Counselor C's level of accurate
empathy....That concludes the accurate empathy
ratings.

99

Step 6: Orientation to the Unconditional Positive Regard
scale:

The next counselor characteristic you will rate is
the level of unconditional positive regard.
Please turn to page two where you will find the
scale for rating this dimension. When the
therapist is communicating a low level of
unconditional positive regard he appears as
described by the narrative next to level one on
the scale. (read level 1)....A description of a
mid-range response is found next to level 4.
(read level 4).....The highest level of
unconditional positive regard is described for
level seven. (read level 7)...

Your ratings are to be recorded in the same way
they were on the last scale. You will make one
rating for each counselor in the proper column for
that counselor.

Step 7: Presentation of the unconditional positive regard
tapes: -

The same three counselors A, B, and C will be
presented again in the same order. Remember you
are rating the counselors responses not the
client.

Here is the recording of Counselor A, please judge
his level of unconditional positive regard. The
client will begin speaking first. (play tape) 95
- 1220

Now rate Counselor A's level of unconditional
positive regard from one to 7 and record you
ratings in the appropriate column. Here is the
recording of Counselor B. In this dialogue the
client again begins. (play tape) 122 - 154.

Rate Counselor B. Now here is Counselor C. The
first voice you will hear is Counselor C. (play
tape) 155 - 186.

Mark your ratings for Counselor C.

That concludes the unconditional positive regard
rating.

Step 8:

Step 9:

100

Introduction to the counselor effectiveness scale:

I would now like you to think of the counselors in
a more general sense. I want you to rate each
counselor on a number of characteristics.

Please turn to the next page. Here you find a
list of counselor characteristics. Read down the
list with me. (read list).

You will be rating each counselor on these
characteristics. You are to place a checkmark at
the point on the line which corresponds to your
opinion as to how a counselor rates each
characteristic.

For example: if you feel counselor A is very
sensitive you would place a check right next to
sensitive. If on the other hand you felt he wa
insensitive you would place a mark next to
insensitive, and of course if felt he was
somewhere inbetween you would place a mark at the
point which most accurately described him in your
mind. You will do the same for each
characteristic, making one check on each line.

There are three copies on this scale, one for each
counselor. On the top of each page is an
indication as to which counselor the scale is for.

Rating the counselors:

I will replay several samples of each counselors.
responses in order to remind you of each
counselor. While listening to the tapes please
put a checkmark indicating your assessment of that
counselor on each characteristic. Record your
first impression.

Begin by rating counselor A. The scale you are
using should have counselor A at the top. I will
play the excerpts from counselor A. Please mark
your scales for the characteristics listed. (play
Counselor A tape) 188 -209. When you are
finished rating please look up.

Now turn to the next page and find the scale for
Counselor 3. Please rate Counselor 8 as I play
the tape of several of his responses. (play
tape). 210 - 224.

Step 10:

Step 11:
speeches:

Step 12:

101

Turn to the next page and find the scale for
counselor C. Rate counselor C as I play several
of his responses. (play tape) Look up when you
are finished. 225 - 244.

That concludes the counselor ratings.
Introduction to the evaluation of speakers:

In this section you will be rating the performance
of individuals as they present short speeches.

The speakers will be rated on seven dimensions
which can be found on the rating scale for the
evaluation of speeches. This can be found on page
6. ‘

Presentation of the scale for the evaluation of

The seven dimensions to be rated are listed here
with there explanation. Please read with me as I
go over the seven dimensions. (read dimensions)

The ratings are again on a seven point scale.
Seven is the highest rating and one is the lowest
rating. After listening to the speaker you will
make a judgment as to their level of performance
on the various dimensions. For example: if you
feel speaker A chose a subject which was very
timely and worthwhile, then you would rate him at
7 on the dimension. If you felt that the
organization of his speech was very poor you would
circle 1 next to that dimension. You will circle
one number representing :your rating for each
dimension shown on the scale. There is a separate
rating sheet for each speaker. It is noted at

the top of each sheet which speaker the scale is
for.

Rating of the Speakers:

The page number should be 6 and the designation at
the top of the page should indicate speaker A. I
will now play the recording of speaker A. (play
tape) 333 - 355. Now please rate speaker A on
the dimensions listed on the scale. Remember 7 is
the highest and l the lowest.

102

Now turn to page 7. There you should find the
rating scale for speaker B. Here is the tape of
speaker B. (play tape). Now rate speaker B. 355
- 375.

Turn to page 8 where you will find the rating
scale for speaker C. Here is speaker C. (play
tape) Now please rate Speaker C on the rating
scale. 375 - 403.

This concludes the research section. Thank you
very much for your assistance.

BIBLIOGRAPHY

BIBLIOGRAPHY

Arthur, A.z. “Response Bias in Semantic Differential.
“British Journal of Sociglogy and Clipical
Psychology, 1966, Vol. 5, pp. 103-107.

Barrett, Gerald, Phillips, James, Alexander, Ralph.
“Concurrent and Predictive Validity Designs: A Critical
Reanalysis." Journal oprplied Psychology, 1967,

Vol. 51, No. 2, pp. 1-6.

Berg, I.A. (Bd.). Response Set in Personality Assessment.
Chicago: Aldine, 1966.

 

Bernardin, John. 'A Recomparison of Behavioral Expectation
Scales to Sumated Scales." Journal of Applied
PsychologyL 1976, Vol. 6l, No. 5, pp. 564-570.

Bernardin, B. John and Smith, Patricia Cain. "A Clarificaton
of Some Issues Regrading the Development and Use of
Behaviorally Anchored Rating Scales.“ Journal of
Applied Psychology, 1981, Vol. 66, No. 4, pp.

458-463.

 

Bernardin, H. John. ”Effects of Rater Training on Leniency
and Halo Errors in Student Ratings of Instructors.”
Journal of Applied Psychology, 1978, Vol. 63, No. 3,
pp. 301-308.

Blalock, Hubert, M. Social Statistics, McGraw—Hill, Inc.
1972.

Borgatta, Edgar F., and Glass, David C. ”Personality
Concomitants of Extreme Response Set.” The Journal Of
Social Psychology, 1961, 55, pp. 213-221.

 

Borman, Walter C. "Consistency of Rating Accuracy and Rating
Errors in the Judgement of Human Performance.”
Organizational Behavior and Human Performance, 1977,
Vol. 20, pp. 238-252.

Borman, Walter C. “Effects of Instructions to avoid Halo
Error on Reliability and Validity of Performance
Evaluation Ratings.“ Journal of Applied Psychology,
1975, Vol. 60, No. 5, pp. 556-560.

 

Borman, Walter C. and Dunette, Marvin, D. "Behavior-Based
Versus Trait-Oriented Performance Ratings: an Empirical
Study.” Journal of Applied Psychology, 1975, Vol. 60,
No. 5, pp. 561-565.

103

104

Braden, Waldo, W. (Editor) Speech Methods and Resources,
Harper 5 Row, N.Y.

Bradway, K. ”Jung's Psychological Types: Classification by
Test versus Classification by Self.“ Jourpgl of
Analytical Psychology, 1964, 9, pp. 129-135.

Brim, Orville and Hoff, David B. "Individual and Situtional
Differences in the Desire for Certainty." Journal of
Abnormal and Social Psychology, 1957, 54, pp.

225-228.

Broen, William E., Jr., and Wirt, Robert D. ”Varieties of
Response Sets.“ Journal of Counseling Psychology,
1958, Vol. 22, No. e, pp. 237- 240.

Bucker, Donald N. “The Predictability of Ratings as a
Function of Interrater Agreement.” Journal of Applied
Psycholpgy, 1959, Vol. 43, No. 1, pp. 60-64.

 

Burnaska, Robert F. and Hollmann, Thomas D. "An Empirical
Comparison of the Relative Effects of Rater Response
Biases of Three Rating Scale Formats” Journal of
Applied Psychology, 1974, Vol. 79, No. 3, pp.
307-312.

Carlyn, Marcia. 'An Assessment of the Myers-Briggs Type
Indicator.” Journal of Personality Assessment, 1977,
Vol. 41, pp. 461-473.

Cascio, Mayne F. and Valenzi, Enzo R. “Behaviorally Anchored
Rating Scales: Effects of Education and Job Experience
of Raters and Ratees". Journal of Appleid ngchology,
1977, Vol. 62, No. 3, pp. 378-382.

Couch, Arthur and Keniston, Denneth. ”Yeasayers and
Naysayer: Agreeing Response Set as a Personality
Variable.“ Journal of Abnormal and Social Psychology,
1960, OVol. 60, No. 2, pp. 151-173.

Damarin, E., and Messick, 8. "Response Styles as Personality
Variables: A Theoretical Integration of Multivariate
Research. ' (Research Belletin _No. RB-65- -10),

Princeton, N. J.: Educational Testing Service, 1965.

De Coths, Thomas A. "An Analysis of the External Validity
and Applied Relevance of Three Rating Formats.”
Qgganizational Behavior and Human Performance, 1977,
Vol.19, pp. 247- 266.

105

Di Teverio, John Kesley. ”The Strength of Sensing-Intuition
Preference on the Myers-Briggs Type Indicator as
Related to Empathetic Discrimination of Overt or Covert
Feeling Messages of Others." Unpublished Doctoral
dissertation, Michigan State University, 1976.

Doonan, Robert Joseph. 'An Analysis of Rating Methodologies
of Empathy, Warmth, and Genuineness. “Doctoral
Dissertation, Auburn University, 1978, Dissertation
Abstracts International, pp. 2978-B 2979-B.

 

Bord, Alexlbert. ”Neutralizing Inequalities in Rating."
The Personnel Journal, 1930, Vol. Ix, No. 6, pp.
466-489.

Freeberg, Norman E. ''Relevance of Rater-Ratee Acquaintance
in the Validity and Reliability of Ratings. ”Journal
pf Applied Psychology, 1969, Vol. 53, No. 6, pp.
518-524.

Goldshmidt, M.L. "Prediction of College Major by Personality
Type.” Journal of Counseling Psychology, 1967, 14,
pp. 302-308.

Greenwood, John M. and McNamara, Walter J. ”Interrater
Reliability in Situational Tests.” Journal of
Applied Psychology, 1967, Vol. 51, No. 2, pp.
101-106.

Guilford, J.P. Psychometric Methods, McGraw-Hill, New
York, 1954.

Hamilton, David. ”Personality Attributes Associated with
Extreme Response Style." Psychological Bulletin,
1968, Vol. 69, No. 3, pp. 192-203.

 

Harvey, O.J., Hunt, D.E., and Shcroeder, H.M. Conceptual
Systems and Personality Organization, New York:
Wiley, 1961.

 

Ivey, A.E. Microcounseling: Innovations in Interviewing
Training", Springfield, 111.: Charles C. Thomas,
1971.

Jung, C.G. Psychological Types. Rev. C.G. Hull. Translated
by H.G. Baynes, Princeton University Press, Princeton,
N.J., 1971.

Keirsey, David, and Bates, Marilyn. Please Understandigg,
Prometheus Nemesis Books, Del Mar, California, 1978.

 

106

Kilmoski, Richard J. and London, Manuel. "Role of Rater in
Performance Appraisal.” Journal of Applied
Psychology, 1974, Vol. 59, No. 4, pp. 445-451.

Kingsburg, F.A. “Analyzing Ratings and Training Raters.”
Journal of Personal Research, 1922, 1, pp. 377-383.

Kneeland, Natalie. ”That Lenient Tendency in Rating.”
Personnel Journal, 1929, 7, pp. 356-366.

Lahey, Mary Anne and Saal, Frank E. ”Evidence Incompatible
with a Cognitive Theory of Rating Behavior." Journal
of Applied Psychology, 1981, Vol. 66, No. 6, pp.
706-715.

Lawlis, G. Frank and Lu, Elba. ”Judgment of Counseling
Process: Reliability, Agreement, and Error."
Psychological Bulletin, 1972, Vol. 78, No. 1, pp.
17-20.

Lee, Raymond, and Malone, Michael, Greco, Susan.
'Multitrait- Multimethod- Multirater Analysis of
Performance Ratings for Law Enforcement Personnel.”

Journal of Applied Psychology, 1981, Vol. 66, No. 5,
pp. 625-632.

McGee, Richard. ”Response Sty1e and Personality Variable: By
What Criterion?“ Psychological Bulletin, 1962, Vol.
59, No. 4, pp. 284-295.

McGee, Richard. “The Relationship Between Response Style and
Personality Variables.” Journal of Abnormal and Social
Psychology, 1962, Vol. 5, No. 5, pp. 347-357.

 

 

 

Mehrens, William A. and Lehmann, Irvin. Measurement and
Evaluation in Education apd Psychology, Holt, Rinehart
and Winston, New York, 1978.

 

Myers, Isabel Briggs. The Myers-Briggs Type Indicator
Manual, Educational Testing Service, Princeton, N. J.,
1962.

Newcomb, Theodore. ”An Experiment Designed to Test the
Validity of a Rating Technique." Journal of
Educational Psychology, 1931, Vol. 22, pp. 279-288.

Rogers, C. R., Gendlin, E. T., Kiesler, D. and Truax, C. B.
The Therapeutic Relationship and its Impact: A Study
of Psychotherapy with Schiophrenics.” Madison:
University of Wisconson Press, 1966. '