~.— - '”" 3” ”P”4, :4 ‘5.
'1 W, ﬂu: h3g1.
"' "' 1' 1111111 :‘II:“I“‘ “3:31:33 1.

111:: ’ : 1 “W' '3
3,... --
1§1"1:1"',,-"::1:,b1ﬁ',1'1 11' "

z . “.4.
' 1:. 13.“:‘1:
3,“:7- : 2: 4: «~12
33"“111-1v‘w “. “.3“
. ‘,~3133~:‘3-L,<..-
1“- “'1 " -. ‘
1', 13131,, 3. in N
1 2'1. :111'3. Q3}3..-'::::'
“mt-I: #11:.-
1' "'1'111? "4i" “‘.:I
: )
:1' 1 -

11131113“- 3’3 3
',. .';':"' 31:73:31": :9'.\ ' I345: “,5:
1-3 5:3,}, 1 3'1"" ‘
"'g"'“' (1'1 3: :11. ~.
, I ' ' 1"" :7 [-'v' ‘1'!- fr, iii? 333,133

, -' "'-x :4»: ..J.I
_: :.:é3.:::=2.:.:. . ,.
4:111 '“":’ 11'1'1'1' ‘1

    
     
     

  

1 “'I"I""11-::1 «:1 11:1 '11 , I' '
1

-::.

' . “1:"'.:':::::“' ’1..--.1 1'
"11"" ""11: '
1:“:

is}! _
(3.? H
M

111

  

::
.,:: “:1 '1“ "1'1'111'11'1
,1, "“111 ::,,.:: "',,,.11:.,, .3,,33 3,3

' ""1'1111,'2,,,,11111, ,3
' 1'11" 1‘“ - 11 1'11:

u. 1,,,,,,,3 .,,,::: 33,,
3,,,,,, 11:1,,31 1,111,,

11,111
:I: :1: 111:1: 1:“

1'..:1:1'
: "' ','
:H“ “'1 " :1 1"

   
            
 

 

   
    
  
   
   
   
  

  

 

 

1,,1,

 
   
   

   

 
 

   
 
  
      
  
   

1 "',I1"11:',33:1 1",‘,1111 '1"
.1111 1“: ::.. 1:11111“::1111111111'1'1.:" =

,3, 11,33,331,3 ,,,,::1, 3,1,,11 ,,,,,3, 3,,: ,3,1 1111:1131 ,,' ,.,,1':1,"3 I‘ ' ﬂ“ #{ﬁli'E' "3;,1
"1'1,'11"11"1,"1',1:3:1""l,','111'1|”'3','1I11,11“1,111' I'11' I'D}: 1 3'3h331-E:6§:fiv ""
1b1”?,1,ﬂ“ﬂ“." " W1 , p::, ' 'ﬁ 'H1E¥¥h% "

1h“'“1::'.1. ”' “M1,:H m1“ ”111 ,1W 1: ,1“ I.

1111? '::' ', 'I '“ "1::1:::| .121- " “11111" 111 “111‘ 111:1

  
 

 
 
  

    
    
  
     
  
   
  

   

WI,

 

 

1
1111111111: 1111 ::,. :,'“1:", 11'1 1:1!" .- -, 5:1"
..: ,3:,,,,',1"'!1'1I‘3I3, ,.. } , ,.33..: I.. 3.3 :1: :::1 3 .3 1:, 3,311, 1:1,,1, 11,1173; ':::3:;,'.,',,' ',,,-;'3.- W333 :,"I“l"~'“:”1 333 333
'l1I"',1'II1"'Q1‘ ": "'I' 11.' :I 1 - '1'1":'“ '."..'1'.'1'l1 " I ' ' ".$l'1‘?}:':"‘.".:‘ '
3’2, :3: . '3: 1 1'1“: n.1,: .’ :11 - 1:.“ .:, :1h,1:':.:i11131\, I,:,3:,.:2-.‘.3, 31-1., 1.4. 3. xg-';:-: '.IN:
I-- .... .2. . “ - "'w . I‘m:- ' - .2 :: '
“111' "J" "I'I' ' ' ‘ ' '“ "' 'I""I"'I"1'“11'1'1"'-"I7‘I1':'.".'“ I'I" "-1.13'11'9 ,11'1'7 ::"'11""1' ' .. .‘::
:11? . ' '1‘ ' '1 2 ' ' ' "I' ,'1‘:.:-:1'«t.:1;::;1§“<'"' 523' " -'1““'I'
L113 :33 3 1 , 3 3 1 ' 33 ‘-é,;,g',,i3il':l3~': «:13 ' 3
11:,,1,31 3 31 ::3 : , 313 3:3'3 3 M,'1'l" (7341,.ggi 1"~‘:;1
@33L3 3'3 11,, 3 3 313 3:,3'33 33 333 ,,1',;')11II132",33|"'I1A'I':"::",'1,333,,16'fhi'11r'"
2:: . , : In: “,-2-.;13'.:' 3‘ I‘ I11, 1,1“. 1131-,- :.I w: 3,, . 3
:','""'1 ,. 3. -' ’ “' ' 3'.1:,"1 “'II'I'I'j'1""..""""' 3,3 ,I 1". 13,1213. :"1,3.'1_:31J3 Q ,1""-'J"' . -
1-? '3. 3 ‘.-' ,1 .“ -‘ I 1. 3 3.11. '3;3"2 3.3,: 3 :1-313 3, '31' ' I" '1111: r ' ,1; "'1 , 11,13,133:- 3,333333.3,333!,3,3 ' [(7.1%
:1“ . A . 2 2 .~ --. 1:"? 3‘ ' ., 1" '“ 1‘ W I“: ,:., "“1:
"'I" ‘ "" I' ' “I'- ’ " ' ' "" '1',"'|"1 1 311193 'l:,,11::: .31.! ,' “:,: : :11, 1“,, 33
31.333 ' 33 ,,, 3 3 2,:I 33 “1. :1.» '::',“,‘,‘I ,,,. : 333,3 113,13 3.3.2::1‘, '1;', 13:13:, .,3~3:,,, :33" “5:5,: .1, :31, 1" 5
'~' 1 " ‘ a" ' ' ' y . I""" " 1 5' h“ -
: I '. "lli',':,'l “."':“.-'-3',.'-,' ':1'31,1"- ::‘3‘," ::: '1 1, I“ t, ,1 {1,1, ..-':' IV", 1:111“ (1:11 [,,13'
1 , .1' ,, , 1,, 3 ,1,1' .3'3: ,1 1: , 1. ,1'1' " “133,13, 13 H P“) v
"' ' ‘ .""" 1'1 111:1: 1 1": '11'1'“ 1-1.“ '13:": "111'1'":::1'11

1II": .‘. ' II1'I.:'-: - 1.1“.1 '1 1111

.‘ , ,2.I:.“,“ .3-,3..1':3,-I, 11,1: :11,” 111,1, 3
: . - “‘“‘ ..'-;.i 1 4: :‘II: : ',13'1313,,"111 M“
. , ,I33,,., ,1 ,“3',3,3 ,3;3,, 1,1313, ,1 111,,3111,|,1: ,1 111111 ,1

3:,
u»:

i' ' 1| "-" ' 11111 :1: ,,:l' 3&3'111 “I?"

LI' "' ' " "I '"'"1"":1.1:LL1':11'11.:::::1 I.1:1:::LIJ1:':L1"I" 13"1111'1. n-

...:.‘.... . 13.)): :: 1111:“ 1.11.1 :1"'11nm11h 191.:"'1':1: .:':'1'.':::'I

 

m, Ill“ “M l \lllll ll lllllllll \\\\\l l/ “ “

1293 10063 3779

 
 

 

This is to certify that the
thesis entitled
The Use of Objective and Subjective Weights

to Model a Medical School Admissions Task

presented by
John B. Molidor

has been accepted towards fulﬁllment
of the requirements for

Ph.D. Educational Psychology

degree in

 

Major professor

Date May 1, 1978

 

OVERDUE FINES ARE 25¢ PER DAY
PER ITEM

Return to book drop to remove
this checkout from your record.

 

Jinks? 8 1992‘

$286

 

 

 

6-) Copyright by
John B. Molidor
1979

THE USE OF OBJECTIVE AND SUBJECTIVE WEIGHTS
T0 MODEL A MEDICAL SCHOOL ADMISSIONS TASK

By
John B. Molidor

A DISSERTATION
Submitted To
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Personel Services

and Educational Psychology

1979

ABSTRACT

THE USE OF OBJECTIVE AND SUBJECTIVE HEIGHTS
TO MODEL A MEDICAL SCHOOL ADMISSIONS TASK

By
John B. Molidor

The purpose of this study was to model and compare how medical
school admissions committee members say they weight information when
making judgments regarding the acceptability of applicants with how
mathematical representations weight the same information.

Two data sets, one representative (correlated) and the other non-
representative (orthogonal), were presented to fifteen admissions com-
mittee members who volunteered to participate in this study. Each
data set contained information on an applicant's GPA, MCAT scores,
personal statement scores, and interview scores. The committee member's
task was: 1) rate each of the applicants (40 total) on an acceptability
scale and 2) report the subjective importance that was attached to
each of the four predictor variables.

This information was used to test the following hypotheses:

l) No relation existed between objective and subjective

weights;

2) A positive relation existed between actual judgments

and judgments generated from objective weights;

3) A positive relation existed between actual judgments

and judgments generated from subjective weights;

4)

5)

John B. Molidor

A positive relation existed between the judgments
generated from both objective and subjective weights;
There was a greater relation between actual judgments
and objectively generated judgments than there was
between actual judgments and subjectively generated

judgments.

Data were collected and analyzed using correlation techniques,

multiple regression, paired t-tests, repeated measures one-way

analysis of variance and post hgg_comparisons. Results showed that:

l)

2)

3)

4)

5)

A significant positive relation existed between ob-
jective and subjective weights, for both data
conditions;

A significant positive relation existed between

actual judgments and judgments generated from ob-
jective weights, for both data conditions;

A significant positive relation existed between

actual judgments and judgments generated from sub-
jective weights, for both data conditions;

A significant positive relation existed between the
judgments generated from both objective and sub-
jective weights, for both data conditions;

For the correlated data, there was not a signifi-
cantly greater relation between actual judgments and
objectively generated judgments than there was between
actual judgments and subjectively generated judgments.

However, for the orthogonal data, there was a significant

John B. Molidor

difference between the correlation of actual judg-
ments with objectively generated judgments and the
correlation of actual judgments with subjectively
generated judgments.

This study concluded that subjective weights were an effective
weighting scheme in modeling how committee members said they utilized
information when making judgments about the acceptability of medical
school applicants. This conclusion resulted from many comparisons:
from the weights themselves to the outcomes arrived at from these
weights. Boundary conditions were established from two data sets:
subjective weights were more effective for correlated data than for
orthogonal data. Thus, subjective weights proved to be a valid
measure to model a medical school admissions judgment task.

Once the comparisons between objective and subjective weights
were made, additional concerns arose centering on alternative
weighting models. Therefore, four additional weighting schemes were
examined: (l) unit weights, (2) random ratings, (3) average weights
and (4) equal weights. Comparisons were made between these four
weighting schemes and the objective and subjective weighting models.
Analyses showed that:

l) There were significant differences between the six

models;
2) The differential weighting models (i.e., objective,

subjective and average) accounted for significantly

John B. Molidor

more variance than did the unit weighting models
(i.e., unit and equal);

3) There were no significant differences between the

differential weighting models;

4) There were significant differences between the unit

weighting models.

From these results, it was concluded that the differential
weighting models were more effective than the unit weighting models
in predicting committee members' judgments. All weighting models
were more effective under the correlated data condition than the
orthogonal condition. These results point to the importance of
examining the outcomes derived from using different weighting schemes
rather than the weights themselves. Thus, under certain data con-

ditions, different weights lead to similar outcomes.

DEDICATED

to Mary Lou Kennedy Molidor and Otto B. Molidor,

my mother and father.

ii

ACKNOWLEDGEMENTS

I would like to express my appreciation to the members of my
dissertation committee, Dr. Stephen L. Yelon, Dr. Sarah A. Sprafka,
Dr. John F. Vinsonhaler, and, in particular, to my dissertation
chairman, Dr. Arthur S. Elstein, for the able guidance, professional
direction, assistance and encouragement which was offered throughout
this research.

Special thanks are given to my family whose support, love, and
encouragement made this study possible; to Jeanne Marie whose confi-
dence, enthusiasm, and support sustained me in rough times; to OMERAD
whose warm atmosphere aided in my growth and knoweldge; to Judy Carley
for her typing of this dissertation; and to all who have made my stay
at Michigan State University a most enjoyable, profitable, and

memorable learning experience.

iii

TABLE OF CONTENTS

LIST OF TABLES .........................

LIST OF FIGURES ........................

Chapter
I. INTRODUCTION .......................

Need .........................
Purpose .......................

Research Questions ..................

Theory ........................
Paradigms of Clinical Reasoning ..........
Problem ......................
Lens Model and Multiple Regression Analysis

II. REVIEW OF THE LITERATURE .................

Medical School Admissions ...............
Judgment .......................
Policy Capturing ..................
Modeling Admissions Tasks .............
Modeling Medical School Admissions Task ......
Subjective Weights .................
Summary ........................

III. DESIGN OF THE STUDY ...................

Population and Sample .................

Stimulus Materials .................. -

Data Sets .....................
Procedures ......................

Intra-Judge and Inter-Judge Reliability ......
Measures .......................
Hypotheses ......................
Analyses .......................
Summary ........................

iv

Page
vii

ix

IV. RESULTS AND DISCUSSION ..................

Relation between Objective and Subjective Weights . . .
Research Hypothesis ................
Statistical Hypothesis ...............
Results ......................
Discussion .....................

Relation between Actual Judgments and Judgments

Generated from Objective Weights ...........
Research Hypothesis ................
Statistical Hypothesis ...............
Results ......................
Discussion .....................

Relation between Actual Judgments and Judgments

Generated from Subjective Weights ...........
Research Hypothesis ................
Statistical Hypothesis ...............
Results ......................
Discussion .....................

Relation between Predicted Judgments from Objective

and Subjective Weights ................
Research Hypothesis ................
Statistical Hypothesis ...............
Results ......................
Discussion .....................

Relation between Actual Judgments and Judgments

Generated from Objective and Subjective Weights . . . .

Research Hypothesis ................
Statistical Hypothesis ...............
Results ......................
Discussion .....................
Four Additional Weighting Models ...........
Unit Weights ....................
Random Ratings ...................
Average Weights ..................
Equal Weights ...................
Comparison of Models ................
Discussion .....................
Differential vs. Unit Weights ...........
Differential Weighting Models ...........
Unit vs. Equal Weights ...............
Summary ........................
V. CONCLUSION ........................
Summary ........................
Limitations of the Study ...............
Implications .....................
Recommendations for Future Research ..........

T38

Page
REFERENCES ........................... I41
APPENDICES
A. Introduction, Instructions, and Correlated Data Set . . . 151
Introduction, Instructions, and Orthogonal Data Set . . . 159
Ratings Given to Applicants in Orthogonal Data Set . . . . 167
Subjective Importance Weights for Orthogonal Data Set . . 169
Ratings Given to Applicants in Correlated Data Set . . . . I71
Subjective Importance Weights for Correlated Data Set . . I73

CD'T'II'TTUOW

Correlation Between Judges' Subjective Weights for
Correlated Data Set ................... 175

H. Correlation Between Judges' Subjective Weights for
Orthogonal Data Set ................... 177

vi

—~—- ‘-

 

#00000)

LIST OF TABLES

Correlation Matrix of Independent Variables -

Correlated Data Set (N=30) ...............

Correlation Matrix of Independent Variables -

Orthogonal Data Set (N=30) ...............

Means, Standard Deviations and Ranges of

Independent Variables .................

Intra-Judge Reliability for 10 Replicated Cases
Inter-Judge Reliability for the Correlated Data Set
Inter-Judge Reliability for the Orthogonal Data Set

Correlations Between Objective and Subjective Weights
(EBSW) and Between Objective and Subjective Rank Order

of importance ([5000) .................

Correlations Between Committee Members' Actual
Judgments and Judgments Generated from

Objective Weights (r¥ ‘ ) ..............

sYobj

Correlations Between Committee Members' Actual
Judgments and Judgments Generated from

Subjective Weights (EVSVsub) ..............

Correlations Between Committee Members'
Judgment Generated from both Objective and

Subjective Weights (EVoijsub) .............

Correlations Comparing Committee Members'
Objective Weighting Models with Subjective

Weighting Models (EVSVobj with EVSVsub) ........

Correlations Between Committee Members' Actual

Judgments and Judgments Generated from Unit

Weights (—¥sVunit

vii

Page

50

51

52

54

56

57

74

81

86

92

96

103

Table Page

4.7 Correlations Between Committee Members' Actual
Judgments and Randomly Generated Judgments (EVSYrand) . 105

4.8 Mean Ratings Given to Each Applicant .......... 107

4.9 Correlations Between Committee Members' Actual
Judgments and Judgments Generated from Average

Weights (EVSVaverage) ................. 108
4.lO Correlations Between Committee Members Actual

Judgments and Judgments Generated from Equal

Weights (EVSquual) .................. llO
4.ll Correlations Between Committee Members' Actual

Judgments and Six Weighting Schemes .......... ll2
4.l2 Correlations Between Committee Members' Actual

Judgments and Six Weighting Schemes .......... ll3
4.13 Repeated Measures One Way Analysis of Variance

for Five Weighting Scheme Models for

Correlated Data .................... ll4
4.14 Repeated Measures One Way Analysis of Variance

for Five Weighting Scheme Models for Orthogonal

Data .......................... ll4
4.l5 Tukey's Post Hgg_Comparisons Between Weighting

Scheme Models ..................... ll6

viii

LIST OF FIGURES

Figure Page
l.l The Lens Model ..................... 13
1.2 Right Hand Side of Lens Model ............. 15
1.3 Modified Lens Model .................. 17
4.1 Relation Between Subjective and Objective Weights . . . 71

4.2 Unit Weighting, Random Ratings, Average Weighting
and Equal Weighting Models ............... 101

ix

Yobj

)

sub

Yrand

A

average

M81

A

equal

DEFINITION OF TERMS

the actual judgments or ratings given to applicants

predicted judgments derived from objective (regression)
weights

predicted judgments derived from subjective weights
beta (objective) weights

subjective importance weights

subjective rank order of the four independent variables

objective (regression) rank order of the four independent
variables

predicted judgments derived from unit weights
unit weights

judgments generated by randomly assigning a rating
to an applicant

predicted judgments derived from the average rating
given to an applicant

average objective weights

predicted judgments derived from equal subjective
weights

CHAPTER I

INTRODUCTION

Every year medical school admissions committees are required not
only to define quality but also to make judgments regarding the ac-
ceptability of applicants based on a definition of quality. The task
of defining quality is perplexing, for the term evokes many diverse
thoughts. Morowitz (1976) wryly drew a parallel between the gifted
scholar Phaedrus who went insane trying to define quality and admis-
sions committees who must similarly try to define quality. This
anxiety-provoking task may lead to schools employing different meanings
of quality, ranging from the very narrow, specific, and well-defined
to the broader, looser, and more general. The fact remains, though,
that medical schools are accepting students based on some inherent
definition of quality.

Quality is often defined by examining certain admissions variables
that are used to select applicants for medical school. For example,
a school may believe that applicants who have high grade point averages
(GPA) and Medical College Admissions Test (MCAT) scores will make
quality physicians. This school might weight academic performance
higher than it weights other selection variables, and so quality would
be measured in terms of academic performance. Another school may feel
that given a certain level of academic skills, applicants who have
high interpersonal skills make quality physicians. Quality would then
be measured by interpersonal skills. Obviously schools do not employ
such clear cut dichotomies in their selection process, but the point
is that certain admissions criteria reflect a school's definition of

quality.

Admissions committees are charged with the task of examining
various admissions criteria, determining their importance, and making
judgments based on these criteria. A committee's conception of
quality is reflected in their judgments about the acceptability of
individual applicants. Quality thus involves the selection and
weighting of predictor variables in order to make judgments.

ﬁgggl

A Herculean task confronts admissions committees in their attempts
to make judgments regarding the quality of medical school applicants.
The need to examine quality is a pressing reality when one considers
some of the pressures being brought to bear on the admissions process.
Consider, for example, the pressures arising from the growing dis-
parity between the number of applicants and the number of places
available. In 1975-76, there were 45,000 applicants for 15,000
places. The number of qualified exceeds the number of places avail-
able. An even more alarming figure is that these 45,000 applicants
submitted over 350,000 applications (Dube and Johnson, 1976).
Selecting qualified students given just the sheer number of applica-
tions poses many logistical problems.

Looking beyond the number of applicants, more problems await
admissions committees. Pressures arise from: making medical schools
representative of the socioeconomic and racial components of the gen-
eral population; the increasing costs of selecting and educating
medical students; the demands to meet society's health care needs;
the consideration of the legal rights of applicants; and the need for
predictive validity studies relating the selection criteria to physician

performance. The task facing admissions committees is formidable indeed.

Therefore, it is all the more reasonable to attempt to model how
committee members weight admissions information in making judgments
about the quality of their applicants. The use of different models
would shed light on how information might be combined to reproduce
committee members' weights and judgments. This lays the groundwork
needed for further communication among judges by providing a common
ground to discuss weights and how these weights can be used to gen-
erate judgments. This communication is necessary for committee
members to determine what they mean by quality and also for meaningful
research to be done in the area of judgment and medical school admis-
sions.

Purpose

The purpose of this study is to model and compare how admissions
committee members say they weight information in making judgments
regarding the acceptability of medical school applicants with how
mathematical representations weight the same information in arriving
at judgments.

The research literature on judgment has shown that a judgment
policy can be represented by a linear model. This policy capturing
has used typically objective (e.g., derived, regression, statistical,
mathematical, beta) weights. It is important to know whether a
judgment policy can be represented by subjective weights based on
judges' reports.

When a judgment policy is represented both objectively and sub-
jectively, the following research question can be considered: What
is the relation between the objective and subjective modeling? To

answer this question, the following performance measures will be

examined: (1) the correlation between objective and subjective
weights; (2) the correlation between actual judgments and judgments
generated from objective weights; (3) the correlation between actual
judgments and judgments generated from subjective weights; (4) the
correlation between objectively generated judgments and subjectively
generated judgments. The use of different performance measures allows
the examination of the weights themselves and the outcomes or pre-
dicted judgments arrived at from these weights. Thus, the relation
between the objective and subjective models is explored in greater
detail.
The following steps are taken to achieve the purpose of this
study:
1) To capture or represent judges' policies, subjectively;
2) To capture or represent judges' policies, mathematically;
3) To compare subjective weights with objective weights;
4) To compare actual judgments with judgments generated
from objective weights;
5) To compare actual judgments with judgments generated
from subjective weights;
6) To compare objectively generated judgments with
subjectively generated judgments.

Research Questions

 

Since it is entirely possible for there to be discrepancies
between objective and subjective weights, different performance
measures are examined. Committee members' objective and sub-

jective weights may differ yet yield predicted judgments that are

correlated highly with their actual judgments. Thus, the comparison

between policies depends on what criterion measures are chosen.

Therefore, the following research questions are considered:

1)

2)

3)

4)

5)

What is the relation between the statistical and
subjective weights?

What is the agreement between actual judgments and pre-
dicted judgments arrived at through the use of objective
weights?

What is the agreement between actual judgments and pre-
dicted judgments arrived at through the use of subjective
weights?

What is the agreement between objectively predicted
judgments and subjectively predicted judgments?

Is there greater agreement between actual judgments and
objectively predicted judgments than there is between

actual judgments and subjectively predicted judgments?

These questions may be stated in the form of the following

broad research hypotheses:

1)

2)

1

Statistical and subjective weights have no relation to
each other;

A positive correlation exists between actual judgments
and judgments generated through the use of statistical

weights;

 

1

The hypotheses are restated in testable form in Chapter 3.

—_——_———'———_’— __,_,_..,._.,.

 

3) A positive correlation exists between actual judgments
and judgments generated through the use of subjective
weights;

4) A positive correlation exists between objectively
predicted judgments and subjectively predicted
judgments;

5) There is a greater correlation between actual judgments
and the objectively predicted judgments than there is
between actual judgments and the subjectively predicted
judgments.

11391

The age-old saying that beauty is in the eye of the beholder
applies to admissions committees' conceptions of quality. Medical
schools not only use different meanings of quality but also individual
committee members within a school employ different meanings. In
talking with admissions committee members an impression is given
that each one knows how to select applicants. Some may tell of a
feeling they have, others may tell of a formula they employ.
Definitions of quality range from the emotional to the scientific.
Committee members know or think they know how they weight information
when making judgments regarding the acceptability of medical school
applicants.

Paradigms of Clinical Reasoning

 

The means that are available for examining the issue of quality
and how people make and think they make judgments emerge in part from
the extensive psychological research in the areas of clinical

judgment and decision making. This research has attempted to identify

relevant invariants of human information processing. For example,
research has been directed toward ascertaining memory capabilities,
how judges weight information in importance, how negative information
is processed, and how information is encoded. Basically, researchers
have been concerned with how to model or characterize judgments or
decisions of clinicians. This modeling has attempted to explain how
clinicians use information to reach judgments or decisions.

This area of research has been called variously problem solving,
decision making, thinking, reasoning, policy capturing, process
tracing, and judgment. The casual or loose employment of these terms
has led to some confusion. To help alleviate this confusion, it is
helpful to conceptualize this research within three major paradigms:
(l) decision making, (2) problem solving, and (3) judgment (Slovic
and Lichtenstein, 1971; Shulman and Elstein, 1975; Slovic et a1. 1977;
Bordage et a1. 1977).

Each of these paradigms addresses specific questions and areas
of interest. They provide the necessary framework and guidelines

needed to focus research. The decision making paradigm is concerned

 

with how one selects a Specific action from a set of alternative
actions. For example, applicants must decide which schools to apply
to; admissions committees must decide whom to reject or invite to
interview; interviewers must decide what to ask next in the interview;
or admissions committees must determine who will comprise the entering
class. In each example the decision maker is working with incomplete
information. There is an uncertainty factor. Probabilities are
associated with the incoming pieces of information as well as the

success or failure of the final action. The decision maker examines

the different alternatives and then decides upon the final course
of action.

The major goal of this paradigm is to determine the ideal way to
make decisions or to assess how naturally made decisions depart from
the ideal. Analyses are directed to prescribing how one ought to go
about making decisions. The work of Edwards (1968), Kahneman and
Tversky (1973), Raiffa (1968), and Fryback (1974) characterizes this
research paradigm.

The problem solving approach views man as a processor of informa-

 

tion operating under the constraints of limited processing capacities.
This paradigm looks at the steps or sequences that are needed to
achieve some goal, given some starting point. These steps lead to
understanding the task environment, the problem solver's repre-
sentation of the task environment, short- and long-term memory
capabilities, and the strategies employed in the solution of a given
problem. Once these components are clearly understood, they are
often simulated by elaborate computer programs which attempt to
reproduce the problem solver's sequences of behavior. The following
example is an illustration of this paradigm: admissions folders are
given to committee members who are asked to "think aloud" as they
made decisions about the acceptability of various applicants. After
a sufficient number of folders had been reviewed, one obtains a
description of each individual's problem-solving processes. These
descriptions are encoded into computer programs to simulate the
problem-solving behaviors of each committee member. These programs
are compared with the actual behavior of the problem solver. If

the theory is adequate, there are no detectable meaningful

differences between the simulation and the actual behavior. Thus,
the problem-solving approach attempts to describe and explain the
behaviors of the problem solver. It is not a prescriptive model.
The work of de Groot (1965), Kleinmuntz (1968), Newell and Simon
(1972), and Elstein et al. (1976, 1978) is representative of this
paradigm.

The judgment paradigm grew out of a mistrust of the use of

 

self-report data and introspection (as exhibited in the problem-
.solving approach). It examines how a judge puts together information
to make a judgment. The concern is not with how a judge ought to use
the information but rather with how the information is used. One looks
at the relative weight (or importance) of each piece of information

as perceived by the judge. This approach attempts to model an act of
judgment. As an example, consider admissions committee members who are
given a set of application folders and whose task is to rate each
applicant on some scale of acceptability. The predictor variables

used to make these ratings might be grade point average, MCAT scores,
and interview scores. The weights of each predictor variables for
each committee member are captured (Naylor and Wherry, 1965) or
represented (Hoffman, 1960) by treating the ratings as the dependent
variables and the predictor variables as the independent variables

in a multiple regression analysis. The psychological processes of
committee members in making judgments are not described by these
weights, but the regression weights are paramorphic representations‘

of committee members' judgments (Hoffman, 1960). That is, a model of
a judge performs like the judge, but there is not a one-to-one

correspondence with the internal process of judgment. This approach

10

is used not only to model an act of judgment but also can be used to
prescribe how judges might change their weighting scheme (hence,
change their judgments). Studies by Hoffman (1960), Goldberg (1970),
Dawes (1972), and Hammond et a1. (1977) are characteristic of this
paradigm.

Problem

In a comprehensive review of this research, Slovic and Lichtenstein
(1971) note that each of these three paradigms has become quite
specialized and has taken paths that have little or no contact with
each other. They recommend an integration of research efforts.
Shulman and Elstein (1975), in their review of this reserach, show
that researchers are starting to integrate their research with other
areas. They cite the work of Tversky and Kahneman (1971, 1973, 1974),
Brehmer (1974), Dawes and Corrigan (1974), and Sprafka and Elstein
(1974) as examples. Shulman and Elstein (1975) state, "... mathe-
matical, prescriptive decision theories appear to be moving toward
greater simplicity as they focus on the task of information-processing
theory: to provide an account of how people actually think and reach
decisions, not how they ought_to."

The work of Cook and Stewart (1975) and Schmitt and Levine (1977)
is of particular interest for this study because these researchers
have followed this movement toward integration and simplicity of
paradigms. Their work has focussed on the use of both subjective
and objective weights in making judgments or decisions. Thus, they'
utilize information gained from the problem-solving and judgment

paradigms.

11

However, research has shown that judges cannot estimate accurately
their combination and weighting rules (Slovic and Lichtenstein, 1971).
Serious discrepancies often exist between judges' subjective and
objective weighting schemes. Bootstrapping, a phenomenon in which
simulated judgments may be better than actual judgments in predicting
some criterion, is cited as evidence that judges cannot describe ac-
curately their weighting schemes. For if judges knew their rules
(i.e. weighting schemes), how could a formula improve on their
judgments? Yet, additional research (Newell and Simon, 1972; Shulman
and Elstein, 1975) has shown that judges can tell you what they are
doing.

This study uses the judgment paradigm in its study of how admis-
sions committee members weight information to make judgments. The
conceptual framework is provided by Brunswik's lens model (1956)
modified by Hammond, Hursch, and Todd (1964) and Hammond (1966) with
the analysis based on the application of multiple regression tech-
niques. The importance of studying judgment in this framework rests
on the fact that the emphasis is placed on the manner in which judges
code and quantify information, not on the relative accuracy of the
statistician over the clinician. As Dawes (1977) puts it "... there
have been a plethora of additional studies showing that the actuarial
approach is superior, and the issue is now--or should be--fairly well
settled." The focus of this research is on judges and how they weight
information to reach judgments.

Lens Model and Multiple Regression Analysis

 

The lens model grew out of the work of Egon Brunswik, a German

psychologist and philosopher who was interested in the psychology of

12

perception. In this framework a person perceives a set of cues which
are combined to form a perception. One apprehends the cues and infers
rapidly what lies beyond the cues. An object is not seen directly,

but rather cues are seen. These cues are integrated to form judgments.
The relationship between a person and his environment (which is
probabilistic) is the object of study in the lens model.

The important elements of the lens model are objects, cues,
judgments, and the relation between any of these elements. The con-
cern is with how a person interprets cues. The lens model represents
the relationship between a perceiver (or judge) and the objects of
perception (or judgments) as mediated by cues whose relationship to both
the perceiver and the object is probabilistic (Elstein et a1. 1978).
This relationship is depicted in Figure 1.1. As can be seen in this
figure, X1 to Xk are the cues or independent variables which are used
to make judgments (Y5). r_X1X2 represents the correlation between
cues l and 2 while Exin represents the correlation between cue l and
the subject's judgments. The criterion values are represented by Ye
while rlee is the correlation between cue l and the criterion value.

Relations between cues, criteria and judgments are expressed as
correlation coefficients. For example, each cue (Xk) is related to
both the criterion (Ye) and to the judge's response (YS). Since
committee members are making judgments on applicants for the first
time, there is no criterion information available. Thus, the focus
of this study is on the right hand side of the lens model (between -
cues and judgments). In addition, instead of having only one

predicted outcome, there are two predicted outcomes resulting from

13

 

 

CUES

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X
,- r—XlYe > r-x1x2 1x1vs\
CRITERION r X2 l r SUBJECT'S
____—X2Ye —X2Ys__1
VALUES = Ye JUDGMENTS = Y,5
’ \r ' r
_kae\ . —Xst
XK /
Figure 1.1

THE LENS MODEL

14

using committee members' objective and subjective weights

(Figure 1.2). As seen in this figure, a committee member's actual
judgments (Ys) are compared with both the judgments predicted
through the use of statistical weights (Tobi) and the judgments pre-
dicted through the use of subjective weights (Vsub). Predicted
judgments are obtained in the following manner:

1) Using statistical weights,

Yobj = § Bi xi’
i=1
where Bi = beta coefficients
Xi = predictor variables
2) Using subjective weights,
Ysub = E SW1 Xi’
i=1
where SW1 = subjective weights
X. = predictor variables

1
The use of multiple regression techniques in the lens model is

fairly straightforward. The relevance of each cue to each judgment
is represented by the correlation between cues and judgments (erYs)'
This relation is known as the utilization coefficient (Hammond et a1.
1964). Once the utilization coefficient and the correlations between
the individual cues are known, one model for capturing a committee
member's objective policy involves an additive linear combination of
cues. The following equation illustrates a judge's objective
weighting strategy or policy:

Y

obj = 81 (GPA) T B2 (MCAT) + 83 (Personal Statement) +

Bu (Interview Score)

15

 

 

CUES

 

 

 

 

 

 

 

 

 

 

X1 Predicted Judgments
Using Statistical Weights = Y b'
x 03
2
‘ \’ <\ Predicted Judgments
Xk Using Subjective Weights = Ysub

 

 

 

 

 

 

Figure 1.2
RIGHT HAND SIDE OF LENS MODEL

16

The beta coefficients, Bis’ provide the objective measure of how
important each cue (Xk) is for each committee member. The sub-
jective weights, SW1, also provide this measure of importance. In
this study three correlation coefficients are of importance:

(1) erVobj’ (2) erVsub’ and (3) [Voijsub (Figure 1.3). The first
correlation, E¥slobj’ refers to the relation between actual judgments
and the predicted judgments obtained through the use of statistical
weights. Squaring this coefficient indicates how well actual
judgments can be predicted by a weighted linear combination of cues.
This is a measure of how well a judge's policy is captured objectively.
Hammond and Summers (1972) refer to this correlation, EVSVObj’ as
measuring cognitive control, that is, the extent to which a judge
controls the execution of his knowledge.

The second correlation, EVsVsub’ is the relation between actual
judgments and the predicted judgments obtained through the use of
subjective weights. The use of these weights refer to the capturing
of a judge's policy, subjectively. Squaring this coefficient also
indicates how well actual judgments are predicted by a linear
combination of cues. The main difference between £¥sVobj and
3¥sVsub is that one correlation uses regression weights while the
other uses subjective weights. Therefore, it seems reasonable to
assume that :YsVsub can also be thought of as a measure of cognitive
control.

The third correlation, EVoijsub’ concerns the relation between
the predicted judgments obtained through the use of both statistical
and subjective weights. A high correlation between these judgments

indicates that predicted judgments generated from statistical weights

17

 

 

 

 

 

 

 

 

 

 
   

 

 

 

    

 

 

 

 

 

 

 

CUES
X1 Committee Member's
Actual Judgment = Ys
x2 . rvsvobj
\
“~: Predicted Judgments . E. .
. ,.% Using Statistical Weights = Yobj r:::::: YSYsub
;':~ - 1 r-Yoijsub
Xk '3, Predicted Judgments ,
Using Subjective Weights = Ysub

 

 

 

 

 

 

NOTE: erVobj and rWsVsub represent cognitive control.

 

Figure 1.3
MODIFIED LENS MODEL

18

are in high agreement with predicted judgments generated from sub-
jective weights. These three correlations allow the comparison of
actual judgments, predicted judgments obtained from statistical
weights, and predicted judgments obtained from subjective weights.
This modification of the lens model provides the conceptual frame-
work to represent a judge's policy, both subjectively and objectively.
To paraphrase Slovic and Lichtenstein (1971), subjective and objective
linear models are capable of: (l) highlighting individual differences
and misuse of information, (2) making explicit the causes of under-
lying disagreement among judges, and (3) providing alternative means
to describe how a person makes judgments. Some of these alternative
means are examined in this study to gain a better understanding of

how to best represent judges' policies.

CHAPTER II

REVIEW OF THE LITERATURE

Problems concerning medical school admissions confront medical
educators and administrators every year. Although these problems
arise from numerous sources, they seem to point to the importance of
exercising sound judgment in the selection of medical school applicants.
Yet when the literature on medical school admissions is examined,
there is little or no convergence with the literature on judgment and
decision making.

This literature review attempts to unite two general areas of
research: (1) medical school admissions and (2) judgment. The review
briefly traces the history of medical school admissions in America
from colonial times to present and examines the judgment research
that can impact on some of the problems facing medical school admis-
sions. It will be shown that the time is opportune for research on
judgment and research on medical school admissions to interact.

Medical School Admissions

 

Through the history of this country, medical school admissions
procedures have changed drastically from a modest beginning of almost
no requirements to the strict requirements of the present day. In
this progress, though, medical school admissions have become a hotbed
of controversy. Insight into these problems can be gained by looking
at some of the roots of medical school admissions in the United States.

Prior to the 1760's none of the colonies had a medical school,
(Bordley and Harvey, 1976). Medical education consisted of going to

Europe or becoming an apprentice to a practicing physician. These

19

20

practitioners may have had a formal education but there was no
assurance. All one needed was a doctor to whom one could be appren-
ticed. In some instances a clergyman or a better educated farmer
became the town doctor if the town was lacking a practicing physician.

Becoming a doctor was quite easy. Admissions consisted of who
you knew or what you knew. For if one could not go to Europe or be-
come an apprentice, one could take correspondence courses or just read
a few books and call oneself a doctor. Admissions requirements were
virtually non-existent at this time.

As America grew out of its infancy, the need to upgrade the medical
profession was seen by many. John Morgan (1735-1789), who with William
Shippen, Jr. founded the Medical College of Philadelphia, wrote "A
Discourse upon the Institution of Medical Schools in America" in 1765.
(The medical departments of the College of Philadelphia and of Kings
College in New York City, founded in 1765 and 1768 respectively, were
the only two medical schools of this period and thus formed the nexus
of medical education.) John Morgan proposed that candidates applying
for admission should have had (1) an apprenticeship, usually three
years with a "reputable" physician, (2) an education in liberal arts,
mathematics, and natural history, and (3) a working knowledge of
Latin; French was also recommended (cf. Bordley and Harvey, 1976).

It was at Philadelphia that admission requirements were instituted
for the first time in a form that would guide the other medical
schools in their selection of applicants.

These requirements remained in effect until the Revolutionary War
when a physician shortage was experienced. Schools were forced to

lower their standards to turn out more physicians. They now required:

21

(l) a two year apprenticeship instead of three and (2) no specific
educational experience. Even with lower requirements, there were
still many who did not bother to attend medical school. They called
themselves doctors and practiced medicine without any formal schooling.
In the early nineteenth century, proprietary medical schools came

into existence in America. Professors were paid directly by the stu-
dents. In many instances it was to the professors' benefit to have
lower admission standards, for more students meant more money. Many
professors, who held no real claim to the title of professor, fought
the raising of standards for medical schools.

The period from 1800-1850 saw the deterioration of admissions
standards and the continuation of proprietary schools and apprentice-
ships. America continued to grow and new medical schools opened to
accommodate the need for physicians. The desire to produce physicians
to meet society's needs resulted in admission requirements being ig-
nored. Once again voices were raised in protest complaining about
the inadequacies of America's system of medical education.

In 1847, the American Medical Association (AMA) was founded with
one of its objectives being the reform of medical education. Most
schools, though, ignored any attempt to upgrade their standards. In
fact, when Charles W. Eliot became president of Harvard University in
1869, he said: "There were no requirements for admission to our
medical schools. To secure admission a young man had nothing to do
but to register his name and pay a fee ..." (Sabin, 1934, cf. Bordley
and Harvey, 1976).

In 1876, the Association of American Medical Colleges (AAMC) was

formed to promote educational standards. Its impact on admissions

22

was not felt until about 1900 when the AAMC felt confident enough to
require that the member medical schools admit only students who had
at least a high school diploma or had passed an examination on the
subjects taught in high school. This requirement resulted in a major
advance in the standardization of the admissions process.

Johns Hopkins University, which had led many revolutions in the
history of medical education, also left its mark on the admissions
process. Most importantly: (l) a baccalaureate degree, or its
equivalent, with emphasis on preliminary education in the sciences
and modern languages was required for admission to medical school and
(2) both men and women students were accepted (Bordley and Harvey,
1976). Thus, admissions requirements reached their highest level of
sophistication in American history.

Another important milestone in the history of the admissions
process occurred when Abraham Flexner wrote a report entitled "Medical
Education in the United States and Canada." The Flexner report
(Flexner, 1910) and the AMA set into motion such reform that 76 medical
schools went out of existence. Many schools raised their requirements
so that applicants had to have at least two years of college. More
required a bachelor's degree. Admission requirements were steadily
increasing. By 1925 most medical schools were using Johns Hopkins
requirements. These requirements would remain in effect until about
World War II.

In the mid-1940's, the AAMC and the Graduate Record Office (GRO)
created a Professional School Aptitude Test (Erdmann et a1., 1971;
Nash, 1977). This was the first required standardized entrance exam

for medical schools. The GRO later merged with the Educational

23

Testing Service and the Medical College Admissions Test (MCAT) was
created. The MCAT along with a bachelor's degree or at least three
years of college in the premedical sciences now formed the basis for
the admissions requirements.

Post-World War II also saw an increased number of applicants to
medical school. Whereas in 1929-30, 76 schools could accept approxi-
mately 6,000 students out of 13,000 applicants, in 1949-50, 79 schools
could accept 7,000 students out of 24,000 applicants. After this
spate of applications, the ratio of number of applicants to places
available returned to about two to one. It is currently about three
to one (Dube and Johnson, 1976). The problem was that schools were
turning away qualified applicants. They were under increasing pressure
from society, the government, the courts, and applicants. These
pressures created the need to examine the admissions process with the
hope of improving the existing one or implementing a new one.

Current admissions processes rely on GPA's, MCAT scores, letters
or recommendation, interviews, extracurricular activities, and
personal (autobiographical) statements to make their decisions.
Although there are movements to examine new areas (e.g. non-cognitive
domains, see Kegel-Flom, 1975; Korman et a1. 1968; Krupka et a1. 1977),
these variables form the foundation upon which decisions are based.

What should be borne in mind, though, is that when one looks at
these variables one is usually assuming that the applicant has been
to college. The standards of admissions have become quite exacting.
For example, the state of Michigan has set the following requirements:
(1) 6 semester hours of the biological sciences, of which 3 must be of

laboratory work, (2) 8 semester hours of chemistry, (3) 6 semester

24

hours of physics, (4) 6 semester hours of English composition and
literature, (5) 6 hours of psychology and/or sociology, (6) 18 hours
in nonscience areas, and (7) completion of 60 total semester hours
exclusive of physical education and military science. Michigan State
University's College of Human Medicine requires in addition that
applicants take the MCAT and write a personal statement telling why
the applicant is interested in medicine and why the applicant is
applying to this particular school.

One can see that as medical schools grew in stature and prestige,
admissions requirements also had to keep pace. However, in this
movement, new problems arose. One obvious problem was how to deal
with increasing applicant pools. The number of places available did
not keep pace with the number of applicants. This necessitated various
screening processes: premedical requirements were established; grades
and MCAT scores became important admissions criteria; and new admis-
sions variables were examined. New ideas concerning medical school
admissions had to be developed.

A corollary of the above problem concerned rejected applicants.
In colonial times, there was no concern for the rejected applicant
because there were none. At present, rejected applicants represent
a large loss to society. Becker et a1. (1973) followed applicants
who were not accepted to medical school and found that fifty—two
percent (52%) of these unsuccessful applicants were lost to the health
care field. These applicants constitute a unique manpower pool with
important academic and social characteristics. These concerns must

be dealt with in the admissions process.

25

The report on the Council of Deans "Ad Hoc Committee to Consider

Medical School Admissions Problems", 1972 noted:

The current situation presents a series of challenges to

the medical schools:

1)

2)

4)

5)

To process applicants efficiently so that this function
is not an undue drain on the institution's resources;
To process applications in a fair and equitable manner
which ensures each applicant a full opportunity to
have his credentials reviewed;

To select from the qualified applicants those who are
most likely to contribute to the fulfillment of the
objective of the educational program of the
institution;

To minimize the financial, academic, and emotional
cost to the applicant;

To assist potential applicants with a realistic
assessment of their potential for success in gaining

admission to medical school.

The committee made the following recommendations:

1)
2)
3)
4)
5)
6)

Define objectives;

Articulate and publish selection factors;
Carefully select and educate committee members;
Establish a uniform acceptance date;

Notify applicants promptly of your decision;

Design admissions policies in accord with public trust.

Char et a1. (1975) conducted a survey of medical schools finding

(1) there is a general feeling of dissatisfaction with the admissions

26

process, especially in the area of assessing personality traits and
selecting for clinical competence; (2) there is an appreciation of
inequities in selection; (3) schools uniformly rely on three parameters--
GPA, MCAT scores, and personal interviews; and (4) there is a diver-
gence of views on the usefulness of interviews.

Dissatisfaction with the process may arise from a host of reasons:
(1) committee members may not know what they are supposed to do; (2)
the philosophies of the school and of members of the committee may
differ; (3) members do not agree with the criteria being used to
select students; (4) members may want more or less input into the
process; or (5) the criteria themselves may be invalid. For example,
Wingard and Williamson (1973), reviewing the literature from 1955-1972,
found little or no correlation between undergraduate grades and subse-
quent career performance. Rhoads et a1. (1974) showed that there is
little objective evidence available to make accurate predictions about
student performance in the clinical courses. These authors believed
that given certain standards of intelligence, premedical preparation,
MCAT performance, acceptable recommendations, and reasonable range of
activities, motivation will determine medical school performance.
Their data also showed an interesting finding: about half of the stu-
dents who excelled in the basic science portion of the curriculum did
so in the clinical portion, while roughly seventy percent of the stu-
dents who excelled in the clinical sciences had not done so in the
basic science area. When these two groups who excelled were looked at
with regard to admissions data, there were minimum differences. In
other words, admissions data could not determine who would do well in

the clinical years.

27

From this review of the history of admissions, the following con-

clusions are drawn:

1)

3)

6)

7)

Admissions has changed drastically throughout the history
of this country from minimum requirements to elaborate
procedures;

Current admissions policies typically examine GPA, MCAT
scores and personal interviews with adjustments made
for autobiographical (personal) statements, letters of
evaluation and extracurricular activities;

The processing of these variables for such a diverse
applicant pool has placed a severe strain on admis-
sions committees;

Some problems surrounding admissions includes:
identifying, measuring and evaluating important admis-
sions criteria; processing applicants efficiently;
selecting applicants who are most suited to one's
programs; minimizing the financial, academic and
emotional costs of the process; and assisting

rejected applicants in assessing their career goals;
Solutions to these problems must be able to
articulate, define and measure relevant admissions
criteria reliably and validly and to gain acceptance
by admissions committee members;

Insight into possible solutions may arise from the
research on judgment;

An important first step is to model how admissions

committee members weight and say they weight

28

admissions variables when making judgments about the
quality of medical school applicants.
Admissions provides a rich content area while the judgment paradigm
supplies the needed methodology and conceptual framework.
Judgment
The psychology of judgment grew out of the work in the area of
psychophysics and was concerned initially with determining sensory
thresholds in order to study perception. This research typically turned
up the following errors: series effects, anchor effects, and errors of
central tendency. Johnson (1972) pointed out that these constant
errors were, at first, treated as distorting influences to be elimi-
nated by a more careful methodology. To some researchers these errors
were to be investigated further. When the emphasis shifted from
attempts to eliminate these effects to attempts to understand them, the
psychology of judgment began to take shape (Johnson, 1972).
In this process the term judgment took on different meanings in
certain situations. For example, the word rating_appears when judg-

ments are made on a scale of numbers; the term evaluation is used when

 

judgments of value are made; decision is used when judgments are to be

made using discrete categories; and preference refers to judgments

 

about personal taste. The general usage of the term judgment is quite
loose. Johnson (1972) clarified the term when he referred to judgment
as the assignment of an object to a small number of specified cate-
gories. Its function is to settle an uncertain state of affairs, and
its critical dimensions are determined by the situation in which judg-
ment occurs. "Judgment begins with unordered objects, events or per-

sons, assigns them to specified response categories so as to maximize

29

the correspondence between the responses and the critical dimension

of the stimulus objects, and thus ends with a more orderly situation,"

(Johnson, 1972). Newell (1968) has defined judgment as a cognitive

process with the following characteristics:

1)

2)

4)

5)

6)

The main inputs to the process--that which is to be
judged--are given and available; obtaining, discovering,
or formulating them is not part of judgment;

The domain of the output--the set of admissible
responses--is simple and well defined prior to the
judgment. The response itself is variously called

a selection, estimation, assertion, evaluation or
classification, depending on the nature of the

domain;

The process is not a simple transduction of informa-
tion. Judgment adds information to the output;

The process is not simply a calculation, or the
application of a given rule;

The process concludes, or occurs at the conclusion of,
a more extended process;

The process is rather immediate, not being extended

in time with phases, stages, subprocesses, etc.;

The process is to be distinguished from searching,
discovering, or creating, on the one hand; and from

musing, browsing, or idly observing, on the other.

Poligy Capturing

Prior to 1960, there was little or no research on how information

was processed in order to reach a judgment. That is, research on

3O

judgment as defined by either Johnson or Newell was sparse. Dawes and
Corrigan (1974) cited Benjamin Franklin's letter to his friend Priestly
which described Franklin's method for processing information to reach

a judgment. This method listed pros and cons, along with appropriate
weights which were then summed down both columns. This approach was
called "moral or prudential algebra". Shulman and Elstein (1975) cited
the study by Wallace (1923) that used linear equations to model corn
judges. A relatively small number of variables accounted for the vari-
ance in a corn judge's judgments. Each of the judges' models was com-
pared with a model of the environment. The comparison is similar to
the left hand side of the lens model. This study predated the research
that was to come in the area of judgment.

It was not until 1960, when Hoffman suggested the use of multiple
regression equations to model the judgment policies of clinical psy-
chologists, that statistical models of how judges weight and combine
information came into prevalent use. Following this paper a large
number of studies were carried out in different task environments.
These tasks involved the modeling of judgments of clinical psychol-
ogists (Goldberg, 1971), stockbrokers (Slovic, 1969), radiologists
(Hoffman et al., 1968), draft boards (Gregory and Dawes, 1972), and
admissions committees (Dawes, 1971; Goldberg, 1977; Dawes, 1977). In
these studies the researchers were concerned with representing the
objective weighting policy of each judge through the use of a linear
model. The linear model predicted outcome judgments fairly accurately

as well as making explicit each judge's weighting policy. Slovic and

31

Lichtenstein (1971)1 in a comprehensive review of this literature
stated that the linear model was a powerful device for predicting
quantitative judgments made on the basis of specific cues. It was
capable of highlighting individual differences and misuse of informa-
tion as well as making explicit the causes of underlying disagreements
among judges in both simple and complex tasks.

Both Newell (1968) and Johnson (1972) have analyzed the major
questions that investigators have asked about judgment. For Johnson,
since most judgments are complex (and objects varyiriseveral dimensions),
the important questions are:

1) What are the dimensions that influence judgment?

2) How much weight does the judge give to each dimension?

3) How are these effects combined by judges?

Newell enumerated the major scientific questions asked about judgment:

1) Upon what information is the judgment based?

2) What is the judgmental law?

3) What is the psychological process or processes which make

possible the lawfully operating judgment?

4) What are the other conditions that influence the judg-

ment and how do they work? Why don't humans make
optimal judgments? Can a machine or algorithm make
judgments as well as humans?

These questions served as guideposts to direct the next stages of

the review of the judgment literature. Specifically, the following-

 

1See also Shulman and Elstein (1975); Slovic et a1. (1977); and Bordage

et a1. (1977) for excellent reviews of literature in the area of
judgment.

32

three areas were focused upon: (1) modeling admissions tasks; (2)
modeling medical school admissions tasks; and (3) modeling judgment
tasks with subjective weights. These areas shaped this study.

Modeling Admissions Tasks

 

Three studies impacting on the admissions process in general came
from the work of Dawes (1971, 1977) and Goldberg (1977). Dawes (1971)
focused on how admissions criteria were combined by the members of a
graduate school admissions committee to predict an applicant's success
in graduate school. Three admissions variables were analyzed: (1)
overall undergraduate grade point average (GPA), (2) an index of the
quality of the undergraduate institution (01), and (3) the total raw
score of the Graduate Record Examination (GRE). Applicants were rated
on a six-point scale. The results showed that a linear model could
account for 78% of the variance of a committees' ratings and that 55%
of the applicants the admissions committee considered could have been
screened out by an equation without rejecting a single individual whom
the admissions committee actually admitted. This screening could
result in an estimated savings of approximately $18 million per year.
These conclusions had important implications for researchers in the
area of admissions:
l) A simple linear combination of the criteria of the
admissions committee did a better job of predicting
performance in graduate school than did the admis-
sions committee itself;
2) The behavior of the admissions committee could be
simulated by a linear combination of the criteria

considered;

33

3) Under certain conditions, the paramorphic repre-
sentation of the judge might be more valid than the
judge himself.

This principle, it will be recalled, is known as bootstrapping
(Goldberg, 1970).

Dawes' research is a classic example of using admissions as a
content area and the judgment paradigm to provide the needed method-
ology. A major difference between his research and this study is the
use of different linear models to capture a judge's policy. This
study includes both objective and subjective models.

Goldberg (1977) modified the regression equation developed by Dawes
so that the probability of receiving an invitation to graduate school
depended on the applicant's GPA and GRE scores. The 01 index was
dropped. The analyses showed that this new equation ranked the appli-
cants in a similar manner as the old equation. Goldberg recommended
that this equation be made available to all applicants. He also argued
for the development of a centralized national application system. The
cost to the applicant and institutions both emotionally and financially
remains high, and this movement could benefit both the applicant and
institution immeasurably.1 Goldberg hoped that his report would
stimulate reports of similar analyses from other institutions. He
also hoped that his report would provoke some further thought on the
fundamental issues relating to the graduate admissions process in

psychology.

 

1Such an application service exists for the medical schools. It is
called the American Medical Colleges Application Service (AMCAS).

34

Dawes (1977) touched on some of these fundamental issues when he

examined case-by-case versus rule-generated procedures for the alloca-

tion of scarce resources. He argued that rule-generated procedures

are superior to case-by—case procedures. Dawes' paper delineates the

advantages of each procedure. The advantages of a case-by-case

approach are that:

1)

2)

Some meritorious variables not considered in a rule
would occur to the decision maker only after looking

at a particular case;

There is a possibility that a weighting system may
appear to be inadequate or misleading only after
knowledge of the distribution properties of the
variables to be weighted;

The decision maker cannot be held accountable later

for errors or for explicit policies that offend someone

else's sensibilities.

The advantages of a rule-generated approach are that:

1)

2)

3)

Since the variables are defined a priori means that
other variables cannot be used to bias a decision
in one direction or another for political or
personal reasons;

Members must decide upon a weighting or combination
system that is to become an important institutional
responsibility;

Committee members are held accountable for their

decisions;

35

4) It is uniform (all people are judged by the same

standards). Dawes viewed this uniformity as a moral
virtue.

After examining these advantages, Dawes cited two examples of
rule-based allocation. The first example was the graduate admissions
study, and the second example concerned the allocation of NDEA and NSF
fellowships to various departments. Rather than have the various de-
partments argue over who should get what, certain rules were instituted
to allocate the fellowships. Various ratios were experimented with
until all fellowships were allocated. Dawes placed his finger on the
heart of the matter when he observed the negative reactions many people
experience in dealing with rule-governed or rule-based procedures.
However, he admonished the reader by stating that to conclude that
unsystematic decision making is superior to rule-based is to argue from
a vacuum. The solution to injustice lies in changing unnecessary,
unfair, or idiotic rules. The commitment to rule-based procedures
involves the capacity to achieve satisfaction and joy from the general
improvement of a social situation. The causative role in benefiting
particular individuals is less evident than it would be in a case-by-
case decision making situation (selected or paraphrased from Dawes,
1977).

Modeling Medical School Admissions Task

 

The studies done on the medical admissions process are very
similar to the three studies on the graduate school admissions process.
In both graduate and medical school admissions, resources are scarce,
the number of applicants is greater than the number of places available,

competition is fierce, and applicants are showing a wider range of

36

qualifications. This research may be divided into two general cate-
gories: (1) those which predict success in medical school and (2)
those which predict committee members' judgments.

The studies of Ambrosino and Brading (1973), Hunka (1964), Mattson
(1969), Milstein et a1. (1976), Schofield (1970), and Simon et a1. (1975)
are representative of research attempting to predict success in medical
school. These studies typically find that regression equations can
predict quite well who would succeed in the first two years of medical
school but that accuracy of prediction declines as an applicant pro-
gresses through school. As in other judgment studies, a relatively
small number of variables can account for most of the variance.

For example, Simon et a1. (1975) compared 23 medical students from
socio-economically disadvantaged backgrounds with 21 regularly admitted
medical students with respect to MCAT scores, GPA's, college ratings,
Part I on the National Boards, and performance in two clerkships.

These two groups of students differed markedly on admission data. At
the end of the second year, average National Board Part I scores
identified two distinct populations, but the average scores of both
groups were clearly above minimum passing level. This study has
interesting implications, because if existing admissions criteria had
been employed, only one of the groups would have been admitted. As it
turned out, though, the group from disadvantaged backgrounds was per-
forming at the average or above average level. Schofield (1970) showed
that there was no significant difference in the achievement of students
selected by full committee deliberation and those selected by a multiple
regression equation (actuarial rankings). The author recommended that

admissions committees' time might be better spent looking at and

37

judging borderline and/or special cases that are not differentiated
meaningfully by an actuarial process.

These studies and others (Funkenstein, 1965; Howell and Vincent,
1967; Matarazzo and Goldstein, 1972; Turner et a1. 1974) have shown the
existing admissions criteria predict success in medical in the pre-
clinical phases but correlate poorly with clinical performance. Faced
with these findings it is easy to see why committee members may experi-
ence dissatisfaction with the admissions process. Funkenstein (1970)
felt that medical schools must select students on the basis of excel-
lence. Different tracks should be set up to allow for the teaching
and training of different physicians. Each track would have a repre-
sentative subcommittee which selects its applicants. Half of the
entering class would be made up of minorities and individuals who have
chosen a specialty or general practice, while the other half would be
made up of superior applicants and those chosen by a lottery.

Models of admissions judgment tasks have not typically examined
how committee members make judgments. Little emphasis is placed on
representing the judgment process and even less emphasis is given to
representing how committee members say they are making judgments.
Little research has been in this area.

A few studies of committee members' judgments have been done by
Ambrosino and Brading (1973), Best et al. (1971) and Padgett et al.
(1976). Best and his co-workers showed how preliminary prediction
equations have been used to help implement the admissions process for
several years at Illinois. The use of these equations facilitated
communication and comparison about various candidates. Also, these

equations were used to predict who would succeed in the first year of

38

medical school. As with other equations, the predictive powers
weakened as the student progressed through medical school.

Ambrosino and Brading (1973) used similar techniques when they
employed stepwise regression procedures to predict applicant averages.
These averages were used as measures of whom to interview. These
regression procedures had a high success rate.

Padgett et al. (1976) used a matching system at the University of
Texas. This system met stringent admissions, economic and behavioral
objectives which resulted in an effective cost-benefit system. However,
such a matching on a national scale, while technically feasible, pre-
sented insurmountable problems.

A slightly different tactic used by Teitelbaum et a1. (1973) was
to design a system in which the admissions committee set its policies
and then selected predictors or variables that would discriminate in a
meaningful manner. They felt that the use of these predictorswas
important for two reasons: (1) it permitted the committee to develop
a formula that reflected its thinking as to what combination of
characteristics a candidate should possess, and (2) it allowed the
committee to explain to applicants the grounds for rejection or
acceptance. There are two approaches that Teitelbaum and his co-workers
considered in developing their system: (1) an empirical approach and
(2) a rational approach. The empirical approach attempts to capture
what has already taken place. This approach usually employs such
techniques as regression analysis, discriminant analysis, factor
analysis, and principal component analysis on the decision already
reached by the committee. In a rational approach variables are selected
before a decision is made about the applicants. The committee agrees

as to how these variables are to be weighted and combined.

39

Various mathematical models using admissions variables have been
tested which successfully predict judgments and performance and yet
these models are not utilized. A possible reason for this may be that
these models rarely considered how individual committee members make
and say they make their judgments. The modeling of committee members'
judgments has not been emphasized sufficiently.

Another possible reason for non-acceptance of these solutions may
rest with resistance to attempts to use rule-generated procedures.
Dawes (1977) noted that decision makers in selection situations may
feel much prouder of having chosen an individual who did very well than
in having established a rule that benefited the whole institution. An
examination of how committee members make and say they make judgments
should shed light on these concerns.

Subjective Weights

 

The major focus of the previous studies has been on predicting
success in school or modeling judges' policies. Also, emphasis has
been on objective (i.e., statistical, mathematical, regression) weights.
Dawes and Corrigan (1974) have shown that the linear model was an ade-
quate representation of human judgment in a large number of instances.
In a series of tests to examine the robustness of the linear model, they
found that unit weights performed as well as differential (objective)
weights in predicting criterion values. They stated that "the whole
trick is to decide what variables to look at and then to know how to
add". Since the data for this study reflect the right hand side of the
lens model (e.g., the correlation between cues and judgments), the unit
weighting schemes will be compared with the differential weighting

schemes. This comparison is similar to the Dawes and Corrigan study

40

but on the opposite side of the lens model. Of importance is whether
unit weights can predict actual judgments as well as objective weights
or subjective weights.

Subjective weights are relatively new to the judgment paradigm.
The work of Cook and Stewart (1975), Martin (1957), Schmitt and Levine
(1977) and Summers et a1. (1970) have shown that much interesting work
can be done with subjective weights. The interesting research question
was whether there were any differences between objective and subjective
models of committee members. The subjective model would be committee
members' self-reports of the importance they attach to the variables
used in rating applicants. The objective model would be mathematical
weights attached to the admissions variables derived from multiple
regression techniques. Then the relation between committee members'
subjective impression of how they weight information and an objective
measure of how they weight information can be determined.

A mistrust of self-report studies and the use of introspection has
often been a prevalent theme in the history of psychology. If there
is a strong positive relation between the two weighting schemes,
support is lent to the hypothesis that judges can relate what they are
doing. If there is little or no relation, support is lent to the
hypothesis that judges cannot accurately estimate their weighting
scheme.

Another area to be examined when comparing subjective and objective
policies is the predicted judgments that are generated through the use
of subjective and objective weights. A possibility exists that judges
may differ in their subjective and statistical weights but the predicted

judgments generated from these weights may be highly related. The

41

emphasis would shift from the weights themselves to the outcome
arrived at using such weights.

Martin (1957) found that a linear model based on the use of
subjective weights was successful in predicting evaluations of student
sociability. Summers et a1. (1970) found that although subjective
weights were successful; a linear model based on regression weights
accounted for 20% more variance. Cook and Stewart (1975) showed that
subjective policy descriptions corresponded fairly closely to the
statistical policy descriptions. For a three-cue task, the subjective
policy accounted for 91% of the maximum linear variance; while for a
seven-cue task, the subjective policy accounted for 74% of the variance.

Summers et a1. (1970) compared subjects' actual judgments with
predicted judgments arrived at through the use of subjective weights
and found that the median correlation was .60. This method was unique
in that it offered an alternative approach in measuring the accuracy
of subjective weights. Typically the accuracy of subjective weights
was measured by correlating objective weights with subjective weights.
These correlations have tended to be low (Hoffman, 1960; Slovic, 1969;
Slovic et a1. 1972).

Cook and Stewart (1975) noted that although there were different
ways to obtain objective weights, there was usually only one method
that had been used to obtain subjective weights. This method was to
have each judge divide 100 points among the predictor variables (Hoffman,
1960). Cook and Stewart compared seven different methods of arriving
at subjective weights and found that there were no significant dif-
ferences among the weighting schemes. This was surprising because the

methods ranged from dividing 100 points to complex configural rating

42

schemes. Although there were no significant differences, it should be
recalled that the subjective policies corresponded fairly closely to
the objective policies.

Schmitt and Levine (1977) felt that more research should be di-
rected to understanding the use of subjective weights. They suggested
a study comparing predicted judgments arrived at through the use of
both subjective and objective weights. They questioned whether the focus
of research should be on subjective rather than objective weights and
suggested that much interesting and important research can be done with
subjective weights.

It is important to investigate the use of subjective weight to
shed some light on the controversy that exists between the different
paradigms. The judgment paradigm usually has ignored subjective
weights while the problem solving paradigm utilized them. The solution
to this controversy may lie somewhere between the stated extremes. The
task environment may prove to be the important determinant in the
analysis of this problem. From these judgment studies, the following
conclusions are drawn:

1) A simple linear combination of the admissions criteria

considered by committee members did a better job of
predicting graduate performance than did the committee
members;

2) The behavior of admissions committee members can be

simulated by a linear combination of the admissions
criteria;

3) A paramorphic representation may be used as a

preliminary screening device;

5)

6)

8)

9)

10)

11)

43

Under certain circumstances, paramorphic representa-
tions may be more valid than the committee members
themselves;

Rule—generated procedures are clearly superior to
case-by-case procedures;

Use of rule-generated procedures does not imply that
decisions have to be dehumanizing;

The few studies that have been done on medical school
admissions can be divided into those which predicted
success in medical school and those which predicted
committee members' judgments;

GPA and MCAT scores predicted success in the first two
years of medical school but lose their predictive
powers as time progresses;

Regression procedures have been used successfully as
measures of whom to invite to interview;

Few of the models used to predict success or judgments
have been adopted;

The few studies done on modeling a judgment policy
with subjective weights have shown that much
promising and interesting work can be done.

Summar

The purpose of this study is to model and compare how admissions

committee members say they weight information in making judgments

regarding the acceptability of medical school applicants with how

mathematical representations weight the same information.

44

The judgment research has shown that a variety of judgment tasks
have been modeled by mathematical representations. These models have
used typically a linear combination of information. Both the pre-
diction of success and actual judgments have been modeled successfully.
In fact, under certain circumstances, the paramorphic representations
may be more valid than the judgments themselves. This modeling has
been useful because it can highlight individual differences and the
misuse of information by judges.

A new area in judgment is subjective weights. The use of subjective
weights in modeling a judgment task allows insight into the question of
how judges say they weight information. How information is weighted
mathematically and how it is weighted subjectively may be two different
things. Understanding these differences expands our knowledge about
judgment. Specifically, it may allow us to discern some of the
problems of judgment in medical school admissions.

There are many problems with existing medical school admissions
processes. Some of these problems have historical roots while others
are just emerging. Admissions has grown from informal processes to
formal procedures.

Through the years three major admissions criteria have emerged:
GPA, MCAT scores and personal interviews. Additional criteria have
included: autobiographical (personal) statements, letters of evalu-
ation and extracurricular activities for which adjustments are made.
However, these criteria predict success in the first two years but
they lose their predictive powers in the clinical years. Given the
sheer number of applicants, the diversity of the pool, and the lack
of predictive powers, a severe strain is placed on admissions com-

mittees.

45

Admissions committees must identify, measure and evaluate
important admissions criteria. Yet, there is little research on how
committees weight these admissions variables. The few available
studies have shown that weighting schemes can be devised that predict
success or judgments but these schemes have not been adopted. Insight
into these concerns may arise from research in the judgment paradigm.

Thus, medical school admissions provides a rich content area to
explore the judgments of committee members. The judgment paradigm
provides the tools and methods of study. Different measures are used
to analyze the success of the different weighting schemes. Questions
of which weights are better, how much agreement there is among com-
mittee members, and how best to use the weights are some of the major

issues addressed in this study.

CHAPTER III

_ DESIGN OF THE STUDY
In this chapter the method and design of this study are described
in seven major sections: (1) Population and Sample, (2) Stimulus
Materials, (3) Procedures, (4) Measures, (5) Hypotheses, (6) Analyses
and (7) Summary.
ngulation and Sample
The subjects for this study were drawn from the current members
of the Admissions Committee of Michigan State University's College of
Human Medicine. Members are elected by their peers from their various
departments for three year terms. Fifteen out of sixteen agreed to
participate in this study of whom four were students. Seven were males
and eight were females. Their ages ranged from 23 to 61; the mean age
was 35. Four M.D.'s, three Ph.D.‘s, two Masters, five Bachelors, and
one medical student who had no college degree comprised the degree
status of the committee members. The average number of years on the
committee was 2 years with the range being from 8 year to 4 years.

Stimulus Materials

 

Two different sets of stimulus materials were created for presen-
tation to committee members. Each set contained an introduction,
instructions, a description of the variables to be used, and the
data (Appendix A and B). Data were presented on four independent
variables: (1) total grade point average (GPA), (2) Medical College
Admissions Test (MCAT) scores, (3) personal statement scores and (4)
interview scores. These are the major variables used in the admissions

process to select medical school applicants (Gee and Cowles, 1957;

46

47

Char et a1. 1975). The following descriptions were provided to each
committee member:

Total GPA. This represented the cumulative grade point average
of the applicant's undergraduate years. The GPA ranged from a low
of 2.00 (C) to a high of 4.00 (A). The average was 3.19 for both the
sample and the actual applicant pool of 1977.

MCAT Score. In 1977, the revised MCAT was given for the first

 

time to medical school applicants. Scoring is much different than

that of the old MCAT, as is the material upon which the applicant is
tested. There are four science-related subtests: content knowledge in
biology, chemistry, and physics, and problem-solving ability in the
sciences. There are two additional tests: quantitative reasoning
ability and reading comprehension. Ordinarily the subtests are re-
ported as separate scores, but because they are highly correlated, an
average score was used in this study to simplify the task. Scores
ranged from a low of l to a high of 14. The average MCAT score was

8 for both the sample and the actual applicant pool.

Personal Statement Score. The personal statement score represented

 

the evaluation given the applicant by two raters who read the two pages
of autobiographical information found in the applicant's formal appli-
cation and the personal statement submitted by the applicant describing
his/her reasons for choosing medicine as a career and for choosing
Michigan State University's College of Human Medicine. The evaluation
of these scores was described by one of five labels ranging from the
extremes of "well above average" (5.0) to "well below average" (1.0).
The average score was 3.0 for both the sample and the actual applicant

pool. This average was based on a five point scale.

48

Interview Score. This score represented the combined recommenda-

 

tion given by two interviewers who had each conducted a fifty to sixty
minute interview with the applicant. Questions in the interview
focused on personal qualities considered important to the student's
successful functioning at this school as well as qualities felt to be
critical to effectiveness as a physician. They included such areas as
problem-solving, maturity, motivation, interpersonal skills, and self-
understanding. Five different labels described the recommendation
given to an applicant. These ranged from "outstanding candidate" (5.0)
to "express reservations" (1.0). The average score was 3.0. This
average was based on a five point scale.

The instructions emphasized that these four variables were a
sample of the information that is available to committee members con-
cerning an applicant. Committee members were asked to accept this
limitation and to make ratings on this information alone.

Data Sets
The examination of various models in the judgment paradigm has

occurred usually under two data conditions, representative and

 

orthogonal. The variables of interest are correlated in the represent-

 

ative condition to the extent believed to prevail in reality and are
uncorrelated in the orthogonal condition.

Brunswik (1955) and Hammond (1972) have argued for the study of
judgment in real situations. They felt that experimental designs
using representative data are to be preferred over designs employing
orthogonal data. The task validity is greater with a representative
design. On the other hand, it is known that the beta weights ob-

tained from a linear model are unstable. When the predictor variables

49

are inter-correlated (multi-collinearity), the weights that are as-
signed to these variables will differ according to the methods used in
computing a regression equation. When the predictor variables are
orthogonal, the beta weights are more stable. Thus, researchers have
used orthogonal data to arrive at cleaner statistical results
(Darlington, 1968).

For this study, two data sets, correlated and orthogonal, were
used to examine the boundary conditions of both the objective and
subjective weighting schemes. For the correlated data set, the
relationships between the four admissions variables were moderate to
high. The correlations ranged from .53 to .69 (Table 3.1). For the
orthogonal data set, the relationships between the variables were
essentially zero. No correlation exceeded |.22| (Table 3.2). However,
each data set had the same means and standard deviation (Table 3.3).
Only the correlations between the variables were changed. The use of
these data sets allowed the examination of each judgment policy under
two conditions.

Procedures

 

Two testing sessions for each committee member were required. In
one session, orthogonal data (in the form of the four variables) about
forty applicants to medical school were presented to a committee member.
The committee member's task was twofold: (1) rate each applicant on
the basis of overall quality by assigning a score from 1 to 7 (one
being a low rating) and (2) verbally report subjective importance
weights for each of the four variables used in evaluating the appli-
cants (Total GPA, MCAT Scores, Personal Statement Scores, and

Interview Scores), first by rank ordering the four variables in

50

Table 3.1

CORRELATION MATRIX OF

INDEPENDENT VARIABLES - CORRELATED DATA SET (N=30)

 

 

 

Independent Total MCAT Personal Interview
Variables GPA Score Statement Score
Total

GPA 1.00

MCAT

Score .69* 1.00

Personal

Statement .61* .60* 1.00

Interview

Score .59* .53* .63* 1.00

 

 

* p < .001

51

Table 3.2
CORRELATION MATRIX OF

INDEPENDENT VARIABLES - ORTHOGONAL DATA SET (N=30)

 

 

 

Independent Total MCAT Personal Interview
Variables GPA Score Statement Score

Total

GPA 1.00

MCAT

Score .22 1.00

Personal

Statement -.19 -.02 1.00

Interview

Score .05 -.09 -.08 1.00

 

 

52

Table 3.3

MEANS, STANDARD DEVIATIONS AND RANGES

OF INDEPENDENT VARIABLES

 

 

 

Independent

Variables Mean Std.Dev. Range
Total

GPA 3.19 .53 2.04 to 4.00
MCAT

Score 8.40 2.87 4 to 14
Personal

Statement 3.00 1.11 1 to 5
Interview

Score 3.00 1.11 1 to 5

 

 

53

importance and second by distributing 100 points among them. A high
number represented a relatively important variable (Appendix C and D).
In the second session, correlated data were presented. The
committee member's task was again twofold: (1) rate each applicant
and (2) report subjective importance weights (Appendix E and F). A
counter-balanced design was used: eight committee members received
orthogonal data in the first session, correlated data in the second
session. The remaining seven committee members received the data sets
in the opposite order. Testing took place in committee members'
offices or conference rooms. Each testing session lasted from 3/4 hour
to an hour. At the end of each session, a short debriefing was held.

Intra-Judge and Inter-Judge Reliability

 

Data on ten applicants were randomly selected to estimate intra-
judge reliability. By correlating the ratings given to an applicant
on the first instance with its replication, a measure of intra-judge
reliability was obtained (Table 3.4). The median correlation for the
correlated data set was .94 with a range of .86 to 1.00 and a mean of
.95. The correlations for the orthogonal data set were lower with a
median correlation of .83, ranging from a low of .42 to a high of .96
with a mean of .80. Once the reliability coefficients were calculated
for each judge, the ten replications were removed from further analy-
sis.

The reliability coefficients reported for the correlated data
set were slightly higher than what has been reported in the literature.
Hoffman et a1. (1968) reported correlations ranging from .60 to .92
with a median of .80. Hoffman (1960) reported intra-judge correlations

ranging from .83 to .88. However, the Hoffman et a1. research used

54

Tab1e 3.4
INTRA-JUDGE RELIABILITY FOR 10 REPLICATED CASES

 

 

 

 

Correlated Data Orthogonal Data
Judge [xx Judge :xx
# 1 l.OO** # 1 .88*
# 2 l.OO** # 2 .85*
# 3 .92* # 3 .93*
# 4 .96* # 4 .88*
# 5 .92* # 5 .57***
# 6 .91* # 6 .67***
# 7 .95* # 7 .42
# 8 .85* # 8 .83*
# 9 .88* # 9 .96*
#10 .95* #10 .88*
#11 .92* #11 .45
#12 .86* #12 .65***
#13 .96* #13 .67***
#14 .91* #14 .90*
#15 .94* #15 .67***
Median = .94 Median = .83
Mean = .95 Mean = .80
Range = .86 to 1.00 Range = .42 to .96

 

 

55

orthogonal data and thus was closer to the results shown for the
orthogonal data set in this study. Hoffman used representative data
but his sample was based on a sample size of four judges.

Goldberg (1968) stated that while the relatively few investiga-
tions of judgmental stability (intra-judge reliability) have concluded
that judges may show substantial consistency in their judgments over
time, the vast majority of reliability studies have focused upon judg-
mental consensus (inter-judge reliability) and have come to widely
disparate conclusions. Goldberg cited some findings of extremely high
agreement on some judgment tasks (e.g., Bryon, Hunt and Walker, 1966;
Goldberg, 1966; Winslow and Rapersand, 1964) and other results of
virtually no consensus (e.g., Brodie, 1964; Gunderson, 1965; Watson,
1967).

It is interesting to note that in this study there was high agree-
ment among the fifteen judges in the correlated data set (Table 3.5).
Correlations between judges ranged from .72 to .96. In the orthogonal
data, the inter-judge correlations were lower (Table 3.6), ranging
from .10 to .89. Hoffman et a1. (1968) showed similar reliabilities
with orthogonal data. Their correlations ranged from -.11 to .83.

The findings of this study showed high consensus on the judgment task
for the correlated and orthogonal data conditions. Coefficient Alpha
(°<), developed by Cronbach (1951), was used to estimate the relia-
bility of the multiple ratings for the thirty applicants. Alpha was
.98 for the correlated data and .95 for the orthogonal data.
Measures
Since the major thrust of this study was on the relation between

objective and subjective weights, the following information was

56

 

 

 

pm. P¢. om. cw. mm. mm. mm. mm. om. om. mm. mm. mm. .Fm. om. m
mm.~ mm.e oo.p om. mm. mm. mm. om. mm. mm. mm. mm. ma. mm. mm. pm. mm. mam
NF.N sm.m oo.~ wm. om. mm. mw. mm. mm. om. mm. em. mm. mm. mm. mm. vﬁ*
po.m me.m oo.— mm. mm. mm. mm. mm. om. mm. mm. em. mm. mm. mm. mﬂﬁ
up.— Nm.¢ oo.~ mm. Nu. mm. om. ww. mm. om. mm. mm. mm. mm. Nﬁﬁ
mm.~ mp.¢ oo.p mm. am. mm. No. mm. mm. pm. em. mm. am. HH¥
mp.~ m¢.m oo.~ as. mm. mm. mm. mm. mm. om. um. mm. Cam
mm.~ om.m cowp um. mm. om. Fm. pm. mm. em. mm. m *
-.p cm.v oo.~ mm. mm. om. am. am. Pa. mm. m *
cm.~ om.m co.p mm. mm. em. ow. om. Na. K *
m~.p nm.¢ 00.? nm. em. mm. mm. mm. m *

.N¢.N oe.¢ oo.~ pm. mm. mm. om. m *
mo.~ mm.¢ oo. mm. mm. mm. e *
o~.~ mo.¢ co. mm. mm. m *
mo.~ om.m oo.~ pm. N *
mo.N oo.¢ oo.F p *
.rwwm mcmuem mH* ¢H* m~* NH¥ Hﬁﬁ oﬁﬁ mﬁ mm 5* m* ma v* m* N* P* wanna

 

 

 

 

 

Hum <h<a omh<4mmmou mzh mom >h~4_m<mgmz macaduzmkzm

m.m «Fame

57

 

 

 

ee. me. _e. me. _e. we. we. we. es. Fe. mm. em. es. De. es. m

mm.P mm.e oo._ me. we. em. oe. me. em. me. Ne. .m. mm. am. me. me. em. wee
mm._ em.m oo.~ me. me. he. mm. em. we. we. on. me. em. me. me. Be. eﬁe
em._ ea.m oo._ me. me. me. mm. mm. .0. mm. we. es. em. am. we. mﬁe
we. Ne.e oo._ Ne. op. we. om. me. em. mm. we. we. we. em. Nﬁe
em. Ne.m oo._ mm. mm. am. Re. Re. mm. me. me. em. em. “ﬁe
Pe._ me.m oo._ m_. _m. we. me. am. em. mm. mm. mm. oHe
me._ em.m oo.F Ne. «N. am. me. me. me. Ne. me. a *
mm. ep.e oo.F ae. me. am. me. so. 05. em. m *
em.p om.m oo._ mm. mm. em. mm. me. ON. A e
mo._ o_.e oo._ he. we. we. as. me. e *
em.~ ee.m oo.F me. on. em. .N. m *
mm.P mo.e co.. .m. me. em. e *
ee._ e_.e co. om. mm. m e
oe._ oo.e oc.p me. N e
eN._ e_.e oo._ F e
.pwmm agree“ mﬁe eﬁe MHe Nﬁe ﬂee oHe me we Ne we we we me we ﬁe amaze

 

 

 

 

 

hum <h<o 4<zowozhmo mzh mom >5~4~m<~gmm muazwummhzﬁ

m.m mpnMH

58

collected for each judge:

1)
2)
3)
4)

5)

subjective importance weights (SWi);

objective (regression) weights (Bi);

actual judgments given each applicant (YS);
predicted judgments generated from subjective
weights (Ysub);

predicted judgments generated from objective
weights (Yobj);

subjective rank order of the four independent
variables (So);

order in which the four independent variables were
stepped in or entered into the multiple regression

equation (00).

This information was used to study the relation between objective

and subjective weights by examining the following correlations:

1)
2)

3)

4)

5)

between objective and subjective weights;

between actual judgments and judgments generated from
objective weights;

between actual judgments and judgments generated from
subjective weights;

between objectively generated judgments and subjectively
generated judgments;

between the subjective rank order of the four independent
variables and the regression rank order of the same vari—

ables.

Each correlation examined one aspect of the relation between objective

and subjective weights.

59

Once these correlations were examined, additional concerns arose

centering on alternative weighting schemes. For example, how might

objective and subjective weights compare with other weighting schemes?

Specifically, how might a unit weighting or random rating model com-

pare to objective and subjective models? To address this, the fol-

lowing data were developed:

1)
2)

);

predicted judgments generated from unit weights (Iunit

judgments which were generated by randomly assigning

an applicant a rating based on a judge's frequency

distribution (Yrand)'

To compare the different weighting schemes, the following correlations

were examined:

1)

2)

3)

4)

between actual judgments and objectively generated
judgments;

between actual judgments and subjectively generated
judgments;

between actual judgments and judgments generated
from unit weights;

between actual judgments and random judgments based

on judges' frequency distributions of actual judgments.

Having examined committee members as individuals, it was decided

to look at models which represented the committee as a group. There-

fore, the following data were developed:

1)

2)

mean judgments which were the average ratings given
to each applicant;
predicted judgments generated from average objective

);

(regreSSTOD) weightS (Yaverage

60

3) predicted judgments generated from equal subjective
importance weights (Yaqual).
The following correlations were examined:
1) between actual judgments and judgments generated from
average objective weights;
2) between actual judgments and judgments generated from
equal subjective weights.
These correlations were then compared to the previously mentioned cor-
relations to examine which weighting scheme best accounted for committee
members' judgment policies.

To summarize, the following models were developed for each com-
mittee member: objective weights, subjective weights, unit weights,
random ratings, mean objective weights and equal subjective weights.
Each of these models yielded predictions that were correlated with
committee members' actual judgments. This was how the success of each
model was determined.

Predicted judgments were obtained in five ways:

1) Objective weights. Judgments were obtained by multi-

 

plying the objective (regression) weights by the
standardized values of the four predictor variables.

The objective (regression) weights were obtained

from multiple regression analysis. The following

equation represented the objective weighting scheme:

iobj = a] (z GPA) + 32 (z MCAT) + 53 (2 Personal Statement)

+ 34 (2 Interview Score)

Where (obj - predicted judgments

Bi

Z

standardized regression weights

standard score

2)

4)

61

Subjective weights. Judgments were obtained by multi-

 

plying the subjective weights by the standardized
values of the four independent variables. Subjective
weights were used as if they were regression weights.
The following equation represented the subjective
weighting scheme:

1?

sub = SW1 (2 GPA) + SW2 (Z MCAT) + SW3 (Z Personal Statement)

+ SW4 (2 Interview Score)

Where (sub predicted judgments

SW1

subjective weights

Z standard score

Unit weights. Ratings were obtained by multiplying

 

the four predictor variables by unit weights (i.e.,
+l's or -l's). For example, an applicant with a GPA
of 3.00, a MCAT of 8, an average personal statement
score and an average interview score would have a
rating of seventeen (l (3.00) + l (8) + 1 (3) + l (3) =
17). The signs of the unit weights were determined

by the multiple regression analyses. The following
equation represented the unit weighting scheme:

A

Y = UW1 (GPA) + UW2 (MCAT) + UW3 (Personal Statement) +

unit
UW4 (Interview Score)

Where Y predicted judgments

unit

UWi

unit weights

Random ratings. Each judge had a frequency distribu-

 

tion associated with the number of times applicants

received a rating of one, two, three, etc. Based on

6)

62

each committee member's frequency distribution,
ratings were assigned randomly to each applicant.
So instead of just randomly assigning a rating from
1 to 7 to an applicant, ratings were assigned ran-
domly to correspond to a frequency distribution.
This model, which might be termed marginal ran-
domness, allowed a determination of whether a
random model might predict actual judgments.

Mean objective weights. Judgments were obtained by

 

multiplying the average weights by the standardized

values of the four independent variables. The

average objective weights were arrived at by re-

gressing the four variables on to the average rating

given each applicant. The average rating was the

sum of each applicant's rating divided by the number

of judges. The following equation represented the

mean weighting scheme:

Y = M31 (2 GPA) + M82 (2 MCAT) + M83 (2 Personal Statement)

average
+ M84 (2 Interview Score)

Where Yaverage = predicted judgments
Mei = average objective weights
Z = standard scores

Equal subjective weights. Judgments were obtained by

 

multiplying equal subjective weights by the standardized
values of the four predictor variables. The following

equation represented the equal weighting scheme:

63

iequa] = 25 (z GPA) + 25 (z MCAT) + 25 (2 Personal Statement)

+ 25 (2 Interview Score)

Where Y = predicted judgments

equal
2 = standard score

Hypotheses

 

The research hypotheses, originally stated in general terms in
Chapter I, are stated operationally as:
1) No relation exists between statistical and sub-
jective weights.
2) A positive relation exists between actual judgments
and predicted judgments obtained through the use of
statistical weights.
3) A positive relation exists between actual judgments
and predicted judgments obtained through the use of
subjective weights.
4) A positive relation exists between predicted judg-
ments obtained through the use of both objective
and subjective weights.
5) There is a greater relation between actual judg-
ments and objectively predicted judgments than
between actual judgments and subjectively pre-
dicted judgments.
The corresponding null hypotheses are stated symbolically in the
following terms:

1) Ho: r = 0

-8iSWi
Where Bi = objective weights

SW1 = subjective weights

64

2) HO: EYsYobj = 0

Where YS = actual judgments

A

Yobj = predicted judgments obtained through the
use of statistical weights.
3) H0: rYsYsub = 0

Where YS = actual judgments

A

Ysub = predicted judgments obtained through the
use of subjective weights.
4) ”°‘ r—Yoijsub T 0

Where Yobj = predicted judgments obtained through
the use of statistical weights.

§sub = predicted judgments obtained through
the use of subjective weights.

5) H0: EYsYobj = EYsYsub
Where YS = actual judgments
§obj = judgments generated from statistical
weights.
Asub = judgments generated from subjective

weights.
Analyses
Each committee member's judgment policy was modeled in four ways:
(1) objective weights, (2) subjective weights, (3) unit weights and
(4) random ratings. The committee as a group was modeled by (1)
average weights and (2) equal weights. These schemes allowed a,
policy to be captured so that the weights and the predicted judgments
arrived at from these weights could be compared. For each committee

member a five-by-five correlation matrix was constructed with the

65

following elements: (1) actual judgments, (2) objectively predicted
judgments, (3) subjectively predicted judgments, (4) unit weight pre-
dicted judgments and (5) random judgments. The highest correlation
between actual judgments and predicted judgments indicated which
weighting scheme best captured a committee member's policy. This
matrix allowed the examination of the effectiveness of different models
in accounting for committee members' judgments.

For the committee as a group a three-by-three correlation was
constructed with the following elements: (1) actual judgments, (2)
predicted judgments arrived at from average weights and (3) predicted
judgments arrived at from equal weights. The highest correlation
indicated which weighting scheme best captured the committee's policy.
The effective use of the different models in accounting for the com-
mittee's judgments was also examined.

Multiple regression was the statistical technique employed to
measure the relation between a dependent or criterion variable and a
set of independent or predictor variables. The major assumptions of
this model are that the variables are measured on at least an inter-
val scale and that the relations among the variables are linear and
additive. It should be noted, though, that non-interval variables,
and nonlinear and nonadditive relations can be handled through the
use of transformations. When multiple regression is used as a de-
scriptive tool, the linear dependence of one variable on the other
variables is summarized and decomposed. The regression analysis
finds the best linear prediction equation and then evaluates the ac-

curacy of this prediction equation.

66

In this study, GPA, MCAT scores, personal statement scores and
interview scores served as the independent (predictor) variables
while committee members' ratings or judgments of acceptability served
as the dependent (criterion) variable. The multiple regression tech-
nique analyzed the relations of the four independent variables to the
one dependent variable. This analysis yielded four objective (regression)
weights which in turn were used to generate predicted judgments. The
other weights (i.e., subjective, unit, average and equal) were also
used like regression weights to generate predicted judgments.

The analyses of the data can be summarized as follows: For each
committee member,

1) subjective importance weights were elicited;

2) multiple regression weights were computed;

3) the objective and subjective weights were correlated;

4) predicted judgments were generated through the use

of regression weights;

5) predicted judgments were generated through the use

of subjective weights;

6) predicted judgments were generated through the use

of unit weights;

7) judgments corresponding to a frequency distribution

were randomly generated;

8) actual judgments, objective judgments, subjective

judgments, unit judgments, and random judgments

were correlated.

67

For the committee at large,

1) a set of predicted judgments were generated based on
the regression weights derived from the mean rating
given each applicant;

2) a set of judgments were generated based on equal
subjective weights;

3) actual judgments, predicted judgments derived from average
weights, and judgments derived from equal weights were
correlated.

Summary

The sample for this study was composed of 15 volunteers, all
members of the admissions committee of the College of Human Medicine,
Michigan State University. Seven were males and eight were females
with faculty, staff and medical students being represented. Most
committee members were experienced judges having served for an average
of two years.

Two different data sets, one correlated and one orthogonal, were
employed to achieve the purpose of this study. Each data set contained
the same introduction, instructions and description of independent
variables. Only the data on the four independent variables were
changed. These variables were total GPA, MCAT scores, personal
statement scores and interview scores.

Two testing sessions were required, one for each data set. The
task was twofold in each session: (1) rate each applicant (40 total)
on the basis of the four independent variables and (2) verbally
report subjective importance weights for each variable. Testing took
place in committee members' offices or conference rooms and lasted

approximately fifty minutes.

68

Data on ten applicants were used for replications to estimate
intra-judge reliability. These reliability estimates were very high
for the correlated data set (.95) and lower for the orthogonal data
(.80). Inter-judge reliability estimates were also very high. For
the correlated data, Cronbach's Alpha was .98. For the orthogonal
data, it was .95. The results showed that the committee members not
only were reliable but also were consistent amongst themselves.

Once a committee member rated all applicants and reported sub-
jective importance weights, the following pieces of information were
collected:

1) Actual judgments and judgments generated from ob-

jective weights, subjective weights, unit weights
and random ratings;

2) Objective, subjective and unit weights.

For the committee as a group, the following pieces of information were
collected:

1) Mean judgments and judgments generated from average

weights and equal weights;
2) Average and equal weights.
It was hypothesized that:

1) No relation existed between objective and sub-

jective weights;

2) A positive relation existed between actual judgments

and judgments generated from objective weights;

3) A positive relation existed between actual judgments

and judgments generated from subjective weights;

69

4) A positive relation existed between objectively gen-
erated judgments and subjectively generated judgments;
5) There was a greater relation between actual judgments
and objectively generated judgments than between
actual judgments and subjectively generated judgments.
To test these hypotheses, multiple regression and correlation anal-
yses were used. Additional analyses examined whether alternative
weighting schemes (e.g., unit, random, average and equal) might perform

as well as the objective or subjective weighting schemes.

CHAPTER IV

RESULTS AND DISCUSSION

In this chapter, descriptive and statistical analyses of the data

are presented and discussed. Data were analyzed primarily by programs

contained in the Statistical Package for the Social Sciences (SPSS),

version 6.5. Procedures included descriptive statistics, correlations,

multiple regressions, t-tests, and repeated measures one way analyses

of variance.

The following research questions were specifically considered

(refer to figure 4.1):

1.

What was the relation between objective and sub-
. . . 7
jective weights (385”).
What was the agreement between actual judgments
and predicted judgments arrived at through the

. . . 7 7
use of objective weights (rYsYobj)'
What was the agreement between actual judg-
ments and predicted judgment arrived at through

. . . , 7

the use of subjective weights (EYsYsub)'
What was the agreement between objectively pre-
dicted judgments and subjectively predicted
- 7 7 7
judgments ([YsubYobj)'
Was there greater agreement between actual judg-
ments and objectively predicted judgments than
there was between actual judgments and sub-

jectively predicted judgments (EYSYobj > EYsYsub)?

In the sections of this chapter, each research question is re-

stated as a hypothesis, relevant data are presented, a statement is

70

71

 

 

 

 

 

 

 

 

 

 

 

 

CUES

X1 “\SW § Predicted Judgments 7

X ,,ze—1%§EETT“ Using Statistical Weights = Yobj
2 ‘ ‘Sy2ygub \
E / k‘obj‘ \ ~\. Predicted Judgments .

Xk ’/‘ka§§h6 _ ..... Using Subjective Weights = Ysub

 

 

 

Figure 4.1
RELATION BETWEEN SUBJECTIVE AND OBJECTIVE WEIGHTS

72

made about whether the hypothesis was rejected or accepted, and the
findings are discussed. The results for each hypothesis are pre-
sented for both correlated and orthogonal data unless otherwise
noted.

After data pertaining to these five hypotheses were analyzed,
four additional judgmental models were examined. They were the unit
weighting model, random rating model, average weighting model and equal
weighting model. Data are first presented for each model and then
compared and discussed for all models.

Relation between Objective and Subjective Weights

 

The first research question concerned the relation between ob-
jective and subjective weights. Objective weights have typically been
used to model and highlight judgment policies (Slovic and Lichtenstein,
1971). Work on modeling policies by using subjective weights has
shown serious discrepancies between the two weighting schemes
(Hoffman, 1960; Slovic, et al., 1972). When examining the use of
subjective weights, the typical performance criterion has been the
correlation between objective and subjective weights.

Subjective weights were elicited from the judges so that they
could be compared with the objective weights derived from multiple
regression analysis. The two sets of weights were correlated with
each other. The use of these correlations was the first step in
examining the question, how well do subjective weights work?

Research Hypothesis

 

Based on previous research findings, it was hypothesized that no
relationship existed between objective weights generated by multiple

regression analysis and subjective weights elicited from judges.

73

Statistical Hypothesis

 

”o‘ raism = 0
”i‘ Eeiswi ’ 0
Where 3_ = correlation coefficient
Bi = objective weights
SW7 = subjective weights
Results

For each judge, two correlations were computed. The first was a
product-moment correlation between objective and subjective weights
(rﬁsw). The second was a rank-order correlation (Spearman rho) between
the order in which the independent variables were entered into the
stepwise regression equation and the judge's ranking of the importance
of the independent variables (£5000). These correlations are pre-
sented in Table 4.1.

For the correlated data set, the median correlation between
objective and subjective weights was .86, a strong positive correla-
tion. The range was from -.12 to .99 with a mean correlation of .84.
Five of fifteen correlations were significant (g_< .05). The median
correlation between the objective and subjective rank orderings was
.40. The range was from -.20 to 1.00. The mean correlation was .86.
Four of fifteen correlations were significant (p_< .05).

The median correlation between the objective and subjective
weights was .88 in the orthogonal data. The range was from .18 to
.99 with a mean correlation of .90. Eight of fifteen correlations were
significant (g_< .05). The median correlation between the objective
and subjective rank orderings was .80. The range was from .20 to
1.00. The mean correlation was .87. Three of fifteen correlations

were significant (g_< .05).

74

 

 

 

 

 

 

 

 

 

Table 4.1
CORRELATIONS BETWEEN OBJECTIVE AND SUBJECTIVE WEIGHTS (EBSW)
AND BETWEEN OBJECTIVE AND SUBJECTIVE RANK ORDER OF IMPORTANCE (35000)
Correlated Data Orthogonal Data

Judge EBSW E-SoOo E88W r-SoOo

# 1 .86 .40 .98** .80

# 2 .98** l.OO** .93 .20

# 3 .61 4O .77 80

# 4 .87 1 OO** .97** 80

# 5 .64 40 .96*** 80

# 6 .99** l.OO** .94*** 80

# 7 -.12 -.20 .18 00

# 8 .88 .80 .51 4O

# 9 .97** .80 .99** 1 OO**
#10 .95*** l.OO** .88 80
#11 .43 80 .20 40
#12 .63 .40 .80 .80
#13 .91*** .20 .98** l.OO**
#14 86 .40 .97** 1 OO**
#15 34 .20 .69 80
Median .86 .40 .88 .80
Mean .84* .86** .90* .87**
Range -.12 to .99 -.20 to 1.00 .18 to .99 .20 to 1.00
* p < .001

** p < .01

*** p < .05

NOTE: The mean correlation is the average of the 15 judges and
is based on an g_of 15. The correlations between objec-
tive and subjective weights are based on an g_of four.
Thus, a mean [_of .84 is significant at the .001 level
while an individual [.0f .91 is only significant at the
.05 level.

75

All of the correlations were transformed to Z_scores using
Fisher's r_to Z_transformation so the mean correlations (in terms of
Z scores) could be tested for significance from zero. All mean cor-
relations were significantly different from zero. For the correlated
data set, the mean correlation between objective and subjective
weights was .84 (t_= 6.24; p_< .01). The mean correlation between the
objective and subjective rank orderings of the independent variables
was .86 (t_= 3.60; p_< .01). In the orthogonal data set, the mean
correlation between objective and subjective weights was .90 (t_= 7.60;
p_< .01). The mean correlation between the objective and subjective
rank orderings of the independent variables was .87 (t_= 4.52; p_< .01).

Confidence intervals were established around the mean correlations.
In the correlated data set, the 95% confidence interval for the mean
correlation between objective and subjective weights ranged from .67
to .93. The 95% confidence interval for the mean rank-order cor-
relation between the regression order and the subjective order of
the importance of the independent variables was from .48 to .97. In
the orthogonal data set, the 95% confidence interval for the mean
correlation between objective and subjective weights was between .79
to .96. The 95% confidence interval for the mean correlation between
the regression order and subjective order of the importance of the
independent variables was from .61 to .96.

Based upon these findings, the null hypothesis was rejected and

the alternate hypothesis, > O, was accepted. On the average,

Eidiswi
there were significant positive correlations between objective
weights and subjective weights elicited from judges. However, note
that there were substantial individual differences. There was at

least one consistent outlier, judge #7, and others who were marginal.

76

Discussion

 

The correlations between objective and subjective weights usually
have tended to be low. Schmitt and Levine (1977) pointed out that the
typical comparison of objective and subjective weights has involved
either an "eyeball" comparison or at best a rank-order correlation
between the sets of weights. This study used two correlation measures:
a Pearson product moment and a Spearman rank order. In interpreting
these significant correlations, a few words of caution are needed.
First, the correlations between the sets of weights were based on only
four variables, an extremely small sample size. An r_of .90 was needed
to reach significance at the .05 level. Correlations based on such a
small g_should be interpreted cautiously.

Second, the significant mean correlations masked individual dif-
ferences. Whereas the individual correlations were based on an g_of
four, the mean correlations were based on an g_of fifteen (across all
judges). A mean :_of only .51 was needed for significance at the .05
level. When individual correlations were examined, only five out of
fifteen correlations were significant in the correlated data set. The
number of significant individual correlations increased from five to
eight in the orthogonal data set. These increases pointed to another
problem, that of multicollinearity. As Kerlinger and Pedhazur (1973)
have pointed out, substantive interpretation of regression coefficients
is difficult and dangerous, and it becomes more difficult and dangerous
as predictors are more highly correlated with each other. The regres-
sion (objective) weights derived from correlated data are unstable.

To correlate them with another set of weights may be inviting certain

problems of interpretation and generalizability.

77

One is faced with deciding how to examine and interpret each data
set. The correlated data were representative of actual admissions
data but yielded unstable objective weights. The orthogonal data set
allowed the interpretation of objective weights to be greatly simpli-
fied but under unrealistic data conditions.

One way to examine and interpret each data set arises from judges'
perceptions of the data. The question of representativeness was pre-
sented to each judge. It dealt with whether judges perceived dif-
ferences between the correlated and orthogonal data sets. In a de-
briefing session that was held at the end of each exercise, the judges
were asked to rate the data sets on a seven-point representativeness
scale (seven being highly representative). To test for perceived dif-
ferences, the mean ratings were compared. The mean ratings were
identical (xr = 5.6). The judges perceived both data sets as being
representative of actual admissions data.

At first glance this was disturbing in light of the fact that the
orthogonal data set was not really representative of admissions data.
What appeared to be happening with the orthogonal data set was what
Abelson (1976) has termed script processing. Judges commented that
certain applicants reminded them of students they knew or of previous
candidates who had applied. In script theory, it is hypothesized that
judges, when faced with a decision, create or employ relevant scripts.
These scripts are based on previous learning or experience. On the
basis of the debriefing session, this script hypothesis was a plau-
sible explanation of why the orthogonal data set was perceived as

representative of admissions data.

78

A second way to examine and interpret the data concerned the
confidence intervals placed around the mean correlations. Using the
correlated data set, the 95% confidence interval for the mean correla-
tion between the objective and subjective weights was between .67 and
.93. This was quite a large range of values. The 95% conficence inter-
val for the mean correlation between the objective and subjective rank
order of importance was even larger, between .48 and .97. With the
orthogonal data set the intervals were smaller. Recall that a confidence
interval is constructed so that it has a known probability (.95) of
including the value of a parameter between its limits. Since these
limits are quite large, caution should be exercised in interpreting
these sample mean correlations.

Thus, research question #1 examined the first step in determining
how well subjective weights worked. Subjective weights were moderately
to highly correlated with objective weights; the correlations were
higher with the orthogonal data than with the correlated data. When the
performance criterion was the correlation between objective and sub-
jective weights, the use of subjective weights was appropriate in
modeling a four-cue medical school admission task.

Although this first step was taken cautiously due to the small
sample size, the small number of significant individual correlations,
and the large confidence intervals, it laid the groundwork for the
examination of other criteria. These criteria were concerned with
how objective and subjective weights were used in the prediction of

actual judgments. The next two hypotheses addressed this issue.

79

Relation between Actual Judgments and Judgments

 

Generated from Objective Weights

 

The second research question involved the relation between actual
judgments and predicted judgments generated from objective weights.
This research question involved a different performance criterion than
the one examined by research question #1. Concern shifted from the
weights themselves to the predicted judgments derived from these weights.
Emphasis was not placed on whether objective weights were representative
of judges' psychological weights. Rather, it was placed on whether
judges' actual ratings could be predicted from weights derived from
multiple regression analysis.

The correlations between actual judgments and predicted judgments
derived from objective weights indicated how well a weighted linear
combination of cue values could predict judges' actual ratings. The
magnitude of these correlations assessed the adequacy of a linear
model using objective weights. These correlations captured the judges'
policies, to the extent that they were linear. Hammond and Summers
(1972) referred to this term as cognitive control, the extent to which
judges control the execution of their knowledge. The squared values
of these correlations represented the variance accounted for by a
linear model based on objective weights.

Research Hypothesis

 

It was hypothesized that there was a positive relation between
actual judgments and judgments generated from objective weights.7

Statistical Hypothesis

 

Ho: rYsYobj =

”i‘ LYsYobj > 0

0

80

Where 3_ = multiple correlation coefficient

YS = actual judgments

qobj = judgments generated from objective weights
Results

The correlations between judges' actual ratings and ratings gen-
erated from objective weights (EYsYobj) are shown in Table 4.2. The
squared values for each of these correlations (2?), the amount of
variance that can be accounted for in committee members' judgments,
are also presented. For the correlated data set, the median correla-
tion between actual judgments and judgments generated from objective
weights was .94. The range was from .85 to .97. The mean correlation
was .94. All individual correlations were significant (g_< .001).
The 3? values ranged from .72 to .94 with a median of .88 and a mean
of .88. On the average, then, a linear model based on objective
weights accounted for 88% of the variance of the actual judgments.

The median correlation between actual judgments and judgments
derived from objective weights was .89 for the orthogonal data set.
The range was from .82 to .97 with a mean correlation of .91. All

individual correlations were significant (p_< .001). The r2 values

ranged from .66 to .95 with a median of .89 and a mean of .83. On
the average, therefore, a linear model based on objective weights
accounted for 83% of the variance of the actual judgments.

All correlations were transformed to Z_scroes so the mean cor-
relations could be tested for significance from zero. Both mean-
correlations were significantly different from zero (for the cor-

related data set, t_= 30.18, p7< .001; for the orthogonal data set,
t = 24.07, p_< .001). The 95% confidence intervals established for

81

Table 4.2
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS
AND JUDGMENTS GENERATED FROM OBJECTIVE WEIGHTS (r “ )

 

 

 

 

—YsYobj
Correlated Data (N=30) Orthogonal Data (N=30)

7 2 7 2

Judge EYsYobj ’1 £YsYobj 5
# 1 .97* .94 .94* .88
# 2 .94* .88 .82* .66
# 3 .96* .92 .91* .84
# 4 .97* .93 .95* .90
# 5 .94* .88 .90* .81
# 6 .95* .90 .89* .79
# 7 .96* .91 .87* .76
# 8 .93* .86 .87* .76
# 9 .95* .91 .97* .95
#10 .90* .81 .92* .85
#11 .92* .85 .89* .79
#12 .85* .72 .86* .75
#13 .93* .87 .85* .73
#14 .94* .88 .91* .83
#15 .96* .92 .90* .81
Median .94 .88 .89 .79
Mean .94* .88 .91* .83

Range .85 to .97 .72 to .94 .82 to .97 .66 to .95

 

 

 

* p < .001

82

the mean correlations were from .93 to .96 (correlated data set) and
from .88 to .93 (orthogonal data set), very narrow and high intervals.

Based upon these findings, the null hypothesis was rejected and
the alternate hypothesis, :YsYobj > O, was accepted. There were sig-
nificant positive correlations between actual judgments and judgments
generated from objective (regression) weights.

It was suggested earlier that although the correlation between
objective and subjective weights might be low, objective and sub-
jective weights might yield predicted judgments that would correspond
highly with actual judgments. This performance criterion dealt with
predicted judgments, not weights. To test for any relation between
judgments and weights, a rank-order correlation was run. This analysis
examined how the correlations between objective and subjective weights
were related to the correlations between actual judgments and judg-
ments generated from objective weights. For example, does a low cor-
relation between objective and subjective weights insure a low correla-
tion between actual judgments and judgments generated from objective
weights? A Spearman rank-order correlation showed that there was no
significant relation between the correlations of objective and sub-
jective weights and the correlations between actual judgments and judg-
ments generated from objective weights (rho = -.14).

Knowing the correlation between a judge's objective and sub-
jective weights tells little about the relation of that judge's actual
ratings and ratings predicted from objective weights. To draw con-
clusions based soley on the correlations between objective and sub-

jective weights would be to ignore an important relation.

83

Discussion

 

The result that there were significant positive correlations be—
tween actual judgments and objectively generated judgments was conso-
nant with the judgment research that has shown linear models to be
good approximations in many decision-making situations (Dawes and
Corrigan, 1974). Correlations between actual judgments and judgments
generated from objective weights are typically quite high, ranging
from values in the .70's to the .90's (Cook and Stewart, 1975; Hoffman,
1960; Slovic and Lichtenstein, 1971). Probably of more importance
than the mean correlation being significantly different from zero, was
the confidence interval placed around the mean correlation. This
interval had a known probability (.95) of including the population
parameter within its limits. One can be 95% confident that the mean
correlation is between its limits. A small interval with a high degree
of confidence was a preferred state. For the correlated data set, the
mean correlation between actual judgments and judgments generated from
objective weights was quite high (.94) with a very narrow 95% confi-
dence interval of .93 to .96 (between 86% and 90% of the variance
accounted for). For the orthogonal data set, the mean correlation was
again quite high (.91) with a 95% confidence interval of .88 to .93
(between 77% and 86% of the variance accounted for).

The confidence limits showed that the correlations between actual
judgments and judgments generated from objective weights were quite
high. Although they do fall within the ranges reported from earlier
research, they are slightly higher than those reported by Cook and
Stewart (1975) who had their judges perform a similar task. The mean
correlations in their study ranged from .89 to .91 (between 79% and

82% of the variance accounted for).

84

The reasons why the correlations of this study were so high may
be answered by Dawes and Corrigan (1974). They stated that linear
models work because the situations in which they have been investigated
are those in which: (a) the predictor (independent) variables have
conditionally monotone relationships to criteria; (b) there is error
in the dependent variable; (c) there is error in the independent vari-
ables and (d) deviations from optimal weighting do not make much
practical difference. The task environment using the correlated data
closely paralleled the above conditions. Each of the predictor vari-
ables had a conditionally monotone relationship with the criterion.
Presumably, "more is better" in the admissions process. The higher
one's GPA, MCAT scores, personal statement score, and interview score
are, the better are one's chances for receiving a higher rating. Two
other elements, errors in the dependent and independent variables,
were also present in this study. The last feature, that deviations
from optimal weighting do not make much practical difference, involved
the comparison of the various weighting schemes, and is addressed
later in the chapter.

It may be concluded that a linear model using objective weights
is a good approximation in predicting a committee member's actual judg-
ments for both data conditions. Another example is added to the
growing number of tasks modeled by a linear model. This model also
provided the upper limit for examining the effectiveness of the use
of differential weights in predicting actual judgments.

Relation between Actual Judgments and Judgments

 

Generated from Subjective Weights

 

The third research question dealt with the relation between

actual judgments and judgments generated from subjective weights.

85

Again, concern was not with the weights themselves but with the cor-
relation between actual judgments and predicted judgments. The
performance criterion dealt with how well judges' actual ratings could
be predicted by using subjective weights.

The correlations between actual judgments and predicted judgments
derived from subjective weights indicated how well a subjectively
weighted linear combination of cue values could predict judges' actual
ratings. The magnitude of these correlations assessed the adequacy of
the linear model using subjective weights. The squared values of these
correlations represented the variance accounted for by a linear model
based on subjective weights.

Research Hypothesis

 

It was hypothesized that there was a positive relationship between
a committee member's actual judgments and judgments generated from
subjective weights.

Statistical Hypothesis:

 

Ho: EYsYsub = 0
H1: IYsYsub > 0
Where 5_ = product moment correlation
Y5 = actual judgments
(sub = predicted judgments generated from subjective weights
Results

The correlations between judges' actual ratings and ratings gen-
erated from subjective weights (:YsYsub) are presented in Table 4.3.
Also shown are the squared values for each of the correlations (3?).
These values are a measure of how much variance can be accounted for

by a linear model employing subjective weights.

86

Table 4.3
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS

 

 

 

 

AND JUDGMENTS GENERATED FROM SUBJECTIVE WEIGHTS (rYsYsub)
Correlated Data (N=30) Orthogonal Data (N=30)
. 2 A 2
“"99 LYsYsub 3'— Evsvsub 5
# 1 .97* .94 .93* .86
# 2 .94* .88 .80* .64
# 3 .94* .88 .82* .67
# 4 .96* .93 .94* .88
# 5 .92* .85 .78* .61
# 6 .95* .89 .87* .76
# 7 .94* .89 .71* .51
# 8 .92* .85 .80* .64
# 9 .95* .90 .95* .89
#10 .90* .81 .86* .74
#11 .91* .82 .70* .49
#12 .84* .70 .80* .65
#13 .93* .87 .83* .68
#14 .93* .86 .89* .78
#15 .95* .90 .85* .72
Median .94 .88 .83 .68
Mean .93* .86 .85* .73
Range .84 to .97 .70 to .94 .70 to .95 .49 to 89

 

 

 

* p < .001

87

For the correlated data set, the median correlation between
actual judgments and judgments generated from subjective weights was
.94. The range was from .84 to .97. The mean was .93. All correla-
tions were significant (g_< .001). The r? values ranged from .70 to
.94 with a median of .88 and a mean of .86. On the average, a linear
model based on subjective weights accounted for 86% of the variance in
the prediction of actual judgments.

The median correlation between actual judgments and judgments
generated from subjective weights was .83 for the orthogonal data set.
The range was from .70 to .95 with a mean correlation of .85. All
correlations were significant (p_< .001). The E? values ranged from
.49 to .89 with a median of .68 with a mean of .73. On the average,

a linear model based On subjective weights accounted for 73% of the
variance in the prediction af actual judgments.

All of the correlations were transformed to Z_scores. Both
mean correlations were significantly different from zero (for the cor-
related data set, t_= 31.47, g_< .001; for the orthogonal data set,
t_= 16.89, p_< .001). The 95% confidence intervals established for
the mean correlations were from .92 to .95 (correlated data) and from
.80 to .89 (orthogonal data).

Based upon these findings, the null hypothesis was rejected and
the alternate hypothesis, EYsYsub > O, was accepted. There were
significant positive correlations between actual judgments and judg-
ments generated from subjective weights.

A Spearman rank-order correlation was run to examine any rela-
tionship between judgments and weights. This analysis examined how
correlations between objective and subjective weights were related to

correlations between actual judgments and judgments derived from

88

subjective weights. For example, does a low correlation between
objective and subjective weights mean that there will be a low cor-
relation between actual judgments and judgments derived from sub-
jective weights? A rank-order correlation showed that there was no
significant relation between the correlations of objective and sub-
jective weights and the correlations between actual judgments and sub-
jectively derived judgments (rho = .07).

Knowing the correlation between a judge's objective and subjective
weights tells little about the relation between that judge's actual
ratings and ratings derived from subjective weights. Thus, to draw
conclusions based on one performance criterion (i.e., the correlation
between objective and subjective weights) would again ignore other
conclusions that may arise from another performance criterion (i.e.,
the relationship between actual judgments and judgments derived from
subjective weights).

Discussion

 

These findings of significant positive correlations between actual
judgments and subjectively generated judgments were consistent with the
research examining the use of linear models. However, these correla-
tions were slightly higher than the correlations that have been reported
in modeling a judgment task with subjective weights. The following
mean correlations have been reported: from .84 to .87 (Cook and
Stewart, 1975); .77 (Martin, 1957) and .60 (Summers et al., 1970).
Again, determining if the mean correlation was significantly different
from zero was not as important as determining the confidence intervals
around the mean correlations. For the correlated data set, the 95%

confidence interval of .92 to .95 (between 85% and 90% of the variance

89

accounted for) was established around the mean correlation. For the
orthogonal data set, the 95% confidence interval from .80 to .89
(between 64% and 79% of the variance accounted for) was established
around the mean correlation.

By establishing these confidence intervals it was possible to see
if the previously reported mean correlations fell within these inter-
vals. The mean correlations reported by Cook and Stewart fell within
the interval established for the orthogonal data. The problem was that
the task employed by Cook and Stewart did not involve the use of
orthogonal data. The reasons why the correlations were so high in this
study may be due again to the four situations that applied to a linear
model using objective weights. The only difference between a linear
model using objective weights and a linear model using subjective
weights was the weights themselves. The generated judgments were
computed in an identical manner. Only the weights themselves were
changed. The subjective weights worked so well because the predictor
variables were monotonically related to the criterion and because the
entire task environment was one that was familiar to all judges. In
this study, experienced judges were making ratings based on familiar
data in a non-threatening environment. They were not asked to make
any predictions of success. There were no right or wrong answers.
Their tasks were to rate each applicant on some scale of acceptability
and to report their subjective importance weights. These tasks were
not demanding or novel for the judges. When the performance cri-
terion was the relation between actual judgments and a set of judgments
arrived at by using subjective weights, the use of subjective weights

was highly significant.

90

Relation between Predicted Judgments Generated

 

from Objective and Subjective Weights

 

The fourth research question concerned the relation between pre-
dicted judgments generated from objective weights and predicted judg-
ments derived from subjective weights. This research question dealt
with still another performance criterion. It was not concerned with
the weights themselves or the correlations between actual ratings and
predicted ratings. Rather, the analysis compared the predicted values
derived from both weighting schemes. How do judgments derived from
subjective weights compare with judgments generated from objective
weights?

The correlations between judgments generated from objective
weights and judgments derived from subjective weights indicated the
agreement between the two sets of predicted judgments. This performance
criterion examined and compared the output (i.e., the predicted judg-
ments) of two linear models.

Research Hypothesis

 

It was hypothesized that there was a positive relation between
judgments generated from objective weights and judgments derived from
subjective weights.

Statistical Hypothesis

 

Ho: rYoijsub = 0
”i‘ LYoijsub > 0
Where 3_ = product moment correlation
§obj = judgments generated from objective weights
Y = judgments generated from subjective weights

sub

91

Results

The correlations between judgments generated from objective
weights and judgments generated from subjective weights (:Yoijsub)
are shown in Table 4.4. Also shown are the squared values of these
correlations. For the correlated data set, the median correlation
between judgments generated from objective weights and judgments gen-
erated from subjective weights was .99. The range was from .98 to
.99. The mean correlation was .99. All correlations were significant
(p_ < .001). The :2 values ranged from .96 to .98 with a median of .98
and a mean of .98. On the average, judgments based on subjective
weights accounted for 98% of the variance of judgments generated from
objective weights.

The median correlation between judgments generated from objective
weights and judgments generated from subjective weights was .95 for
the orthogonal data set. The range was from .79 to .99. The mean
correlation was .95. All correlations were significant (p_< .001).
The r? values ranged from .62 to .98 with a median of .90 and a mean
of .90. On the average, judgments based on subjective weights
accounted for 90% of the variance of judgments generated from
objective weights.

All correlations were transformed to Z_scores so that the mean
correlations could be tested for significance. Both mean correlations
were significantly different from zero (for the correlated data,
t_= 58.65, g_< .001; for the orthogonal data, t_= 14.27, g_< .001).
The 95% confidence interval established around these mean correlations

were from .98 to .99 (correlated data) and from .92 to .97 (orthogonal
data).

92

Table 4.4

CORRELATIONS BETWEEN COMMITTEE MEMBERS'JUDGMENT GENERATED

 

 

 

 

FROM BOTH OBJECTIVE AND SUBJECTIVE WEIGHTS (rYoijsub)
Correlated Data (N=30) Orthogonal Data (N=30)
,. 7 2 A A 2
““93 LYoijsub 1 [Yoijsub 1
# 1 .99* .98 .99* .98
# 2 .99* .98 .98* .96
# 3 .98* .96 .89* .79
# 4 .99* .98 .99* .98
# 5 .98* .96 .87* .76
# 6 .99* .98 .98* .96
# 7 .99* .98 .81* .66
# 8 .99* .98 .92* .85
# 9 .99* .98 .97* .94
#10 .99* .98 .93* .86
#11 .98* .96 .79* .62
#12 .98* .96 .93* .86
#13 .99* .98 .97* .94
#14 .99* .98 .97* .94
#15 .98* .96 .95* .90
Median .99 .98 .95 .90
Mean .99* .96 .95* .90
Range .98 to .99 .96 to .98 .79 to .99 .62 to .98

 

 

 

* p < .001

93

Based upon these analyses, the null hypothesis was rejected and
the alternate hypotheSis, KYoijsub > O, was accepted. There were
significant positive correlations between judgments generated from
objective weights and judgments derived from subjective weights.

Discussion

 

The finding of significant positive correlations between objec-
tively generated judgments and subjectively derived judgments is new
to judgment research. As Schmitt and Levine (1977) pointed out, no
published studies comparing the two types of systems (objective and
subjective weights) have correlated the predicted values. The focus
of this research question was the correlation of these predicted
values.

When accuracy was defined in terms of this performance criterion,
the subjective weights were extremely effective. For the correlated
data, there was almost a perfect correlation between the two sets of
predicted jUdgments. A full 98% of the variance was accounted for
by subjectively predicted judgments when compared with objectively
predicted judgments.

The greatest feature of using this performance criterion rested
with the fact that it allowed another comparison of the two weighting
schemes. This criterion demonstrated the robustness of the linear
model in terms of predicted judgments. The only thing that was dif-
ferent in the predictions of judgments was the weights themselves.
Even though different weights were used, similar predicted judgments
resulted.

The conclusion that subjectively derived judgments could account
for a significant amount of variance in the prediction of objectively

generated judgment under two data conditions was accepted.

94

Relation between Actual Judgments and Judgments

 

Generated from Objective and Subjective Weights

 

The fifth research question asked whether there was a greater rela-
tionship between actual judgments and judgments generated from objective
weights than there was between actual judgments and judgments generated
from subjective weights. This research question concerned the choice
of one weighting scheme over the other. The performance criterion
dealt with assessing which weighting scheme predicted actual judgments
more accurately.

The correlations between actual judgments and judgments generated
from objective weights were compared with the correlations between
actual judgments and judgments generated from subjective weights.

These comparisons were concerned with finding the better of the two
models.

Research Hypothesis

 

It was hypothesized that there was a greater correlation between
actual judgments and objectively generated judgments than there was
between actual judgments and subjectively generated judgments.

Statistical Hypothesis

 

Ho: :YsYobj = EYsYsub

“1‘ rvsiobj ’ rrsisub

Where r_ = product moment correlation
Y5 = actual judgments
(obj = judgments generated from objective weights
§sub = judgments generated from subjective weights

95

Results

The relations between the correlations of actual judgments with
judgments generated from objective weights (rYsYobj) and the correla-
tions of actual judgments with judgments generated from subjective
weights (rYsYsub) are shown in Table 4.5 (these values have been taken
from Tables 4.2 and 4.3). For the correlated data set, the correla-
tions between actual judgments and objectively predicted judgments
were greater than the correlations between actual judgments and sub-
jectively predicted judgments in nine out of fifteen cases. In the
remaining six cases the correlations were identical. For the orthogonal
data set, the correlations between actual judgments and objectively
predicted judgments were greater than the correlations between actual
judgments and judgments generated from subjective weights for all cases.

All correlations were converted to Z_scores using Fisher's :_to Z_
transformation so mean correlations of both weighting schemes could be
compared. A paired t-test was run to see if there was a difference
between the mean correlations. For the correlated data set, there was
a significant mean difference (t_= 3.59; p_< .001). For the orthogonal
data set, there was also a significant mean difference (t_= 5.71;
E.‘ .001).

It was decided to compare the two models after the objective
model was corrected for shrinkage. Since a linear model using regres-
sion (objective) weights capitalized on chance (Cook and Stewart,
1975), a more meaningful comparison was between the subjective
weighting model and the objective weighting model corrected for
shrinkage. The correlations between actual judgments and judgments

generated from objective weights would show some shrinkage if the

96

Table 4.5
CORRELATIONS COMPARING COMMITTEE MEMBERS' OBJECTIVE WEIGHTING
MODELS WITH SUBJECTIVE WEIGHTING MODELS (r ‘ with r “ )

 

 

 

 

—YsYobj -YsYsub
Correlated Data (N=30) Orthogonal Data (N=30)
Judge LYsYobj LYsYsub LYsYobj LYsYsub
# 1 .97 = .97 .94 > .93
# 2 .94 = .94 .82 > .80
# 3 .96 > .94 .91 > .82
# 4 .97 > .96 .95 > .94
# 5 .94 > .92 .90 > .78
# 6 .95 = .95 .89 > .87
# 7 .96 > .94 .87 > .71
# 8 .93 > .92 .87 > .80
# 9 .95 = .95 .97 > .95
#10 .90 = .90 .92 > .86
#11 .92 > .91 .89 > .70
#12 .85 > .84 .86 > .80
#13 .93 = .93 .85 > .83
#14 .94 > .93 .91 > .89
#15 .96 > .95 .90 > .85
Median .94 .93 .89 .83
Mean .94 > .93 .91 > .85
Range .85 to .97 .84 to .97 .82 to .97 .70 to .95

 

 

 

97

objective (regression) weights were applied to a new sample. This
amount of shrinkage would depend upon the new number of predictor
variables and number of applicants to be rated. On the other hand,
the correlations between actual judgments and judgments derived from
subjective weights would not show any shrinkage because the subjective
weights were not estimated. Recall that this research question used
the performance criterion of correlations between actual judgments and
predicted judgments. The following modified formula provided an
estimate of the shrunken squared correlation between actual judgments
and judgments generated from objective weights (Cohen and Cohen, 1975):
*2

" = - - 2 " .-
3stij 1 (' r—YsYobj)-n—"-:k—1—7—

.2 7 = .
Where E-YsYobj new shrunken squared correlation
2 7 = - .
r—YsYobj squared correlation between actual judgments and
judgments generated from objective weights

(multiple r_squared)

[:5
ll

new number of applicants

k_ new number of predictor variables

Fischer's [_to Z transformations were again employed so that
paired t-tests could be run. These tests examined the differences
between the subjective weighting scheme and the objective weighting
scheme corrected for shrinkage. For the correlated data, there was no
difference between the models (t_= -l.23). For the orthogonal data
set, there was a significant difference (t_= 3.61; g_< .01). For
the correlated data, subjective weights performed as well as objec-

tive weights but the objective weights performed better than the sub-

jective weights for orthogonal data.

98

Based upon these findings, the null hypothesis was not rejected
for the correlated data. There were no significantly greater relations
between actual judgments and judgments generated from objective weights
than there were between actual judgments and judgments generated from
subjective weights. However, for the orthogonal data, the null
hypothesis was rejected, and the alternate hypothesis, EYsYobj >
[YsYsub’ was accepted. There were significantly greater relations
between actual judgments and judgments generated from objective weights
than there were between actual judgments and judgments derived from
subjective weights.

Discussion

 

The result that there were significant differences between objec-
tive and subjective weights in the orthogonal data set was congruent
with previous research. However, the differences that have been
typically reported were not as small as the differences reported by
this study. In fact, the differences found in this study, although
statistically significant, were not practically significant. For the
correlated data set, subjective weights performed as well as the
objective weights in six out of fifteen instances. For any one judge,
the difference between the models was minimal. For all practical
purposes, the two models were identical. This was not the case,
though, in the orthogonal data. The objective weighting model was
clearly superior to the subjective weighting model.

Similar results were found by Cook and Stewart (1975) when they
compared the proportion of the variance accounted for by using sub-
jective weights with the maximum proportion accounted for by using

regression weights. Their comparison was the mean 3? between actual

99

judgments and judgments generated from subjective weights divided by

the mean 3? between actual judgments and judgments generated from
objective weights. This ratio represented the proportion of the
variance accounted for by using subjective weights compared to the
maximum proportion of the variance that was accounted for by a

linear model based on objective weights. They reported values ranging
from .68 to .95 (i.e., between 68% and 95% of the variance was accounted
for by using subjective weights). Hence, the subjective weights were
highly accurate.

This study reaffirms that when the performance criterion was the
correlation between actual judgments and predicted judgments, the use
of subjective weights was highly effective. Subjective weights per-
formed as well as objective weights for correlated data but dropped off

in effectiveness in the orthogonal data set.

Four Additional Weighting Models

 

After each committee member's judgment policy had been modeled by
using objective and subjective weights, four additional models were
examined. They were: (1) unit weights, (2) random ratings, (3)
average weights and (4) equal weights. These models were constructed
to examine the robustness of the linear model in predicting judges'
actual ratings. The use of these models paralleled the study done by
Dawes and Corrigan (1974) but with one major difference. They con-
structed models that were used to predict criterion values, not
ratings. This study examined the prediction of judgments, since‘
criterion values of applicant quality were unavailable.

Since one of their conclusions was that deviations from optimal

weighting do not make much practical difference, it was decided to

100

examine whether different weighting schemes made any difference in the
prediction of actual judgments. The additional models were chosen to
examine this question. The unit weights and random ratings were

based on some information about each individual judge. The average
and equal weights were based on some information about the committee
as a group. No weighting scheme required that any additional informa-
tion be collected from the judges. Rather, all were based on existing
data and information, and so it was possible to examine which model
was most effective in predicting judges' actual ratings. Thus, the
performance criterion was the correlation between actual judgments

and judgments derived from the respective weights of the models (Figure
4.2). The results of each analysis are presented separately and then
discussed jointly.

Unit Weights

 

A popular model with judgment researchers has been unit weights.
A variety of contexts have been examined in which unit weights have
done well. The work of Dawes and Corrigan (1974) and Einhorn and
Hogarth (1975) typifies this research. These researchers concluded
that unit weights may be superior to optimal weights. Other investi—
gators have found that unit weights perform as well as optimal weights
when the weights were cross-validated (Trattner, 1963; Schmidt, 1971).

The major conclusion drawn from this research was that a simple
additive model represented how judges operated. However, this con-
clusion has resulted typically from studying the relationship between
cues and criterion values. Since this study was concerned with the
relationship between cues and judgments, it was decided to test

whether this conclusion was valid.

101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CUES
X1 Committee Member's
Actual Judgments = YS //
x \
2
Predicted Judgments 7
X3 USing Unit Weights - Yunit "'rYsYunit
x4 Predicted Judgments 7
USing Average Weights = Yaverage 7 "E¥sYaverage 7
Predicted Judgments 7
USing Average Weights = quual 7'5Y5quual
Random Ratings for l
Each Committee Member = Yrand "'EYsYrand

 

 

 

Figure 4.2
UNIT WEIGHTING, RANDOM RATINGS, AVERAGE WEIGHTING
AND EQUAL WEIGHTING MODELS

102

The unit weighting scheme consisted of ratings predicted for
each judge that were computed by using +1 or -1 as weights. The
unit weights were used as if they were beta weights, and the pre-
dicted ratings were obtained by multiplying unit weights by the non-
standardized values of the cues. These predictions were correlated
with the actual judgments of the committee members (EYSYunit)' The
squared values of these correlations (3?) represented the variance
accounted for by a linear model based on unit weights (Table 4.6).

For the correlated data set, the median correlation between
actual judgments and judgments generated from unit weights was quite
high (.88). The range was from .80 to .95 with a mean correlation
of .89. All correlations were significant (p_< .001). For the
orthogonal data set, the median correlation was .64. The range was
from .30 to .81. The mean correlation was .62. Fourteen out of
fifteen correlations were significant (g_< .01).

All correlations were transformed to Z_scores so that the mean
correlations could be tested for significance. Both mean correla-
tions were significantly different from zero (for the correlated
data, t_= 26.39, p_< .001; for the orthogonal data, t_= 11.93,
p_< .001). The 95% confidence intervals established around the mean
correlations were from .87 to .92 (correlated data) and .54 to .70
(orthogonal data). An obvious conclusion from Table 4.6 is that the
unit weight model is better with correlated data than with orthogonal
data.

Random Ratings,

 

A set of random ratings was used to determine if actual judg-

ments could be predicted from knowledge about judge's frequency

103

Table 4.6
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS

 

 

 

 

 

 

 

AND JUDGMENTS GENERATED FROM UNIT WEIGHTS (—YsYunit)
Correlated Data (N=30) Orthogonal Data (N=30)
, 2 7 2
Judge r-YsYunit 5- EYsYunit -£
# 1 .95* .90 .78* .61
# 2 .88* .77 .53* .28
# 3 .87* .76 .48** .23
# 4 .92* .85 .68* .46
# 5 .91* .83 .72* .52
# 6 .93* .86 .81* .66
# 7 .95* .90 .64* .41
# 8 .87* .76 .70* .49
# 9 .84* .71 .30 .09
#10 .88* .77 .71* .50
#11 .88* .77 .60* .36
#12 .80* .64 .39** .15
#13 .90* .81 .56* .31
#14 .86* .74 .45** .20
#15 .93* .86 .73* .53
Median .88 .77 .64 .41
Mean .89* .79 .62* .38
Range .80 to .95 .64 to .90 .30 to .81 .09 to .66
* p < .001

** p < .01

104

distributions. That is, each judge had a frequency distribution cor—
responding to the number of times a rating was used. This distribution
provided very limited information about each judge. If actual judg—
ments could be predicted by this limited information, the use of more
elaborate models was questionable. These ratings provided a baseline
model for each judge.

The random rating scheme consisted of a set of randomly assigned
ratings (Yrand) that was based on each judge's frequency distribution
of actual judgments. Instead of simply randomly assigning a rating
from 1 to 7 to an applicant, a set of ratings that corresponded to
each judge's own frequency distribution was assigned randomly. These
random ratings were correlated with the actual ratings of the judges
(IYsYrand)' The squared values of these correlations (3?) represented
the variance accounted for by the use of random ratings (Table 4.7).

For the correlated data set, the median correlation between
actual judgments and random judgments was -.08. The range was from
-.35 to .18. The mean correlation was -.O8. Only one out of fifteen
correlations was significant (p_< .01). For the orthogonal data set,
the median correlation was -.05. The range was from -.30 to .39. The
mean correlation was -.01. Again, only one out of fifteen correlations
was significant (p_< .01).

All correlations were converted to Z_scores so that the mean cor-
relations could be tested for significance. Neither mean correlation
was significant (for the correlated data, t_= -2.37; for the ortho-

gonal data, t_= -.23).

105

Table 4.7

CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS

AND RANDOMLY GENERATED JUDGMENTS ([

YsYrand)

 

 

Correlated Data (N=30)

Orthogonal Data (N=30)

 

 

 

 

 

2 2
JUdge rrsvrand 3- Eisvrand -5
# 1 -.16 .03 .03 .00
# 2 -.19 .04 .05 .00
# 3 .09 .01 .10 .01
# 4 -.17 .03 .39** .15
# 5 .18 .03 -.00 .00
# 6 -.18 .03 —.21 .04
# 7 -.08 .01 -.14 .02
# 8 -.08 .01 .08 .01
# 9 -.06 .00 .20 .04
#10 -.12 .01 .29 .08
#11 .08 .01 .Ol .00
#12 .03 .00 .01 .00
#13 -.05 .00 -.1O .01
#14 -.35** .12 -.07 .00
#15 -.21 .04 -.30 .09
Median -.08 .Ol .05 .00
Mean -.08 .Ol .01 .00
Range -.35 to .18 .00 to .12 .30 to .39 .00 to .15
** p < .01

106

Average Weights

 

An average weighting scheme allowed the comparison of an individ-
ual judge with a composite or average judge. This average weighting
scheme reflected the judgment policy of the committee. If an average
weighting model performed quite well, then the use of individual
weighting models may be called into question.

The average weighting scheme consisted of predicted judgments
obtained by using objective (regression) weights computed from the
average rating given to each applicant. The average ratings given
the thirty applicants are shown in Table 4.8. These average ratings
were treated as the dependent variable in a multiple regression
analysis which yielded a set of average objective (regression) weights.
These in turn were used to compute a set of predicted judgments.

These predictions were correlated with the actual judgments of individ-

ual committee members (5_ ). The squared values of these cor-

YsYaverage
relations (3?) represented the variance accounted for by a linear
model based on average weights (Table 4.9).

For the correlated data set, the median correlation between actual
judgments and judgments generated from average weights was .92. The
range was from .84 to .97. The mean correlation was .93. All cor-
relations were significant (p_< .001). For the orthogonal data set,
the median correlation was .81. The range was from .62 to .95. The
mean correlation was .83. All correlations were significant (p_< .001).

The mean correlations (in terms of 7 scores) were tested for
significance. Both mean correlations were statistically significant

(for the correlated data, t_= 29.37, g_< .001; for the orthogonal data,
t_= 15.99, p_< .001). The 95% confidence intervals for the mean

107

Table 4.8
MEAN RATINGS GIVEN TO EACH APPLICANT

637465744655445565655675675457

 

e
g 00000000000OOOOOOOOOOOOOOOOOOO
M tttttttttttttttttttttttttttttt
a
t R 3.15122511121111131430022152147.1100
a
D
1m . 30893503113204956263329640394]
m D 87691878911991605498855718920%
Ma S ._I. alt-II 1|. 1|. 1 «I I] a
In
t
ml
0 m.
:I 330770730707000073073303330003
a 5788402544462006890833257114687
R 4.152446233903234343533463353224
x
e 476735367456557777375727644345
g 000000000000000000000000000000
m tttttttttttttttttttttttttttttt
a R 133513133135313555161515311123
t
a
D
d D 835334402251426363065053803702
m (w 810687799165729656820736698777
a x 1|] 1|. 1| «I
1
e
r
r
rw m.
.1. 737037037000073000333730033073
.«M 256658634606607486993214835803
R 2446131452453346561626116421134
x

 

 

 

Applicant

 

123456789

012345678901234567890
111111111122222222223
##############################

 

108

Tab1e 4.9
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS
AND JUDGMENTS GENERATED FROM AVERAGE WEIGHTS (r “ )

 

 

 

 

—YsYaverage
Correlated Data (N=30) Orthogonal Data (N=30)

7 2 7 2
Judge rYsYaverage E. rYsYaverage ~£
# 1 .96* .92 .91* .83
# 2 .93* .86 .77* .59
# 3 .94* .88 .86* .74
# 4 .97* .94 .95* .90
# 5 .91* .83 .74* .55
# 6 .94* .88 .84* .71
# 7 .94* .88 .86* .74
# 8 .92* V .85 .85* .72
# 9 .92* .85 .81* .66
#10 .88* .77 .62* .38
#11 .90* .81 .70* .49
#12 .84* .71 .71* .50
#13 .92* .85 .75* .56
#14 .92* .85 .84* .71
#15 .95* .90 .88* .77
Median .92 .85 .81 .66
Mean .93* .86 .83* .69

Range .84 to .97 .71 to .94 .62 to .95 .38 to .90

 

 

 

* p < .001

109

correlations were from .91 to .94 (correlated data) and from .78 to
.87 (orthogonal data).

Equal Weights

 

An equal weighting scheme served as another baseline policy. If
an outside observer wanted to predict actual judgments of a randomly
chosen judge, an equal weighting scheme would suffice if no other
information was known about the judges. If an equal weighting model
predicted actual judgments with high accuracy, then the use of dif-
ferential weighting models is questionable.

The equal weighting scheme consisted of predicted ratings for
each judge that were computed by using equal subjective importance
weights. The equal weights were used if they were regression
weights and the predicted ratings were obtained by multiplying equal
weights by the standardized values of the cues. This weighting scheme
is, in effect, a standardized unit weighting scheme. While the unit
weighting scheme used non-standardized cue values, the equal weighting
scheme used standardized cue values. The predicted ratings that re-
sulted from equal weights were correlated with the actual ratings of

the judges (:_Y The squared values of these correlations (r?)

squual)'
represented the variance accounted for by a linear model based on
equal weights (Table 4.10).

For the correlated data set, the median correlation between actual
judgments and judgments generated from equal weights was .92. The
range of correlations was from .83 to .97. The mean correlation was

.92. All correlations were significant (g_< .001). For the orthogonal

data set, the median correlation was .72. The range was from .55 to

110

Table 4.10
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS
AND JUDGMENTS GENERATED FROM EQUAL WEIGHTS (r_

 

 

 

 

Ysquual)
Correlated Data (N=30) Orthogonal Data (N=30)
7 2 7 2
Judge E-Ysquual 3- rYsYegual :-

# 1 .97* .94 .88* .77
# 2 .92* .85 .67* .45
# 3 .92* .85 .69* .48
# 4 .96* .92 .83* .69
# 5 .91* .83 .63* .40
# 6 .94* .88 .80* .64
# 7 .95* .90 .77* .59
# 8 .91* .83 .79* .62
# 9 .90* .81 .55* .30
#10 .88* .77 .67* .45
#11 .91* .83 .69* .48
#12 .83* .69 .65* .42
#13 .93* .86 .79* .62
#14 .90* .81 .72* .52
#15 .95* .90 .82* .67
Median .92 .85 .72 .52
Mean .92* .85 .74* .55
Range .83 to .97 .60 to .94 .55 to .88 .30 to .77

 

 

 

111

.88. The mean correlation was .74. All correlations were significant
(g_< .001).

The mean correlations (in terms of Z_scores) were tested for
significance. Both mean correlations were significant (for the cor-
related data, t_= 29.38, g_< .001; for the orthogonal data, t_= 18.08,
p_< .001). The 95% confidence intervals for the mean correlations
were from .90 to .94 (correlated data) and from .69 to .79 (ortho-
gonal data).

Comparison of Models

 

Since in both data conditions the mean correlations of the objec-
tive, subjective, unit, average, and equal weighting schemes were
significantly different from zero, it was decided to test for dif-
ferences between the models. The correlations between each judge's
actual ratings and the six weighting schemes are shown in Table 4.11
for the correlated data and in Table 4.12 for the orthogonal data.

All correlations were transformed to ;_scores so that a repeated
measures one-way analysis of variance (ANOVA) could be run. Since the
random model was not significanly different from zero, it was not
included in the repeated measures analysis to avoid introducing obvi-
ous but trivial statistical significance.

For the correlated data set, the differences between the models
were significant (F(4,11) = 30.64, g_< .001). The objective model had
the highest mean correlation, followed by the subjective, average,
equal, and unit weighting model. For the orthogonal data set, the
same order of magnitude was exhibited as the differences between the
models were also significant (F(4,11) = 30.91, g_< .001). See Tables
4.13 and 4.14 for ANOVA results.

112

Table 4.11
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS
AND SIX WEIGHTING SCHEMES

 

 

 

 

Correlated Data (N=30)

Judge EYsYobj, E-YsYsub E~YsYunit EYsYaverage EYsquual rYsYrand
# l 97 97 .95 .96 97 -.16
# 2 94 94 .88 .93 92 -.19
# 3 96 94 .87 .94 92 .09
# 4 97 96 .92 .97 96 -.17
# 5 94 92 .91 .91 91 .18
# 6 95 95 .93 .94 94 -.18
# 7 96 94 .95 .94 95 -.08
# 8 93 92 .87 .92 91 -.08
# 9 95 95 .84 .92 9O -.06
#10 90 9O .88 .88 88 -.12
#11 92 91 .88 .90 91 .08
#12 85 84 80 .84 83 03
#13 93 93 .90 .92 93 - 05
#14 .94 .93 .86 .92 .90 -.35
#15 .96 .95 .93 .95 .95 -.21
Median .94 .93 .88 .92 .92 -.08
Mean .94* .93* .89* .93* .92* -.08

Range .85 to .84 to .80 to .84 to .83 to -.35 to

.97 .97 .95 97 97 .18

 

 

* p < .001

113

Tab1e 4.12
CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS

AND SIX WEIGHTING SCHEMES

 

 

 

 

 

Orthogonal Data (N=30)

Judge r-YsYobj r-YsYsub EYsYunit rYsYaverage EYsquggl I-YsYrand
# l 94 93 .78 .91 88 O3
# 2 82 80 .53 .77 67 - 05
# 3 91 82 .48 .86 69 10
# 4 95 94 .68 .95 83 39
# 5 90 78 72 .74 63 - OO
# 6 89 87 81 .84 80 - 21
# 7 87 71 .64 .86 77 - l4
# 8 87 80 .70 .85 79 O8
# 9 .97 .95 .30 .81 .55 -.20
#10 .92 .86 .71 .62 .67 .29
#11 89 7O .60 7O 69 - Ol
#12 86 80 .39 71 65 Ol
#13 85 83 .56 75 79 - 10
#14 91 89 .45 84 72 - 07
#15 90 85 .73 88 82 30
Median .89 .83 .64 .81 .72 -.05
Mean .91* .85* .62* .83* .74* -.01

Range .82 to .70 to .30 to .62 to .55 to -.30 to

.97 .95 .81 .95 .88 .39

 

 

*p < .001

114

Table 4.13
REPEATED MEASURES ONE WAY ANALYSIS OF VARIANCE FOR
FIVE WEIGHTING SCHEME MODELS FOR CORRELATED DATA

 

 

 

 

Source of Sum of Mean
Variation df Squares Square F .9
Between Judges 14 2.98 .205
Within Judges 60 1.08 .019
Between Models 4 .74 .191 27.34 .001
Residual 56 .34 .007
Total 74 4.07 .054
Table 4.14

REPEATED MEASURES ONE WAY ANALYSIS OF VARIANCE FOR
FIVE WEIGHTING SCHEME MODELS FOR ORTHOGONAL DATA

 

 

 

Source of Sum of Mean
Variation df Squares Square F 7g
Between Judges 14 2.12 .151
Within Judges 60 7.47 .125
Between Models 4 5.14 1.286 30.911 .001
Residual 56 2.33 .041
Total 74 9.59 .129

 

115

From a significant F ratio, one may conclude that the mean cor-
relations are not identical, but cannot determine the location or
magnitude of these differences. Therefore, gg§t_hgg_comparisions were
calculated. Tukey's confidence interval for a one way ANOVA was the
technique employed to examine differences between pairs of mean cor-
relations (Glass and Stanley, 1970).1

The following comparisons were examined between: 1) the objec-
tive and unit weight model; 2) the objective and average weight
model; 3) the objective and equal weight model; 4) the subjective and
unit weight model; 5) the subjective and the average weight model;

6) the subjective and equal weight model; 7) the unit and the equal
weight model; and 8) the unit and the average weight model. The fol-
lowing contrasts were significant: 1) the objective and unit model
(J = .30, g_< .05); 2) the objective and equal model (0 = .15, g_< .05);
3) the subjective and unit model (8 = .24, g_< .05); 4) the unit and
the average model (0 = -.19, g_< .05); and 5) the unit and the equal
model (0 = -.16; p_< .05). See Table 4.15 for the confidence inter-
vals that were placed around these comparisons. If g_fell within
these confidence intervals, the result of that comparison was not
significant. If Q_did not fall within these confidence intervals,
the comparison was significant.

The results of these analyses showed that the objective weighting
model was significantly different from the unit and equal weighting

models, but not from the subjective and average weighting models. The

 

1Comparisons were constructed only for the correlated data set.

TUKEY'S POST HQC_COMPARISONS BETWEEN WEIGHTING SCHEME MODELS

116

Table 4.15

 

 

)

 

Comparison p i) i qJ,N-J (l- 0‘) MSW/n SIG
Objective and
Unit .30 .16 to .44 <.05
Objective and
Average .11 -.02 to .25
Objective and
Equal .14 .01 to .28 <.05
Subjective and
Unit .24 .10 to .37 <.05
Subjective and
Average .05 -.O9 to .18
Subjective and
Equal .08 -.06 to .21
Unit and Equal -.18 -.05 to -.32 <.05
Unit and Mean -.15 -.02 to -.29 <.05

 

 

Where 0 = contrast between two means

qJ,N-J (1")

MS

D.

C.

= value at the (l-<%9 percentile on the
studentized range distribution

mean square within

sample size for each group

absolute value of the coefficient of the means being
3 compared

117

subjective weighting model was significantly different from the unit
weighting model, but not from the objective, average, and equal
weighting models. The unit weighting model was significantly dif-
ferent from the average and equal weighting models.

Discussion

 

The finding of significant differences between the various
weighting schemes paralleled previous research findings. The important
conclusions to be drawn are that: l) the differential weighting models
accounted for significantly more variance than did the unit weighting
models, 2) there were no significant differences among the differ-
ential weighting schemes, and 3) there were significant differences
between the two unit weighting schemes.

These findings should be qualified by the nature of the judgment
task (i.e., a four-cue medical school admissions task) and the use of
group averages to assess differences between models.

Differential vs. Unit Weights

 

The differential weighting models (i.e., the objective, subjective,
and average) accounted for significantly more variance than did the unit
weight or equal weight models. At first glance, this finding appeared
to be at odds with previous research findings, specifically the Dawes
and Corrigan (1974) and Schmidt (1971) research. These researchers
found that unit and equal weighting both did extremely well in pre-
dicting the criterion values. A linear model based on these weights
performed as well as or better than the differential weights. Recall
that Dawes and Corrigan stated that the whole trick was to decide what
variables to look at and then to know how to add. Other researchers

in the field have stated that these unit weighting results suggested

118

that in many decision settings, all the judge needed to know was what
variables to throw into the equation, which direction (+ or -) to
weight them, and how to add (Slovic et al., 1977); or one need not
even go through the laborious process of differential weighting, just
identify the big variables and add (Shulman and Elstein, 1975). The
problem may be that researchers have generalized the Dawes and
Corrigan research beyond what the authors stated or results warranted.
In their study, unit weighting did extremely well in predicting cri-
terion values, not actual judgments. The important relationship was
between the cues and criterion values (i.e., the left hand side of
the lens model). Their research found that unit weights do well on
the left hand side of the lens model.

This study and its conclusions, however, dealt with the rela-
tionship between the cues and actual judgments (i.e., the right hand
side of the lens model). Unit weights did not perform as well as
differential weights on this side of the lens. Cook and Stewart
(1975) found a similar result when they compared unit weights to sub-
jective weights. They showed that the use of subjective (i.e., dif-
ferential) weights resulted in a 12%-l4% increase in variance
accounted for. The differences in this study were not as large
(between 4% and 9%) for the correlated data. However, the differences
were greater for the orthogonal data. That is, unit weights performed
much less effectively than did the differential weights for uncorre-
lated cues.

Differential Weighting Models

 

The finding that there were no significant differences between

the differential weighting models has already been discussed in part.

119

Recall that there were no significant differences between the objective
and subjective weights. Based on additional comparisons of models,

it was found that there were no differences between either of these
two models and an average weighting model. There may be several
reasons why there were no significant differences. First, there was
extremely high agreement among the judges in their rating of the
applicants. This was surprising in light of the fact that the judges
reported using different weighting schemes. In fact, when these sub-
jective weights were examined for agreement among judges, three groups
or types were identified. One group appeared to be weighting MCAT
scores, interview scores, and GPA fairly high, while the personal
statement scores received less weight. Judges 5, l, 3, 6, 15, and 4
represented this group. A second group might be termed personal
qualities weighters. They weighted interview and personal statement
scores quite high. Judges 8, 12, 14, 9, and 2 comprised this group.

A third group might be termed the academic qualities weighters. They
valued GPA and MCAT scores. Judges 13, 10, and 7 are representative
of this group (See Appendix G and H).

These three distinct groups of judges employed different weighting
schemes, and nevertheless showed high agreement in their ratings of
applicants. Part of this high inter-judge agreement is explained by
the data. For the correlated data, different weightings can lead to
the same ratings because of collinearity. Since the data were cor-
related, different weighting schemes mattered little as long as each
variable received some weight. For the orthogonal data, collinearity
was not present. What might have happened there was that judges were

only using some of the information that was presented to them. Their

120

ratings were not based on all the data. It was possible that if an
applicant had a high score on at least one of the independent vari-
ables, he/she received a high rating. In any event, the judges
exhibited high agreement in their ratings of the applicants.

A second reason why the average weighting model did not differ
from the objective or subjective models rested with the fact that the
average model struck a balance between judges. No one judge rated
all the independent variables equally. (Hence, the equal weighting
model did not perform as well.) Each judge used differential weights.
Each variable received some weight but not in equal portions. The
average weighting model reflected this.

Recall also that the average weighting model was based on the
regression (objective) weights generated from the average rating
given to each applicant. Had there been less agreement among judges,
there would have been more variability in the average ratings. The
average weighting model would have become less effective. Thus, high
inter-judge agreement explains why the differential weighting models
were so effective in modeling judges' policies.

Unit vs. Egual Weights

 

There was a significant difference between the unit weighting
model and the equal weighting model. Recall that the former used raw
scores while the latter used standard scores in computing predicted
judgments. If one literally added the variables, one would be using
a unit weighting model. If one standardized the variables and then
added, one would be using an equal (or standardized unit) weighting
model. The equal weighting model provided for control of each vari-

able's variance through the use of standard scores.

121

The results of this study imply that the differences between unit
and equal weight models are a result of measurement procedure rather
than a true difference between the models. That is, the way the
weights were computed made a significant difference in this study.

The weights using standard scores performed better than the weights
using non-standard scores.

Past research has used unit weights with standard scores. A
research question which has not been addressed is, do judges have a
concept of standard scores and use this concept in processing cues?
Based on unit weights using standard scores, conclusions are drawn
which are not only on the wrong side of the lens model but are also
based on what might be a measurement artifact. For if unit weights
were used as in simple addition, they did not perform as well as the
differential weights, even when the unit weights are applied to
standardized scores.

Summar

The purpose of this study was to model and compare how admissions
committee members say they weight information in making judgments
regarding the acceptability of medical school applicants with how
mathematical representations weight the same information.

Each research question was restated as a hypothesis, data were
presented, a statement was made about whether the hypothesis was
rejected or accepted, and findings were discussed. Data were analyzed
primarily by multiple regression and correlation techniques.

Each research question took a step further in determining how
well subjective and objective weights worked in modeling judges'

admissions policies. The first step was to compare (correlate) both

122

objective and subjective weights. The next step examined the cor-
relation between actual judgments and judgments arrived at through
the use of both weights. This step looked at the outcomes of these
weights (i.e., how well did they predict actual judgments). The
third step compared these predicted outcomes (i.e., how much agreement
was there between the predicted outcomes). The fourth step assessed
which weighting scheme predicted the actual judgments most accurately.
The results of the tested hypotheses were:

1) N0 relationship existed between objective and sub-

jective weights. Rejected for both data conditions;

 

2) A positive relationship existed between actual judg-
ments and judgments generated from objective weights.

Accepted for both data conditions;

 

3) A positive relationship existed between actual judg-
ments and judgments derived from subjective weights.

Accepted for both data conditions;

 

4) A positive relationship existed between the judg-
ments generated from both objective and subjective

weights. Accepted for both data conditions;

 

5) There was a greater relationship between actual
judgments and judgments generated from objective
weights than there was between actual judgments
and judgments derived from subjective weights.

Rejected for the correlated data, accepted for the

 

orthogonal data.

 

When comparisons were made between objective and subjective

weights, it was shown that there was a high correlation between the

 

123

two weighting schemes. This result lent support to the hypothesis that
judges can relate their subjective importance weights and that these
weights are related to objective weights. However, this correlation
was based on an g_of four (cues) and should be interpreted cautiously.

Additional comparisons were made to examine further the relation
between these two weighting schemes. These results were consonant
with the research that has shown linear models to be good approxima-
tions in many decision making situations. In addition, this study
showed that subjective weights were as effective as objective weights
in predicting actual judgments with correlated data. Note, however,
that the effectiveness of subjective weights decreased when the data
were orthogonal.

When the outcomes or predicted judgments generated from these
weights were compared, there was extremely high agreement. Judgments
derived from both weighting schemes were highly correlated.

This study concluded that subjective weights were an effective
model of how committee members say they weight information in making
judgments about medical school applicants. This conclusion resulted
from many comparisons. However, boundary conditions were established
from two data sets. This conclusion was valid for correlated data,
but weakened for orthogonal data. Subjective weights lost their
effectiveness when applied to orthogonal data.

Having established the comparisons between objective and sub-
jective weights, concerns centered on alternative weighting schemes.
Therefore, additional analyses examined the effectiveness of four
alternative weighting models (i.e., unit weights, random ratings,

average weights and equal weights) in capturing judges' policies.

124

The results showed that:

1) There were significant differences between the various

weighting scheme models;

2) The differential weighting models (i.e., objective,

subjective, and average) accounted for significantly
more variance than did the unit weighting models
(i.e., unit and equal);

3) There were no significant differences between the

differential weighting models;

4) There were significant differences between the unit

weighting models.

The finding of significant differences between differential and
unit weighting schemes seemed at first blush to be at odds with
previous research. However, that research examined a different judg-
ment task than the one in this study. This research studied the
relationship between cues and judgments not between cues and criteria.
Unit weights were not as effective as differential weights in pre-
dicting judgments. This difference was greatest for orthogonal data.
It was not as large for correlated data. The unit weights lost their
effectiveness under orthogonal conditions.

Another result showed that there were no differences between the
differential weights. It was shown previously that there were no
significant differences between the objective and subjective weights.
Since there was such high inter-judge agreement, a differential
weighting model based on this agreement was quite successful. Thus,
all three differential models were quite successful in predicting

actual judgments.

125

The finding of significant differences between unit weighting
models pointed to the importance of examining how judgments were
computed. That is, a simple unit weighting model just added the four
independent variables to generate predicted judgments. A more
advanced model standardized the independent variables and then added
them. The simple model was less effective than the standardized model.
If a judge used a simple additive model, judgments could be predicted
from the independent variable with the greatest variance. Therefore,
how judgments were computed made a difference in the success of pre-

dicting actual judgments.

CHAPTER V

CONCLUSION

In this chapter 1) a summary of the study and its findings are
presented; 2) limitations are examined; 3) implications for researchers
are presented and 4) future research is recommended.

Summar

Medical school admissions corrmittees are charged with the task of
selecting applicants for their entering classes. This task involves
examining various admissions criteria, determining their importance
and making judgments based on these criteria. Committee's definitions
of quality reflect how they weight information in making judgments
about the acceptability of applicants. Thus, quality involves the
selection and weighting of variables in order to make judgments.

When the issue of quality is examined, it becomes apparent that
there is little or no consensus. Committee members have different
conceptions of quality. Yet, they must make decisions about the
acceptability of applicants based on some conception of quality.

This decision-making process takes place in an environment
riddled with controversy. Problems range from making medical schools
representative of the socioeconomic and racial components of the gen-
eral population to meeting society's health care needs. Understanding
how judgments are made (specifically, how information is weighted),
allows one to infer what is meant by quality. This understanding
lays the needed groundwork for communication among admissions com-

mittee members.

126

127

The means available to examine the issue of quality and how
admissions committees weight information emerge in part from psycho-
logical research in the areas of clinical judgment and decision
making which has been concerned with how to model or characterize
judgments or decisions of clinicians. This modeling attempts to
explain how clinicians use information to reach judgments or decisions.

A problem is that some judgment research has shown that judges
cannot estimate accurately their combination and weighting rules.
Serious discrepancies often exist between jduges' subjective and
objective (mathematical) weighting schemes. Thus, what judges relate
about their weighting schemes is often regarded as invalid.

However, another body of this research implies that judges can
relate what they are doing when making decisions. This research
accepts the use of self report and introspection as measures to assess
the judgment process. The use of subjective (self report) weights is
a valid area to be investigated.

Therefore, the purpose of this study was to model and compare
how medical school admissions committee members say they weight infor-
mation in making judgments regarding the acceptability of applicants
with how mathematical representations weight the same information.

The importance of exercising sound judgment in the selection of
medical school applicants was apparent. Yet when the literature on
medical school admissions was examined, there was little or no
convergence with the literature on judgment and decision making. The
time was ripe for these two bodies of research to interact.

A review of medical school admissions showed that policies and

procedures changed drastically throughout the history of this country.

128

Requirements have gone from minimal to elaborate. Current requirements
are quite stringent, with grade point average, MCAT scores and personal
interviews being the primary admissions variables. Other important
admissions variables include autobiographical (personal) statements,
letters of evaluation and extracurricular activities.

A severe strain is placed on the admissions committee as it attempts
to process these admissions variables for a diverse applicant pool.
Problems arise from identifying, measuring and evaluating important
admissions criteria; processing applicants efficiently; selecting the
most qualified applicants; minimizing the financial, academic and
emotional costs of the process; and assisting rejected applicants in
assessing their career goals. A key first step in addressing potential
solutions to these problems is to examine how committee members say
they weight admissions variables when making judgments about the
quality of medical school applicants. Admissions provides a rich
content area to explore the judgments of committee members.

The judgment research has shown that tasks requiring the inte-
gration and combination of information to reach a judgment is best
performed actuarially (i.e., by routine application of explicit rules).
A rule-based procedure is superior to a case-by-case procedure for
such tasks. Another finding of this research is that many kinds of
decision makers (e.g., psychologists, stock brokers, radiologists)
have been modeled successfully by linear models. These models have
performed as well as or even better than more complex non-linear
models (e.g., configural). This research has also shown that a linear
model of a judge is often a better predictor of actual judgments than
the judge from which the model was derived. This has been termed the

bootstrapping effect.

129

The few studies done on modeling a judgment policy with sub-
jective weights have shown that promising work lies ahead. It was
shown that these weights were effective models and warranted further
research. Subjective weights provide a means of examining how judges
say they weight information when making judgments. Questions of
interest are whether these weights are related to the typical weights
of the linear model (i.e., regression weights). Are these weights
effective in a linear model? Are these weights useful in predicting
judgments? With these questions in mind, this study examined the
relations between subjective (self-report) and objective (regression)
weights.

Two testing sessions, one using correlated (representative) admis-
sions data, the other using orthogonal (non-representative) admissions
data, were required to achieve the purpose of this study. Each data
set contained the same introduction, instructions and description of
independent variables (i.e., GPA, MCAT scores, personal statement
scores and interview scores). The committee members' task was twofold:
1) rate each of the applicants (40 total) on an acceptability scale
and 2) report the subjective importance that was attached to each of
the four independent variables.

Once these tasks were completed, the following data were col-
lected or developed for each committee member:

1) Objective and subjective weights;

2) Actual judgments and judgments generated from objective

and subjective weights.

This information was used to test the following hypotheses:

130

1) No relation existed between objective and subjective
weights;

2) A positive relation existed between actual judgments
and judgments generated from objective weights;

3) A positive relation existed between actual judgments
and judgments generated from subjective weights;

4) A positive relation existed between the judgments
generated from both objective and subjective weights;

5) There was a stronger relation between actual judgments
and judgments generated from objective weights than
between actual judgments and judgments generated from

subjective weights.

Data were collected and analyzed using correlation techniques,

multiple regression, paired t-tests, repeated measures one-way

analysis of variance and post hgg_comparisons. The results of the

hypotheses showed that:

l) A significant positive relationship existed between
objective and subjective weights, for both data
conditions;

2) A significant positive relationship existed between
actual judgments and judgments generated from ob-
jective weights, for both data conditions;

3) A significant positive relationship existed between
actual judgments and judgments generated from sub-
jective weights, for both data conditions;

4) A significant positive relationship existed between
the judgments generated from both objective and

subjective weights, for both data conditions;

131

5) For the correlated data, there was not a significantly
greater relation between actual judgments and ob-
jectively generated judgments than there was between
actual judgments and subjectively generated judgments.
However, for the orthogonal data, there was a signifi-
cant difference between the correlation of actual judg-
ments with objectively generated judgments and the
correlation of actual judgments with subjectively gen-
erated judgments.

This study concluded that subjective weights were an effective
weighting scheme in modeling how committee members said they utilized
information when making judgments about the acceptability of medical
school applicants. This conclusion resulted from many comparisons:
from the weights themselves to the outcomes arrived at from these
weights. However, boundary conditions were established from two data
sets. Subjective weights were more effective for correlated data
than they were for orthogonal data. Subjective weights proved to be
a valid measure to model an admissions judgment task under correlated
data conditions.

Once the comparisons between objective and subjective weights
were made, additional concerns arose centering on the use of alterna-
tive weighting models. Four additional weighting schemes were
examined: (1) unit weights, (2) random ratings, (3) average weights
and (4) equal weights. This necessitated developing the following
data:

1) Unit weights, random ratings, average weights and

equal weights;

132

2) Judgments generated from unit weights, random ratings,

average weights and equal weights.
Comparisons were made between these four weighting schemes and
the objective and subjective models.
Analyses showed that:
1) There were significant differences between the six
models;

2) The differential weighting models (i.e., objective,
subjective and average) accounted for significantly
more variance than did the unit weighting models
(i.e., unit and equal);

3) There were no significant differences between the

differential weighting models;

4) There were significant differences between the unit

weighting models.

From these results, it was concluded that the differential
weighting models were more effective than the unit weighting models in
predicting committee members' judgments. Differential weighting
schemes were most effective in modeling admissions judgment policies.
However, there were no significant differences between the differ-
ential weighting schemes. It will be recalled that there were no
significant differences between objective and subjective weights for
the correlated data. Since there was such high inter-judge agreement,
a weighting scheme (average weights) based on this agreement was highly
accurate in predicting actual judgments. Had there been less inter-
judge agreement, the average weighting model would have been less

effective. Also, previous research has shown that when the predictor

133

(independent) variables are correlated, different sets of positive
weights tend to yield similar correlations. The results obtained with
three differential weighting models confirm this conclusion.

Another interesting result was that how the unit weight judg-
ments were computed made a difference. If simple unit weights were
used, they were less effective than standardized unit weights. The
unit weighting scheme used non-standardized cue values while the equal
weighting scheme used standardized cue values. A judgment policy of
simply adding the independent variables (unit weighting) would not be
very effective in predicting committee members' judgments. Thus, it
was important to see how the predicted judgments were computed.

Boundary conditions were established from two data sets.

Weighting models were most effective for the correlated data. Accuracy
decreased for the orthogonal data. Therefore, the inter-correlation of
data should be examined before certain weighting schemes are used. It
will make a difference in effectiveness.

Limitations of the Study

 

The limitations of this study fall within two general categories.
One is related to external validity, the other to internal validity.
Limits on external validity are as follows. First, the generaliza-
bility of the study is limited to admissions committees who are
similar to the tested subjects. Second, other independent variables
could have been presented to the judges. Third, there were additional
ways to measure the success or goodness of fit of the various weighting
models. Fourth, different restrictions could have been placed on the

judgment task.

134

The results are restricted to those subjects who are similar to
the group tested in this study. These committee members were elected
to three year terms and were trained through workshops to become
familiar with the existing admissions policy. Members sit on the com-
mittee for one year before they are allowed to participate in setting
new policy each year. In this manner, the members have at least one
year of experience upon which to base or make any changes in the
process. Since the subjects of this study had an average of two
years experience, these results may generalize to those committees
constructed in a similar fashion.

Four independent variables were selected for study. Additional
or different variables could have been selected. Although the
selected variables are the most widely used, there is other valuable
information examined in the admissions process. The results might be
restricted by the number and type of variables. Therefore, increasing
the number of variables or type of variables might change the results
of this study.

Ratings of two variables (i.e., personal statement and interview
scores) were presented to the judges. Judges did not have to rate
the personal statements or interview the applicants. The results of
ratings were done in advance and these were presented to the judges.
Had judges done the ratings or interviews, this might have altered
the results.

This study relied primarily on the weights themselves and how
these weights were used to generate judgments. Regression and
correlation techniques were the method of analysis. Other techniques

could have been used. For example, decision trees, computer simulations

135

or thinking aloud protocols could be used to test the efficacy of
different weighting models. These types of analyses would further
the understanding of how judges weight information when making
decisions.

Finally, judges were not limited in the number of times they
could give a rating to an applicant. That is, there was no pressure
to decide on a set number of applicants who would be accepted. If
judges wanted to give all applicants a high rating, they could do so.
This freedom is not possible in the admissions process. Committees
are restricted by the number of places available. This restriction
was not placed on the subjects in this study and thus may have
reduced the generalizability of the research.

Two factors which might limit the internal validity of the study
are as follows. First, some of the subjects have worked together as
a committee. Second, the subjects volunteered to participate in this
study. That the subjects worked together has been discussed in part.
It will be recalled that there was high inter-judge agreement. This
agreement may have resulted from the fairly extensive training members
received. Members may have been aware of how other members might be
judging applicants and therefore, adjusted their judgments. This
would contaminate some of the results.

The subjects who participated in this study were non-paid
volunteers. In any study where there are volunteers, there is a risk
of self-selection biasing the results. However, the fifteen committee
members who served as judges represented 95% of the total committee of
16. Thus, the entire committee was essentially represented, but:

again, results are limited to similar committees.

136

Implications

 

Based upon the findings of this study and some of the questions
raised in the literature, a number of implications are suggested. They
related to (l) the use of linear models, (2) the use of subjective
weights, (3) the use of differential vs.unit weights and (4) the use
of different data sets.

The linear model proved successful in representing another judg-
ment task. Admissions committee members were successfully simulated
by such a model. Another kind of decision maker (i.e., medical school
admissions committee members) and another judgment task (i.e., medical
school admissions) has been added to a growing list in the judgment
paradigm.

The use of subjective weights to model judgments is a relatively
new area in the judgment paradigm. Research has shown discrepancies
between subjective and objective weights. The research has typically
used one performance criterion, the correlation between objective and
subjective weights.

The results of this study showed that the subjective weights
performed quite admirably in terms of three different performance
criteria: (1) the subjective weights were correlated positively with
the objective weights; (2) there were no significant differences in
the prediction of actual judgments when subjective and objective
weights were used; and (3) subjective weights yielded predicted
judgments which correlated highly with predicted judgments generated
from objective weights. Using each performance criterion, the sub-
jective weights worked quite well. These findings are also consonant

with the research showing that when predictor variables are positively

137

correlated, different sets of positive weights tend to yield similar
correlations. Thus, the subjective weights performed as well as the
objective weights.

A major implication for medical school admissions is to use sub-
jective weights in a linear model. Weights could be elicited from
committee members and then used in a linear equation, thus boot-
strapping the committee member's judgment policy. Factors such as
fatigue, boredom, daydreaming and malaise may cause a judge to be
inconsistent in making decisions. A linear model is resilient to such
sources of error and is consistent with a judge's policy. An even
stronger case can be made for the use of subjective weights when the
predictor variables are correlated.

When examining the issue of differential and unit weights, two
concerns arise. The first cautions readers not to overgeneralize
researchers' findings. The lens model has two sides; one deals with
a criterion, the other concerns actual judgments. Weights used to
predict actual judgments are not necessarily the ones used to
predict a criterion. Therefore, conclusions about weights derived
from one side should not be generalized to the other side.

The second concern alerts the reader to note how researchers
compute various scores. For example, this study found that the use
of standardized vs. non-standardized cue values could lead to dif-
ferent conclusions. An equal-weighting model using standardized unit
cue values was found to be superior to a unit-weighting model using
non-standardized cue values. Although this finding is consistent
with previous research that found an equal-weighting model to be

effective in predicting a performance criterion, it points to the

138

need to be careful not to ignore the measurement components of
modeling a judgment task.

A final implication concerns judges' perceptions of the data sets.
The use of two data sets established boundary conditions for the
various weighting schemes and incidentally produced an interesting
finding. That is, experienced committee members could not differ-
entiate representative from non-representative data. This may be
partially explained by script theory or the fact that they had no
reason to believe that non-representative data were being used in one
of the testing sessions. However, the fact is explained, the judges
were not able to recognize the independence of the four variables in
the orthogonal data set. These findings only scratch the surface in
identifying judges' perceptions of data and how these perceptions
affect various weighting schemes.

Recommendations for Future Research

 

This study is a first step in gaining a better understanding of
how admissions committee members make judgments about the acceptability
of applicants. The next steps must not only expand the findings of
this study, but also overcome its limitations. Several possibilities
present themselves.

The first would be to examine how committee members make judg-
ments when they are presented with an applicant's entire admissions
folder. Obviously, more variables are contained in this folder than
the ones presented in this study. But, it would be interesting to
note which variables account for the most variance in their judgments.

These variables could then be compared to the ones typically used in

139

the admissions process. The task generalizability would be greatly
increased if the judgments were made on the entire folder.

A second possibility would be to design a study such that the
criterion values would be known. That is, this study would involve
both sides of the lens model. Committee members would be asked to make
judgments about applicants about whom some criterion information is
known. This criterion could be a rating of performance during the
first year (e.g., GPA, or some combination of scores)<n~a rating of
performance later in medical school. The lens model could be used to
identify committee members who might be selecting applicants who
turned out to do poorly in medical school. These committee members'
policies could then be altered or they could be replaced. This
prospective study would examine the capabilities of the lens model to
diagnose and prescribe.

A third study could examine judges' perceptions of the data
presented. A learning experiment could be designed to test when
judges are able to perceive the true correlations of the data. It
would be interesting to note under which conditions judges would change
their perceptions. Having changed their perceptions would it be pos-
sible to change their weighting schemes? Such a study would start
to address this question.

Finally, research must address the relative merits of rule-
based and case-by-case decision making in terms of acceptance. That
is, research could be directed toward ascertaining the non-statistical
aspects of decision making. For example, judges often have intel-
lectual and emotional commitments to case-by-case decision making.

Why is there resistance to rule-based decision making? How can it be

140

overcome? An important question to ask is whether there are any
relations between types of individuals and types of decision making.

00 particular personality types resist rule-based decision making?
Research could examine how personality factors interact with preferences

and outcomes in decision making tasks.

REFERENCES CITED

141

REFERENCES CITED

Abelson, R. P. "Script processing in attitude formation and decision
making." In J. S. Carroll and J. W. Payne (eds.). Cognition
and Social Behavior, Hillsdale, N.J.: Lawrence Erlbaum
Associates, 1976.

Ambrosino, R. J. and Brading, P. L. "An analytical computer-based
methodology for screening medical school applicants." Journal
of Medical Education, 48:332-335. 1973.

Becker, M. H., Katatsky, M. E., and Seidel, H. M. "A follow-up study
of unsuccessful applicants to medical schools." Journal of
Medical Education, 48:991-1001, 1973.

 

Best, W. R., Diekema, A. J., Fisher, L. A., and Smith, N. E. "Multi-
variate predictors in selecting medical students." Journal of
Medical Education, 46:42-50, 1971.

 

Bordage, G., Elstein, A., Vinsonhaler, J., and Wagner, C. “Computer-
Aided and Computer-Simulated Medical Diagnosis." Paper pre-
sented at IEEE First Annual Symposium on 'Computer Application
in Medical Care,‘ Washington, pp. 204-210, Oct. 3-5, 1977.

Bordley, J., and Harvey, A. M. Two Centuries of American Medicine.
Philadelphia: W. B. Saunders Company, 1976.

Brehmer, B. "Hypothesis about relations between scaled variables in
the learning of probabilistic inference tasks." Organizational
Behavior and Human Performance, 11:1-27, 1974.

Brodie, C. M. "Clinical prediction of personality traits displayed in
specific situations." Journal of Clinical Psycholggy, 20:459-
461, 1964.

Brunswik, E. "Representative design and probabilistic theory in a.
functional psychology." Psychological Review, 62:193-217,
1955.

 

142

143

Brunswik, E. Perception and the Representative Design of Psychological
Experiments. Berkeley: University of California Press, 1956.

Bryan, J. H., Hunt, W. A., and Walker, R. E. "Reliability of estimat-
ing intellectual ability from transcribed interviews." Journal
of Clinical Psychology, 22:360, 1966.

Char, W. F., McDermott, J. E., Haning, W. E., and Hansen, M. J. "In-
terviewing, motivation and clinical judgment.“ Journal of
Medical Education (Communications), 50:192-194, 1975.

 

Cohen, J., and Cohen, P. Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences. Hillsdale, N.J.:
Lawrence Erlbaum Associates, 1975.

 

Cook, R. L., and Stewart, T. R. "A comparison of seven methods for
obtaining subjective descriptions of judgmental policy."
Organizational Behavior and Human Performance, 13:31-45, 1975.

Cronbach, L. J. "Coefficient Alpha and the internal structure of
tests." Psychometrika, 16:297-334, 1951.

Darlington, R. 8. "Multiple regression in psychological research and
practice." Psycholggical Bulletin, 69:161-181, 1968.

Dawes, R. M. "A case study of graduate admissions: Application of
three principles of human decision making." American Psychol-
ogist, 26:180-188, 1971.

Dawes, R. M. "Slitting the decision maker's throat with Occam's Razor:
The superiority of random linear models to real judges."
Oregon Research Institute Research Bulletin, pp. 12-13, 1972.

Dawes, R. M., and Corrigan, B. "Linear models in decision making."
Psychological Bulletin, 81:95-106, 1974.

Dawes, R. M. "Case-by-case versus rule-generated procedures for the
allocation of scarce resources." In M. F. Kaplan and S.
Schwartz. Human Judgment and Decision Processes: Applications
in Problem Settingg, New York: Academic Press, 1977.

 

de Groot, A. 0. Thought and Choice in Chess. The Hague: Monton,
1965.

 

144

Dube, W. F., and Johnson, D. G. “Study of U.S. medical school appli-
cants." 1974-1975, Journal of Medical Education, 51:877-895,
1976.

 

Edwards, W. “Conservatism in human information processing." In B.
Kleinmuntz (Ed.). Formal Representation of Human Judgment.
New York: Wiley, 1968.

Einhorn, H. J., and Hogarth, R. M. "Unit weighting schemes for deci-
sion making." Organizational Behavior and Human Performance,
13:171-192, 1975.

Elstein, A. S., Shulman, L. S., Sprafka, S. A., et al. "An analysis
of medical inquiry processes." Final report. Grant No.
ROl-PE-OOO41, 1976.

Elstein, A. S., Shulman, L. S., Sprafka, S., et a1. Medical Problem
Solving, an Analysis of Clinical Reasoning. Harvard University
Press, 1978.

Erdmann, J. B., Mattson, D. E., Hutton, J. G., and Wallace, W. L. "The
medical college admission test: Past, present, future."
Journal of Medical Education, 46:937-946, 1971.

Flexner, A. "Carnegie Foundation for the advancement of teaching."
Medical Education in the United States and Canada. New York:
1910.

Fryback, 0. "Use of radiologists' subjective probability estimates in
a medical decision making problem." Mathematical Psycholggy
Program, Department of Psychology, Ann Arbor: University of
Michigan, 1974.

Funkenstein, D. H. "Current problems in the verbal and quantitative
ability subtests of the medical college admission test."
Journal of Medical Education, 49:1031-1048, 1965.

Funkenstein, D. H. "Current medical school admissions: The problems
and a proposal." Journal of Medical Education, 45:497-509,
1970.

Gee, H. H., and Cowles, J. T. (Eds.). "The appraisal of applicants to
medical schools.“ Journal of Medical Education, Part 2, 131-
139. 1957.

 

145

Glass, G. V., and Stanley, J. C. Statistical Methods in Education and
Psychology. Englewood Cliffs: Prentice-Hall, Inc., 1970.

Goldberg, L. R. "Reliability of Peace Corps selection boards: A study
of interjudge agreement before and after board discussions."
Journal of Applied Egychology, 50:400-408, 1966.

 

Goldberg, L. R. "Simple models or simple processes? Some research on
clinical judgments." American Psychologist, 23:483-496, 1968.

Goldberg, L. R. "Man versus model of man: A rationale, plus some
evidence, for a method of improving on clinical inferences."
ngchological Bulletin, 73:422-432. 1970.

Goldberg, L. R. "Five models of clinical judgment: An empirical
comparison between linear representations of the human infer-
ence process." Orgenizational Behavior and Human Performance,
6:458-479, 1971.

Goldberg, L. R. "Psychology in action: Admission to the PLO program
in the Department of Psychology at the University of Oregon."
American Psychologist, pp. 663-668, 1977.

 

Gregory, S., and Dawes, R. M. "The linear analysis of sincerity."
Paper presented at the Conference of Human Judgment, Univer-
sity of Colorado, March, 1972.

Gunderson, E. K. E. “The reliability of personality ratings under
varied assessment conditions." Journal of Clinical Psychology,
21:161-164, 1965.

Hammond, K. R. "Probabilistic functionalism: Egon Brunswik's inte-
gration of history, theory and method of psychology." In
K. R. Hammond (Ed.). The Psychologyiof Egpn Brunswik. New
York: Holt, Rinehart and Winston, 1966.

Hammond, K. R. "Inductive Knowing." In J. Royce and W. Rozeboom
(Eds.). The Psycholggy of Knowing, New York: Gordon and
Breach, 1972.

Hammond, K. R., Hursch, C., and Todd, F. J. "Analyzing the components
of clinical inference." Psycholggical Review, 71:438-456,
1964.

 

146

Hammond, K. R., and Summers, D. A. "Cognitive control." Psychological

 

Review, 79:58-67, 1972.

Hammond, K. R., Stewart, T. R., Brehmer, B., and Steinmann, D. 0.
"Social judgment theory." In M. F. Kaplan and S. Schwartz.
Human Judgment and Decision Processes: Applications in
Problem Settings. New York: Academic Press, 1977.

Hoffman, P. J. "The paramorphic representation of clinical judgment.“
Psychological Bulletin, 47:116-131, 1960.

Hoffman, P. J., Slovic, P., and Rorer, L. G. "An analysis of variance
model for the assessment of configural cue utilization in
clinical judgment." Psychoiggical Bulletin, 69:338-349, 1968.

Howell, M. A., and Vincent, J. W. "The medical college admission test
as related to achievement tests in medicine and to supervisory
evaluation of clinical physicians." Journal of Medical Educa-
.pigp, 42:1037-1044, 1967.

Hunka, S., Gilbert, J. A., and Cameron, 0. F. "Predicting success of
medical students." Journal of Medical Education (Abstract),
39:881. 1964.

Johnson, 0. Systematic Introduction to the Psycholggy of Thinking.
New York: Harper and Row, 1972.

Kahneman, D., and Tversky, A. "On the psychology of prediction."
Psychological Review, 80:237-251. 1973.

Kegel-Flom, P. "Predicting supervisor, peer and self ratings of intern
perfbrmance." Journal of Medical Education, 50:812-815, 1975.

Kerlinger, F. N., and Pedhazur, E. J. Multiple Regression in Behav-
ioral Research. New York: Holt, Rinehart and Winston, Inc.,
1973.

 

Kleinmuntz, B. "The processing of clinical information by man and
machine. In B. Kleinmuntz (Ed.)." Formal Representation of
Human Judgment. New York: Wiley, 1968.

 

Korman, M., Stubblefield, R. L., and Martin, L. W. "Patterns of suc-
cess in medical school and their correlates." Journal of
Medical Education, 43:405-410, 1968.

 

147

Krupka, J., Elstein, A. S., Molidor, J. 8., King, L., Parsons, M.,
and Son, L. "Assessment of empathy skills and problem solving
skills as a screen for admissions to medical school." Final
report to the National Fund for Medical Education, NFME Grant
#47/74, 1977.

Martin, H. T. "The nature of clinical judgment." Unpublished doc-
toral dissertation, Washington State College, 1957.

Matarazzo, J. D., and Goldstein, S. G. "The intellectual caliber of
medical students." Journal of Medical Education, 47:102-111.
1972.

Mattson, D. E. "Use of a formal decision theory model in the selection
of medical students." Journal of Medical Education, 44:964-
973, 1969.

Milstein, R. M., Burrow, G. N., Wilkinson, L., and Kessen, W. "Predic-
tion of screening decisions in a medical school admission
process." Journal of Medical Education, 51:626-633, 1976.

Morowitz, H. J. "Zen and the art of getting into medical school."
Hospital Practice, pp. 132-134, 1976.

Nash, D. B. "MCAT 1977: New, but improved?" Medical Dimensions,
6:24-30, 1977.

 

Naylor, J. C., and Wherry, R. J. "The use of simulated stimuli and
the "JAN" technique to capture and cluster the policies of
raters.“ Educational and Psychological Measurement, 25:969-
986, 1965.

Newell, A. "Judgment and its representation: an introduction." In
B. Kleinmuntz (Ed.). Formal Representation of Human Judgment.
New York: Wiley, 1968.

Newell, A., and Simon, H. A. Human Problem-Solving. Prentice-Hall,
1972.

 

Padgett, W. B., Rankin, B. B., and Knisley, W. H. "A computer- .
assisted admission matching system for the University of
Texas medical schools." Journal of Medical Education, 51:
478-487, 1976.

 

148

Raiffa, H. Decision Analysis: Introductory Lectures on Choices Under
Uncertainty. Reading, Mass.: Addison-Wesley, 1968.

Report on the Council of Deans "Ad Hock Committee to consider medical
school admissions problems." From the Agenda Book for Novem-
ber 3, 1972 meeting of AAMC Council of Deans.

Rhoads, J. M., Gallemore, J. L., Gianturco, D. T., and Osterhout, S.

"Motivation, medical school admissions and student performance)‘
Journal of Medical Education, 49:1119-1127, 1974.

Sabin, F. R. Franklin Paine Mall--The Story of a Mind. Baltimore:
Johns Hopkins University Press, 1934.

Schmidt, F. L. "The relevant efficiency of regression in simple unit
predictor weights in applied differential psychology." Educa-
tional and Psychological Measurement, 31:699-714, 1971.

Schmitt, N., and Levine, R. L. "Statistical versus subjective weights:
Should we change our focus?" Organizational Behavior and Human
Performance, 20:15-30, 1977.

Schofield, W. "A modified actuarial method in the selection of medical
students." Journal of Medical Education, 45:740-744, 1970.

Shulman, L. S., and Elstein, A. 5. "Studies of problem-solving, judg-
ment and decision-making: Implications for educational re-.
search." In F. Kerlinger (Ed.). Review of Research in Educa-
iipp, Peacock, 3:3-42, 1975.

Simon, H. J., and Corell, J. W. “Performance of medical students ad-
mitted via regular and admission-variance routes." Journal of
Medical Education, 50:237-241, 1975.

 

Slovic, P. "Analyzing the expert judge: A descriptive study of a
stockbroker's decision processes." Journal of Applied Psy-
chology, 53:255-263, 1969.

 

Slovic, P., Fischhoff, B., and Lichtenstein, S. "Behavioral Decision
Theory." Annual Review of Psychology, 1977.

Slovic, P., Fleissner, D., and Bauman, W. S. "Analyzing the use of
information in investment decision making: A methodological
proposal." Journal of Business, 45:283-301. 1972.

149

Slovic, P., and Lichtenstein, 5. “Comparison of bayesian and regres-
sion approaches to the study of information processing in
judgment." Organizational Behavior and Human Performance,
6:649-744, 1971.

Sprafka, S. A., and Elstein, A. 5. "What do physicians do? An
analysis of diagnostic reasoning." Symposium presentation
at conference on research in medical judgment, Office of Med-
ical Education Research and Development, Michigan State Uni-
versity, East Lansing, Michigan, May, 1974.

Summers, D. A., Taliaferro, J. D., and Fletcher, D. J. "Subjective
vs. objective description of judgment policy." Psychonomic
Science, 18:249-250, 1970.

Teitelbaum, H. S., Elstein, A. 5., Rex, S., and Conklin, J. L. "De-
signing and implementing a quantitative system of admissions

for a medical school." Paper presented at the conference on
research in Medical Education. AAMC, Washington, D.C., 1973.

Trattner, M. H. "Comparison of three methods for assembling aptitude
test battery." Personnel Psychology, 16:221-232, 1963.

Turner, E. V., Helper, M. M., and Kriska, S. D. "Predictors of clin-
ical performance." Journal of Medical Education, 49:338-342.
1974.

Tversky, A., and Kahneman, D. "Belief in the law of small numbers."
Psycholdgical Bulletin, 76:105-110, 1971.

Tversky, A., and Kahneman, 0. "Availability: A heuristic for judging
frequency and probability." ngnitive Psychology, 5:207-332,
1973.

 

Tversky, A., and Kahneman, 0. "Judgment under uncertainty: Heuristics
and biases." Science, 185:1124-1131, 1974.

Wallace, H. A. "What is in the corn judge's mind?" Journal of the
American Societyof Agronomy, 15:300-304. 1923.

Watson, C. G. "Relationship of distortion to DAP diagnostic accUracy
among psychologists at three levels of sophistication."
Journal of Consulting Psycholdgy, 31:142-146, 1967.

150

Wingard, J. R., and Williamson, J. W. "Grades as predictors of physi-
cians' career performance: An evaluative literature review."
Journal of Medical Education, 48:311-322. 1973.

Winslow, C. N., and Rapersand, I. "Postdiction of the outcome of
somatic therapy from the Rorschack records of schizophrenic
patients." Journal of Consulting Psychology, 28:243-247.
1964.

APPENDIX A

INTRODUCTION, INSTRUCTIONS, AND CORRELATED DATA SET

151

APPENDIX A

INTRODUCTION, INSTRUCTIONS, AND CORRELATED DATA SET

POLICY CAPTURING

 

 

 

 

 

 

 

STUDY

Name: Sex
Office Address: Age
Office Phone: Degree
Number of Years

On Admissions Committee:

Area of Specialization:

INTRODUCTION

On the following pages you will be given some information about
forty applicants to medical school. This information includes:
1) total grade point average, 2) Medical College Admission Test score,
3) a personal statement rating, and 4) an interview score. These
scores are part of the information available to committee members. You
are asked to accept this limitation as you complete the following two
tasks:

1) Rate each applicant on the basis of his overall quality by
assigning a score from one (1) to seven (7).

2) Report the subjective importance you placed on each of the
four variables used in rating the applicant.

152

153

_ The following is a detailed explanation of each of the vari-

ables and scores:

1)

2)

3)

4)

Total GPA--This represents the cumulative grade average of

the applicant's undergraduate years. The GPA ranges from a
low of 2.00 (C) to a high of 4.00 (A). The average GPA is
3.20 for both the following sample and the actual applicant
pool.

MCAT Scores--In 1977 the revised MCAT was given for the first

 

time to medical school applicants. Scoring is much different
than that of the old MCAT as is the nature of the material
upon which the applicant is being tested. There are four sub-
tests which are science-related: Content knowledge in biology,
chemistry, and physics, and problem solving ability in the
sciences. There are two additional tests upon which the ap-
plicants are tested: quantitative reasoning ability and read-
ing comprehension. Ordinarily the scores are reported as
separate scores, but because they are highly correlated and

to simplify the task an average score has been given. Scores
range from a low of 1 to a high of 14. The average MCAT score
is 8 for both the sample and the actual applicant pool.

Personal Statement Score--The personal statement score repre-

 

sents the evaluation given the applicant by two raters who
have read the two pages of autobiographical information found
in the applicant's formal application and the personal state-
ment submitted by the applicant describing his reasons for
choosing Michigan State University. The favorability of these
scores are described by one of five verbal labels: (1) well
below average, (2) below average, (3) average, (4) above aver-
age, and (5) well above average. The highest score, therefore,
is well above average. The average score is 3.0 for both the
sample and applicant pool.

Interview Score--This score represents the combined recommenda-

 

tion given by two interviewers who have each conducted a fifty
to sixty minute interview of the applicant. Questions in the
interview focus on personal qualities considered important to
the student's successful functioning at this school as well as
qualities felt to be critical to effectiveness as a physician.
They include such areas as problem solving skills, maturity,
motivation, interpersonal skills, and self-understanding. Five

154

different labels describe the recommendation given to an appli-
cant. These range from (1) express reservations, (2) below
average candidate, (3) average candidate, (4) above average
candidate, and (5) outstanding candidate. The average score

is 3.0 for both the sample and applicant pool.

INSTRUCTIONS

Rate each applicant on the basis of his overall quality by assign-
ing a score from 1 to 7. One (1) is a clearly unacceptable rating,
four (4) is an average rating, and seven (7) is a clearly accept-
able rating. There is no limit to the number of times a rating
may be used. Write the rating in the column labeled "Your Rating.“

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
1 2.64 5 Average Average
2 3.57 13 231°" Average
verage
3 4.00 70 Above Below
Average Average
Above Outstanding
4 3'27 13 Average Candidate
5 2.33. 5 Well Below Express

Average Reservations

 

155

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
6 3.23 8 Average Average
7 2°68 4 gilggge Hilggge
8 3.52 10 gilggge Average
9 3'42 9 Aggxige Aggtgge
10 2.69 4 Average Average
APPLICANT TOTAL MCAT PERSONAL INTERVIEW
NUMBER GPA SCORE STATEMENT SCORE
11 3.64 6 Average Average
12 3.75 10 ﬁggxgge Average
13 3.08 9 Average gilggge
14 2.60 8 Average Average
15 3.41 11 Average Average

 

156

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
VNUMBER GPA SCORE STATEMENT SCORE RATING
‘6 3-44 8 2332,. 32.133232”
17 4.00 8 Average Aggxgge
is m 11:37:?“ 222.23..
19 2.26 6 Average iglggge
20 12 232:2... 233.3323?"
APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
21 2.51 6 23:13::10w Average
22 3-45 12 1321.72?“ 2521:...
23 2-04 4 13431329., 15132,.
24 3‘76 1] Hillagzove 232::ge
25 3.46 - 7 Average Above

Average

 

157

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
26 3.08 7 Average Egggijgtions
27 2'57 6 Hilla3210w Rzggiiztions
28 2'76 8 Atlggge Hilggge
29 3.22 7 Ailggge Average
30 3.31 11 Average Average
APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
3. n 2:21.22“: 252:.
32 3.57 13 Hilggge Average
33 2'68 4 Ailggge gilggge
34 3.41 11 Average Average
35 3.64 6 Average Average

 

158

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Above
36 3.75 10 Average Average
37 3 08 9 Average 361°"
' Average
38 2.69 4 Average Average
Below
39 3.52 10 Average Average

 

Well Above Above

40 3'51 14 Average Average

 

Report the subjective importance you placed on each of the four
variables used in rating the applicant (Total GPA, MCAT Scores,
Personal Statement, and Interview Score) first, by ranking each
variable; and second, by distributing 100 points among the four
'variables. For example, a committee member may rank the four
variables as follows: (from high to low) MCAT Scores, Interview
Score, Total GPA, and Personal Statement. MCAT Scores get 40
points, Interview Scores get 30 points, Total GPA gets 20 points,
and Personal Statement gets 10 points.

VARIABLE RANK* IMPORTANCE POINTS
TOTAL GPA
MCAT SCORES

PERSONAL STATEMENT
INTERVIEW SCORE

 

 

Does Total = 100?

*one is low
*four is high

APPENDIX 8

INTRODUCTION, INSTRUCTIONS, AND ORTHOGONAL DATA SET

159

APPENDIX B

INTRODUCTION, INSTRUCTIONS, AND ORTHOGONAL DATA SET

POLICY CAPTURING

 

 

 

 

 

 

 

STUDY

Name: Sex
Office Address: Age
Office Phone: Degree
Number of Years

On Admissions Committee:

Area of Specialization:

INTRODUCTION

On the following pages you will be given some information about
forty applicants to medical school. This information includes:
1) total grade point average, 2) Medical College Admission Test score,
3) a personal statement rating, and 4) an interview score. These
scores are part of the information available to committee members. You
are asked to accept this limitation as you complete the following two
tasks:

1) Rate each applicant on the basis of his overall quality by
assigning a score from one (1) to seven (7). '

2) Report the subjective importance you placed on each of the
four variables used in rating the applicant.

160

161

The fOllowing is a detailed explanation of each of the vari-

ables and scores:

1)

2)

4)

Total GPA--This represents the cumulative grade average of the

applicant's undergraduate years. The GPA ranges from a low of
2.00 (C) to a high of 4.00 (A). The average GPA is 3.20 for
both the following sample and the actual applicant pool.

‘MCAT Scores--In 1977 the revised MCAT was given for the first

 

time to medical school applicants. Scoring is much different
than that of the old MCAT as is the nature of the material upon
which the applicant is being tested. There are four subtests
which are science-related: Content knowledge in biology, chem-
istry, and physics, and problem solving ability in the sciences.
There are two additional tests upon which the applicants are
tested: quantitative reasoning ability and reading comprehen-
sion. Ordinarily the scores are reported as separate scores,
but because they are highly correlated and to simplify the task
an average score has been given. Scores range from a low of 1
to a high of 14. The average MCAT score is 8 for both the
sample and the actual applicant pool.

Personal Statement Score--The personal statement score repre-

 

sents the evaluation given the applicant by two raters who have
read the two pages of autobiographical information found in the
applicant's formal application and the personal statement sub-
mitted by the applicant describing his reasons for choosing
Michigan State University. The favorability of these scores
are described by one of five verbal labels: (1) well below
average, (2) below average, (3) average, (4) above average, and
(5) well above average. The highest score, therefore, is well
above average. The average score is 3.0 for both the sample
and applicant pool.

Interview Score--This score represents the combined recommenda-
tion given by two interviewers who have each conducted a fifty
to sixty minute interview of the applicant. Questions in the
interview focus on personal qualities considered important to
the student's successful functioning at this school as well as
qualities felt to be critical to effectiveness as a physician.
They include such areas as problem solving skills, maturity,
motivation, interpersonal skills, and self-understanding. Five
different labels describe the recommendation given to an
applicant. These range from (1) express reservations,

 

162

(2) below average candidate, (3) average candidate, (4) above
average candidate, and (5) outstanding candidate. The average
score is 3.0 for both the sample and applicant pool.

INSTRUCTIONS

Rate each applicant on the basis of his overall quality by assign-
ing a score from 1 to 7. One (1) is a clearly unacceptable rating,
four (4) is an average rating, and seven (7) is a clearly accept—
able rating. There is no limit to the number of times a rating

may be used. Write the rating in the column labeled "Your Rating."

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
1 3.46 7 Average 23::298
2 3’08 5 Hilla3210w Rzggiiztions
3 3.51 8 Average 2::Z§3232"9
4 3.76 6 Average Hilggge
5 2.68 11 Well Above Average

Average

 

163

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Below Above
6 2'33 10 Average Average
Above
7 3.75 13 Average Average
8 2.69 6 Above Below
Average Average
Above Below
9 3'26 8 Average Average
Below Below
10 4'00 1] Average Average
APPLICANT TOTAL MCAT PERSONAL INTERVIEW- YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Above
11 3.44 4 Average Average
12 3.57 6 Average Average
13 2.04 5 Average Average
Well Above Express
14 2'64 12 Average Reservations
15 2.51 10 Above Average

Average

 

164

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Bel ow Above
16 3‘23 4 Average Average
Above
17 4.00 9 Average Average
18 3.14 8 Average Average
Above
19 3.42 8 Average Average
Below
20 3.52 10 Average Average
APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Well Below
21 3.22 7 Average Average
Above Outstanding
22 2'26 4 Average Candidate
Below Outstanding
23 3'64 14 Average Candidate
Well Below
24 3.08 8 Average Average
25 3.27 6 ”61] Above Average

Average

 

165

 

 

 

 

 

 

 

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
26 4.00 11 Average Average
27 3.31 13 351°" 391°“
Average Average
Express
28 2.76 12 Average Reservations
29 2.57 7 Average Average
30 3.45 9 Below Above
Average Average
APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Below
31 3.76 6 Average Average
Below Below
32 4'00 1] Average Average
Below
33 3.52 10 Average Average
Well Above Express
34 2'64 12 Average Reservations
35 3.46 7 Average Above

Average

 

166

 

 

 

 

 

 

APPLICANT TOTAL MCAT PERSONAL INTERVIEW YOUR
NUMBER GPA SCORE STATEMENT SCORE RATING
Above
36 4.00 9 Average Average
37 2.57 7 Average Average
38 3.26 8 above Below
verage Average
39 3.57 6 Average Average
40 3.37 73 Below Below
Average Average

 

Report the subjective importance you placed on each of the four
variables used in rating the applicant (Total GPA, MCAT Scores,
Personal Statement, and Interview Score) first, by ranking each
variable; and second, by distributing 100 points among the four
variables. For example, a committee member may rank the four
variables as follows: (from high to low) MCAT Scores get 40
points, Interview Scores get 30 points, Total GPA gets 20 points,
and Personal Statement gets 10 points.

VARIABLE RANK* IMPORTANCE POINTS
TOTAL GPA
MCAT SCORES

PERSONAL STATEMENT
INTERVIEW SCORE

 

 

 

Does Total = 100?

*one is low
*four is high

APPENDIX C

RATINGS GIVEN T0 APPLICANTS IN ORTHOGONAL DATA SET

167

APPENDIX C

RATINGS GIVEN TO APPLICANTS IN ORTHOGONAL DATA SET

 

 

RANGE

STD

I 2 I 3 i 4 I 5 ﬂ 6 I 7 5 8 ﬂ 9 #10 #11 #12 #13 '14 #15

i 1

APPLICANT JUDGE

 

.83 3 to 6
.70 l to 3

4.53

1.73
5.80

2.87

.68 5 to 7

6

l to 4

.99

4.47 1.13 2t06

5

.85 2 to S

4.

5

.70 5 to 7

6.27

.83 1 to 4

2.53

l to 4

.91
3.47 1.13 2 to 6
3.40 1.12 1 to 5

4

10

11

.3

.90 1 to 5

3.67

12

00th

000
“U“
Pal-M
VOID
°:'-.°.

'—

8.8.8.
NMQ’

2
4
4

13
14
15

168

3.60 1.06 1 to 5

4

16
17
18
19
20
21

.52 4 to 6

4.87

.46 3 to 5

3.93
5.00

3.87
3.

.93 3 to 6
.83 2 to 5

6

.82 2 to 5
4.33 1.59 1 to 6

33

4

22

.56 5 to 7

6.20

23

.74 2 to 5

3.73 1.10 1 to 6

3.53

24
25

4

.83 4 to 7

5.13
3.

26
27

.99 2 to 5

2.60 1.24 I to 4

4

28
29

2.80 1.01 l to 5
.96 3 to 7

4.73

3

 

4.00 4.17 4.03 3.47 4.10 3.30 4.17 3.37 3.43 3.77 4.67 3.47 3.87 4.23

4.17

1.40 1.46 1.33 2.27 1.09 1.54 .95 1.43 1.61 .97 .88 1.36 1.33 1.33

1.26

STD.

l to 7 2 to 6 3 to 6 1 to 6 2 to 7 2 to 7

l to 6

l to 7 2 to 7 1 to 7 1 to 7 1 50 7 2 to 7 l to 6 2 to 6

RANGE

 

APPENDIX D

SUBJECTIVE IMPORTANCE WEIGHTS FOR ORTHOGONAL DATA SET

169

APPENDIX D

SUBJECTIVE IMPORTANCE WEIGHTS FOR ORTHOGONAL DATA SET

 

 

 

 

JUDGE GPA MCAT 21:123211 INEngéew
# 1 25 30 15 30

# 2 20 13 17 50

# 3 20 30 1o 40

# 4 25 25 1o 40

# 5 20 35 15 30

# 6 1O 35 20 35

# 7 15 45 15 25

# 8 20 10 3o 40

# 9 20 10 10 50
#10 4O 3O 1O 20
#11 20 3O 30 20
#12 1O 20 30 4O
#13 3O 20 25 25
#14 3O 1O 20 4O
#15 15 4O 15 .30

X’ 21.33 25.53 18.14 35.00

 

170

APPENDIX E

RATINGS GIVEN TO APPLICANTS IN CORRELATED DATA SET

171

APPENDIX E

RATINGS GIVEN TO APPLICANTS IN CORRELATED DATA SET

 

 

RANGE

STD

2 I 3 ﬂ 4 # 5 I 6 0 7 l 8 i 9 #10 #11 #12 #13 I14 #15

I 1

APPLICANT JUDGE

 

.88 1 to 4
4.53 1.13 3 to 7
4.67 1.05 3 to 6
6.60
1.53

2.27
3.87

6

.63 5 to 7

.83 1 to 3

.74 3 to 5

.74 1 to 3

1.60
4.33
5.47

.90 3 to 6
.92 3 to 7

2.60 1.12 1 to 4
4.00 .65 3 to 5
.51

5.60
3.60

10

11

4

5 to 6

12
13
14

.74 3 to 5

3.07 1.22

1 to 5

3

.96 3 to 7
.63 5 to 7

4.73
6.40
5.80
6.60
1.93
6.93

15
16
17
18
19
20
21

172

.56 5 to 7

.63 5 to 7

.80 1 to 3

.26 6 to 7

2.33 1.05 1 to 5

2

.70 5 to 7

6.27
1.13

6.40

22
23

.35 1 to 2

.63 5 to 7

24
25

.68 3 to 6
.90 1 to 4

4.80
2.33
1.53
1.80
3.07
4.33

6

26
27

.83 1 to 4

.77 1 to 3
.70 2 to 4

28
29

.72 3 to 5

 

3.90 4.08 4.23 4.40 4.37 3.80 4.30 3.50 3.43 4.13 4.57 3.47 3.67 4.23

4.00

1.65 2.10 2.06 2.42 1.73 2.20 1.71 1.66 2.19 1.57 1.17 2.01 2.12 2.33

2.05

STD.

1 to 7

1 to 7 1 to 7 1 to 7 2 to 7 2 to 6 1 to 7 1 to 7

1 to 7

1 to 7

1 to 7 1 to 7 1 to 7 1 to 7 1 to 7

RANGE

 

APPENDIX F

SUBJECTIVE IMPORTANCE WEIGHTS FOR CORRELATED DATA SET

173

APPENDIX F
SUBJECTIVE IMPORTANCE WEIGHTS FOR CORRELATED DATA SET

 

 

 

 

2:222:
# 1 25 30 15 30
# 2 18 15 17 50
# 3 20 3O 10 40
# 4 25 25 1O 40
# 5 20 35 1o 35
# 6 1O 35 15 4O
# 7 30 30 20 20
# 8 20 1O 30 4O
# 9 13 12 15 6O
#10 4O 30 1O 20
#11 20 3O 3O 20
#12 20 1O 3O 4O
#13 3O 23 23 24
#14 25 1O 25 4O
#15 15 4O 15 '30
X. 22.06 24.33 18.35 35.26

 

174

APPENDIX G
CORRELATION BETWEEN JUDGES' SUBJECTIVE WEIGHTS

FOR CORRELATED DATA SET

175

 

 

 

Fm. mm. mo.u op. ~¢.u mo. mm. mp. 59.: me. mm. mm. om. pm. .L
oo._ mm.u mm.u mm.n em. pp. mm. mm.u em. mm. 2mm. om. en. mm. mpﬁ
oo._ NF. «mm. Pm.u mm.u cw. 2mm. P~.u ep. co. om. mm. oo. ep*
oo.p m_.u mm.u mm. Fm.n mp.u pm. po.u om.n Np. mp. so. my»
oo.p m¢.u om.u Pm. oo.~ «mm.u mp. PF.- mm. om» mp.u N—ﬁ
oo.~ m¢.u um.u m¢.u co. co. ¢~.u ps.u me. F¢.- FF*
oo.p om.u om.n «mm. mp.n mm. mm. om. mm. opﬁ
oo.p pm. No.1 me. om. mm. em. mu. m *
oo._ 2mm.u mp. pp.u mm. om. mp.u m *
co.F o~.u em. co. co. Fe. N *
oo._ mm. mm. mm. mm. o *
co.— mm. «mm. mm. m *
oo.~ 2mm. mm. e *
oo.P 2pm. m *
we. N *
oo.~ p k
m~* ¢F* mp* N_* —P* op* m * m * N * m * m * e * m * p *

 

 

Hum <H<o omh<4mmmou mom mkzommz m>~humwm=m .mmwosw zmmzhum zomh<4mmmou

u x~ozmam<

176

APPENDIX H
CORRELATION BETWEEN JUDGES' SUBJECTIVE WEIGHTS

FOR ORTHOGONAL DATA SET

177

mo. v 9..

 

o
LO
ls.

we. mN. Nm.n mN. o¢.u no.1 mm. mp. om. Fe. Nm. mm. mm. mm.

 

oo.~ Nm.u mm.u FN. vN. FF. NF. Nm.n 2mm. 2mm. 25m. om. em. NF. mm. mp*
oo.p mo. oe. 2mm.u co. mm. om. mm.u o~.n ¢—.u mm. cc. mm. mp. ¢F*

oo.— Nm.u —m.u Nm. up. Nm. Nw.u mm.n no.u co. Nm.u up. mN.u mp*

oo.p oo. om.u me. am. co. mo. ¢—. Nm. oe. Nu. oo. Npﬁ

oo.p m¢.u m~.n m¢.u Fe. eN. oo. Fm.n me.. No.1 F¢.u FF»

oo.P F—.u om.n mp. Nm.u wN. Nm. ON. PN.u mm. op*
oo.P on. mo.u ow. pm. mm. mm. 2mm. cm. a *

oo.P mm.u op. mN.u Nm. ON. Nm. mp.u m *

oo.— 55. «cm. mN. mm. mp.u no. N *

oo._ Nw. om. en. mm. mm. m *

oo.p mm. mm. «N. «om. m *

oo.P «mm. ow. mm. c *

oo.F on. 2pm. m *

oo.~ ow. N *

oo.p P *

 

m~* ¢Pw m—* N_* FP* OPN m N m N N N m N m N v * m * N N p *

 

 

hum <h<o 4<zowozhmo mo; mhxwumz m>~homnm=m .muwosw zmmzhmm onH<4mmmou

: x~ozmam<

178

 

"22222222