~.— - '”" 3” ”P”4, :4 ‘5. '1 W, flu: h3g1. "' "' 1' 1111111 :‘II:“I“‘ “3:31:33 1. 111:: ’ : 1 “W' '3 3,... -- 1§1"1:1"',,-"::1:,b1fi',1'1 11' " z . “.4. ' 1:. 13.“:‘1: 3,“:7- : 2: 4: «~12 33"“111-1v‘w “. “.3“ . ‘,~3133~:‘3-L,<..- 1“- “'1 " -. ‘ 1', 13131,, 3. in N 1 2'1. :111'3. Q3}3..-'::::' “mt-I: #11:.- 1' "'1'111? "4i" “‘.:I : ) :1' 1 - 11131113“- 3’3 3 ',. .';':"' 31:73:31": :9'.\ ' I345: “,5: 1-3 5:3,}, 1 3'1"" ‘ "'g"'“' (1'1 3: :11. ~. , I ' ' 1"" :7 [-'v' ‘1'!- fr, iii? 333,133 , -' "'-x :4»: ..J.I _: :.:é3.:::=2.:.:. . ,. 4:111 '“":’ 11'1'1'1' ‘1 1 “'I"I""11-::1 «:1 11:1 '11 , I' ' 1 -::. ' . “1:"'.:':::::“' ’1..--.1 1' "11"" ""11: ' 1:“: is}! _ (3.? H M 111 :: .,:: “:1 '1“ "1'1'111'11'1 ,1, "“111 ::,,.:: "',,,.11:.,, .3,,33 3,3 ' ""1'1111,'2,,,,11111, ,3 ' 1'11" 1‘“ - 11 1'11: u. 1,,,,,,,3 .,,,::: 33,, 3,,,,,, 11:1,,31 1,111,, 11,111 :I: :1: 111:1: 1:“ 1'..:1:1' : "' ',' :H“ “'1 " :1 1" 1,,1, 1 "',I1"11:',33:1 1",‘,1111 '1" .1111 1“: ::.. 1:11111“::1111111111'1'1.:" = ,3, 11,33,331,3 ,,,,::1, 3,1,,11 ,,,,,3, 3,,: ,3,1 1111:1131 ,,' ,.,,1':1,"3 I‘ ' fl“ #{fili'E' "3;,1 "1'1,'11"11"1,"1',1:3:1""l,','111'1|”'3','1I11,11“1,111' I'11' I'D}: 1 3'3h331-E:6§:fiv "" 1b1”?,1,fl“fl“." " W1 , p::, ' 'fi 'H1E¥¥h% " 1h“'“1::'.1. ”' “M1,:H m1“ ”111 ,1W 1: ,1“ I. 1111? '::' ', 'I '“ "1::1:::| .121- " “11111" 111 “111‘ 111:1 WI, 1 1111111111: 1111 ::,. :,'“1:", 11'1 1:1!" .- -, 5:1" ..: ,3:,,,,',1"'!1'1I‘3I3, ,.. } , ,.33..: I.. 3.3 :1: :::1 3 .3 1:, 3,311, 1:1,,1, 11,1173; ':::3:;,'.,',,' ',,,-;'3.- W333 :,"I“l"~'“:”1 333 333 'l1I"',1'II1"'Q1‘ ": "'I' 11.' :I 1 - '1'1":'“ '."..'1'.'1'l1 " I ' ' ".$l'1‘?}:':"‘.".:‘ ' 3’2, :3: . '3: 1 1'1“: n.1,: .’ :11 - 1:.“ .:, :1h,1:':.:i11131\, I,:,3:,.:2-.‘.3, 31-1., 1.4. 3. xg-';:-: '.IN: I-- .... .2. . “ - "'w . I‘m:- ' - .2 :: ' “111' "J" "I'I' ' ' ‘ ' '“ "' 'I""I"'I"1'“11'1'1"'-"I7‘I1':'.".'“ I'I" "-1.13'11'9 ,11'1'7 ::"'11""1' ' .. .‘:: :11? . ' '1‘ ' '1 2 ' ' ' "I' ,'1‘:.:-:1'«t.:1;::;1§“<'"' 523' " -'1““'I' L113 :33 3 1 , 3 3 1 ' 33 ‘-é,;,g',,i3il':l3~': «:13 ' 3 11:,,1,31 3 31 ::3 : , 313 3:3'3 3 M,'1'l" (7341,.ggi 1"~‘:;1 @33L3 3'3 11,, 3 3 313 3:,3'33 33 333 ,,1',;')11II132",33|"'I1A'I':"::",'1,333,,16'fhi'11r'" 2:: . , : In: “,-2-.;13'.:' 3‘ I‘ I11, 1,1“. 1131-,- :.I w: 3,, . 3 :','""'1 ,. 3. -' ’ “' ' 3'.1:,"1 “'II'I'I'j'1""..""""' 3,3 ,I 1". 13,1213. :"1,3.'1_:31J3 Q ,1""-'J"' . - 1-? '3. 3 ‘.-' ,1 .“ -‘ I 1. 3 3.11. '3;3"2 3.3,: 3 :1-313 3, '31' ' I" '1111: r ' ,1; "'1 , 11,13,133:- 3,333333.3,333!,3,3 ' [(7.1% :1“ . A . 2 2 .~ --. 1:"? 3‘ ' ., 1" '“ 1‘ W I“: ,:., "“1: "'I" ‘ "" I' ' “I'- ’ " ' ' "" '1',"'|"1 1 311193 'l:,,11::: .31.! ,' “:,: : :11, 1“,, 33 31.333 ' 33 ,,, 3 3 2,:I 33 “1. :1.» '::',“,‘,‘I ,,,. : 333,3 113,13 3.3.2::1‘, '1;', 13:13:, .,3~3:,,, :33" “5:5,: .1, :31, 1" 5 '~' 1 " ‘ a" ' ' ' y . I""" " 1 5' h“ - : I '. "lli',':,'l “."':“.-'-3',.'-,' ':1'31,1"- ::‘3‘," ::: '1 1, I“ t, ,1 {1,1, ..-':' IV", 1:111“ (1:11 [,,13' 1 , .1' ,, , 1,, 3 ,1,1' .3'3: ,1 1: , 1. ,1'1' " “133,13, 13 H P“) v "' ' ‘ .""" 1'1 111:1: 1 1": '11'1'“ 1-1.“ '13:": "111'1'":::1'11 1II": .‘. ' II1'I.:'-: - 1.1“.1 '1 1111 .‘ , ,2.I:.“,“ .3-,3..1':3,-I, 11,1: :11,” 111,1, 3 : . - “‘“‘ ..'-;.i 1 4: :‘II: : ',13'1313,,"111 M“ . , ,I33,,., ,1 ,“3',3,3 ,3;3,, 1,1313, ,1 111,,3111,|,1: ,1 111111 ,1 3:, u»: i' ' 1| "-" ' 11111 :1: ,,:l' 3&3'111 “I?" LI' "' ' " "I '"'"1"":1.1:LL1':11'11.:::::1 I.1:1:::LIJ1:':L1"I" 13"1111'1. n- ...:.‘.... . 13.)): :: 1111:“ 1.11.1 :1"'11nm11h 191.:"'1':1: .:':'1'.':::'I m, Ill“ “M l \lllll ll lllllllll \\\\\l l/ “ “ 1293 10063 3779 This is to certify that the thesis entitled The Use of Objective and Subjective Weights to Model a Medical School Admissions Task presented by John B. Molidor has been accepted towards fulfillment of the requirements for Ph.D. Educational Psychology degree in Major professor Date May 1, 1978 OVERDUE FINES ARE 25¢ PER DAY PER ITEM Return to book drop to remove this checkout from your record. Jinks? 8 1992‘ $286 6-) Copyright by John B. Molidor 1979 THE USE OF OBJECTIVE AND SUBJECTIVE WEIGHTS T0 MODEL A MEDICAL SCHOOL ADMISSIONS TASK By John B. Molidor A DISSERTATION Submitted To Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personel Services and Educational Psychology 1979 ABSTRACT THE USE OF OBJECTIVE AND SUBJECTIVE HEIGHTS TO MODEL A MEDICAL SCHOOL ADMISSIONS TASK By John B. Molidor The purpose of this study was to model and compare how medical school admissions committee members say they weight information when making judgments regarding the acceptability of applicants with how mathematical representations weight the same information. Two data sets, one representative (correlated) and the other non- representative (orthogonal), were presented to fifteen admissions com- mittee members who volunteered to participate in this study. Each data set contained information on an applicant's GPA, MCAT scores, personal statement scores, and interview scores. The committee member's task was: 1) rate each of the applicants (40 total) on an acceptability scale and 2) report the subjective importance that was attached to each of the four predictor variables. This information was used to test the following hypotheses: l) No relation existed between objective and subjective weights; 2) A positive relation existed between actual judgments and judgments generated from objective weights; 3) A positive relation existed between actual judgments and judgments generated from subjective weights; 4) 5) John B. Molidor A positive relation existed between the judgments generated from both objective and subjective weights; There was a greater relation between actual judgments and objectively generated judgments than there was between actual judgments and subjectively generated judgments. Data were collected and analyzed using correlation techniques, multiple regression, paired t-tests, repeated measures one-way analysis of variance and post hgg_comparisons. Results showed that: l) 2) 3) 4) 5) A significant positive relation existed between ob- jective and subjective weights, for both data conditions; A significant positive relation existed between actual judgments and judgments generated from ob- jective weights, for both data conditions; A significant positive relation existed between actual judgments and judgments generated from sub- jective weights, for both data conditions; A significant positive relation existed between the judgments generated from both objective and sub- jective weights, for both data conditions; For the correlated data, there was not a signifi- cantly greater relation between actual judgments and objectively generated judgments than there was between actual judgments and subjectively generated judgments. However, for the orthogonal data, there was a significant John B. Molidor difference between the correlation of actual judg- ments with objectively generated judgments and the correlation of actual judgments with subjectively generated judgments. This study concluded that subjective weights were an effective weighting scheme in modeling how committee members said they utilized information when making judgments about the acceptability of medical school applicants. This conclusion resulted from many comparisons: from the weights themselves to the outcomes arrived at from these weights. Boundary conditions were established from two data sets: subjective weights were more effective for correlated data than for orthogonal data. Thus, subjective weights proved to be a valid measure to model a medical school admissions judgment task. Once the comparisons between objective and subjective weights were made, additional concerns arose centering on alternative weighting models. Therefore, four additional weighting schemes were examined: (l) unit weights, (2) random ratings, (3) average weights and (4) equal weights. Comparisons were made between these four weighting schemes and the objective and subjective weighting models. Analyses showed that: l) There were significant differences between the six models; 2) The differential weighting models (i.e., objective, subjective and average) accounted for significantly John B. Molidor more variance than did the unit weighting models (i.e., unit and equal); 3) There were no significant differences between the differential weighting models; 4) There were significant differences between the unit weighting models. From these results, it was concluded that the differential weighting models were more effective than the unit weighting models in predicting committee members' judgments. All weighting models were more effective under the correlated data condition than the orthogonal condition. These results point to the importance of examining the outcomes derived from using different weighting schemes rather than the weights themselves. Thus, under certain data con- ditions, different weights lead to similar outcomes. DEDICATED to Mary Lou Kennedy Molidor and Otto B. Molidor, my mother and father. ii ACKNOWLEDGEMENTS I would like to express my appreciation to the members of my dissertation committee, Dr. Stephen L. Yelon, Dr. Sarah A. Sprafka, Dr. John F. Vinsonhaler, and, in particular, to my dissertation chairman, Dr. Arthur S. Elstein, for the able guidance, professional direction, assistance and encouragement which was offered throughout this research. Special thanks are given to my family whose support, love, and encouragement made this study possible; to Jeanne Marie whose confi- dence, enthusiasm, and support sustained me in rough times; to OMERAD whose warm atmosphere aided in my growth and knoweldge; to Judy Carley for her typing of this dissertation; and to all who have made my stay at Michigan State University a most enjoyable, profitable, and memorable learning experience. iii TABLE OF CONTENTS LIST OF TABLES ......................... LIST OF FIGURES ........................ Chapter I. INTRODUCTION ....................... Need ......................... Purpose ....................... Research Questions .................. Theory ........................ Paradigms of Clinical Reasoning .......... Problem ...................... Lens Model and Multiple Regression Analysis II. REVIEW OF THE LITERATURE ................. Medical School Admissions ............... Judgment ....................... Policy Capturing .................. Modeling Admissions Tasks ............. Modeling Medical School Admissions Task ...... Subjective Weights ................. Summary ........................ III. DESIGN OF THE STUDY ................... Population and Sample ................. Stimulus Materials .................. - Data Sets ..................... Procedures ...................... Intra-Judge and Inter-Judge Reliability ...... Measures ....................... Hypotheses ...................... Analyses ....................... Summary ........................ iv Page vii ix IV. RESULTS AND DISCUSSION .................. Relation between Objective and Subjective Weights . . . Research Hypothesis ................ Statistical Hypothesis ............... Results ...................... Discussion ..................... Relation between Actual Judgments and Judgments Generated from Objective Weights ........... Research Hypothesis ................ Statistical Hypothesis ............... Results ...................... Discussion ..................... Relation between Actual Judgments and Judgments Generated from Subjective Weights ........... Research Hypothesis ................ Statistical Hypothesis ............... Results ...................... Discussion ..................... Relation between Predicted Judgments from Objective and Subjective Weights ................ Research Hypothesis ................ Statistical Hypothesis ............... Results ...................... Discussion ..................... Relation between Actual Judgments and Judgments Generated from Objective and Subjective Weights . . . . Research Hypothesis ................ Statistical Hypothesis ............... Results ...................... Discussion ..................... Four Additional Weighting Models ........... Unit Weights .................... Random Ratings ................... Average Weights .................. Equal Weights ................... Comparison of Models ................ Discussion ..................... Differential vs. Unit Weights ........... Differential Weighting Models ........... Unit vs. Equal Weights ............... Summary ........................ V. CONCLUSION ........................ Summary ........................ Limitations of the Study ............... Implications ..................... Recommendations for Future Research .......... T38 Page REFERENCES ........................... I41 APPENDICES A. Introduction, Instructions, and Correlated Data Set . . . 151 Introduction, Instructions, and Orthogonal Data Set . . . 159 Ratings Given to Applicants in Orthogonal Data Set . . . . 167 Subjective Importance Weights for Orthogonal Data Set . . 169 Ratings Given to Applicants in Correlated Data Set . . . . I71 Subjective Importance Weights for Correlated Data Set . . I73 CD'T'II'TTUOW Correlation Between Judges' Subjective Weights for Correlated Data Set ................... 175 H. Correlation Between Judges' Subjective Weights for Orthogonal Data Set ................... 177 vi —~—- ‘- #00000) LIST OF TABLES Correlation Matrix of Independent Variables - Correlated Data Set (N=30) ............... Correlation Matrix of Independent Variables - Orthogonal Data Set (N=30) ............... Means, Standard Deviations and Ranges of Independent Variables ................. Intra-Judge Reliability for 10 Replicated Cases Inter-Judge Reliability for the Correlated Data Set Inter-Judge Reliability for the Orthogonal Data Set Correlations Between Objective and Subjective Weights (EBSW) and Between Objective and Subjective Rank Order of importance ([5000) ................. Correlations Between Committee Members' Actual Judgments and Judgments Generated from Objective Weights (r¥ ‘ ) .............. sYobj Correlations Between Committee Members' Actual Judgments and Judgments Generated from Subjective Weights (EVSVsub) .............. Correlations Between Committee Members' Judgment Generated from both Objective and Subjective Weights (EVoijsub) ............. Correlations Comparing Committee Members' Objective Weighting Models with Subjective Weighting Models (EVSVobj with EVSVsub) ........ Correlations Between Committee Members' Actual Judgments and Judgments Generated from Unit Weights (—¥sVunit vii Page 50 51 52 54 56 57 74 81 86 92 96 103 Table Page 4.7 Correlations Between Committee Members' Actual Judgments and Randomly Generated Judgments (EVSYrand) . 105 4.8 Mean Ratings Given to Each Applicant .......... 107 4.9 Correlations Between Committee Members' Actual Judgments and Judgments Generated from Average Weights (EVSVaverage) ................. 108 4.lO Correlations Between Committee Members Actual Judgments and Judgments Generated from Equal Weights (EVSquual) .................. llO 4.ll Correlations Between Committee Members' Actual Judgments and Six Weighting Schemes .......... ll2 4.l2 Correlations Between Committee Members' Actual Judgments and Six Weighting Schemes .......... ll3 4.13 Repeated Measures One Way Analysis of Variance for Five Weighting Scheme Models for Correlated Data .................... ll4 4.14 Repeated Measures One Way Analysis of Variance for Five Weighting Scheme Models for Orthogonal Data .......................... ll4 4.l5 Tukey's Post Hgg_Comparisons Between Weighting Scheme Models ..................... ll6 viii LIST OF FIGURES Figure Page l.l The Lens Model ..................... 13 1.2 Right Hand Side of Lens Model ............. 15 1.3 Modified Lens Model .................. 17 4.1 Relation Between Subjective and Objective Weights . . . 71 4.2 Unit Weighting, Random Ratings, Average Weighting and Equal Weighting Models ............... 101 ix Yobj ) sub Yrand A average M81 A equal DEFINITION OF TERMS the actual judgments or ratings given to applicants predicted judgments derived from objective (regression) weights predicted judgments derived from subjective weights beta (objective) weights subjective importance weights subjective rank order of the four independent variables objective (regression) rank order of the four independent variables predicted judgments derived from unit weights unit weights judgments generated by randomly assigning a rating to an applicant predicted judgments derived from the average rating given to an applicant average objective weights predicted judgments derived from equal subjective weights CHAPTER I INTRODUCTION Every year medical school admissions committees are required not only to define quality but also to make judgments regarding the ac- ceptability of applicants based on a definition of quality. The task of defining quality is perplexing, for the term evokes many diverse thoughts. Morowitz (1976) wryly drew a parallel between the gifted scholar Phaedrus who went insane trying to define quality and admis- sions committees who must similarly try to define quality. This anxiety-provoking task may lead to schools employing different meanings of quality, ranging from the very narrow, specific, and well-defined to the broader, looser, and more general. The fact remains, though, that medical schools are accepting students based on some inherent definition of quality. Quality is often defined by examining certain admissions variables that are used to select applicants for medical school. For example, a school may believe that applicants who have high grade point averages (GPA) and Medical College Admissions Test (MCAT) scores will make quality physicians. This school might weight academic performance higher than it weights other selection variables, and so quality would be measured in terms of academic performance. Another school may feel that given a certain level of academic skills, applicants who have high interpersonal skills make quality physicians. Quality would then be measured by interpersonal skills. Obviously schools do not employ such clear cut dichotomies in their selection process, but the point is that certain admissions criteria reflect a school's definition of quality. Admissions committees are charged with the task of examining various admissions criteria, determining their importance, and making judgments based on these criteria. A committee's conception of quality is reflected in their judgments about the acceptability of individual applicants. Quality thus involves the selection and weighting of predictor variables in order to make judgments. figggl A Herculean task confronts admissions committees in their attempts to make judgments regarding the quality of medical school applicants. The need to examine quality is a pressing reality when one considers some of the pressures being brought to bear on the admissions process. Consider, for example, the pressures arising from the growing dis- parity between the number of applicants and the number of places available. In 1975-76, there were 45,000 applicants for 15,000 places. The number of qualified exceeds the number of places avail- able. An even more alarming figure is that these 45,000 applicants submitted over 350,000 applications (Dube and Johnson, 1976). Selecting qualified students given just the sheer number of applica- tions poses many logistical problems. Looking beyond the number of applicants, more problems await admissions committees. Pressures arise from: making medical schools representative of the socioeconomic and racial components of the gen- eral population; the increasing costs of selecting and educating medical students; the demands to meet society's health care needs; the consideration of the legal rights of applicants; and the need for predictive validity studies relating the selection criteria to physician performance. The task facing admissions committees is formidable indeed. Therefore, it is all the more reasonable to attempt to model how committee members weight admissions information in making judgments about the quality of their applicants. The use of different models would shed light on how information might be combined to reproduce committee members' weights and judgments. This lays the groundwork needed for further communication among judges by providing a common ground to discuss weights and how these weights can be used to gen- erate judgments. This communication is necessary for committee members to determine what they mean by quality and also for meaningful research to be done in the area of judgment and medical school admis- sions. Purpose The purpose of this study is to model and compare how admissions committee members say they weight information in making judgments regarding the acceptability of medical school applicants with how mathematical representations weight the same information in arriving at judgments. The research literature on judgment has shown that a judgment policy can be represented by a linear model. This policy capturing has used typically objective (e.g., derived, regression, statistical, mathematical, beta) weights. It is important to know whether a judgment policy can be represented by subjective weights based on judges' reports. When a judgment policy is represented both objectively and sub- jectively, the following research question can be considered: What is the relation between the objective and subjective modeling? To answer this question, the following performance measures will be examined: (1) the correlation between objective and subjective weights; (2) the correlation between actual judgments and judgments generated from objective weights; (3) the correlation between actual judgments and judgments generated from subjective weights; (4) the correlation between objectively generated judgments and subjectively generated judgments. The use of different performance measures allows the examination of the weights themselves and the outcomes or pre- dicted judgments arrived at from these weights. Thus, the relation between the objective and subjective models is explored in greater detail. The following steps are taken to achieve the purpose of this study: 1) To capture or represent judges' policies, subjectively; 2) To capture or represent judges' policies, mathematically; 3) To compare subjective weights with objective weights; 4) To compare actual judgments with judgments generated from objective weights; 5) To compare actual judgments with judgments generated from subjective weights; 6) To compare objectively generated judgments with subjectively generated judgments. Research Questions Since it is entirely possible for there to be discrepancies between objective and subjective weights, different performance measures are examined. Committee members' objective and sub- jective weights may differ yet yield predicted judgments that are correlated highly with their actual judgments. Thus, the comparison between policies depends on what criterion measures are chosen. Therefore, the following research questions are considered: 1) 2) 3) 4) 5) What is the relation between the statistical and subjective weights? What is the agreement between actual judgments and pre- dicted judgments arrived at through the use of objective weights? What is the agreement between actual judgments and pre- dicted judgments arrived at through the use of subjective weights? What is the agreement between objectively predicted judgments and subjectively predicted judgments? Is there greater agreement between actual judgments and objectively predicted judgments than there is between actual judgments and subjectively predicted judgments? These questions may be stated in the form of the following broad research hypotheses: 1) 2) 1 Statistical and subjective weights have no relation to each other; A positive correlation exists between actual judgments and judgments generated through the use of statistical weights; 1 The hypotheses are restated in testable form in Chapter 3. —_——_———'———_’— __,_,_..,._.,. 3) A positive correlation exists between actual judgments and judgments generated through the use of subjective weights; 4) A positive correlation exists between objectively predicted judgments and subjectively predicted judgments; 5) There is a greater correlation between actual judgments and the objectively predicted judgments than there is between actual judgments and the subjectively predicted judgments. 11391 The age-old saying that beauty is in the eye of the beholder applies to admissions committees' conceptions of quality. Medical schools not only use different meanings of quality but also individual committee members within a school employ different meanings. In talking with admissions committee members an impression is given that each one knows how to select applicants. Some may tell of a feeling they have, others may tell of a formula they employ. Definitions of quality range from the emotional to the scientific. Committee members know or think they know how they weight information when making judgments regarding the acceptability of medical school applicants. Paradigms of Clinical Reasoning The means that are available for examining the issue of quality and how people make and think they make judgments emerge in part from the extensive psychological research in the areas of clinical judgment and decision making. This research has attempted to identify relevant invariants of human information processing. For example, research has been directed toward ascertaining memory capabilities, how judges weight information in importance, how negative information is processed, and how information is encoded. Basically, researchers have been concerned with how to model or characterize judgments or decisions of clinicians. This modeling has attempted to explain how clinicians use information to reach judgments or decisions. This area of research has been called variously problem solving, decision making, thinking, reasoning, policy capturing, process tracing, and judgment. The casual or loose employment of these terms has led to some confusion. To help alleviate this confusion, it is helpful to conceptualize this research within three major paradigms: (l) decision making, (2) problem solving, and (3) judgment (Slovic and Lichtenstein, 1971; Shulman and Elstein, 1975; Slovic et a1. 1977; Bordage et a1. 1977). Each of these paradigms addresses specific questions and areas of interest. They provide the necessary framework and guidelines needed to focus research. The decision making paradigm is concerned with how one selects a Specific action from a set of alternative actions. For example, applicants must decide which schools to apply to; admissions committees must decide whom to reject or invite to interview; interviewers must decide what to ask next in the interview; or admissions committees must determine who will comprise the entering class. In each example the decision maker is working with incomplete information. There is an uncertainty factor. Probabilities are associated with the incoming pieces of information as well as the success or failure of the final action. The decision maker examines the different alternatives and then decides upon the final course of action. The major goal of this paradigm is to determine the ideal way to make decisions or to assess how naturally made decisions depart from the ideal. Analyses are directed to prescribing how one ought to go about making decisions. The work of Edwards (1968), Kahneman and Tversky (1973), Raiffa (1968), and Fryback (1974) characterizes this research paradigm. The problem solving approach views man as a processor of informa- tion operating under the constraints of limited processing capacities. This paradigm looks at the steps or sequences that are needed to achieve some goal, given some starting point. These steps lead to understanding the task environment, the problem solver's repre- sentation of the task environment, short- and long-term memory capabilities, and the strategies employed in the solution of a given problem. Once these components are clearly understood, they are often simulated by elaborate computer programs which attempt to reproduce the problem solver's sequences of behavior. The following example is an illustration of this paradigm: admissions folders are given to committee members who are asked to "think aloud" as they made decisions about the acceptability of various applicants. After a sufficient number of folders had been reviewed, one obtains a description of each individual's problem-solving processes. These descriptions are encoded into computer programs to simulate the problem-solving behaviors of each committee member. These programs are compared with the actual behavior of the problem solver. If the theory is adequate, there are no detectable meaningful differences between the simulation and the actual behavior. Thus, the problem-solving approach attempts to describe and explain the behaviors of the problem solver. It is not a prescriptive model. The work of de Groot (1965), Kleinmuntz (1968), Newell and Simon (1972), and Elstein et al. (1976, 1978) is representative of this paradigm. The judgment paradigm grew out of a mistrust of the use of self-report data and introspection (as exhibited in the problem- .solving approach). It examines how a judge puts together information to make a judgment. The concern is not with how a judge ought to use the information but rather with how the information is used. One looks at the relative weight (or importance) of each piece of information as perceived by the judge. This approach attempts to model an act of judgment. As an example, consider admissions committee members who are given a set of application folders and whose task is to rate each applicant on some scale of acceptability. The predictor variables used to make these ratings might be grade point average, MCAT scores, and interview scores. The weights of each predictor variables for each committee member are captured (Naylor and Wherry, 1965) or represented (Hoffman, 1960) by treating the ratings as the dependent variables and the predictor variables as the independent variables in a multiple regression analysis. The psychological processes of committee members in making judgments are not described by these weights, but the regression weights are paramorphic representations‘ of committee members' judgments (Hoffman, 1960). That is, a model of a judge performs like the judge, but there is not a one-to-one correspondence with the internal process of judgment. This approach 10 is used not only to model an act of judgment but also can be used to prescribe how judges might change their weighting scheme (hence, change their judgments). Studies by Hoffman (1960), Goldberg (1970), Dawes (1972), and Hammond et a1. (1977) are characteristic of this paradigm. Problem In a comprehensive review of this research, Slovic and Lichtenstein (1971) note that each of these three paradigms has become quite specialized and has taken paths that have little or no contact with each other. They recommend an integration of research efforts. Shulman and Elstein (1975), in their review of this reserach, show that researchers are starting to integrate their research with other areas. They cite the work of Tversky and Kahneman (1971, 1973, 1974), Brehmer (1974), Dawes and Corrigan (1974), and Sprafka and Elstein (1974) as examples. Shulman and Elstein (1975) state, "... mathe- matical, prescriptive decision theories appear to be moving toward greater simplicity as they focus on the task of information-processing theory: to provide an account of how people actually think and reach decisions, not how they ought_to." The work of Cook and Stewart (1975) and Schmitt and Levine (1977) is of particular interest for this study because these researchers have followed this movement toward integration and simplicity of paradigms. Their work has focussed on the use of both subjective and objective weights in making judgments or decisions. Thus, they' utilize information gained from the problem-solving and judgment paradigms. 11 However, research has shown that judges cannot estimate accurately their combination and weighting rules (Slovic and Lichtenstein, 1971). Serious discrepancies often exist between judges' subjective and objective weighting schemes. Bootstrapping, a phenomenon in which simulated judgments may be better than actual judgments in predicting some criterion, is cited as evidence that judges cannot describe ac- curately their weighting schemes. For if judges knew their rules (i.e. weighting schemes), how could a formula improve on their judgments? Yet, additional research (Newell and Simon, 1972; Shulman and Elstein, 1975) has shown that judges can tell you what they are doing. This study uses the judgment paradigm in its study of how admis- sions committee members weight information to make judgments. The conceptual framework is provided by Brunswik's lens model (1956) modified by Hammond, Hursch, and Todd (1964) and Hammond (1966) with the analysis based on the application of multiple regression tech- niques. The importance of studying judgment in this framework rests on the fact that the emphasis is placed on the manner in which judges code and quantify information, not on the relative accuracy of the statistician over the clinician. As Dawes (1977) puts it "... there have been a plethora of additional studies showing that the actuarial approach is superior, and the issue is now--or should be--fairly well settled." The focus of this research is on judges and how they weight information to reach judgments. Lens Model and Multiple Regression Analysis The lens model grew out of the work of Egon Brunswik, a German psychologist and philosopher who was interested in the psychology of 12 perception. In this framework a person perceives a set of cues which are combined to form a perception. One apprehends the cues and infers rapidly what lies beyond the cues. An object is not seen directly, but rather cues are seen. These cues are integrated to form judgments. The relationship between a person and his environment (which is probabilistic) is the object of study in the lens model. The important elements of the lens model are objects, cues, judgments, and the relation between any of these elements. The con- cern is with how a person interprets cues. The lens model represents the relationship between a perceiver (or judge) and the objects of perception (or judgments) as mediated by cues whose relationship to both the perceiver and the object is probabilistic (Elstein et a1. 1978). This relationship is depicted in Figure 1.1. As can be seen in this figure, X1 to Xk are the cues or independent variables which are used to make judgments (Y5). r_X1X2 represents the correlation between cues l and 2 while Exin represents the correlation between cue l and the subject's judgments. The criterion values are represented by Ye while rlee is the correlation between cue l and the criterion value. Relations between cues, criteria and judgments are expressed as correlation coefficients. For example, each cue (Xk) is related to both the criterion (Ye) and to the judge's response (YS). Since committee members are making judgments on applicants for the first time, there is no criterion information available. Thus, the focus of this study is on the right hand side of the lens model (between - cues and judgments). In addition, instead of having only one predicted outcome, there are two predicted outcomes resulting from 13 CUES X ,- r—XlYe > r-x1x2 1x1vs\ CRITERION r X2 l r SUBJECT'S ____—X2Ye —X2Ys__1 VALUES = Ye JUDGMENTS = Y,5 ’ \r ' r _kae\ . —Xst XK / Figure 1.1 THE LENS MODEL 14 using committee members' objective and subjective weights (Figure 1.2). As seen in this figure, a committee member's actual judgments (Ys) are compared with both the judgments predicted through the use of statistical weights (Tobi) and the judgments pre- dicted through the use of subjective weights (Vsub). Predicted judgments are obtained in the following manner: 1) Using statistical weights, Yobj = § Bi xi’ i=1 where Bi = beta coefficients Xi = predictor variables 2) Using subjective weights, Ysub = E SW1 Xi’ i=1 where SW1 = subjective weights X. = predictor variables 1 The use of multiple regression techniques in the lens model is fairly straightforward. The relevance of each cue to each judgment is represented by the correlation between cues and judgments (erYs)' This relation is known as the utilization coefficient (Hammond et a1. 1964). Once the utilization coefficient and the correlations between the individual cues are known, one model for capturing a committee member's objective policy involves an additive linear combination of cues. The following equation illustrates a judge's objective weighting strategy or policy: Y obj = 81 (GPA) T B2 (MCAT) + 83 (Personal Statement) + Bu (Interview Score) 15 CUES X1 Predicted Judgments Using Statistical Weights = Y b' x 03 2 ‘ \’ <\ Predicted Judgments Xk Using Subjective Weights = Ysub Figure 1.2 RIGHT HAND SIDE OF LENS MODEL 16 The beta coefficients, Bis’ provide the objective measure of how important each cue (Xk) is for each committee member. The sub- jective weights, SW1, also provide this measure of importance. In this study three correlation coefficients are of importance: (1) erVobj’ (2) erVsub’ and (3) [Voijsub (Figure 1.3). The first correlation, E¥slobj’ refers to the relation between actual judgments and the predicted judgments obtained through the use of statistical weights. Squaring this coefficient indicates how well actual judgments can be predicted by a weighted linear combination of cues. This is a measure of how well a judge's policy is captured objectively. Hammond and Summers (1972) refer to this correlation, EVSVObj’ as measuring cognitive control, that is, the extent to which a judge controls the execution of his knowledge. The second correlation, EVsVsub’ is the relation between actual judgments and the predicted judgments obtained through the use of subjective weights. The use of these weights refer to the capturing of a judge's policy, subjectively. Squaring this coefficient also indicates how well actual judgments are predicted by a linear combination of cues. The main difference between £¥sVobj and 3¥sVsub is that one correlation uses regression weights while the other uses subjective weights. Therefore, it seems reasonable to assume that :YsVsub can also be thought of as a measure of cognitive control. The third correlation, EVoijsub’ concerns the relation between the predicted judgments obtained through the use of both statistical and subjective weights. A high correlation between these judgments indicates that predicted judgments generated from statistical weights 17 CUES X1 Committee Member's Actual Judgment = Ys x2 . rvsvobj \ “~: Predicted Judgments . E. . . ,.% Using Statistical Weights = Yobj r:::::: YSYsub ;':~ - 1 r-Yoijsub Xk '3, Predicted Judgments , Using Subjective Weights = Ysub NOTE: erVobj and rWsVsub represent cognitive control. Figure 1.3 MODIFIED LENS MODEL 18 are in high agreement with predicted judgments generated from sub- jective weights. These three correlations allow the comparison of actual judgments, predicted judgments obtained from statistical weights, and predicted judgments obtained from subjective weights. This modification of the lens model provides the conceptual frame- work to represent a judge's policy, both subjectively and objectively. To paraphrase Slovic and Lichtenstein (1971), subjective and objective linear models are capable of: (l) highlighting individual differences and misuse of information, (2) making explicit the causes of under- lying disagreement among judges, and (3) providing alternative means to describe how a person makes judgments. Some of these alternative means are examined in this study to gain a better understanding of how to best represent judges' policies. CHAPTER II REVIEW OF THE LITERATURE Problems concerning medical school admissions confront medical educators and administrators every year. Although these problems arise from numerous sources, they seem to point to the importance of exercising sound judgment in the selection of medical school applicants. Yet when the literature on medical school admissions is examined, there is little or no convergence with the literature on judgment and decision making. This literature review attempts to unite two general areas of research: (1) medical school admissions and (2) judgment. The review briefly traces the history of medical school admissions in America from colonial times to present and examines the judgment research that can impact on some of the problems facing medical school admis- sions. It will be shown that the time is opportune for research on judgment and research on medical school admissions to interact. Medical School Admissions Through the history of this country, medical school admissions procedures have changed drastically from a modest beginning of almost no requirements to the strict requirements of the present day. In this progress, though, medical school admissions have become a hotbed of controversy. Insight into these problems can be gained by looking at some of the roots of medical school admissions in the United States. Prior to the 1760's none of the colonies had a medical school, (Bordley and Harvey, 1976). Medical education consisted of going to Europe or becoming an apprentice to a practicing physician. These 19 20 practitioners may have had a formal education but there was no assurance. All one needed was a doctor to whom one could be appren- ticed. In some instances a clergyman or a better educated farmer became the town doctor if the town was lacking a practicing physician. Becoming a doctor was quite easy. Admissions consisted of who you knew or what you knew. For if one could not go to Europe or be- come an apprentice, one could take correspondence courses or just read a few books and call oneself a doctor. Admissions requirements were virtually non-existent at this time. As America grew out of its infancy, the need to upgrade the medical profession was seen by many. John Morgan (1735-1789), who with William Shippen, Jr. founded the Medical College of Philadelphia, wrote "A Discourse upon the Institution of Medical Schools in America" in 1765. (The medical departments of the College of Philadelphia and of Kings College in New York City, founded in 1765 and 1768 respectively, were the only two medical schools of this period and thus formed the nexus of medical education.) John Morgan proposed that candidates applying for admission should have had (1) an apprenticeship, usually three years with a "reputable" physician, (2) an education in liberal arts, mathematics, and natural history, and (3) a working knowledge of Latin; French was also recommended (cf. Bordley and Harvey, 1976). It was at Philadelphia that admission requirements were instituted for the first time in a form that would guide the other medical schools in their selection of applicants. These requirements remained in effect until the Revolutionary War when a physician shortage was experienced. Schools were forced to lower their standards to turn out more physicians. They now required: 21 (l) a two year apprenticeship instead of three and (2) no specific educational experience. Even with lower requirements, there were still many who did not bother to attend medical school. They called themselves doctors and practiced medicine without any formal schooling. In the early nineteenth century, proprietary medical schools came into existence in America. Professors were paid directly by the stu- dents. In many instances it was to the professors' benefit to have lower admission standards, for more students meant more money. Many professors, who held no real claim to the title of professor, fought the raising of standards for medical schools. The period from 1800-1850 saw the deterioration of admissions standards and the continuation of proprietary schools and apprentice- ships. America continued to grow and new medical schools opened to accommodate the need for physicians. The desire to produce physicians to meet society's needs resulted in admission requirements being ig- nored. Once again voices were raised in protest complaining about the inadequacies of America's system of medical education. In 1847, the American Medical Association (AMA) was founded with one of its objectives being the reform of medical education. Most schools, though, ignored any attempt to upgrade their standards. In fact, when Charles W. Eliot became president of Harvard University in 1869, he said: "There were no requirements for admission to our medical schools. To secure admission a young man had nothing to do but to register his name and pay a fee ..." (Sabin, 1934, cf. Bordley and Harvey, 1976). In 1876, the Association of American Medical Colleges (AAMC) was formed to promote educational standards. Its impact on admissions 22 was not felt until about 1900 when the AAMC felt confident enough to require that the member medical schools admit only students who had at least a high school diploma or had passed an examination on the subjects taught in high school. This requirement resulted in a major advance in the standardization of the admissions process. Johns Hopkins University, which had led many revolutions in the history of medical education, also left its mark on the admissions process. Most importantly: (l) a baccalaureate degree, or its equivalent, with emphasis on preliminary education in the sciences and modern languages was required for admission to medical school and (2) both men and women students were accepted (Bordley and Harvey, 1976). Thus, admissions requirements reached their highest level of sophistication in American history. Another important milestone in the history of the admissions process occurred when Abraham Flexner wrote a report entitled "Medical Education in the United States and Canada." The Flexner report (Flexner, 1910) and the AMA set into motion such reform that 76 medical schools went out of existence. Many schools raised their requirements so that applicants had to have at least two years of college. More required a bachelor's degree. Admission requirements were steadily increasing. By 1925 most medical schools were using Johns Hopkins requirements. These requirements would remain in effect until about World War II. In the mid-1940's, the AAMC and the Graduate Record Office (GRO) created a Professional School Aptitude Test (Erdmann et a1., 1971; Nash, 1977). This was the first required standardized entrance exam for medical schools. The GRO later merged with the Educational 23 Testing Service and the Medical College Admissions Test (MCAT) was created. The MCAT along with a bachelor's degree or at least three years of college in the premedical sciences now formed the basis for the admissions requirements. Post-World War II also saw an increased number of applicants to medical school. Whereas in 1929-30, 76 schools could accept approxi- mately 6,000 students out of 13,000 applicants, in 1949-50, 79 schools could accept 7,000 students out of 24,000 applicants. After this spate of applications, the ratio of number of applicants to places available returned to about two to one. It is currently about three to one (Dube and Johnson, 1976). The problem was that schools were turning away qualified applicants. They were under increasing pressure from society, the government, the courts, and applicants. These pressures created the need to examine the admissions process with the hope of improving the existing one or implementing a new one. Current admissions processes rely on GPA's, MCAT scores, letters or recommendation, interviews, extracurricular activities, and personal (autobiographical) statements to make their decisions. Although there are movements to examine new areas (e.g. non-cognitive domains, see Kegel-Flom, 1975; Korman et a1. 1968; Krupka et a1. 1977), these variables form the foundation upon which decisions are based. What should be borne in mind, though, is that when one looks at these variables one is usually assuming that the applicant has been to college. The standards of admissions have become quite exacting. For example, the state of Michigan has set the following requirements: (1) 6 semester hours of the biological sciences, of which 3 must be of laboratory work, (2) 8 semester hours of chemistry, (3) 6 semester 24 hours of physics, (4) 6 semester hours of English composition and literature, (5) 6 hours of psychology and/or sociology, (6) 18 hours in nonscience areas, and (7) completion of 60 total semester hours exclusive of physical education and military science. Michigan State University's College of Human Medicine requires in addition that applicants take the MCAT and write a personal statement telling why the applicant is interested in medicine and why the applicant is applying to this particular school. One can see that as medical schools grew in stature and prestige, admissions requirements also had to keep pace. However, in this movement, new problems arose. One obvious problem was how to deal with increasing applicant pools. The number of places available did not keep pace with the number of applicants. This necessitated various screening processes: premedical requirements were established; grades and MCAT scores became important admissions criteria; and new admis- sions variables were examined. New ideas concerning medical school admissions had to be developed. A corollary of the above problem concerned rejected applicants. In colonial times, there was no concern for the rejected applicant because there were none. At present, rejected applicants represent a large loss to society. Becker et a1. (1973) followed applicants who were not accepted to medical school and found that fifty—two percent (52%) of these unsuccessful applicants were lost to the health care field. These applicants constitute a unique manpower pool with important academic and social characteristics. These concerns must be dealt with in the admissions process. 25 The report on the Council of Deans "Ad Hoc Committee to Consider Medical School Admissions Problems", 1972 noted: The current situation presents a series of challenges to the medical schools: 1) 2) 4) 5) To process applicants efficiently so that this function is not an undue drain on the institution's resources; To process applications in a fair and equitable manner which ensures each applicant a full opportunity to have his credentials reviewed; To select from the qualified applicants those who are most likely to contribute to the fulfillment of the objective of the educational program of the institution; To minimize the financial, academic, and emotional cost to the applicant; To assist potential applicants with a realistic assessment of their potential for success in gaining admission to medical school. The committee made the following recommendations: 1) 2) 3) 4) 5) 6) Define objectives; Articulate and publish selection factors; Carefully select and educate committee members; Establish a uniform acceptance date; Notify applicants promptly of your decision; Design admissions policies in accord with public trust. Char et a1. (1975) conducted a survey of medical schools finding (1) there is a general feeling of dissatisfaction with the admissions 26 process, especially in the area of assessing personality traits and selecting for clinical competence; (2) there is an appreciation of inequities in selection; (3) schools uniformly rely on three parameters-- GPA, MCAT scores, and personal interviews; and (4) there is a diver- gence of views on the usefulness of interviews. Dissatisfaction with the process may arise from a host of reasons: (1) committee members may not know what they are supposed to do; (2) the philosophies of the school and of members of the committee may differ; (3) members do not agree with the criteria being used to select students; (4) members may want more or less input into the process; or (5) the criteria themselves may be invalid. For example, Wingard and Williamson (1973), reviewing the literature from 1955-1972, found little or no correlation between undergraduate grades and subse- quent career performance. Rhoads et a1. (1974) showed that there is little objective evidence available to make accurate predictions about student performance in the clinical courses. These authors believed that given certain standards of intelligence, premedical preparation, MCAT performance, acceptable recommendations, and reasonable range of activities, motivation will determine medical school performance. Their data also showed an interesting finding: about half of the stu- dents who excelled in the basic science portion of the curriculum did so in the clinical portion, while roughly seventy percent of the stu- dents who excelled in the clinical sciences had not done so in the basic science area. When these two groups who excelled were looked at with regard to admissions data, there were minimum differences. In other words, admissions data could not determine who would do well in the clinical years. 27 From this review of the history of admissions, the following con- clusions are drawn: 1) 3) 6) 7) Admissions has changed drastically throughout the history of this country from minimum requirements to elaborate procedures; Current admissions policies typically examine GPA, MCAT scores and personal interviews with adjustments made for autobiographical (personal) statements, letters of evaluation and extracurricular activities; The processing of these variables for such a diverse applicant pool has placed a severe strain on admis- sions committees; Some problems surrounding admissions includes: identifying, measuring and evaluating important admis- sions criteria; processing applicants efficiently; selecting applicants who are most suited to one's programs; minimizing the financial, academic and emotional costs of the process; and assisting rejected applicants in assessing their career goals; Solutions to these problems must be able to articulate, define and measure relevant admissions criteria reliably and validly and to gain acceptance by admissions committee members; Insight into possible solutions may arise from the research on judgment; An important first step is to model how admissions committee members weight and say they weight 28 admissions variables when making judgments about the quality of medical school applicants. Admissions provides a rich content area while the judgment paradigm supplies the needed methodology and conceptual framework. Judgment The psychology of judgment grew out of the work in the area of psychophysics and was concerned initially with determining sensory thresholds in order to study perception. This research typically turned up the following errors: series effects, anchor effects, and errors of central tendency. Johnson (1972) pointed out that these constant errors were, at first, treated as distorting influences to be elimi- nated by a more careful methodology. To some researchers these errors were to be investigated further. When the emphasis shifted from attempts to eliminate these effects to attempts to understand them, the psychology of judgment began to take shape (Johnson, 1972). In this process the term judgment took on different meanings in certain situations. For example, the word rating_appears when judg- ments are made on a scale of numbers; the term evaluation is used when judgments of value are made; decision is used when judgments are to be made using discrete categories; and preference refers to judgments about personal taste. The general usage of the term judgment is quite loose. Johnson (1972) clarified the term when he referred to judgment as the assignment of an object to a small number of specified cate- gories. Its function is to settle an uncertain state of affairs, and its critical dimensions are determined by the situation in which judg- ment occurs. "Judgment begins with unordered objects, events or per- sons, assigns them to specified response categories so as to maximize 29 the correspondence between the responses and the critical dimension of the stimulus objects, and thus ends with a more orderly situation," (Johnson, 1972). Newell (1968) has defined judgment as a cognitive process with the following characteristics: 1) 2) 4) 5) 6) The main inputs to the process--that which is to be judged--are given and available; obtaining, discovering, or formulating them is not part of judgment; The domain of the output--the set of admissible responses--is simple and well defined prior to the judgment. The response itself is variously called a selection, estimation, assertion, evaluation or classification, depending on the nature of the domain; The process is not a simple transduction of informa- tion. Judgment adds information to the output; The process is not simply a calculation, or the application of a given rule; The process concludes, or occurs at the conclusion of, a more extended process; The process is rather immediate, not being extended in time with phases, stages, subprocesses, etc.; The process is to be distinguished from searching, discovering, or creating, on the one hand; and from musing, browsing, or idly observing, on the other. Poligy Capturing Prior to 1960, there was little or no research on how information was processed in order to reach a judgment. That is, research on 3O judgment as defined by either Johnson or Newell was sparse. Dawes and Corrigan (1974) cited Benjamin Franklin's letter to his friend Priestly which described Franklin's method for processing information to reach a judgment. This method listed pros and cons, along with appropriate weights which were then summed down both columns. This approach was called "moral or prudential algebra". Shulman and Elstein (1975) cited the study by Wallace (1923) that used linear equations to model corn judges. A relatively small number of variables accounted for the vari- ance in a corn judge's judgments. Each of the judges' models was com- pared with a model of the environment. The comparison is similar to the left hand side of the lens model. This study predated the research that was to come in the area of judgment. It was not until 1960, when Hoffman suggested the use of multiple regression equations to model the judgment policies of clinical psy- chologists, that statistical models of how judges weight and combine information came into prevalent use. Following this paper a large number of studies were carried out in different task environments. These tasks involved the modeling of judgments of clinical psychol- ogists (Goldberg, 1971), stockbrokers (Slovic, 1969), radiologists (Hoffman et al., 1968), draft boards (Gregory and Dawes, 1972), and admissions committees (Dawes, 1971; Goldberg, 1977; Dawes, 1977). In these studies the researchers were concerned with representing the objective weighting policy of each judge through the use of a linear model. The linear model predicted outcome judgments fairly accurately as well as making explicit each judge's weighting policy. Slovic and 31 Lichtenstein (1971)1 in a comprehensive review of this literature stated that the linear model was a powerful device for predicting quantitative judgments made on the basis of specific cues. It was capable of highlighting individual differences and misuse of informa- tion as well as making explicit the causes of underlying disagreements among judges in both simple and complex tasks. Both Newell (1968) and Johnson (1972) have analyzed the major questions that investigators have asked about judgment. For Johnson, since most judgments are complex (and objects varyiriseveral dimensions), the important questions are: 1) What are the dimensions that influence judgment? 2) How much weight does the judge give to each dimension? 3) How are these effects combined by judges? Newell enumerated the major scientific questions asked about judgment: 1) Upon what information is the judgment based? 2) What is the judgmental law? 3) What is the psychological process or processes which make possible the lawfully operating judgment? 4) What are the other conditions that influence the judg- ment and how do they work? Why don't humans make optimal judgments? Can a machine or algorithm make judgments as well as humans? These questions served as guideposts to direct the next stages of the review of the judgment literature. Specifically, the following- 1See also Shulman and Elstein (1975); Slovic et a1. (1977); and Bordage et a1. (1977) for excellent reviews of literature in the area of judgment. 32 three areas were focused upon: (1) modeling admissions tasks; (2) modeling medical school admissions tasks; and (3) modeling judgment tasks with subjective weights. These areas shaped this study. Modeling Admissions Tasks Three studies impacting on the admissions process in general came from the work of Dawes (1971, 1977) and Goldberg (1977). Dawes (1971) focused on how admissions criteria were combined by the members of a graduate school admissions committee to predict an applicant's success in graduate school. Three admissions variables were analyzed: (1) overall undergraduate grade point average (GPA), (2) an index of the quality of the undergraduate institution (01), and (3) the total raw score of the Graduate Record Examination (GRE). Applicants were rated on a six-point scale. The results showed that a linear model could account for 78% of the variance of a committees' ratings and that 55% of the applicants the admissions committee considered could have been screened out by an equation without rejecting a single individual whom the admissions committee actually admitted. This screening could result in an estimated savings of approximately $18 million per year. These conclusions had important implications for researchers in the area of admissions: l) A simple linear combination of the criteria of the admissions committee did a better job of predicting performance in graduate school than did the admis- sions committee itself; 2) The behavior of the admissions committee could be simulated by a linear combination of the criteria considered; 33 3) Under certain conditions, the paramorphic repre- sentation of the judge might be more valid than the judge himself. This principle, it will be recalled, is known as bootstrapping (Goldberg, 1970). Dawes' research is a classic example of using admissions as a content area and the judgment paradigm to provide the needed method- ology. A major difference between his research and this study is the use of different linear models to capture a judge's policy. This study includes both objective and subjective models. Goldberg (1977) modified the regression equation developed by Dawes so that the probability of receiving an invitation to graduate school depended on the applicant's GPA and GRE scores. The 01 index was dropped. The analyses showed that this new equation ranked the appli- cants in a similar manner as the old equation. Goldberg recommended that this equation be made available to all applicants. He also argued for the development of a centralized national application system. The cost to the applicant and institutions both emotionally and financially remains high, and this movement could benefit both the applicant and institution immeasurably.1 Goldberg hoped that his report would stimulate reports of similar analyses from other institutions. He also hoped that his report would provoke some further thought on the fundamental issues relating to the graduate admissions process in psychology. 1Such an application service exists for the medical schools. It is called the American Medical Colleges Application Service (AMCAS). 34 Dawes (1977) touched on some of these fundamental issues when he examined case-by-case versus rule-generated procedures for the alloca- tion of scarce resources. He argued that rule-generated procedures are superior to case-by—case procedures. Dawes' paper delineates the advantages of each procedure. The advantages of a case-by-case approach are that: 1) 2) Some meritorious variables not considered in a rule would occur to the decision maker only after looking at a particular case; There is a possibility that a weighting system may appear to be inadequate or misleading only after knowledge of the distribution properties of the variables to be weighted; The decision maker cannot be held accountable later for errors or for explicit policies that offend someone else's sensibilities. The advantages of a rule-generated approach are that: 1) 2) 3) Since the variables are defined a priori means that other variables cannot be used to bias a decision in one direction or another for political or personal reasons; Members must decide upon a weighting or combination system that is to become an important institutional responsibility; Committee members are held accountable for their decisions; 35 4) It is uniform (all people are judged by the same standards). Dawes viewed this uniformity as a moral virtue. After examining these advantages, Dawes cited two examples of rule-based allocation. The first example was the graduate admissions study, and the second example concerned the allocation of NDEA and NSF fellowships to various departments. Rather than have the various de- partments argue over who should get what, certain rules were instituted to allocate the fellowships. Various ratios were experimented with until all fellowships were allocated. Dawes placed his finger on the heart of the matter when he observed the negative reactions many people experience in dealing with rule-governed or rule-based procedures. However, he admonished the reader by stating that to conclude that unsystematic decision making is superior to rule-based is to argue from a vacuum. The solution to injustice lies in changing unnecessary, unfair, or idiotic rules. The commitment to rule-based procedures involves the capacity to achieve satisfaction and joy from the general improvement of a social situation. The causative role in benefiting particular individuals is less evident than it would be in a case-by- case decision making situation (selected or paraphrased from Dawes, 1977). Modeling Medical School Admissions Task The studies done on the medical admissions process are very similar to the three studies on the graduate school admissions process. In both graduate and medical school admissions, resources are scarce, the number of applicants is greater than the number of places available, competition is fierce, and applicants are showing a wider range of 36 qualifications. This research may be divided into two general cate- gories: (1) those which predict success in medical school and (2) those which predict committee members' judgments. The studies of Ambrosino and Brading (1973), Hunka (1964), Mattson (1969), Milstein et a1. (1976), Schofield (1970), and Simon et a1. (1975) are representative of research attempting to predict success in medical school. These studies typically find that regression equations can predict quite well who would succeed in the first two years of medical school but that accuracy of prediction declines as an applicant pro- gresses through school. As in other judgment studies, a relatively small number of variables can account for most of the variance. For example, Simon et a1. (1975) compared 23 medical students from socio-economically disadvantaged backgrounds with 21 regularly admitted medical students with respect to MCAT scores, GPA's, college ratings, Part I on the National Boards, and performance in two clerkships. These two groups of students differed markedly on admission data. At the end of the second year, average National Board Part I scores identified two distinct populations, but the average scores of both groups were clearly above minimum passing level. This study has interesting implications, because if existing admissions criteria had been employed, only one of the groups would have been admitted. As it turned out, though, the group from disadvantaged backgrounds was per- forming at the average or above average level. Schofield (1970) showed that there was no significant difference in the achievement of students selected by full committee deliberation and those selected by a multiple regression equation (actuarial rankings). The author recommended that admissions committees' time might be better spent looking at and 37 judging borderline and/or special cases that are not differentiated meaningfully by an actuarial process. These studies and others (Funkenstein, 1965; Howell and Vincent, 1967; Matarazzo and Goldstein, 1972; Turner et a1. 1974) have shown the existing admissions criteria predict success in medical in the pre- clinical phases but correlate poorly with clinical performance. Faced with these findings it is easy to see why committee members may experi- ence dissatisfaction with the admissions process. Funkenstein (1970) felt that medical schools must select students on the basis of excel- lence. Different tracks should be set up to allow for the teaching and training of different physicians. Each track would have a repre- sentative subcommittee which selects its applicants. Half of the entering class would be made up of minorities and individuals who have chosen a specialty or general practice, while the other half would be made up of superior applicants and those chosen by a lottery. Models of admissions judgment tasks have not typically examined how committee members make judgments. Little emphasis is placed on representing the judgment process and even less emphasis is given to representing how committee members say they are making judgments. Little research has been in this area. A few studies of committee members' judgments have been done by Ambrosino and Brading (1973), Best et al. (1971) and Padgett et al. (1976). Best and his co-workers showed how preliminary prediction equations have been used to help implement the admissions process for several years at Illinois. The use of these equations facilitated communication and comparison about various candidates. Also, these equations were used to predict who would succeed in the first year of 38 medical school. As with other equations, the predictive powers weakened as the student progressed through medical school. Ambrosino and Brading (1973) used similar techniques when they employed stepwise regression procedures to predict applicant averages. These averages were used as measures of whom to interview. These regression procedures had a high success rate. Padgett et al. (1976) used a matching system at the University of Texas. This system met stringent admissions, economic and behavioral objectives which resulted in an effective cost-benefit system. However, such a matching on a national scale, while technically feasible, pre- sented insurmountable problems. A slightly different tactic used by Teitelbaum et a1. (1973) was to design a system in which the admissions committee set its policies and then selected predictors or variables that would discriminate in a meaningful manner. They felt that the use of these predictorswas important for two reasons: (1) it permitted the committee to develop a formula that reflected its thinking as to what combination of characteristics a candidate should possess, and (2) it allowed the committee to explain to applicants the grounds for rejection or acceptance. There are two approaches that Teitelbaum and his co-workers considered in developing their system: (1) an empirical approach and (2) a rational approach. The empirical approach attempts to capture what has already taken place. This approach usually employs such techniques as regression analysis, discriminant analysis, factor analysis, and principal component analysis on the decision already reached by the committee. In a rational approach variables are selected before a decision is made about the applicants. The committee agrees as to how these variables are to be weighted and combined. 39 Various mathematical models using admissions variables have been tested which successfully predict judgments and performance and yet these models are not utilized. A possible reason for this may be that these models rarely considered how individual committee members make and say they make their judgments. The modeling of committee members' judgments has not been emphasized sufficiently. Another possible reason for non-acceptance of these solutions may rest with resistance to attempts to use rule-generated procedures. Dawes (1977) noted that decision makers in selection situations may feel much prouder of having chosen an individual who did very well than in having established a rule that benefited the whole institution. An examination of how committee members make and say they make judgments should shed light on these concerns. Subjective Weights The major focus of the previous studies has been on predicting success in school or modeling judges' policies. Also, emphasis has been on objective (i.e., statistical, mathematical, regression) weights. Dawes and Corrigan (1974) have shown that the linear model was an ade- quate representation of human judgment in a large number of instances. In a series of tests to examine the robustness of the linear model, they found that unit weights performed as well as differential (objective) weights in predicting criterion values. They stated that "the whole trick is to decide what variables to look at and then to know how to add". Since the data for this study reflect the right hand side of the lens model (e.g., the correlation between cues and judgments), the unit weighting schemes will be compared with the differential weighting schemes. This comparison is similar to the Dawes and Corrigan study 40 but on the opposite side of the lens model. Of importance is whether unit weights can predict actual judgments as well as objective weights or subjective weights. Subjective weights are relatively new to the judgment paradigm. The work of Cook and Stewart (1975), Martin (1957), Schmitt and Levine (1977) and Summers et a1. (1970) have shown that much interesting work can be done with subjective weights. The interesting research question was whether there were any differences between objective and subjective models of committee members. The subjective model would be committee members' self-reports of the importance they attach to the variables used in rating applicants. The objective model would be mathematical weights attached to the admissions variables derived from multiple regression techniques. Then the relation between committee members' subjective impression of how they weight information and an objective measure of how they weight information can be determined. A mistrust of self-report studies and the use of introspection has often been a prevalent theme in the history of psychology. If there is a strong positive relation between the two weighting schemes, support is lent to the hypothesis that judges can relate what they are doing. If there is little or no relation, support is lent to the hypothesis that judges cannot accurately estimate their weighting scheme. Another area to be examined when comparing subjective and objective policies is the predicted judgments that are generated through the use of subjective and objective weights. A possibility exists that judges may differ in their subjective and statistical weights but the predicted judgments generated from these weights may be highly related. The 41 emphasis would shift from the weights themselves to the outcome arrived at using such weights. Martin (1957) found that a linear model based on the use of subjective weights was successful in predicting evaluations of student sociability. Summers et a1. (1970) found that although subjective weights were successful; a linear model based on regression weights accounted for 20% more variance. Cook and Stewart (1975) showed that subjective policy descriptions corresponded fairly closely to the statistical policy descriptions. For a three-cue task, the subjective policy accounted for 91% of the maximum linear variance; while for a seven-cue task, the subjective policy accounted for 74% of the variance. Summers et a1. (1970) compared subjects' actual judgments with predicted judgments arrived at through the use of subjective weights and found that the median correlation was .60. This method was unique in that it offered an alternative approach in measuring the accuracy of subjective weights. Typically the accuracy of subjective weights was measured by correlating objective weights with subjective weights. These correlations have tended to be low (Hoffman, 1960; Slovic, 1969; Slovic et a1. 1972). Cook and Stewart (1975) noted that although there were different ways to obtain objective weights, there was usually only one method that had been used to obtain subjective weights. This method was to have each judge divide 100 points among the predictor variables (Hoffman, 1960). Cook and Stewart compared seven different methods of arriving at subjective weights and found that there were no significant dif- ferences among the weighting schemes. This was surprising because the methods ranged from dividing 100 points to complex configural rating 42 schemes. Although there were no significant differences, it should be recalled that the subjective policies corresponded fairly closely to the objective policies. Schmitt and Levine (1977) felt that more research should be di- rected to understanding the use of subjective weights. They suggested a study comparing predicted judgments arrived at through the use of both subjective and objective weights. They questioned whether the focus of research should be on subjective rather than objective weights and suggested that much interesting and important research can be done with subjective weights. It is important to investigate the use of subjective weight to shed some light on the controversy that exists between the different paradigms. The judgment paradigm usually has ignored subjective weights while the problem solving paradigm utilized them. The solution to this controversy may lie somewhere between the stated extremes. The task environment may prove to be the important determinant in the analysis of this problem. From these judgment studies, the following conclusions are drawn: 1) A simple linear combination of the admissions criteria considered by committee members did a better job of predicting graduate performance than did the committee members; 2) The behavior of admissions committee members can be simulated by a linear combination of the admissions criteria; 3) A paramorphic representation may be used as a preliminary screening device; 5) 6) 8) 9) 10) 11) 43 Under certain circumstances, paramorphic representa- tions may be more valid than the committee members themselves; Rule—generated procedures are clearly superior to case-by-case procedures; Use of rule-generated procedures does not imply that decisions have to be dehumanizing; The few studies that have been done on medical school admissions can be divided into those which predicted success in medical school and those which predicted committee members' judgments; GPA and MCAT scores predicted success in the first two years of medical school but lose their predictive powers as time progresses; Regression procedures have been used successfully as measures of whom to invite to interview; Few of the models used to predict success or judgments have been adopted; The few studies done on modeling a judgment policy with subjective weights have shown that much promising and interesting work can be done. Summar The purpose of this study is to model and compare how admissions committee members say they weight information in making judgments regarding the acceptability of medical school applicants with how mathematical representations weight the same information. 44 The judgment research has shown that a variety of judgment tasks have been modeled by mathematical representations. These models have used typically a linear combination of information. Both the pre- diction of success and actual judgments have been modeled successfully. In fact, under certain circumstances, the paramorphic representations may be more valid than the judgments themselves. This modeling has been useful because it can highlight individual differences and the misuse of information by judges. A new area in judgment is subjective weights. The use of subjective weights in modeling a judgment task allows insight into the question of how judges say they weight information. How information is weighted mathematically and how it is weighted subjectively may be two different things. Understanding these differences expands our knowledge about judgment. Specifically, it may allow us to discern some of the problems of judgment in medical school admissions. There are many problems with existing medical school admissions processes. Some of these problems have historical roots while others are just emerging. Admissions has grown from informal processes to formal procedures. Through the years three major admissions criteria have emerged: GPA, MCAT scores and personal interviews. Additional criteria have included: autobiographical (personal) statements, letters of evalu- ation and extracurricular activities for which adjustments are made. However, these criteria predict success in the first two years but they lose their predictive powers in the clinical years. Given the sheer number of applicants, the diversity of the pool, and the lack of predictive powers, a severe strain is placed on admissions com- mittees. 45 Admissions committees must identify, measure and evaluate important admissions criteria. Yet, there is little research on how committees weight these admissions variables. The few available studies have shown that weighting schemes can be devised that predict success or judgments but these schemes have not been adopted. Insight into these concerns may arise from research in the judgment paradigm. Thus, medical school admissions provides a rich content area to explore the judgments of committee members. The judgment paradigm provides the tools and methods of study. Different measures are used to analyze the success of the different weighting schemes. Questions of which weights are better, how much agreement there is among com- mittee members, and how best to use the weights are some of the major issues addressed in this study. CHAPTER III _ DESIGN OF THE STUDY In this chapter the method and design of this study are described in seven major sections: (1) Population and Sample, (2) Stimulus Materials, (3) Procedures, (4) Measures, (5) Hypotheses, (6) Analyses and (7) Summary. ngulation and Sample The subjects for this study were drawn from the current members of the Admissions Committee of Michigan State University's College of Human Medicine. Members are elected by their peers from their various departments for three year terms. Fifteen out of sixteen agreed to participate in this study of whom four were students. Seven were males and eight were females. Their ages ranged from 23 to 61; the mean age was 35. Four M.D.'s, three Ph.D.‘s, two Masters, five Bachelors, and one medical student who had no college degree comprised the degree status of the committee members. The average number of years on the committee was 2 years with the range being from 8 year to 4 years. Stimulus Materials Two different sets of stimulus materials were created for presen- tation to committee members. Each set contained an introduction, instructions, a description of the variables to be used, and the data (Appendix A and B). Data were presented on four independent variables: (1) total grade point average (GPA), (2) Medical College Admissions Test (MCAT) scores, (3) personal statement scores and (4) interview scores. These are the major variables used in the admissions process to select medical school applicants (Gee and Cowles, 1957; 46 47 Char et a1. 1975). The following descriptions were provided to each committee member: Total GPA. This represented the cumulative grade point average of the applicant's undergraduate years. The GPA ranged from a low of 2.00 (C) to a high of 4.00 (A). The average was 3.19 for both the sample and the actual applicant pool of 1977. MCAT Score. In 1977, the revised MCAT was given for the first time to medical school applicants. Scoring is much different than that of the old MCAT, as is the material upon which the applicant is tested. There are four science-related subtests: content knowledge in biology, chemistry, and physics, and problem-solving ability in the sciences. There are two additional tests: quantitative reasoning ability and reading comprehension. Ordinarily the subtests are re- ported as separate scores, but because they are highly correlated, an average score was used in this study to simplify the task. Scores ranged from a low of l to a high of 14. The average MCAT score was 8 for both the sample and the actual applicant pool. Personal Statement Score. The personal statement score represented the evaluation given the applicant by two raters who read the two pages of autobiographical information found in the applicant's formal appli- cation and the personal statement submitted by the applicant describing his/her reasons for choosing medicine as a career and for choosing Michigan State University's College of Human Medicine. The evaluation of these scores was described by one of five labels ranging from the extremes of "well above average" (5.0) to "well below average" (1.0). The average score was 3.0 for both the sample and the actual applicant pool. This average was based on a five point scale. 48 Interview Score. This score represented the combined recommenda- tion given by two interviewers who had each conducted a fifty to sixty minute interview with the applicant. Questions in the interview focused on personal qualities considered important to the student's successful functioning at this school as well as qualities felt to be critical to effectiveness as a physician. They included such areas as problem-solving, maturity, motivation, interpersonal skills, and self- understanding. Five different labels described the recommendation given to an applicant. These ranged from "outstanding candidate" (5.0) to "express reservations" (1.0). The average score was 3.0. This average was based on a five point scale. The instructions emphasized that these four variables were a sample of the information that is available to committee members con- cerning an applicant. Committee members were asked to accept this limitation and to make ratings on this information alone. Data Sets The examination of various models in the judgment paradigm has occurred usually under two data conditions, representative and orthogonal. The variables of interest are correlated in the represent- ative condition to the extent believed to prevail in reality and are uncorrelated in the orthogonal condition. Brunswik (1955) and Hammond (1972) have argued for the study of judgment in real situations. They felt that experimental designs using representative data are to be preferred over designs employing orthogonal data. The task validity is greater with a representative design. On the other hand, it is known that the beta weights ob- tained from a linear model are unstable. When the predictor variables 49 are inter-correlated (multi-collinearity), the weights that are as- signed to these variables will differ according to the methods used in computing a regression equation. When the predictor variables are orthogonal, the beta weights are more stable. Thus, researchers have used orthogonal data to arrive at cleaner statistical results (Darlington, 1968). For this study, two data sets, correlated and orthogonal, were used to examine the boundary conditions of both the objective and subjective weighting schemes. For the correlated data set, the relationships between the four admissions variables were moderate to high. The correlations ranged from .53 to .69 (Table 3.1). For the orthogonal data set, the relationships between the variables were essentially zero. No correlation exceeded |.22| (Table 3.2). However, each data set had the same means and standard deviation (Table 3.3). Only the correlations between the variables were changed. The use of these data sets allowed the examination of each judgment policy under two conditions. Procedures Two testing sessions for each committee member were required. In one session, orthogonal data (in the form of the four variables) about forty applicants to medical school were presented to a committee member. The committee member's task was twofold: (1) rate each applicant on the basis of overall quality by assigning a score from 1 to 7 (one being a low rating) and (2) verbally report subjective importance weights for each of the four variables used in evaluating the appli- cants (Total GPA, MCAT Scores, Personal Statement Scores, and Interview Scores), first by rank ordering the four variables in 50 Table 3.1 CORRELATION MATRIX OF INDEPENDENT VARIABLES - CORRELATED DATA SET (N=30) Independent Total MCAT Personal Interview Variables GPA Score Statement Score Total GPA 1.00 MCAT Score .69* 1.00 Personal Statement .61* .60* 1.00 Interview Score .59* .53* .63* 1.00 * p < .001 51 Table 3.2 CORRELATION MATRIX OF INDEPENDENT VARIABLES - ORTHOGONAL DATA SET (N=30) Independent Total MCAT Personal Interview Variables GPA Score Statement Score Total GPA 1.00 MCAT Score .22 1.00 Personal Statement -.19 -.02 1.00 Interview Score .05 -.09 -.08 1.00 52 Table 3.3 MEANS, STANDARD DEVIATIONS AND RANGES OF INDEPENDENT VARIABLES Independent Variables Mean Std.Dev. Range Total GPA 3.19 .53 2.04 to 4.00 MCAT Score 8.40 2.87 4 to 14 Personal Statement 3.00 1.11 1 to 5 Interview Score 3.00 1.11 1 to 5 53 importance and second by distributing 100 points among them. A high number represented a relatively important variable (Appendix C and D). In the second session, correlated data were presented. The committee member's task was again twofold: (1) rate each applicant and (2) report subjective importance weights (Appendix E and F). A counter-balanced design was used: eight committee members received orthogonal data in the first session, correlated data in the second session. The remaining seven committee members received the data sets in the opposite order. Testing took place in committee members' offices or conference rooms. Each testing session lasted from 3/4 hour to an hour. At the end of each session, a short debriefing was held. Intra-Judge and Inter-Judge Reliability Data on ten applicants were randomly selected to estimate intra- judge reliability. By correlating the ratings given to an applicant on the first instance with its replication, a measure of intra-judge reliability was obtained (Table 3.4). The median correlation for the correlated data set was .94 with a range of .86 to 1.00 and a mean of .95. The correlations for the orthogonal data set were lower with a median correlation of .83, ranging from a low of .42 to a high of .96 with a mean of .80. Once the reliability coefficients were calculated for each judge, the ten replications were removed from further analy- sis. The reliability coefficients reported for the correlated data set were slightly higher than what has been reported in the literature. Hoffman et a1. (1968) reported correlations ranging from .60 to .92 with a median of .80. Hoffman (1960) reported intra-judge correlations ranging from .83 to .88. However, the Hoffman et a1. research used 54 Tab1e 3.4 INTRA-JUDGE RELIABILITY FOR 10 REPLICATED CASES Correlated Data Orthogonal Data Judge [xx Judge :xx # 1 l.OO** # 1 .88* # 2 l.OO** # 2 .85* # 3 .92* # 3 .93* # 4 .96* # 4 .88* # 5 .92* # 5 .57*** # 6 .91* # 6 .67*** # 7 .95* # 7 .42 # 8 .85* # 8 .83* # 9 .88* # 9 .96* #10 .95* #10 .88* #11 .92* #11 .45 #12 .86* #12 .65*** #13 .96* #13 .67*** #14 .91* #14 .90* #15 .94* #15 .67*** Median = .94 Median = .83 Mean = .95 Mean = .80 Range = .86 to 1.00 Range = .42 to .96 55 orthogonal data and thus was closer to the results shown for the orthogonal data set in this study. Hoffman used representative data but his sample was based on a sample size of four judges. Goldberg (1968) stated that while the relatively few investiga- tions of judgmental stability (intra-judge reliability) have concluded that judges may show substantial consistency in their judgments over time, the vast majority of reliability studies have focused upon judg- mental consensus (inter-judge reliability) and have come to widely disparate conclusions. Goldberg cited some findings of extremely high agreement on some judgment tasks (e.g., Bryon, Hunt and Walker, 1966; Goldberg, 1966; Winslow and Rapersand, 1964) and other results of virtually no consensus (e.g., Brodie, 1964; Gunderson, 1965; Watson, 1967). It is interesting to note that in this study there was high agree- ment among the fifteen judges in the correlated data set (Table 3.5). Correlations between judges ranged from .72 to .96. In the orthogonal data, the inter-judge correlations were lower (Table 3.6), ranging from .10 to .89. Hoffman et a1. (1968) showed similar reliabilities with orthogonal data. Their correlations ranged from -.11 to .83. The findings of this study showed high consensus on the judgment task for the correlated and orthogonal data conditions. Coefficient Alpha (°<), developed by Cronbach (1951), was used to estimate the relia- bility of the multiple ratings for the thirty applicants. Alpha was .98 for the correlated data and .95 for the orthogonal data. Measures Since the major thrust of this study was on the relation between objective and subjective weights, the following information was 56 pm. P¢. om. cw. mm. mm. mm. mm. om. om. mm. mm. mm. .Fm. om. m mm.~ mm.e oo.p om. mm. mm. mm. om. mm. mm. mm. mm. ma. mm. mm. pm. mm. mam NF.N sm.m oo.~ wm. om. mm. mw. mm. mm. om. mm. em. mm. mm. mm. mm. vfi* po.m me.m oo.— mm. mm. mm. mm. mm. om. mm. mm. em. mm. mm. mm. mflfi up.— Nm.¢ oo.~ mm. Nu. mm. om. ww. mm. om. mm. mm. mm. mm. Nfifi mm.~ mp.¢ oo.p mm. am. mm. No. mm. mm. pm. em. mm. am. HH¥ mp.~ m¢.m oo.~ as. mm. mm. mm. mm. mm. om. um. mm. Cam mm.~ om.m cowp um. mm. om. Fm. pm. mm. em. mm. m * -.p cm.v oo.~ mm. mm. om. am. am. Pa. mm. m * cm.~ om.m co.p mm. mm. em. ow. om. Na. K * m~.p nm.¢ 00.? nm. em. mm. mm. mm. m * .N¢.N oe.¢ oo.~ pm. mm. mm. om. m * mo.~ mm.¢ oo. mm. mm. mm. e * o~.~ mo.¢ co. mm. mm. m * mo.~ om.m oo.~ pm. N * mo.N oo.¢ oo.F p * .rwwm mcmuem mH* ¢H* m~* NH¥ Hfifi ofifi mfi mm 5* m* ma v* m* N* P* wanna Hum h~4_m5~4~m<~gmm muazwummhzfi m.m mpnMH 58 collected for each judge: 1) 2) 3) 4) 5) subjective importance weights (SWi); objective (regression) weights (Bi); actual judgments given each applicant (YS); predicted judgments generated from subjective weights (Ysub); predicted judgments generated from objective weights (Yobj); subjective rank order of the four independent variables (So); order in which the four independent variables were stepped in or entered into the multiple regression equation (00). This information was used to study the relation between objective and subjective weights by examining the following correlations: 1) 2) 3) 4) 5) between objective and subjective weights; between actual judgments and judgments generated from objective weights; between actual judgments and judgments generated from subjective weights; between objectively generated judgments and subjectively generated judgments; between the subjective rank order of the four independent variables and the regression rank order of the same vari— ables. Each correlation examined one aspect of the relation between objective and subjective weights. 59 Once these correlations were examined, additional concerns arose centering on alternative weighting schemes. For example, how might objective and subjective weights compare with other weighting schemes? Specifically, how might a unit weighting or random rating model com- pare to objective and subjective models? To address this, the fol- lowing data were developed: 1) 2) ); predicted judgments generated from unit weights (Iunit judgments which were generated by randomly assigning an applicant a rating based on a judge's frequency distribution (Yrand)' To compare the different weighting schemes, the following correlations were examined: 1) 2) 3) 4) between actual judgments and objectively generated judgments; between actual judgments and subjectively generated judgments; between actual judgments and judgments generated from unit weights; between actual judgments and random judgments based on judges' frequency distributions of actual judgments. Having examined committee members as individuals, it was decided to look at models which represented the committee as a group. There- fore, the following data were developed: 1) 2) mean judgments which were the average ratings given to each applicant; predicted judgments generated from average objective ); (regreSSTOD) weightS (Yaverage 60 3) predicted judgments generated from equal subjective importance weights (Yaqual). The following correlations were examined: 1) between actual judgments and judgments generated from average objective weights; 2) between actual judgments and judgments generated from equal subjective weights. These correlations were then compared to the previously mentioned cor- relations to examine which weighting scheme best accounted for committee members' judgment policies. To summarize, the following models were developed for each com- mittee member: objective weights, subjective weights, unit weights, random ratings, mean objective weights and equal subjective weights. Each of these models yielded predictions that were correlated with committee members' actual judgments. This was how the success of each model was determined. Predicted judgments were obtained in five ways: 1) Objective weights. Judgments were obtained by multi- plying the objective (regression) weights by the standardized values of the four predictor variables. The objective (regression) weights were obtained from multiple regression analysis. The following equation represented the objective weighting scheme: iobj = a] (z GPA) + 32 (z MCAT) + 53 (2 Personal Statement) + 34 (2 Interview Score) Where (obj - predicted judgments Bi Z standardized regression weights standard score 2) 4) 61 Subjective weights. Judgments were obtained by multi- plying the subjective weights by the standardized values of the four independent variables. Subjective weights were used as if they were regression weights. The following equation represented the subjective weighting scheme: 1? sub = SW1 (2 GPA) + SW2 (Z MCAT) + SW3 (Z Personal Statement) + SW4 (2 Interview Score) Where (sub predicted judgments SW1 subjective weights Z standard score Unit weights. Ratings were obtained by multiplying the four predictor variables by unit weights (i.e., +l's or -l's). For example, an applicant with a GPA of 3.00, a MCAT of 8, an average personal statement score and an average interview score would have a rating of seventeen (l (3.00) + l (8) + 1 (3) + l (3) = 17). The signs of the unit weights were determined by the multiple regression analyses. The following equation represented the unit weighting scheme: A Y = UW1 (GPA) + UW2 (MCAT) + UW3 (Personal Statement) + unit UW4 (Interview Score) Where Y predicted judgments unit UWi unit weights Random ratings. Each judge had a frequency distribu- tion associated with the number of times applicants received a rating of one, two, three, etc. Based on 6) 62 each committee member's frequency distribution, ratings were assigned randomly to each applicant. So instead of just randomly assigning a rating from 1 to 7 to an applicant, ratings were assigned ran- domly to correspond to a frequency distribution. This model, which might be termed marginal ran- domness, allowed a determination of whether a random model might predict actual judgments. Mean objective weights. Judgments were obtained by multiplying the average weights by the standardized values of the four independent variables. The average objective weights were arrived at by re- gressing the four variables on to the average rating given each applicant. The average rating was the sum of each applicant's rating divided by the number of judges. The following equation represented the mean weighting scheme: Y = M31 (2 GPA) + M82 (2 MCAT) + M83 (2 Personal Statement) average + M84 (2 Interview Score) Where Yaverage = predicted judgments Mei = average objective weights Z = standard scores Equal subjective weights. Judgments were obtained by multiplying equal subjective weights by the standardized values of the four predictor variables. The following equation represented the equal weighting scheme: 63 iequa] = 25 (z GPA) + 25 (z MCAT) + 25 (2 Personal Statement) + 25 (2 Interview Score) Where Y = predicted judgments equal 2 = standard score Hypotheses The research hypotheses, originally stated in general terms in Chapter I, are stated operationally as: 1) No relation exists between statistical and sub- jective weights. 2) A positive relation exists between actual judgments and predicted judgments obtained through the use of statistical weights. 3) A positive relation exists between actual judgments and predicted judgments obtained through the use of subjective weights. 4) A positive relation exists between predicted judg- ments obtained through the use of both objective and subjective weights. 5) There is a greater relation between actual judg- ments and objectively predicted judgments than between actual judgments and subjectively pre- dicted judgments. The corresponding null hypotheses are stated symbolically in the following terms: 1) Ho: r = 0 -8iSWi Where Bi = objective weights SW1 = subjective weights 64 2) HO: EYsYobj = 0 Where YS = actual judgments A Yobj = predicted judgments obtained through the use of statistical weights. 3) H0: rYsYsub = 0 Where YS = actual judgments A Ysub = predicted judgments obtained through the use of subjective weights. 4) ”°‘ r—Yoijsub T 0 Where Yobj = predicted judgments obtained through the use of statistical weights. §sub = predicted judgments obtained through the use of subjective weights. 5) H0: EYsYobj = EYsYsub Where YS = actual judgments §obj = judgments generated from statistical weights. Asub = judgments generated from subjective weights. Analyses Each committee member's judgment policy was modeled in four ways: (1) objective weights, (2) subjective weights, (3) unit weights and (4) random ratings. The committee as a group was modeled by (1) average weights and (2) equal weights. These schemes allowed a, policy to be captured so that the weights and the predicted judgments arrived at from these weights could be compared. For each committee member a five-by-five correlation matrix was constructed with the 65 following elements: (1) actual judgments, (2) objectively predicted judgments, (3) subjectively predicted judgments, (4) unit weight pre- dicted judgments and (5) random judgments. The highest correlation between actual judgments and predicted judgments indicated which weighting scheme best captured a committee member's policy. This matrix allowed the examination of the effectiveness of different models in accounting for committee members' judgments. For the committee as a group a three-by-three correlation was constructed with the following elements: (1) actual judgments, (2) predicted judgments arrived at from average weights and (3) predicted judgments arrived at from equal weights. The highest correlation indicated which weighting scheme best captured the committee's policy. The effective use of the different models in accounting for the com- mittee's judgments was also examined. Multiple regression was the statistical technique employed to measure the relation between a dependent or criterion variable and a set of independent or predictor variables. The major assumptions of this model are that the variables are measured on at least an inter- val scale and that the relations among the variables are linear and additive. It should be noted, though, that non-interval variables, and nonlinear and nonadditive relations can be handled through the use of transformations. When multiple regression is used as a de- scriptive tool, the linear dependence of one variable on the other variables is summarized and decomposed. The regression analysis finds the best linear prediction equation and then evaluates the ac- curacy of this prediction equation. 66 In this study, GPA, MCAT scores, personal statement scores and interview scores served as the independent (predictor) variables while committee members' ratings or judgments of acceptability served as the dependent (criterion) variable. The multiple regression tech- nique analyzed the relations of the four independent variables to the one dependent variable. This analysis yielded four objective (regression) weights which in turn were used to generate predicted judgments. The other weights (i.e., subjective, unit, average and equal) were also used like regression weights to generate predicted judgments. The analyses of the data can be summarized as follows: For each committee member, 1) subjective importance weights were elicited; 2) multiple regression weights were computed; 3) the objective and subjective weights were correlated; 4) predicted judgments were generated through the use of regression weights; 5) predicted judgments were generated through the use of subjective weights; 6) predicted judgments were generated through the use of unit weights; 7) judgments corresponding to a frequency distribution were randomly generated; 8) actual judgments, objective judgments, subjective judgments, unit judgments, and random judgments were correlated. 67 For the committee at large, 1) a set of predicted judgments were generated based on the regression weights derived from the mean rating given each applicant; 2) a set of judgments were generated based on equal subjective weights; 3) actual judgments, predicted judgments derived from average weights, and judgments derived from equal weights were correlated. Summary The sample for this study was composed of 15 volunteers, all members of the admissions committee of the College of Human Medicine, Michigan State University. Seven were males and eight were females with faculty, staff and medical students being represented. Most committee members were experienced judges having served for an average of two years. Two different data sets, one correlated and one orthogonal, were employed to achieve the purpose of this study. Each data set contained the same introduction, instructions and description of independent variables. Only the data on the four independent variables were changed. These variables were total GPA, MCAT scores, personal statement scores and interview scores. Two testing sessions were required, one for each data set. The task was twofold in each session: (1) rate each applicant (40 total) on the basis of the four independent variables and (2) verbally report subjective importance weights for each variable. Testing took place in committee members' offices or conference rooms and lasted approximately fifty minutes. 68 Data on ten applicants were used for replications to estimate intra-judge reliability. These reliability estimates were very high for the correlated data set (.95) and lower for the orthogonal data (.80). Inter-judge reliability estimates were also very high. For the correlated data, Cronbach's Alpha was .98. For the orthogonal data, it was .95. The results showed that the committee members not only were reliable but also were consistent amongst themselves. Once a committee member rated all applicants and reported sub- jective importance weights, the following pieces of information were collected: 1) Actual judgments and judgments generated from ob- jective weights, subjective weights, unit weights and random ratings; 2) Objective, subjective and unit weights. For the committee as a group, the following pieces of information were collected: 1) Mean judgments and judgments generated from average weights and equal weights; 2) Average and equal weights. It was hypothesized that: 1) No relation existed between objective and sub- jective weights; 2) A positive relation existed between actual judgments and judgments generated from objective weights; 3) A positive relation existed between actual judgments and judgments generated from subjective weights; 69 4) A positive relation existed between objectively gen- erated judgments and subjectively generated judgments; 5) There was a greater relation between actual judgments and objectively generated judgments than between actual judgments and subjectively generated judgments. To test these hypotheses, multiple regression and correlation anal- yses were used. Additional analyses examined whether alternative weighting schemes (e.g., unit, random, average and equal) might perform as well as the objective or subjective weighting schemes. CHAPTER IV RESULTS AND DISCUSSION In this chapter, descriptive and statistical analyses of the data are presented and discussed. Data were analyzed primarily by programs contained in the Statistical Package for the Social Sciences (SPSS), version 6.5. Procedures included descriptive statistics, correlations, multiple regressions, t-tests, and repeated measures one way analyses of variance. The following research questions were specifically considered (refer to figure 4.1): 1. What was the relation between objective and sub- . . . 7 jective weights (385”). What was the agreement between actual judgments and predicted judgments arrived at through the . . . 7 7 use of objective weights (rYsYobj)' What was the agreement between actual judg- ments and predicted judgment arrived at through . . . , 7 the use of subjective weights (EYsYsub)' What was the agreement between objectively pre- dicted judgments and subjectively predicted - 7 7 7 judgments ([YsubYobj)' Was there greater agreement between actual judg- ments and objectively predicted judgments than there was between actual judgments and sub- jectively predicted judgments (EYSYobj > EYsYsub)? In the sections of this chapter, each research question is re- stated as a hypothesis, relevant data are presented, a statement is 70 71 CUES X1 “\SW § Predicted Judgments 7 X ,,ze—1%§EETT“ Using Statistical Weights = Yobj 2 ‘ ‘Sy2ygub \ E / k‘obj‘ \ ~\. Predicted Judgments . Xk ’/‘ka§§h6 _ ..... Using Subjective Weights = Ysub Figure 4.1 RELATION BETWEEN SUBJECTIVE AND OBJECTIVE WEIGHTS 72 made about whether the hypothesis was rejected or accepted, and the findings are discussed. The results for each hypothesis are pre- sented for both correlated and orthogonal data unless otherwise noted. After data pertaining to these five hypotheses were analyzed, four additional judgmental models were examined. They were the unit weighting model, random rating model, average weighting model and equal weighting model. Data are first presented for each model and then compared and discussed for all models. Relation between Objective and Subjective Weights The first research question concerned the relation between ob- jective and subjective weights. Objective weights have typically been used to model and highlight judgment policies (Slovic and Lichtenstein, 1971). Work on modeling policies by using subjective weights has shown serious discrepancies between the two weighting schemes (Hoffman, 1960; Slovic, et al., 1972). When examining the use of subjective weights, the typical performance criterion has been the correlation between objective and subjective weights. Subjective weights were elicited from the judges so that they could be compared with the objective weights derived from multiple regression analysis. The two sets of weights were correlated with each other. The use of these correlations was the first step in examining the question, how well do subjective weights work? Research Hypothesis Based on previous research findings, it was hypothesized that no relationship existed between objective weights generated by multiple regression analysis and subjective weights elicited from judges. 73 Statistical Hypothesis ”o‘ raism = 0 ”i‘ Eeiswi ’ 0 Where 3_ = correlation coefficient Bi = objective weights SW7 = subjective weights Results For each judge, two correlations were computed. The first was a product-moment correlation between objective and subjective weights (rfisw). The second was a rank-order correlation (Spearman rho) between the order in which the independent variables were entered into the stepwise regression equation and the judge's ranking of the importance of the independent variables (£5000). These correlations are pre- sented in Table 4.1. For the correlated data set, the median correlation between objective and subjective weights was .86, a strong positive correla- tion. The range was from -.12 to .99 with a mean correlation of .84. Five of fifteen correlations were significant (g_< .05). The median correlation between the objective and subjective rank orderings was .40. The range was from -.20 to 1.00. The mean correlation was .86. Four of fifteen correlations were significant (p_< .05). The median correlation between the objective and subjective weights was .88 in the orthogonal data. The range was from .18 to .99 with a mean correlation of .90. Eight of fifteen correlations were significant (g_< .05). The median correlation between the objective and subjective rank orderings was .80. The range was from .20 to 1.00. The mean correlation was .87. Three of fifteen correlations were significant (g_< .05). 74 Table 4.1 CORRELATIONS BETWEEN OBJECTIVE AND SUBJECTIVE WEIGHTS (EBSW) AND BETWEEN OBJECTIVE AND SUBJECTIVE RANK ORDER OF IMPORTANCE (35000) Correlated Data Orthogonal Data Judge EBSW E-SoOo E88W r-SoOo # 1 .86 .40 .98** .80 # 2 .98** l.OO** .93 .20 # 3 .61 4O .77 80 # 4 .87 1 OO** .97** 80 # 5 .64 40 .96*** 80 # 6 .99** l.OO** .94*** 80 # 7 -.12 -.20 .18 00 # 8 .88 .80 .51 4O # 9 .97** .80 .99** 1 OO** #10 .95*** l.OO** .88 80 #11 .43 80 .20 40 #12 .63 .40 .80 .80 #13 .91*** .20 .98** l.OO** #14 86 .40 .97** 1 OO** #15 34 .20 .69 80 Median .86 .40 .88 .80 Mean .84* .86** .90* .87** Range -.12 to .99 -.20 to 1.00 .18 to .99 .20 to 1.00 * p < .001 ** p < .01 *** p < .05 NOTE: The mean correlation is the average of the 15 judges and is based on an g_of 15. The correlations between objec- tive and subjective weights are based on an g_of four. Thus, a mean [_of .84 is significant at the .001 level while an individual [.0f .91 is only significant at the .05 level. 75 All of the correlations were transformed to Z_scores using Fisher's r_to Z_transformation so the mean correlations (in terms of Z scores) could be tested for significance from zero. All mean cor- relations were significantly different from zero. For the correlated data set, the mean correlation between objective and subjective weights was .84 (t_= 6.24; p_< .01). The mean correlation between the objective and subjective rank orderings of the independent variables was .86 (t_= 3.60; p_< .01). In the orthogonal data set, the mean correlation between objective and subjective weights was .90 (t_= 7.60; p_< .01). The mean correlation between the objective and subjective rank orderings of the independent variables was .87 (t_= 4.52; p_< .01). Confidence intervals were established around the mean correlations. In the correlated data set, the 95% confidence interval for the mean correlation between objective and subjective weights ranged from .67 to .93. The 95% confidence interval for the mean rank-order cor- relation between the regression order and the subjective order of the importance of the independent variables was from .48 to .97. In the orthogonal data set, the 95% confidence interval for the mean correlation between objective and subjective weights was between .79 to .96. The 95% confidence interval for the mean correlation between the regression order and subjective order of the importance of the independent variables was from .61 to .96. Based upon these findings, the null hypothesis was rejected and the alternate hypothesis, > O, was accepted. On the average, Eidiswi there were significant positive correlations between objective weights and subjective weights elicited from judges. However, note that there were substantial individual differences. There was at least one consistent outlier, judge #7, and others who were marginal. 76 Discussion The correlations between objective and subjective weights usually have tended to be low. Schmitt and Levine (1977) pointed out that the typical comparison of objective and subjective weights has involved either an "eyeball" comparison or at best a rank-order correlation between the sets of weights. This study used two correlation measures: a Pearson product moment and a Spearman rank order. In interpreting these significant correlations, a few words of caution are needed. First, the correlations between the sets of weights were based on only four variables, an extremely small sample size. An r_of .90 was needed to reach significance at the .05 level. Correlations based on such a small g_should be interpreted cautiously. Second, the significant mean correlations masked individual dif- ferences. Whereas the individual correlations were based on an g_of four, the mean correlations were based on an g_of fifteen (across all judges). A mean :_of only .51 was needed for significance at the .05 level. When individual correlations were examined, only five out of fifteen correlations were significant in the correlated data set. The number of significant individual correlations increased from five to eight in the orthogonal data set. These increases pointed to another problem, that of multicollinearity. As Kerlinger and Pedhazur (1973) have pointed out, substantive interpretation of regression coefficients is difficult and dangerous, and it becomes more difficult and dangerous as predictors are more highly correlated with each other. The regres- sion (objective) weights derived from correlated data are unstable. To correlate them with another set of weights may be inviting certain problems of interpretation and generalizability. 77 One is faced with deciding how to examine and interpret each data set. The correlated data were representative of actual admissions data but yielded unstable objective weights. The orthogonal data set allowed the interpretation of objective weights to be greatly simpli- fied but under unrealistic data conditions. One way to examine and interpret each data set arises from judges' perceptions of the data. The question of representativeness was pre- sented to each judge. It dealt with whether judges perceived dif- ferences between the correlated and orthogonal data sets. In a de- briefing session that was held at the end of each exercise, the judges were asked to rate the data sets on a seven-point representativeness scale (seven being highly representative). To test for perceived dif- ferences, the mean ratings were compared. The mean ratings were identical (xr = 5.6). The judges perceived both data sets as being representative of actual admissions data. At first glance this was disturbing in light of the fact that the orthogonal data set was not really representative of admissions data. What appeared to be happening with the orthogonal data set was what Abelson (1976) has termed script processing. Judges commented that certain applicants reminded them of students they knew or of previous candidates who had applied. In script theory, it is hypothesized that judges, when faced with a decision, create or employ relevant scripts. These scripts are based on previous learning or experience. On the basis of the debriefing session, this script hypothesis was a plau- sible explanation of why the orthogonal data set was perceived as representative of admissions data. 78 A second way to examine and interpret the data concerned the confidence intervals placed around the mean correlations. Using the correlated data set, the 95% confidence interval for the mean correla- tion between the objective and subjective weights was between .67 and .93. This was quite a large range of values. The 95% conficence inter- val for the mean correlation between the objective and subjective rank order of importance was even larger, between .48 and .97. With the orthogonal data set the intervals were smaller. Recall that a confidence interval is constructed so that it has a known probability (.95) of including the value of a parameter between its limits. Since these limits are quite large, caution should be exercised in interpreting these sample mean correlations. Thus, research question #1 examined the first step in determining how well subjective weights worked. Subjective weights were moderately to highly correlated with objective weights; the correlations were higher with the orthogonal data than with the correlated data. When the performance criterion was the correlation between objective and sub- jective weights, the use of subjective weights was appropriate in modeling a four-cue medical school admission task. Although this first step was taken cautiously due to the small sample size, the small number of significant individual correlations, and the large confidence intervals, it laid the groundwork for the examination of other criteria. These criteria were concerned with how objective and subjective weights were used in the prediction of actual judgments. The next two hypotheses addressed this issue. 79 Relation between Actual Judgments and Judgments Generated from Objective Weights The second research question involved the relation between actual judgments and predicted judgments generated from objective weights. This research question involved a different performance criterion than the one examined by research question #1. Concern shifted from the weights themselves to the predicted judgments derived from these weights. Emphasis was not placed on whether objective weights were representative of judges' psychological weights. Rather, it was placed on whether judges' actual ratings could be predicted from weights derived from multiple regression analysis. The correlations between actual judgments and predicted judgments derived from objective weights indicated how well a weighted linear combination of cue values could predict judges' actual ratings. The magnitude of these correlations assessed the adequacy of a linear model using objective weights. These correlations captured the judges' policies, to the extent that they were linear. Hammond and Summers (1972) referred to this term as cognitive control, the extent to which judges control the execution of their knowledge. The squared values of these correlations represented the variance accounted for by a linear model based on objective weights. Research Hypothesis It was hypothesized that there was a positive relation between actual judgments and judgments generated from objective weights.7 Statistical Hypothesis Ho: rYsYobj = ”i‘ LYsYobj > 0 0 80 Where 3_ = multiple correlation coefficient YS = actual judgments qobj = judgments generated from objective weights Results The correlations between judges' actual ratings and ratings gen- erated from objective weights (EYsYobj) are shown in Table 4.2. The squared values for each of these correlations (2?), the amount of variance that can be accounted for in committee members' judgments, are also presented. For the correlated data set, the median correla- tion between actual judgments and judgments generated from objective weights was .94. The range was from .85 to .97. The mean correlation was .94. All individual correlations were significant (g_< .001). The 3? values ranged from .72 to .94 with a median of .88 and a mean of .88. On the average, then, a linear model based on objective weights accounted for 88% of the variance of the actual judgments. The median correlation between actual judgments and judgments derived from objective weights was .89 for the orthogonal data set. The range was from .82 to .97 with a mean correlation of .91. All individual correlations were significant (p_< .001). The r2 values ranged from .66 to .95 with a median of .89 and a mean of .83. On the average, therefore, a linear model based on objective weights accounted for 83% of the variance of the actual judgments. All correlations were transformed to Z_scroes so the mean cor- relations could be tested for significance from zero. Both mean- correlations were significantly different from zero (for the cor- related data set, t_= 30.18, p7< .001; for the orthogonal data set, t = 24.07, p_< .001). The 95% confidence intervals established for 81 Table 4.2 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND JUDGMENTS GENERATED FROM OBJECTIVE WEIGHTS (r “ ) —YsYobj Correlated Data (N=30) Orthogonal Data (N=30) 7 2 7 2 Judge EYsYobj ’1 £YsYobj 5 # 1 .97* .94 .94* .88 # 2 .94* .88 .82* .66 # 3 .96* .92 .91* .84 # 4 .97* .93 .95* .90 # 5 .94* .88 .90* .81 # 6 .95* .90 .89* .79 # 7 .96* .91 .87* .76 # 8 .93* .86 .87* .76 # 9 .95* .91 .97* .95 #10 .90* .81 .92* .85 #11 .92* .85 .89* .79 #12 .85* .72 .86* .75 #13 .93* .87 .85* .73 #14 .94* .88 .91* .83 #15 .96* .92 .90* .81 Median .94 .88 .89 .79 Mean .94* .88 .91* .83 Range .85 to .97 .72 to .94 .82 to .97 .66 to .95 * p < .001 82 the mean correlations were from .93 to .96 (correlated data set) and from .88 to .93 (orthogonal data set), very narrow and high intervals. Based upon these findings, the null hypothesis was rejected and the alternate hypothesis, :YsYobj > O, was accepted. There were sig- nificant positive correlations between actual judgments and judgments generated from objective (regression) weights. It was suggested earlier that although the correlation between objective and subjective weights might be low, objective and sub- jective weights might yield predicted judgments that would correspond highly with actual judgments. This performance criterion dealt with predicted judgments, not weights. To test for any relation between judgments and weights, a rank-order correlation was run. This analysis examined how the correlations between objective and subjective weights were related to the correlations between actual judgments and judg- ments generated from objective weights. For example, does a low cor- relation between objective and subjective weights insure a low correla- tion between actual judgments and judgments generated from objective weights? A Spearman rank-order correlation showed that there was no significant relation between the correlations of objective and sub- jective weights and the correlations between actual judgments and judg- ments generated from objective weights (rho = -.14). Knowing the correlation between a judge's objective and sub- jective weights tells little about the relation of that judge's actual ratings and ratings predicted from objective weights. To draw con- clusions based soley on the correlations between objective and sub- jective weights would be to ignore an important relation. 83 Discussion The result that there were significant positive correlations be— tween actual judgments and objectively generated judgments was conso- nant with the judgment research that has shown linear models to be good approximations in many decision-making situations (Dawes and Corrigan, 1974). Correlations between actual judgments and judgments generated from objective weights are typically quite high, ranging from values in the .70's to the .90's (Cook and Stewart, 1975; Hoffman, 1960; Slovic and Lichtenstein, 1971). Probably of more importance than the mean correlation being significantly different from zero, was the confidence interval placed around the mean correlation. This interval had a known probability (.95) of including the population parameter within its limits. One can be 95% confident that the mean correlation is between its limits. A small interval with a high degree of confidence was a preferred state. For the correlated data set, the mean correlation between actual judgments and judgments generated from objective weights was quite high (.94) with a very narrow 95% confi- dence interval of .93 to .96 (between 86% and 90% of the variance accounted for). For the orthogonal data set, the mean correlation was again quite high (.91) with a 95% confidence interval of .88 to .93 (between 77% and 86% of the variance accounted for). The confidence limits showed that the correlations between actual judgments and judgments generated from objective weights were quite high. Although they do fall within the ranges reported from earlier research, they are slightly higher than those reported by Cook and Stewart (1975) who had their judges perform a similar task. The mean correlations in their study ranged from .89 to .91 (between 79% and 82% of the variance accounted for). 84 The reasons why the correlations of this study were so high may be answered by Dawes and Corrigan (1974). They stated that linear models work because the situations in which they have been investigated are those in which: (a) the predictor (independent) variables have conditionally monotone relationships to criteria; (b) there is error in the dependent variable; (c) there is error in the independent vari- ables and (d) deviations from optimal weighting do not make much practical difference. The task environment using the correlated data closely paralleled the above conditions. Each of the predictor vari- ables had a conditionally monotone relationship with the criterion. Presumably, "more is better" in the admissions process. The higher one's GPA, MCAT scores, personal statement score, and interview score are, the better are one's chances for receiving a higher rating. Two other elements, errors in the dependent and independent variables, were also present in this study. The last feature, that deviations from optimal weighting do not make much practical difference, involved the comparison of the various weighting schemes, and is addressed later in the chapter. It may be concluded that a linear model using objective weights is a good approximation in predicting a committee member's actual judg- ments for both data conditions. Another example is added to the growing number of tasks modeled by a linear model. This model also provided the upper limit for examining the effectiveness of the use of differential weights in predicting actual judgments. Relation between Actual Judgments and Judgments Generated from Subjective Weights The third research question dealt with the relation between actual judgments and judgments generated from subjective weights. 85 Again, concern was not with the weights themselves but with the cor- relation between actual judgments and predicted judgments. The performance criterion dealt with how well judges' actual ratings could be predicted by using subjective weights. The correlations between actual judgments and predicted judgments derived from subjective weights indicated how well a subjectively weighted linear combination of cue values could predict judges' actual ratings. The magnitude of these correlations assessed the adequacy of the linear model using subjective weights. The squared values of these correlations represented the variance accounted for by a linear model based on subjective weights. Research Hypothesis It was hypothesized that there was a positive relationship between a committee member's actual judgments and judgments generated from subjective weights. Statistical Hypothesis: Ho: EYsYsub = 0 H1: IYsYsub > 0 Where 5_ = product moment correlation Y5 = actual judgments (sub = predicted judgments generated from subjective weights Results The correlations between judges' actual ratings and ratings gen- erated from subjective weights (:YsYsub) are presented in Table 4.3. Also shown are the squared values for each of the correlations (3?). These values are a measure of how much variance can be accounted for by a linear model employing subjective weights. 86 Table 4.3 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND JUDGMENTS GENERATED FROM SUBJECTIVE WEIGHTS (rYsYsub) Correlated Data (N=30) Orthogonal Data (N=30) . 2 A 2 “"99 LYsYsub 3'— Evsvsub 5 # 1 .97* .94 .93* .86 # 2 .94* .88 .80* .64 # 3 .94* .88 .82* .67 # 4 .96* .93 .94* .88 # 5 .92* .85 .78* .61 # 6 .95* .89 .87* .76 # 7 .94* .89 .71* .51 # 8 .92* .85 .80* .64 # 9 .95* .90 .95* .89 #10 .90* .81 .86* .74 #11 .91* .82 .70* .49 #12 .84* .70 .80* .65 #13 .93* .87 .83* .68 #14 .93* .86 .89* .78 #15 .95* .90 .85* .72 Median .94 .88 .83 .68 Mean .93* .86 .85* .73 Range .84 to .97 .70 to .94 .70 to .95 .49 to 89 * p < .001 87 For the correlated data set, the median correlation between actual judgments and judgments generated from subjective weights was .94. The range was from .84 to .97. The mean was .93. All correla- tions were significant (g_< .001). The r? values ranged from .70 to .94 with a median of .88 and a mean of .86. On the average, a linear model based on subjective weights accounted for 86% of the variance in the prediction of actual judgments. The median correlation between actual judgments and judgments generated from subjective weights was .83 for the orthogonal data set. The range was from .70 to .95 with a mean correlation of .85. All correlations were significant (p_< .001). The E? values ranged from .49 to .89 with a median of .68 with a mean of .73. On the average, a linear model based On subjective weights accounted for 73% of the variance in the prediction af actual judgments. All of the correlations were transformed to Z_scores. Both mean correlations were significantly different from zero (for the cor- related data set, t_= 31.47, g_< .001; for the orthogonal data set, t_= 16.89, p_< .001). The 95% confidence intervals established for the mean correlations were from .92 to .95 (correlated data) and from .80 to .89 (orthogonal data). Based upon these findings, the null hypothesis was rejected and the alternate hypothesis, EYsYsub > O, was accepted. There were significant positive correlations between actual judgments and judg- ments generated from subjective weights. A Spearman rank-order correlation was run to examine any rela- tionship between judgments and weights. This analysis examined how correlations between objective and subjective weights were related to correlations between actual judgments and judgments derived from 88 subjective weights. For example, does a low correlation between objective and subjective weights mean that there will be a low cor- relation between actual judgments and judgments derived from sub- jective weights? A rank-order correlation showed that there was no significant relation between the correlations of objective and sub- jective weights and the correlations between actual judgments and sub- jectively derived judgments (rho = .07). Knowing the correlation between a judge's objective and subjective weights tells little about the relation between that judge's actual ratings and ratings derived from subjective weights. Thus, to draw conclusions based on one performance criterion (i.e., the correlation between objective and subjective weights) would again ignore other conclusions that may arise from another performance criterion (i.e., the relationship between actual judgments and judgments derived from subjective weights). Discussion These findings of significant positive correlations between actual judgments and subjectively generated judgments were consistent with the research examining the use of linear models. However, these correla- tions were slightly higher than the correlations that have been reported in modeling a judgment task with subjective weights. The following mean correlations have been reported: from .84 to .87 (Cook and Stewart, 1975); .77 (Martin, 1957) and .60 (Summers et al., 1970). Again, determining if the mean correlation was significantly different from zero was not as important as determining the confidence intervals around the mean correlations. For the correlated data set, the 95% confidence interval of .92 to .95 (between 85% and 90% of the variance 89 accounted for) was established around the mean correlation. For the orthogonal data set, the 95% confidence interval from .80 to .89 (between 64% and 79% of the variance accounted for) was established around the mean correlation. By establishing these confidence intervals it was possible to see if the previously reported mean correlations fell within these inter- vals. The mean correlations reported by Cook and Stewart fell within the interval established for the orthogonal data. The problem was that the task employed by Cook and Stewart did not involve the use of orthogonal data. The reasons why the correlations were so high in this study may be due again to the four situations that applied to a linear model using objective weights. The only difference between a linear model using objective weights and a linear model using subjective weights was the weights themselves. The generated judgments were computed in an identical manner. Only the weights themselves were changed. The subjective weights worked so well because the predictor variables were monotonically related to the criterion and because the entire task environment was one that was familiar to all judges. In this study, experienced judges were making ratings based on familiar data in a non-threatening environment. They were not asked to make any predictions of success. There were no right or wrong answers. Their tasks were to rate each applicant on some scale of acceptability and to report their subjective importance weights. These tasks were not demanding or novel for the judges. When the performance cri- terion was the relation between actual judgments and a set of judgments arrived at by using subjective weights, the use of subjective weights was highly significant. 90 Relation between Predicted Judgments Generated from Objective and Subjective Weights The fourth research question concerned the relation between pre- dicted judgments generated from objective weights and predicted judg- ments derived from subjective weights. This research question dealt with still another performance criterion. It was not concerned with the weights themselves or the correlations between actual ratings and predicted ratings. Rather, the analysis compared the predicted values derived from both weighting schemes. How do judgments derived from subjective weights compare with judgments generated from objective weights? The correlations between judgments generated from objective weights and judgments derived from subjective weights indicated the agreement between the two sets of predicted judgments. This performance criterion examined and compared the output (i.e., the predicted judg- ments) of two linear models. Research Hypothesis It was hypothesized that there was a positive relation between judgments generated from objective weights and judgments derived from subjective weights. Statistical Hypothesis Ho: rYoijsub = 0 ”i‘ LYoijsub > 0 Where 3_ = product moment correlation §obj = judgments generated from objective weights Y = judgments generated from subjective weights sub 91 Results The correlations between judgments generated from objective weights and judgments generated from subjective weights (:Yoijsub) are shown in Table 4.4. Also shown are the squared values of these correlations. For the correlated data set, the median correlation between judgments generated from objective weights and judgments gen- erated from subjective weights was .99. The range was from .98 to .99. The mean correlation was .99. All correlations were significant (p_ < .001). The :2 values ranged from .96 to .98 with a median of .98 and a mean of .98. On the average, judgments based on subjective weights accounted for 98% of the variance of judgments generated from objective weights. The median correlation between judgments generated from objective weights and judgments generated from subjective weights was .95 for the orthogonal data set. The range was from .79 to .99. The mean correlation was .95. All correlations were significant (p_< .001). The r? values ranged from .62 to .98 with a median of .90 and a mean of .90. On the average, judgments based on subjective weights accounted for 90% of the variance of judgments generated from objective weights. All correlations were transformed to Z_scores so that the mean correlations could be tested for significance. Both mean correlations were significantly different from zero (for the correlated data, t_= 58.65, g_< .001; for the orthogonal data, t_= 14.27, g_< .001). The 95% confidence interval established around these mean correlations were from .98 to .99 (correlated data) and from .92 to .97 (orthogonal data). 92 Table 4.4 CORRELATIONS BETWEEN COMMITTEE MEMBERS'JUDGMENT GENERATED FROM BOTH OBJECTIVE AND SUBJECTIVE WEIGHTS (rYoijsub) Correlated Data (N=30) Orthogonal Data (N=30) ,. 7 2 A A 2 ““93 LYoijsub 1 [Yoijsub 1 # 1 .99* .98 .99* .98 # 2 .99* .98 .98* .96 # 3 .98* .96 .89* .79 # 4 .99* .98 .99* .98 # 5 .98* .96 .87* .76 # 6 .99* .98 .98* .96 # 7 .99* .98 .81* .66 # 8 .99* .98 .92* .85 # 9 .99* .98 .97* .94 #10 .99* .98 .93* .86 #11 .98* .96 .79* .62 #12 .98* .96 .93* .86 #13 .99* .98 .97* .94 #14 .99* .98 .97* .94 #15 .98* .96 .95* .90 Median .99 .98 .95 .90 Mean .99* .96 .95* .90 Range .98 to .99 .96 to .98 .79 to .99 .62 to .98 * p < .001 93 Based upon these analyses, the null hypothesis was rejected and the alternate hypotheSis, KYoijsub > O, was accepted. There were significant positive correlations between judgments generated from objective weights and judgments derived from subjective weights. Discussion The finding of significant positive correlations between objec- tively generated judgments and subjectively derived judgments is new to judgment research. As Schmitt and Levine (1977) pointed out, no published studies comparing the two types of systems (objective and subjective weights) have correlated the predicted values. The focus of this research question was the correlation of these predicted values. When accuracy was defined in terms of this performance criterion, the subjective weights were extremely effective. For the correlated data, there was almost a perfect correlation between the two sets of predicted jUdgments. A full 98% of the variance was accounted for by subjectively predicted judgments when compared with objectively predicted judgments. The greatest feature of using this performance criterion rested with the fact that it allowed another comparison of the two weighting schemes. This criterion demonstrated the robustness of the linear model in terms of predicted judgments. The only thing that was dif- ferent in the predictions of judgments was the weights themselves. Even though different weights were used, similar predicted judgments resulted. The conclusion that subjectively derived judgments could account for a significant amount of variance in the prediction of objectively generated judgment under two data conditions was accepted. 94 Relation between Actual Judgments and Judgments Generated from Objective and Subjective Weights The fifth research question asked whether there was a greater rela- tionship between actual judgments and judgments generated from objective weights than there was between actual judgments and judgments generated from subjective weights. This research question concerned the choice of one weighting scheme over the other. The performance criterion dealt with assessing which weighting scheme predicted actual judgments more accurately. The correlations between actual judgments and judgments generated from objective weights were compared with the correlations between actual judgments and judgments generated from subjective weights. These comparisons were concerned with finding the better of the two models. Research Hypothesis It was hypothesized that there was a greater correlation between actual judgments and objectively generated judgments than there was between actual judgments and subjectively generated judgments. Statistical Hypothesis Ho: :YsYobj = EYsYsub “1‘ rvsiobj ’ rrsisub Where r_ = product moment correlation Y5 = actual judgments (obj = judgments generated from objective weights §sub = judgments generated from subjective weights 95 Results The relations between the correlations of actual judgments with judgments generated from objective weights (rYsYobj) and the correla- tions of actual judgments with judgments generated from subjective weights (rYsYsub) are shown in Table 4.5 (these values have been taken from Tables 4.2 and 4.3). For the correlated data set, the correla- tions between actual judgments and objectively predicted judgments were greater than the correlations between actual judgments and sub- jectively predicted judgments in nine out of fifteen cases. In the remaining six cases the correlations were identical. For the orthogonal data set, the correlations between actual judgments and objectively predicted judgments were greater than the correlations between actual judgments and judgments generated from subjective weights for all cases. All correlations were converted to Z_scores using Fisher's :_to Z_ transformation so mean correlations of both weighting schemes could be compared. A paired t-test was run to see if there was a difference between the mean correlations. For the correlated data set, there was a significant mean difference (t_= 3.59; p_< .001). For the orthogonal data set, there was also a significant mean difference (t_= 5.71; E.‘ .001). It was decided to compare the two models after the objective model was corrected for shrinkage. Since a linear model using regres- sion (objective) weights capitalized on chance (Cook and Stewart, 1975), a more meaningful comparison was between the subjective weighting model and the objective weighting model corrected for shrinkage. The correlations between actual judgments and judgments generated from objective weights would show some shrinkage if the 96 Table 4.5 CORRELATIONS COMPARING COMMITTEE MEMBERS' OBJECTIVE WEIGHTING MODELS WITH SUBJECTIVE WEIGHTING MODELS (r ‘ with r “ ) —YsYobj -YsYsub Correlated Data (N=30) Orthogonal Data (N=30) Judge LYsYobj LYsYsub LYsYobj LYsYsub # 1 .97 = .97 .94 > .93 # 2 .94 = .94 .82 > .80 # 3 .96 > .94 .91 > .82 # 4 .97 > .96 .95 > .94 # 5 .94 > .92 .90 > .78 # 6 .95 = .95 .89 > .87 # 7 .96 > .94 .87 > .71 # 8 .93 > .92 .87 > .80 # 9 .95 = .95 .97 > .95 #10 .90 = .90 .92 > .86 #11 .92 > .91 .89 > .70 #12 .85 > .84 .86 > .80 #13 .93 = .93 .85 > .83 #14 .94 > .93 .91 > .89 #15 .96 > .95 .90 > .85 Median .94 .93 .89 .83 Mean .94 > .93 .91 > .85 Range .85 to .97 .84 to .97 .82 to .97 .70 to .95 97 objective (regression) weights were applied to a new sample. This amount of shrinkage would depend upon the new number of predictor variables and number of applicants to be rated. On the other hand, the correlations between actual judgments and judgments derived from subjective weights would not show any shrinkage because the subjective weights were not estimated. Recall that this research question used the performance criterion of correlations between actual judgments and predicted judgments. The following modified formula provided an estimate of the shrunken squared correlation between actual judgments and judgments generated from objective weights (Cohen and Cohen, 1975): *2 " = - - 2 " .- 3stij 1 (' r—YsYobj)-n—"-:k—1—7— .2 7 = . Where E-YsYobj new shrunken squared correlation 2 7 = - . r—YsYobj squared correlation between actual judgments and judgments generated from objective weights (multiple r_squared) [:5 ll new number of applicants k_ new number of predictor variables Fischer's [_to Z transformations were again employed so that paired t-tests could be run. These tests examined the differences between the subjective weighting scheme and the objective weighting scheme corrected for shrinkage. For the correlated data, there was no difference between the models (t_= -l.23). For the orthogonal data set, there was a significant difference (t_= 3.61; g_< .01). For the correlated data, subjective weights performed as well as objec- tive weights but the objective weights performed better than the sub- jective weights for orthogonal data. 98 Based upon these findings, the null hypothesis was not rejected for the correlated data. There were no significantly greater relations between actual judgments and judgments generated from objective weights than there were between actual judgments and judgments generated from subjective weights. However, for the orthogonal data, the null hypothesis was rejected, and the alternate hypothesis, EYsYobj > [YsYsub’ was accepted. There were significantly greater relations between actual judgments and judgments generated from objective weights than there were between actual judgments and judgments derived from subjective weights. Discussion The result that there were significant differences between objec- tive and subjective weights in the orthogonal data set was congruent with previous research. However, the differences that have been typically reported were not as small as the differences reported by this study. In fact, the differences found in this study, although statistically significant, were not practically significant. For the correlated data set, subjective weights performed as well as the objective weights in six out of fifteen instances. For any one judge, the difference between the models was minimal. For all practical purposes, the two models were identical. This was not the case, though, in the orthogonal data. The objective weighting model was clearly superior to the subjective weighting model. Similar results were found by Cook and Stewart (1975) when they compared the proportion of the variance accounted for by using sub- jective weights with the maximum proportion accounted for by using regression weights. Their comparison was the mean 3? between actual 99 judgments and judgments generated from subjective weights divided by the mean 3? between actual judgments and judgments generated from objective weights. This ratio represented the proportion of the variance accounted for by using subjective weights compared to the maximum proportion of the variance that was accounted for by a linear model based on objective weights. They reported values ranging from .68 to .95 (i.e., between 68% and 95% of the variance was accounted for by using subjective weights). Hence, the subjective weights were highly accurate. This study reaffirms that when the performance criterion was the correlation between actual judgments and predicted judgments, the use of subjective weights was highly effective. Subjective weights per- formed as well as objective weights for correlated data but dropped off in effectiveness in the orthogonal data set. Four Additional Weighting Models After each committee member's judgment policy had been modeled by using objective and subjective weights, four additional models were examined. They were: (1) unit weights, (2) random ratings, (3) average weights and (4) equal weights. These models were constructed to examine the robustness of the linear model in predicting judges' actual ratings. The use of these models paralleled the study done by Dawes and Corrigan (1974) but with one major difference. They con- structed models that were used to predict criterion values, not ratings. This study examined the prediction of judgments, since‘ criterion values of applicant quality were unavailable. Since one of their conclusions was that deviations from optimal weighting do not make much practical difference, it was decided to 100 examine whether different weighting schemes made any difference in the prediction of actual judgments. The additional models were chosen to examine this question. The unit weights and random ratings were based on some information about each individual judge. The average and equal weights were based on some information about the committee as a group. No weighting scheme required that any additional informa- tion be collected from the judges. Rather, all were based on existing data and information, and so it was possible to examine which model was most effective in predicting judges' actual ratings. Thus, the performance criterion was the correlation between actual judgments and judgments derived from the respective weights of the models (Figure 4.2). The results of each analysis are presented separately and then discussed jointly. Unit Weights A popular model with judgment researchers has been unit weights. A variety of contexts have been examined in which unit weights have done well. The work of Dawes and Corrigan (1974) and Einhorn and Hogarth (1975) typifies this research. These researchers concluded that unit weights may be superior to optimal weights. Other investi— gators have found that unit weights perform as well as optimal weights when the weights were cross-validated (Trattner, 1963; Schmidt, 1971). The major conclusion drawn from this research was that a simple additive model represented how judges operated. However, this con- clusion has resulted typically from studying the relationship between cues and criterion values. Since this study was concerned with the relationship between cues and judgments, it was decided to test whether this conclusion was valid. 101 CUES X1 Committee Member's Actual Judgments = YS // x \ 2 Predicted Judgments 7 X3 USing Unit Weights - Yunit "'rYsYunit x4 Predicted Judgments 7 USing Average Weights = Yaverage 7 "E¥sYaverage 7 Predicted Judgments 7 USing Average Weights = quual 7'5Y5quual Random Ratings for l Each Committee Member = Yrand "'EYsYrand Figure 4.2 UNIT WEIGHTING, RANDOM RATINGS, AVERAGE WEIGHTING AND EQUAL WEIGHTING MODELS 102 The unit weighting scheme consisted of ratings predicted for each judge that were computed by using +1 or -1 as weights. The unit weights were used as if they were beta weights, and the pre- dicted ratings were obtained by multiplying unit weights by the non- standardized values of the cues. These predictions were correlated with the actual judgments of the committee members (EYSYunit)' The squared values of these correlations (3?) represented the variance accounted for by a linear model based on unit weights (Table 4.6). For the correlated data set, the median correlation between actual judgments and judgments generated from unit weights was quite high (.88). The range was from .80 to .95 with a mean correlation of .89. All correlations were significant (p_< .001). For the orthogonal data set, the median correlation was .64. The range was from .30 to .81. The mean correlation was .62. Fourteen out of fifteen correlations were significant (g_< .01). All correlations were transformed to Z_scores so that the mean correlations could be tested for significance. Both mean correla- tions were significantly different from zero (for the correlated data, t_= 26.39, p_< .001; for the orthogonal data, t_= 11.93, p_< .001). The 95% confidence intervals established around the mean correlations were from .87 to .92 (correlated data) and .54 to .70 (orthogonal data). An obvious conclusion from Table 4.6 is that the unit weight model is better with correlated data than with orthogonal data. Random Ratings, A set of random ratings was used to determine if actual judg- ments could be predicted from knowledge about judge's frequency 103 Table 4.6 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND JUDGMENTS GENERATED FROM UNIT WEIGHTS (—YsYunit) Correlated Data (N=30) Orthogonal Data (N=30) , 2 7 2 Judge r-YsYunit 5- EYsYunit -£ # 1 .95* .90 .78* .61 # 2 .88* .77 .53* .28 # 3 .87* .76 .48** .23 # 4 .92* .85 .68* .46 # 5 .91* .83 .72* .52 # 6 .93* .86 .81* .66 # 7 .95* .90 .64* .41 # 8 .87* .76 .70* .49 # 9 .84* .71 .30 .09 #10 .88* .77 .71* .50 #11 .88* .77 .60* .36 #12 .80* .64 .39** .15 #13 .90* .81 .56* .31 #14 .86* .74 .45** .20 #15 .93* .86 .73* .53 Median .88 .77 .64 .41 Mean .89* .79 .62* .38 Range .80 to .95 .64 to .90 .30 to .81 .09 to .66 * p < .001 ** p < .01 104 distributions. That is, each judge had a frequency distribution cor— responding to the number of times a rating was used. This distribution provided very limited information about each judge. If actual judg— ments could be predicted by this limited information, the use of more elaborate models was questionable. These ratings provided a baseline model for each judge. The random rating scheme consisted of a set of randomly assigned ratings (Yrand) that was based on each judge's frequency distribution of actual judgments. Instead of simply randomly assigning a rating from 1 to 7 to an applicant, a set of ratings that corresponded to each judge's own frequency distribution was assigned randomly. These random ratings were correlated with the actual ratings of the judges (IYsYrand)' The squared values of these correlations (3?) represented the variance accounted for by the use of random ratings (Table 4.7). For the correlated data set, the median correlation between actual judgments and random judgments was -.08. The range was from -.35 to .18. The mean correlation was -.O8. Only one out of fifteen correlations was significant (p_< .01). For the orthogonal data set, the median correlation was -.05. The range was from -.30 to .39. The mean correlation was -.01. Again, only one out of fifteen correlations was significant (p_< .01). All correlations were converted to Z_scores so that the mean cor- relations could be tested for significance. Neither mean correlation was significant (for the correlated data, t_= -2.37; for the ortho- gonal data, t_= -.23). 105 Table 4.7 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND RANDOMLY GENERATED JUDGMENTS ([ YsYrand) Correlated Data (N=30) Orthogonal Data (N=30) 2 2 JUdge rrsvrand 3- Eisvrand -5 # 1 -.16 .03 .03 .00 # 2 -.19 .04 .05 .00 # 3 .09 .01 .10 .01 # 4 -.17 .03 .39** .15 # 5 .18 .03 -.00 .00 # 6 -.18 .03 —.21 .04 # 7 -.08 .01 -.14 .02 # 8 -.08 .01 .08 .01 # 9 -.06 .00 .20 .04 #10 -.12 .01 .29 .08 #11 .08 .01 .Ol .00 #12 .03 .00 .01 .00 #13 -.05 .00 -.1O .01 #14 -.35** .12 -.07 .00 #15 -.21 .04 -.30 .09 Median -.08 .Ol .05 .00 Mean -.08 .Ol .01 .00 Range -.35 to .18 .00 to .12 .30 to .39 .00 to .15 ** p < .01 106 Average Weights An average weighting scheme allowed the comparison of an individ- ual judge with a composite or average judge. This average weighting scheme reflected the judgment policy of the committee. If an average weighting model performed quite well, then the use of individual weighting models may be called into question. The average weighting scheme consisted of predicted judgments obtained by using objective (regression) weights computed from the average rating given to each applicant. The average ratings given the thirty applicants are shown in Table 4.8. These average ratings were treated as the dependent variable in a multiple regression analysis which yielded a set of average objective (regression) weights. These in turn were used to compute a set of predicted judgments. These predictions were correlated with the actual judgments of individ- ual committee members (5_ ). The squared values of these cor- YsYaverage relations (3?) represented the variance accounted for by a linear model based on average weights (Table 4.9). For the correlated data set, the median correlation between actual judgments and judgments generated from average weights was .92. The range was from .84 to .97. The mean correlation was .93. All cor- relations were significant (p_< .001). For the orthogonal data set, the median correlation was .81. The range was from .62 to .95. The mean correlation was .83. All correlations were significant (p_< .001). The mean correlations (in terms of 7 scores) were tested for significance. Both mean correlations were statistically significant (for the correlated data, t_= 29.37, g_< .001; for the orthogonal data, t_= 15.99, p_< .001). The 95% confidence intervals for the mean 107 Table 4.8 MEAN RATINGS GIVEN TO EACH APPLICANT 637465744655445565655675675457 e g 00000000000OOOOOOOOOOOOOOOOOOO M tttttttttttttttttttttttttttttt a t R 3.15122511121111131430022152147.1100 a D 1m . 30893503113204956263329640394] m D 87691878911991605498855718920% Ma S ._I. alt-II 1|. 1|. 1 «I I] a In t ml 0 m. :I 330770730707000073073303330003 a 5788402544462006890833257114687 R 4.152446233903234343533463353224 x e 476735367456557777375727644345 g 000000000000000000000000000000 m tttttttttttttttttttttttttttttt a R 133513133135313555161515311123 t a D d D 835334402251426363065053803702 m (w 810687799165729656820736698777 a x 1|] 1|. 1| «I 1 e r r rw m. .1. 737037037000073000333730033073 .«M 256658634606607486993214835803 R 2446131452453346561626116421134 x Applicant 123456789 012345678901234567890 111111111122222222223 ############################## 108 Tab1e 4.9 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND JUDGMENTS GENERATED FROM AVERAGE WEIGHTS (r “ ) —YsYaverage Correlated Data (N=30) Orthogonal Data (N=30) 7 2 7 2 Judge rYsYaverage E. rYsYaverage ~£ # 1 .96* .92 .91* .83 # 2 .93* .86 .77* .59 # 3 .94* .88 .86* .74 # 4 .97* .94 .95* .90 # 5 .91* .83 .74* .55 # 6 .94* .88 .84* .71 # 7 .94* .88 .86* .74 # 8 .92* V .85 .85* .72 # 9 .92* .85 .81* .66 #10 .88* .77 .62* .38 #11 .90* .81 .70* .49 #12 .84* .71 .71* .50 #13 .92* .85 .75* .56 #14 .92* .85 .84* .71 #15 .95* .90 .88* .77 Median .92 .85 .81 .66 Mean .93* .86 .83* .69 Range .84 to .97 .71 to .94 .62 to .95 .38 to .90 * p < .001 109 correlations were from .91 to .94 (correlated data) and from .78 to .87 (orthogonal data). Equal Weights An equal weighting scheme served as another baseline policy. If an outside observer wanted to predict actual judgments of a randomly chosen judge, an equal weighting scheme would suffice if no other information was known about the judges. If an equal weighting model predicted actual judgments with high accuracy, then the use of dif- ferential weighting models is questionable. The equal weighting scheme consisted of predicted ratings for each judge that were computed by using equal subjective importance weights. The equal weights were used if they were regression weights and the predicted ratings were obtained by multiplying equal weights by the standardized values of the cues. This weighting scheme is, in effect, a standardized unit weighting scheme. While the unit weighting scheme used non-standardized cue values, the equal weighting scheme used standardized cue values. The predicted ratings that re- sulted from equal weights were correlated with the actual ratings of the judges (:_Y The squared values of these correlations (r?) squual)' represented the variance accounted for by a linear model based on equal weights (Table 4.10). For the correlated data set, the median correlation between actual judgments and judgments generated from equal weights was .92. The range of correlations was from .83 to .97. The mean correlation was .92. All correlations were significant (g_< .001). For the orthogonal data set, the median correlation was .72. The range was from .55 to 110 Table 4.10 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND JUDGMENTS GENERATED FROM EQUAL WEIGHTS (r_ Ysquual) Correlated Data (N=30) Orthogonal Data (N=30) 7 2 7 2 Judge E-Ysquual 3- rYsYegual :- # 1 .97* .94 .88* .77 # 2 .92* .85 .67* .45 # 3 .92* .85 .69* .48 # 4 .96* .92 .83* .69 # 5 .91* .83 .63* .40 # 6 .94* .88 .80* .64 # 7 .95* .90 .77* .59 # 8 .91* .83 .79* .62 # 9 .90* .81 .55* .30 #10 .88* .77 .67* .45 #11 .91* .83 .69* .48 #12 .83* .69 .65* .42 #13 .93* .86 .79* .62 #14 .90* .81 .72* .52 #15 .95* .90 .82* .67 Median .92 .85 .72 .52 Mean .92* .85 .74* .55 Range .83 to .97 .60 to .94 .55 to .88 .30 to .77 111 .88. The mean correlation was .74. All correlations were significant (g_< .001). The mean correlations (in terms of Z_scores) were tested for significance. Both mean correlations were significant (for the cor- related data, t_= 29.38, g_< .001; for the orthogonal data, t_= 18.08, p_< .001). The 95% confidence intervals for the mean correlations were from .90 to .94 (correlated data) and from .69 to .79 (ortho- gonal data). Comparison of Models Since in both data conditions the mean correlations of the objec- tive, subjective, unit, average, and equal weighting schemes were significantly different from zero, it was decided to test for dif- ferences between the models. The correlations between each judge's actual ratings and the six weighting schemes are shown in Table 4.11 for the correlated data and in Table 4.12 for the orthogonal data. All correlations were transformed to ;_scores so that a repeated measures one-way analysis of variance (ANOVA) could be run. Since the random model was not significanly different from zero, it was not included in the repeated measures analysis to avoid introducing obvi- ous but trivial statistical significance. For the correlated data set, the differences between the models were significant (F(4,11) = 30.64, g_< .001). The objective model had the highest mean correlation, followed by the subjective, average, equal, and unit weighting model. For the orthogonal data set, the same order of magnitude was exhibited as the differences between the models were also significant (F(4,11) = 30.91, g_< .001). See Tables 4.13 and 4.14 for ANOVA results. 112 Table 4.11 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND SIX WEIGHTING SCHEMES Correlated Data (N=30) Judge EYsYobj, E-YsYsub E~YsYunit EYsYaverage EYsquual rYsYrand # l 97 97 .95 .96 97 -.16 # 2 94 94 .88 .93 92 -.19 # 3 96 94 .87 .94 92 .09 # 4 97 96 .92 .97 96 -.17 # 5 94 92 .91 .91 91 .18 # 6 95 95 .93 .94 94 -.18 # 7 96 94 .95 .94 95 -.08 # 8 93 92 .87 .92 91 -.08 # 9 95 95 .84 .92 9O -.06 #10 90 9O .88 .88 88 -.12 #11 92 91 .88 .90 91 .08 #12 85 84 80 .84 83 03 #13 93 93 .90 .92 93 - 05 #14 .94 .93 .86 .92 .90 -.35 #15 .96 .95 .93 .95 .95 -.21 Median .94 .93 .88 .92 .92 -.08 Mean .94* .93* .89* .93* .92* -.08 Range .85 to .84 to .80 to .84 to .83 to -.35 to .97 .97 .95 97 97 .18 * p < .001 113 Tab1e 4.12 CORRELATIONS BETWEEN COMMITTEE MEMBERS' ACTUAL JUDGMENTS AND SIX WEIGHTING SCHEMES Orthogonal Data (N=30) Judge r-YsYobj r-YsYsub EYsYunit rYsYaverage EYsquggl I-YsYrand # l 94 93 .78 .91 88 O3 # 2 82 80 .53 .77 67 - 05 # 3 91 82 .48 .86 69 10 # 4 95 94 .68 .95 83 39 # 5 90 78 72 .74 63 - OO # 6 89 87 81 .84 80 - 21 # 7 87 71 .64 .86 77 - l4 # 8 87 80 .70 .85 79 O8 # 9 .97 .95 .30 .81 .55 -.20 #10 .92 .86 .71 .62 .67 .29 #11 89 7O .60 7O 69 - Ol #12 86 80 .39 71 65 Ol #13 85 83 .56 75 79 - 10 #14 91 89 .45 84 72 - 07 #15 90 85 .73 88 82 30 Median .89 .83 .64 .81 .72 -.05 Mean .91* .85* .62* .83* .74* -.01 Range .82 to .70 to .30 to .62 to .55 to -.30 to .97 .95 .81 .95 .88 .39 *p < .001 114 Table 4.13 REPEATED MEASURES ONE WAY ANALYSIS OF VARIANCE FOR FIVE WEIGHTING SCHEME MODELS FOR CORRELATED DATA Source of Sum of Mean Variation df Squares Square F .9 Between Judges 14 2.98 .205 Within Judges 60 1.08 .019 Between Models 4 .74 .191 27.34 .001 Residual 56 .34 .007 Total 74 4.07 .054 Table 4.14 REPEATED MEASURES ONE WAY ANALYSIS OF VARIANCE FOR FIVE WEIGHTING SCHEME MODELS FOR ORTHOGONAL DATA Source of Sum of Mean Variation df Squares Square F 7g Between Judges 14 2.12 .151 Within Judges 60 7.47 .125 Between Models 4 5.14 1.286 30.911 .001 Residual 56 2.33 .041 Total 74 9.59 .129 115 From a significant F ratio, one may conclude that the mean cor- relations are not identical, but cannot determine the location or magnitude of these differences. Therefore, gg§t_hgg_comparisions were calculated. Tukey's confidence interval for a one way ANOVA was the technique employed to examine differences between pairs of mean cor- relations (Glass and Stanley, 1970).1 The following comparisons were examined between: 1) the objec- tive and unit weight model; 2) the objective and average weight model; 3) the objective and equal weight model; 4) the subjective and unit weight model; 5) the subjective and the average weight model; 6) the subjective and equal weight model; 7) the unit and the equal weight model; and 8) the unit and the average weight model. The fol- lowing contrasts were significant: 1) the objective and unit model (J = .30, g_< .05); 2) the objective and equal model (0 = .15, g_< .05); 3) the subjective and unit model (8 = .24, g_< .05); 4) the unit and the average model (0 = -.19, g_< .05); and 5) the unit and the equal model (0 = -.16; p_< .05). See Table 4.15 for the confidence inter- vals that were placed around these comparisons. If g_fell within these confidence intervals, the result of that comparison was not significant. If Q_did not fall within these confidence intervals, the comparison was significant. The results of these analyses showed that the objective weighting model was significantly different from the unit and equal weighting models, but not from the subjective and average weighting models. The 1Comparisons were constructed only for the correlated data set. TUKEY'S POST HQC_COMPARISONS BETWEEN WEIGHTING SCHEME MODELS 116 Table 4.15 ) Comparison p i) i qJ,N-J (l- 0‘) MSW/n SIG Objective and Unit .30 .16 to .44 <.05 Objective and Average .11 -.02 to .25 Objective and Equal .14 .01 to .28 <.05 Subjective and Unit .24 .10 to .37 <.05 Subjective and Average .05 -.O9 to .18 Subjective and Equal .08 -.06 to .21 Unit and Equal -.18 -.05 to -.32 <.05 Unit and Mean -.15 -.02 to -.29 <.05 Where 0 = contrast between two means qJ,N-J (1") MS D. C. = value at the (l-<%9 percentile on the studentized range distribution mean square within sample size for each group absolute value of the coefficient of the means being 3 compared 117 subjective weighting model was significantly different from the unit weighting model, but not from the objective, average, and equal weighting models. The unit weighting model was significantly dif- ferent from the average and equal weighting models. Discussion The finding of significant differences between the various weighting schemes paralleled previous research findings. The important conclusions to be drawn are that: l) the differential weighting models accounted for significantly more variance than did the unit weighting models, 2) there were no significant differences among the differ- ential weighting schemes, and 3) there were significant differences between the two unit weighting schemes. These findings should be qualified by the nature of the judgment task (i.e., a four-cue medical school admissions task) and the use of group averages to assess differences between models. Differential vs. Unit Weights The differential weighting models (i.e., the objective, subjective, and average) accounted for significantly more variance than did the unit weight or equal weight models. At first glance, this finding appeared to be at odds with previous research findings, specifically the Dawes and Corrigan (1974) and Schmidt (1971) research. These researchers found that unit and equal weighting both did extremely well in pre- dicting the criterion values. A linear model based on these weights performed as well as or better than the differential weights. Recall that Dawes and Corrigan stated that the whole trick was to decide what variables to look at and then to know how to add. Other researchers in the field have stated that these unit weighting results suggested 118 that in many decision settings, all the judge needed to know was what variables to throw into the equation, which direction (+ or -) to weight them, and how to add (Slovic et al., 1977); or one need not even go through the laborious process of differential weighting, just identify the big variables and add (Shulman and Elstein, 1975). The problem may be that researchers have generalized the Dawes and Corrigan research beyond what the authors stated or results warranted. In their study, unit weighting did extremely well in predicting cri- terion values, not actual judgments. The important relationship was between the cues and criterion values (i.e., the left hand side of the lens model). Their research found that unit weights do well on the left hand side of the lens model. This study and its conclusions, however, dealt with the rela- tionship between the cues and actual judgments (i.e., the right hand side of the lens model). Unit weights did not perform as well as differential weights on this side of the lens. Cook and Stewart (1975) found a similar result when they compared unit weights to sub- jective weights. They showed that the use of subjective (i.e., dif- ferential) weights resulted in a 12%-l4% increase in variance accounted for. The differences in this study were not as large (between 4% and 9%) for the correlated data. However, the differences were greater for the orthogonal data. That is, unit weights performed much less effectively than did the differential weights for uncorre- lated cues. Differential Weighting Models The finding that there were no significant differences between the differential weighting models has already been discussed in part. 119 Recall that there were no significant differences between the objective and subjective weights. Based on additional comparisons of models, it was found that there were no differences between either of these two models and an average weighting model. There may be several reasons why there were no significant differences. First, there was extremely high agreement among the judges in their rating of the applicants. This was surprising in light of the fact that the judges reported using different weighting schemes. In fact, when these sub- jective weights were examined for agreement among judges, three groups or types were identified. One group appeared to be weighting MCAT scores, interview scores, and GPA fairly high, while the personal statement scores received less weight. Judges 5, l, 3, 6, 15, and 4 represented this group. A second group might be termed personal qualities weighters. They weighted interview and personal statement scores quite high. Judges 8, 12, 14, 9, and 2 comprised this group. A third group might be termed the academic qualities weighters. They valued GPA and MCAT scores. Judges 13, 10, and 7 are representative of this group (See Appendix G and H). These three distinct groups of judges employed different weighting schemes, and nevertheless showed high agreement in their ratings of applicants. Part of this high inter-judge agreement is explained by the data. For the correlated data, different weightings can lead to the same ratings because of collinearity. Since the data were cor- related, different weighting schemes mattered little as long as each variable received some weight. For the orthogonal data, collinearity was not present. What might have happened there was that judges were only using some of the information that was presented to them. Their 120 ratings were not based on all the data. It was possible that if an applicant had a high score on at least one of the independent vari- ables, he/she received a high rating. In any event, the judges exhibited high agreement in their ratings of the applicants. A second reason why the average weighting model did not differ from the objective or subjective models rested with the fact that the average model struck a balance between judges. No one judge rated all the independent variables equally. (Hence, the equal weighting model did not perform as well.) Each judge used differential weights. Each variable received some weight but not in equal portions. The average weighting model reflected this. Recall also that the average weighting model was based on the regression (objective) weights generated from the average rating given to each applicant. Had there been less agreement among judges, there would have been more variability in the average ratings. The average weighting model would have become less effective. Thus, high inter-judge agreement explains why the differential weighting models were so effective in modeling judges' policies. Unit vs. Egual Weights There was a significant difference between the unit weighting model and the equal weighting model. Recall that the former used raw scores while the latter used standard scores in computing predicted judgments. If one literally added the variables, one would be using a unit weighting model. If one standardized the variables and then added, one would be using an equal (or standardized unit) weighting model. The equal weighting model provided for control of each vari- able's variance through the use of standard scores. 121 The results of this study imply that the differences between unit and equal weight models are a result of measurement procedure rather than a true difference between the models. That is, the way the weights were computed made a significant difference in this study. The weights using standard scores performed better than the weights using non-standard scores. Past research has used unit weights with standard scores. A research question which has not been addressed is, do judges have a concept of standard scores and use this concept in processing cues? Based on unit weights using standard scores, conclusions are drawn which are not only on the wrong side of the lens model but are also based on what might be a measurement artifact. For if unit weights were used as in simple addition, they did not perform as well as the differential weights, even when the unit weights are applied to standardized scores. Summar The purpose of this study was to model and compare how admissions committee members say they weight information in making judgments regarding the acceptability of medical school applicants with how mathematical representations weight the same information. Each research question was restated as a hypothesis, data were presented, a statement was made about whether the hypothesis was rejected or accepted, and findings were discussed. Data were analyzed primarily by multiple regression and correlation techniques. Each research question took a step further in determining how well subjective and objective weights worked in modeling judges' admissions policies. The first step was to compare (correlate) both 122 objective and subjective weights. The next step examined the cor- relation between actual judgments and judgments arrived at through the use of both weights. This step looked at the outcomes of these weights (i.e., how well did they predict actual judgments). The third step compared these predicted outcomes (i.e., how much agreement was there between the predicted outcomes). The fourth step assessed which weighting scheme predicted the actual judgments most accurately. The results of the tested hypotheses were: 1) N0 relationship existed between objective and sub- jective weights. Rejected for both data conditions; 2) A positive relationship existed between actual judg- ments and judgments generated from objective weights. Accepted for both data conditions; 3) A positive relationship existed between actual judg- ments and judgments derived from subjective weights. Accepted for both data conditions; 4) A positive relationship existed between the judg- ments generated from both objective and subjective weights. Accepted for both data conditions; 5) There was a greater relationship between actual judgments and judgments generated from objective weights than there was between actual judgments and judgments derived from subjective weights. Rejected for the correlated data, accepted for the orthogonal data. When comparisons were made between objective and subjective weights, it was shown that there was a high correlation between the 123 two weighting schemes. This result lent support to the hypothesis that judges can relate their subjective importance weights and that these weights are related to objective weights. However, this correlation was based on an g_of four (cues) and should be interpreted cautiously. Additional comparisons were made to examine further the relation between these two weighting schemes. These results were consonant with the research that has shown linear models to be good approxima- tions in many decision making situations. In addition, this study showed that subjective weights were as effective as objective weights in predicting actual judgments with correlated data. Note, however, that the effectiveness of subjective weights decreased when the data were orthogonal. When the outcomes or predicted judgments generated from these weights were compared, there was extremely high agreement. Judgments derived from both weighting schemes were highly correlated. This study concluded that subjective weights were an effective model of how committee members say they weight information in making judgments about medical school applicants. This conclusion resulted from many comparisons. However, boundary conditions were established from two data sets. This conclusion was valid for correlated data, but weakened for orthogonal data. Subjective weights lost their effectiveness when applied to orthogonal data. Having established the comparisons between objective and sub- jective weights, concerns centered on alternative weighting schemes. Therefore, additional analyses examined the effectiveness of four alternative weighting models (i.e., unit weights, random ratings, average weights and equal weights) in capturing judges' policies. 124 The results showed that: 1) There were significant differences between the various weighting scheme models; 2) The differential weighting models (i.e., objective, subjective, and average) accounted for significantly more variance than did the unit weighting models (i.e., unit and equal); 3) There were no significant differences between the differential weighting models; 4) There were significant differences between the unit weighting models. The finding of significant differences between differential and unit weighting schemes seemed at first blush to be at odds with previous research. However, that research examined a different judg- ment task than the one in this study. This research studied the relationship between cues and judgments not between cues and criteria. Unit weights were not as effective as differential weights in pre- dicting judgments. This difference was greatest for orthogonal data. It was not as large for correlated data. The unit weights lost their effectiveness under orthogonal conditions. Another result showed that there were no differences between the differential weights. It was shown previously that there were no significant differences between the objective and subjective weights. Since there was such high inter-judge agreement, a differential weighting model based on this agreement was quite successful. Thus, all three differential models were quite successful in predicting actual judgments. 125 The finding of significant differences between unit weighting models pointed to the importance of examining how judgments were computed. That is, a simple unit weighting model just added the four independent variables to generate predicted judgments. A more advanced model standardized the independent variables and then added them. The simple model was less effective than the standardized model. If a judge used a simple additive model, judgments could be predicted from the independent variable with the greatest variance. Therefore, how judgments were computed made a difference in the success of pre- dicting actual judgments. CHAPTER V CONCLUSION In this chapter 1) a summary of the study and its findings are presented; 2) limitations are examined; 3) implications for researchers are presented and 4) future research is recommended. Summar Medical school admissions corrmittees are charged with the task of selecting applicants for their entering classes. This task involves examining various admissions criteria, determining their importance and making judgments based on these criteria. Committee's definitions of quality reflect how they weight information in making judgments about the acceptability of applicants. Thus, quality involves the selection and weighting of variables in order to make judgments. When the issue of quality is examined, it becomes apparent that there is little or no consensus. Committee members have different conceptions of quality. Yet, they must make decisions about the acceptability of applicants based on some conception of quality. This decision-making process takes place in an environment riddled with controversy. Problems range from making medical schools representative of the socioeconomic and racial components of the gen- eral population to meeting society's health care needs. Understanding how judgments are made (specifically, how information is weighted), allows one to infer what is meant by quality. This understanding lays the needed groundwork for communication among admissions com- mittee members. 126 127 The means available to examine the issue of quality and how admissions committees weight information emerge in part from psycho- logical research in the areas of clinical judgment and decision making which has been concerned with how to model or characterize judgments or decisions of clinicians. This modeling attempts to explain how clinicians use information to reach judgments or decisions. A problem is that some judgment research has shown that judges cannot estimate accurately their combination and weighting rules. Serious discrepancies often exist between jduges' subjective and objective (mathematical) weighting schemes. Thus, what judges relate about their weighting schemes is often regarded as invalid. However, another body of this research implies that judges can relate what they are doing when making decisions. This research accepts the use of self report and introspection as measures to assess the judgment process. The use of subjective (self report) weights is a valid area to be investigated. Therefore, the purpose of this study was to model and compare how medical school admissions committee members say they weight infor- mation in making judgments regarding the acceptability of applicants with how mathematical representations weight the same information. The importance of exercising sound judgment in the selection of medical school applicants was apparent. Yet when the literature on medical school admissions was examined, there was little or no convergence with the literature on judgment and decision making. The time was ripe for these two bodies of research to interact. A review of medical school admissions showed that policies and procedures changed drastically throughout the history of this country. 128 Requirements have gone from minimal to elaborate. Current requirements are quite stringent, with grade point average, MCAT scores and personal interviews being the primary admissions variables. Other important admissions variables include autobiographical (personal) statements, letters of evaluation and extracurricular activities. A severe strain is placed on the admissions committee as it attempts to process these admissions variables for a diverse applicant pool. Problems arise from identifying, measuring and evaluating important admissions criteria; processing applicants efficiently; selecting the most qualified applicants; minimizing the financial, academic and emotional costs of the process; and assisting rejected applicants in assessing their career goals. A key first step in addressing potential solutions to these problems is to examine how committee members say they weight admissions variables when making judgments about the quality of medical school applicants. Admissions provides a rich content area to explore the judgments of committee members. The judgment research has shown that tasks requiring the inte- gration and combination of information to reach a judgment is best performed actuarially (i.e., by routine application of explicit rules). A rule-based procedure is superior to a case-by-case procedure for such tasks. Another finding of this research is that many kinds of decision makers (e.g., psychologists, stock brokers, radiologists) have been modeled successfully by linear models. These models have performed as well as or even better than more complex non-linear models (e.g., configural). This research has also shown that a linear model of a judge is often a better predictor of actual judgments than the judge from which the model was derived. This has been termed the bootstrapping effect. 129 The few studies done on modeling a judgment policy with sub- jective weights have shown that promising work lies ahead. It was shown that these weights were effective models and warranted further research. Subjective weights provide a means of examining how judges say they weight information when making judgments. Questions of interest are whether these weights are related to the typical weights of the linear model (i.e., regression weights). Are these weights effective in a linear model? Are these weights useful in predicting judgments? With these questions in mind, this study examined the relations between subjective (self-report) and objective (regression) weights. Two testing sessions, one using correlated (representative) admis- sions data, the other using orthogonal (non-representative) admissions data, were required to achieve the purpose of this study. Each data set contained the same introduction, instructions and description of independent variables (i.e., GPA, MCAT scores, personal statement scores and interview scores). The committee members' task was twofold: 1) rate each of the applicants (40 total) on an acceptability scale and 2) report the subjective importance that was attached to each of the four independent variables. Once these tasks were completed, the following data were col- lected or developed for each committee member: 1) Objective and subjective weights; 2) Actual judgments and judgments generated from objective and subjective weights. This information was used to test the following hypotheses: 130 1) No relation existed between objective and subjective weights; 2) A positive relation existed between actual judgments and judgments generated from objective weights; 3) A positive relation existed between actual judgments and judgments generated from subjective weights; 4) A positive relation existed between the judgments generated from both objective and subjective weights; 5) There was a stronger relation between actual judgments and judgments generated from objective weights than between actual judgments and judgments generated from subjective weights. Data were collected and analyzed using correlation techniques, multiple regression, paired t-tests, repeated measures one-way analysis of variance and post hgg_comparisons. The results of the hypotheses showed that: l) A significant positive relationship existed between objective and subjective weights, for both data conditions; 2) A significant positive relationship existed between actual judgments and judgments generated from ob- jective weights, for both data conditions; 3) A significant positive relationship existed between actual judgments and judgments generated from sub- jective weights, for both data conditions; 4) A significant positive relationship existed between the judgments generated from both objective and subjective weights, for both data conditions; 131 5) For the correlated data, there was not a significantly greater relation between actual judgments and ob- jectively generated judgments than there was between actual judgments and subjectively generated judgments. However, for the orthogonal data, there was a signifi- cant difference between the correlation of actual judg- ments with objectively generated judgments and the correlation of actual judgments with subjectively gen- erated judgments. This study concluded that subjective weights were an effective weighting scheme in modeling how committee members said they utilized information when making judgments about the acceptability of medical school applicants. This conclusion resulted from many comparisons: from the weights themselves to the outcomes arrived at from these weights. However, boundary conditions were established from two data sets. Subjective weights were more effective for correlated data than they were for orthogonal data. Subjective weights proved to be a valid measure to model an admissions judgment task under correlated data conditions. Once the comparisons between objective and subjective weights were made, additional concerns arose centering on the use of alterna- tive weighting models. Four additional weighting schemes were examined: (1) unit weights, (2) random ratings, (3) average weights and (4) equal weights. This necessitated developing the following data: 1) Unit weights, random ratings, average weights and equal weights; 132 2) Judgments generated from unit weights, random ratings, average weights and equal weights. Comparisons were made between these four weighting schemes and the objective and subjective models. Analyses showed that: 1) There were significant differences between the six models; 2) The differential weighting models (i.e., objective, subjective and average) accounted for significantly more variance than did the unit weighting models (i.e., unit and equal); 3) There were no significant differences between the differential weighting models; 4) There were significant differences between the unit weighting models. From these results, it was concluded that the differential weighting models were more effective than the unit weighting models in predicting committee members' judgments. Differential weighting schemes were most effective in modeling admissions judgment policies. However, there were no significant differences between the differ- ential weighting schemes. It will be recalled that there were no significant differences between objective and subjective weights for the correlated data. Since there was such high inter-judge agreement, a weighting scheme (average weights) based on this agreement was highly accurate in predicting actual judgments. Had there been less inter- judge agreement, the average weighting model would have been less effective. Also, previous research has shown that when the predictor 133 (independent) variables are correlated, different sets of positive weights tend to yield similar correlations. The results obtained with three differential weighting models confirm this conclusion. Another interesting result was that how the unit weight judg- ments were computed made a difference. If simple unit weights were used, they were less effective than standardized unit weights. The unit weighting scheme used non-standardized cue values while the equal weighting scheme used standardized cue values. A judgment policy of simply adding the independent variables (unit weighting) would not be very effective in predicting committee members' judgments. Thus, it was important to see how the predicted judgments were computed. Boundary conditions were established from two data sets. Weighting models were most effective for the correlated data. Accuracy decreased for the orthogonal data. Therefore, the inter-correlation of data should be examined before certain weighting schemes are used. It will make a difference in effectiveness. Limitations of the Study The limitations of this study fall within two general categories. One is related to external validity, the other to internal validity. Limits on external validity are as follows. First, the generaliza- bility of the study is limited to admissions committees who are similar to the tested subjects. Second, other independent variables could have been presented to the judges. Third, there were additional ways to measure the success or goodness of fit of the various weighting models. Fourth, different restrictions could have been placed on the judgment task. 134 The results are restricted to those subjects who are similar to the group tested in this study. These committee members were elected to three year terms and were trained through workshops to become familiar with the existing admissions policy. Members sit on the com- mittee for one year before they are allowed to participate in setting new policy each year. In this manner, the members have at least one year of experience upon which to base or make any changes in the process. Since the subjects of this study had an average of two years experience, these results may generalize to those committees constructed in a similar fashion. Four independent variables were selected for study. Additional or different variables could have been selected. Although the selected variables are the most widely used, there is other valuable information examined in the admissions process. The results might be restricted by the number and type of variables. Therefore, increasing the number of variables or type of variables might change the results of this study. Ratings of two variables (i.e., personal statement and interview scores) were presented to the judges. Judges did not have to rate the personal statements or interview the applicants. The results of ratings were done in advance and these were presented to the judges. Had judges done the ratings or interviews, this might have altered the results. This study relied primarily on the weights themselves and how these weights were used to generate judgments. Regression and correlation techniques were the method of analysis. Other techniques could have been used. For example, decision trees, computer simulations 135 or thinking aloud protocols could be used to test the efficacy of different weighting models. These types of analyses would further the understanding of how judges weight information when making decisions. Finally, judges were not limited in the number of times they could give a rating to an applicant. That is, there was no pressure to decide on a set number of applicants who would be accepted. If judges wanted to give all applicants a high rating, they could do so. This freedom is not possible in the admissions process. Committees are restricted by the number of places available. This restriction was not placed on the subjects in this study and thus may have reduced the generalizability of the research. Two factors which might limit the internal validity of the study are as follows. First, some of the subjects have worked together as a committee. Second, the subjects volunteered to participate in this study. That the subjects worked together has been discussed in part. It will be recalled that there was high inter-judge agreement. This agreement may have resulted from the fairly extensive training members received. Members may have been aware of how other members might be judging applicants and therefore, adjusted their judgments. This would contaminate some of the results. The subjects who participated in this study were non-paid volunteers. In any study where there are volunteers, there is a risk of self-selection biasing the results. However, the fifteen committee members who served as judges represented 95% of the total committee of 16. Thus, the entire committee was essentially represented, but: again, results are limited to similar committees. 136 Implications Based upon the findings of this study and some of the questions raised in the literature, a number of implications are suggested. They related to (l) the use of linear models, (2) the use of subjective weights, (3) the use of differential vs.unit weights and (4) the use of different data sets. The linear model proved successful in representing another judg- ment task. Admissions committee members were successfully simulated by such a model. Another kind of decision maker (i.e., medical school admissions committee members) and another judgment task (i.e., medical school admissions) has been added to a growing list in the judgment paradigm. The use of subjective weights to model judgments is a relatively new area in the judgment paradigm. Research has shown discrepancies between subjective and objective weights. The research has typically used one performance criterion, the correlation between objective and subjective weights. The results of this study showed that the subjective weights performed quite admirably in terms of three different performance criteria: (1) the subjective weights were correlated positively with the objective weights; (2) there were no significant differences in the prediction of actual judgments when subjective and objective weights were used; and (3) subjective weights yielded predicted judgments which correlated highly with predicted judgments generated from objective weights. Using each performance criterion, the sub- jective weights worked quite well. These findings are also consonant with the research showing that when predictor variables are positively 137 correlated, different sets of positive weights tend to yield similar correlations. Thus, the subjective weights performed as well as the objective weights. A major implication for medical school admissions is to use sub- jective weights in a linear model. Weights could be elicited from committee members and then used in a linear equation, thus boot- strapping the committee member's judgment policy. Factors such as fatigue, boredom, daydreaming and malaise may cause a judge to be inconsistent in making decisions. A linear model is resilient to such sources of error and is consistent with a judge's policy. An even stronger case can be made for the use of subjective weights when the predictor variables are correlated. When examining the issue of differential and unit weights, two concerns arise. The first cautions readers not to overgeneralize researchers' findings. The lens model has two sides; one deals with a criterion, the other concerns actual judgments. Weights used to predict actual judgments are not necessarily the ones used to predict a criterion. Therefore, conclusions about weights derived from one side should not be generalized to the other side. The second concern alerts the reader to note how researchers compute various scores. For example, this study found that the use of standardized vs. non-standardized cue values could lead to dif- ferent conclusions. An equal-weighting model using standardized unit cue values was found to be superior to a unit-weighting model using non-standardized cue values. Although this finding is consistent with previous research that found an equal-weighting model to be effective in predicting a performance criterion, it points to the 138 need to be careful not to ignore the measurement components of modeling a judgment task. A final implication concerns judges' perceptions of the data sets. The use of two data sets established boundary conditions for the various weighting schemes and incidentally produced an interesting finding. That is, experienced committee members could not differ- entiate representative from non-representative data. This may be partially explained by script theory or the fact that they had no reason to believe that non-representative data were being used in one of the testing sessions. However, the fact is explained, the judges were not able to recognize the independence of the four variables in the orthogonal data set. These findings only scratch the surface in identifying judges' perceptions of data and how these perceptions affect various weighting schemes. Recommendations for Future Research This study is a first step in gaining a better understanding of how admissions committee members make judgments about the acceptability of applicants. The next steps must not only expand the findings of this study, but also overcome its limitations. Several possibilities present themselves. The first would be to examine how committee members make judg- ments when they are presented with an applicant's entire admissions folder. Obviously, more variables are contained in this folder than the ones presented in this study. But, it would be interesting to note which variables account for the most variance in their judgments. These variables could then be compared to the ones typically used in 139 the admissions process. The task generalizability would be greatly increased if the judgments were made on the entire folder. A second possibility would be to design a study such that the criterion values would be known. That is, this study would involve both sides of the lens model. Committee members would be asked to make judgments about applicants about whom some criterion information is known. This criterion could be a rating of performance during the first year (e.g., GPA, or some combination of scores)~humwm=m .mmwosw zmmzhum zomh<4mmmou u x~ozmam< 176 APPENDIX H CORRELATION BETWEEN JUDGES' SUBJECTIVE WEIGHTS FOR ORTHOGONAL DATA SET 177 mo. v 9.. o LO ls. we. mN. Nm.n mN. o¢.u no.1 mm. mp. om. Fe. Nm. mm. mm. mm. oo.~ Nm.u mm.u FN. vN. FF. NF. Nm.n 2mm. 2mm. 25m. om. em. NF. mm. mp* oo.p mo. oe. 2mm.u co. mm. om. mm.u o~.n ¢—.u mm. cc. mm. mp. ¢F* oo.— Nm.u —m.u Nm. up. Nm. Nw.u mm.n no.u co. Nm.u up. mN.u mp* oo.p oo. om.u me. am. co. mo. ¢—. Nm. oe. Nu. oo. Npfi oo.p m¢.u m~.n m¢.u Fe. eN. oo. Fm.n me.. No.1 F¢.u FF» oo.P F—.u om.n mp. Nm.u wN. Nm. ON. PN.u mm. op* oo.P on. mo.u ow. pm. mm. mm. 2mm. cm. a * oo.P mm.u op. mN.u Nm. ON. Nm. mp.u m * oo.— 55. «cm. mN. mm. mp.u no. N * oo._ Nw. om. en. mm. mm. m * oo.p mm. mm. «N. «om. m * oo.P «mm. ow. mm. c * oo.F on. 2pm. m * oo.~ ow. N * oo.p P * m~* ¢Pw m—* N_* FP* OPN m N m N N N m N m N v * m * N N p * hum ~homnm=m .muwosw zmmzhmm onH<4mmmou : x~ozmam< 178 "22222222