fl {3.331 11!.” 3“";1'7‘ .Q 1:116" ”S". "31 a. u o :1 {13", .‘uJJM-k c:~‘:’.."'w11_v- 1"sz .." ‘EL'J‘L .m.‘ 5%}. 1v ‘2. ,L‘ 1 ‘1;va W s: W: ”a?“ I 'u'~ “(1? "~§“)1R ' 1" “.1. Y 1‘ ‘ ’ VI. '1" in.” Z: '1‘.ng V7. '1:‘". ‘31 15:3: YYVIL “:11 1:” L r. .3. ‘55:: ~ urgittui‘gHI R “Hiérr‘ '7'“ fl' u 1'4; “tit? “mi? K'Tz‘” .. 215% ' S uui‘fizfilv :~ fmwfi. III I IIIIIIIIIII‘III *. Sfi‘N; .‘c m :11. "£5.32 the 451.7,». 3‘ ul‘mh .I 11%.. 1221.5}. Ifijgiyf 21:13? Jufx “1! It}. “£5 i 1% .IIIKIIII Km- IIIIIIIIIII 1:12:11 35L “'3. .‘1 21.33. r 5' "'1: " :1 "a": .. if“. mu. 11213511; W5 fifima ”~in 1:, . r. 4‘ ‘1 “ELSE: ”v‘”. ' , -.. 1,5“? "111113.011 was: :.. film :11?" 1' m in ~Y'r't'"'{;2;u ‘- ‘r ‘wn m... ‘11:” , .,' Mu"), .351: '3" «5'31 143'” up, “I. 1'1" ‘0'. 1‘1”“ - f . .1]: ‘ .‘I'fiI ..',2nj!. . h“ I.“ ‘1. '11-. ME? 4%.“ 1 'Lt‘k.m» . W ‘ “‘41:: "w: V—.». 3 ‘ V. wk 5,! ~ v .1 ‘1JKL :‘1 v. 31.1 ., - 1:3 3:1“: “u ~13 1 5‘.‘ I ‘ 135:”. «3. . ‘n. ‘ EZ‘LJ; {2.3"} 1.15%...2..,_Im III win-v.2. "g2” é‘IiIIII: “I § I‘- L}; _ kid“ Hath,‘ ., '35. ”fish. III th '~‘~:éII 3.1? q“‘3a~\ffim n‘u khgflai:\ "ii: \ 1:5 “11%;; 1;; 1 21% III \. “‘5‘“? 543:,»th 33. 13.1,“w‘J” vhf: if“; 7“” ‘3‘? ~13} . IN»- ‘1 ' “L‘L‘r‘1‘fi . Km; 1‘... . 1?": Via, ‘21“;‘LI, “1 Tits 1: 1 v ". u.» 4": 3. . w H w. ,, 32124.. ~ my. , . IIIIIIIIIIIIIIIIIIIIIII III I IIII III . .. .. 31293 00796 2636 . - 45’ Flesh, “A“‘wsg‘z fly I s . 'Michigan State, University .. "if"; 1 .5 This is to certify that the dissertation entitled A Comparison of the Oral Interview and Behavioral Consistency Evaluation Methods for Selecting Job Applicants presented by Sally Adrienne Hildebrand Mc Attee has been accepted towards fulfillment of the requirements for the Ph.D. degree in Measurement, Evaluation, and Research Design 1W Major professor Date October 22, 1985 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 MSU RETURNING MATERIALS: Place in book drop to I Mar- FEQL 1999 I3 LIBRARIES remove this checkout from - your record. FINES will be charged if book is returned after the date stamped below. IJJ‘A‘IN - ~ A COMPARISON OF THE ORAL INTERVIEW AND BEHAVIORAL CONSISTENCY EVALUATION METHODS FOR SELECTING JOB APPLICANTS by Sally Adrienne Hildebrand Mc Attee A DISSERTATION Submitted to . Michigan State Univer51ty. in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology, and Special Education 1985 Copyright by SALLY ADRIENNE HILDEBRAND MC ATTEE 1985 ABSTRACT A COMPARISON OF THE ORAL INTERVIEW AND BEHAVIORAL CONSISTENCY EVALUATION METHODS FOR SELECTING JOB APPLICANTS BY Sally Adrienne Hildebrand Mc Attee This study compared the oral and behavioral consistency examination methods in the selection process for two mana- gerial positions. The need for such a study arose from the researcher's desire to find a testing method which possessed the desirable characteristics of the oral interview but which avoided its disadvantages. The behavioral consistency approach was used as an alternative to the oral interview because it is parallel in development, content, and administration but involves no interaction between raters and candidates. For each position, test development for both approaches was based on a job analysis which defined the essential job dimensions. Test content was parallel. The behavioral consistency examination asked candidates to describe major achievements which demonstrated their capabilities in each job dimension. The oral examination consisted of two questions developed by subject matter experts for each job dimension. There were 18 subjects in the first sample and 14 in the second. The findings were as follows: 1. The results regarding the comparability of the two methods were inconclusive. Correlations between the methods were significant and meaningful for one sample but were non-significant for the other. 2. There were no significant differences in reliability between the two methods for either the overall ratings or the dimension ratings for either sample with one exception for the dimension ratings. 3. Convergent validity results were inconclusive. The methods demonstrated convergent validity for one sample but not for the other. The methods did not demonstrate discriminant validity for either sample. 4. There were no significant differences between the methods regarding their acceptability to the raters. However, based on descriptive comparisons, the behavioral consistency method was superior in terms of rater time. 5. Based on descriptive comparisons, time efficiency for the candidate was in favor of the oral examination. However, candidate time included only actual examination time; it did not include time for travel or preparation. DEDI CAT ION to Eric Mc Attee and Janice and Stuart Hildebrand iii ACKNOWLEDGMENTS There are many people who have provided guidance and support for this study or who have participated in carrying out the research. I would first like to thank my advisor, Dr. William Mehrens, who was my dissertation committee chairman and provided advice and support throughout the project. I would also like to thank the other members of my dissertation committee, Drs. Frederick Ignatovich, William Schmidt, and Neal Schmitt, for their suggestions at the proposal stage and their review of and comments on the dissertation. The Personnel Department and Bureau of Sanitation of the City of Milwaukee participated in this project and gave me their unstinting cooperation and support. I would like to thank James P. Springer and Fagan D. Stackhouse of the Personnel Department and William Kappel, James Kosmatka, and Will Hudson, Jr. of the Bureau of Sanitation for their participation in the study. I would also like to thank other members of the two departments who participated in the study or who supported me in carrying it out. I would like to give special thanks to Timothy Keeley, Adrian Foster, and Stephen Smith of the Personnnel Department for their support in seeing me through the entire project. iv TABLE OF CONTENTS Page LIST OF TABLES viii Chapter I. INTRODUCTION . . . . . . . . . . . 1 Need . . . . . . . . . . . . . 1 Purpose . . . . . . . . . . . . 4 Research Questions . . . . . . . . 5 II. REVIEW OF THE LITERATURE . . . . . . . 6 Oral Interview Literature . . . . . . 6 Introduction . . . . . . . . . . 6 Examination Characteristics . . . . 7 Rater, Applicant and Situational Characteristics . . . . . . . 21 Comparison to Other Methods . . . . . 37 Training and Experience Evaluation Literature . . . . . . . . . . 41 Method . . . . . . . . . . . . 41 Incidence of Use . . . . . . . . 43 Studies . . . . . . . . . 44 Comparison of Methods . . . . . . . 49 III. METHODOLOGY . . . . . . . . . . . 54 Methodology for the Affirmative Action Officer Study . . . . . . . . . 54 Subjects . . . . . . . . . . . 54 Raters . . . . . . . . . . . . 55 Job Analysis . . . . . . . . . . 55 Behavioral Consistency Examination Development . . . . . . . . . 61 Oral Examination Development . . . . 66 Procedure . . . . . 68 Methodology for the Sanitation District Manager Study . . . . . . . . . 74 Subjects . . . . . . . . . . . 74 Raters . . . . . . . . . . . . 74 Job Analysis . . . . . . . . . . 74 Behavioral Consistency Examination Development . . . . . . . Oral Examination Development . . . . Procedure . . . . . . . . . . . Method of Analysis . . . . . . . . IV. RESULTS Results for the Affirmative Action Officer Study . . . . . . . . . . . Comparability . . . . . . . . . . Reliability . . . . . . . . . . Convergent and Discriminant Validity . . Rater Acceptability and Efficiency . . . Candidate Efficiency . . . . . . . Results for the Sanitation District Manager Study I O O O O O O O I O 0 Comparability . . . . . . . . . . Reliability . . . . . . . . . . Convergent and Discriminant Validity . . Rater Acceptability and Efficiency . . . Candidate Efficiency . . . . . . . V. SUMMARY AND CONCLUSIONS . . . . . . . Summary . . . . . . . . . . . . Conclusions . . . . . . . . . . . Comparability . . . . . . . . . . Reliability . . . . . . . . . Convergent and Discriminant Validity . . Rater Acceptability and Efficiency . . . Candidate Efficiency . . . . . . . Discussion . . . . . . . . . . . Implications for Future Research . . . . APPENDICES . . . . . . . . . . . . . . Appendices for Affirmative Action Officer . . . A. Task Inventory . . . . . . . . . . . B. KSA Inventory . . . . . . C. Task Groups and Essential Tasks . . . . . D. Revised Dimensions and Critical KSA's . . . E. Link-up of Task Groups/KSA Dimensions . . . F. Dimensions and KSA's Tested . . . . . . G. Dimensions . . . . . . . . . . . . H. Achievement History Questionnaire. . . . I. Rater Evaluation Form for the Behavioral Consistency Examination . . . . . . . J. Rater Evaluation Form for the Oral Examination K. Significance Tests for Differences Between Oral and Behavioral Consistency Reliability Coef— ficients for Dimension and Overall Ratings . Appendices for Sanitation District Manager . . . L. Critical Incident Inventory . . . . . . Vi 84 90 95 100 100 101 102 105 107 108 108 108 109 113 114 115 115 122 122 122 125 125 125 126 132 134 134 134 140 147 148 149 150 151 152 166 167 168 169 169 BIBLIOGRAPHY Dimensions Achievement History Questionnaire. Rater Evaluation Form for the Behavioral Consistency Examination Rater Evaluation Form for the Oral Examination Significance Tests for Differences Between Oral and Behavioral Consistency Reliability Coef- ficients for Dimension and Overall Ratings vii 184 186 203 205 207 208 LI ST OF TABLES Table page 2.1 Comparison of Training and Experience Methods . . . 50 4.1 Multitrait—multimethod Matrix for the Oral and Behavioral Consistency Methods for Affirmative Action Officer . . . . . . . . . . . . . 103 4.2 Multitrait-multimethod Matrix for the Oral and Behavioral Consistency Methods for Sanitation District Manager . . . . . . . . . . . . 110 viii Chapter One INTRODUCTION 1162211. There has been a great deal of research literature on the employee selection interview. This research has been summarized in at least eight literature reviews, which have uniformly deplored the lack of reliability and validity evidence for oral interviews. However, recent studies have found reliability and validity under the following conditions: information about the job was available to raters, the interview was structured so that the same questions were asked of all applicants, behaviorally anchored rating scales were used, and raters received training prior to the examination process. These recent results are encouraging because there is a need for oral examinations. For jobs with interpersonal and oral communication dimensions and for managerial jobs where there are relatively few candidates and technical knowledge and managerial skills must be assessed, the oral examination is the most practical selection technique. For the first type of job, the oral examination is the only content valid method available. For the second type, content validity can be achieved, and the oral examination is efficient compared to written tests and assessment centers. Further, it 2 usually is accepted by candidates and hiring agencies. Although the appropriateness of a content validity rationale for many of the examples discussed in the literature would be debatable, content validity and fairness can be supported for an oral interview when it is developed and administered according to certain procedures. However, in spite of their potential content validity and efficiency, oral examinations have some serious problems which have been examined in the literature. First, there is evidence that race, sex, attractiveness, and other non-job- related applicant characteristics can affect oral test scores. Second, ratings may be subject to a halo effect due to the applicant's general likeability or oral communi- cations skill over and above its importance to the job. Third, certain rater characteristics such as degree of accountability, responsibility, or authoritarianism may affect the ratings. Fourth, certain situational character- istics such as the quality of the previous applicant or the timing of unfavorable information can affect the ratings. And fifth, it is logistically difficult to assemble the raters and candidates, who may come from different parts of the country, together on the same days. In the process the best raters and/or candidates are sometimes lost. Because of these problems, it would be desirable to identify another testing method which possessed the desirable characteristics of the oral interview but which avoided its disadvantages. Few studies have compared the 3 oral to other types of examinations. However, a relatively new approach to assessing training and experience (the behavioral consistency approach) appears promising as an alternative to the oral examination for the following reasons: 1. The behavioral consistency approach has been based on job analysis. 2. It covers comparable job dimensions. 3. It is based on the logical rationale that past behavior is the best predictor of future behavior. 4. It is typically scored by means of behaviorally anchored rating scales. 5. It is not affected by the race, sex or attractive- ness of the applicants. 6. Because it does not assess interpersonal and oral communications skills directly, it avoids halo effect due to those factors. 7. Because raters and candidates do not have to assemble in one place, it avoids the logistics problem. The behavioral consistency method is parallel in devel- opment and content to the oral interview method. It is also similar in administration except that the presentation format is written rather than oral and there is no inter— action between raters and candidates. Therefore, it seems a logical alternative to the oral interview. 4 Purpose The primary purpose of this study was to compare the oral interview evaluation method with the behavioral consistency evaluation method. The need for such a study arose from the practical need to continue conducting oral examinations combined with the practical problems they present. Because hiring agencies are typically convinced of the validity of oral examinations, it would be infeasible to substitute a training and experience method without doing a comparative study. A secondary purpose for the proposed study was to fulfill a need for research on oral examinations in an applied setting. Much of the oral interview research has been done in laboratory settings with perhaps unwarranted assumptions regarding their generalizability to applied settings. The types of raters used in such studies are typically students or recruiters, the types of ratees used are students or simulated applicants, the stimulus material often consists of videotapes, applications, or protocols (a type of resume with additional biographical information regarding the applicant), the rating outcome is a hiring decision or lack thereof, and in many of the studies the interview content is unspecified. Some research has been done on the generalizability issues, but results have been inconsistent. More research studies need to be done with subject matter experts as raters, job applicants as ratees, oral interviews as the 5 stimulus material, a continuum of ratings as the outcome measure, and structured questions based on job-related dimensions as the interview content. This type of research would be beneficial both for private companies using oral interviews to make individual hiring decisions and for governmental agencies using oral examinations to obtain comparison ratings on candidates. Research Questions This study was designed to answer the following research questions: 1. Do the oral and behavioral consistency examination methods provide comparable ratings of applicants for employment? 2. Are the oral and behavioral consistency methods equally reliable? 3. Do the oral and behavioral consistency methods demonstrate convergent and discriminant validity? 4. Are the oral and behavioral consistency methods equally acceptable to raters and equally efficient in terms of rater time? 5. Are the oral and behavioral consistency methods equally efficient in terms of candidate time? Chapter Two REVIEW OF THE LITERATURE Oral Interview Literature Introduction There has been a great deal of research literature on the oral interview. This research has been summarized in at least eight reviews, the earliest a 1949 review by Wagner and the most recent a 1982 review by Arvey and Campion. These reviews (Arvey and Campion, 1982; Carlson, Thayer, Mayfield and Peterson, 1971; Dunnette, 1962; Mayfield, 1964; Schmitt, 1976; Ulrich and Trumbo, 1965; Wagner, 1949; and Wright, 1969) have uniformly deplored the lack of reliability and validity evidence for oral interviews and have urged additional research, including research which compares the oral to other selection methods. Although there have been many negative research findings on oral interviews, some recent studies have found reliability and validity under the following conditions: 1. The raters receive information about the job. 2. The interview is structured so that each candidate is asked the same questions. 3. There are behaviorally anchored rating scales so that each rater is using the same definition for each rating level. 7 4. The raters receive training before beginning the interviewing process. Because many of the later studies have investigated the effect of these factors and found them to improve reliability and validity, the later reviews have been more positive regarding the use of oral examinations. This literature review is focused on research studies of the oral examination characteristics leading to reliability and validity. The model proposed by Schmitt in his 1976 review of the oral interview literature is used to summarize research on rater, applicant, and situational characteristics. This review concludes with a discussion of studies comparing the oral examination to other examination methods. Examination Characteristics There has been a group of studies which has investi- gated the effect of how the oral examination has been conducted. Interviews have varied in how much information about the job was given to the raters, how structured the questions were, what type of questions were used, what type of rating scale was used, and what type of training was received by the raters. Job information. Studies investigating the effect of job information have found that giving the raters a job description improves the reliability and validity of the oral examination. Langdale and Weitz, in a 1973 study, found that raters who received a complete job description 8 had higher interrater reliability and discriminated more among the candidates than raters who received only a job title. Similarly, Rothstein and Jackson (1980) found that raters were able to discriminate more accurately between congruent and incongruent applicants (whose characteristics matched or did not match the job) when they received job descriptions than when they received only job labels. In a related study, Wiener and Schneiderman (1974) found that raters focused on relevant applicant information when job information was available. Although the effect of irrelevant applicant information was not removed, its effect was stronger when no job information was available. In addition, Leonard (1974) found that when raters were given job descriptions, interrater reliability was higher for ratings of job relevant factors than for irrelevant factors. Finally, a study by Osburn, Timmreck, and Bigby (1981) investigated the effect of providing job—related dimensions to the raters. Two applicants, each of whom was well-suited to one of two jobs, were rated accurately by raters who had access to specific and relevant job dimensions and inaccurately by raters who had access to only general job dimensions. The above studies show quite clearly the positive effect of relevant job information. The literature also shows that raters with job knowledge tend to rate the same applicant information as important. The Langdale and Weitz study showed that there was 9 substantial consistency among raters in rating the importance of applicant information regardless of the amount of job information they possessed. However, the two rater groups (one with job information and one without) did differ significantly in their ratings with uninformed raters giving lower ratings on item importance. Hakel, Dobmeyer, and Dunnette (1970) showed that rater groups which differed on their knowledge of the job (students and interviewers) differed on which content dimension they considered most important. However, within groups the raters agreed on content importance, content importance determined the ratings, and the effect of favorable information depended on content importance. In a related study, Valenzi and Andrews (1973) found that raters, all of whom had job descriptions, differed in applicant cue use (how they weighed different types of applicant information) and their ratings of applicants differed according to these weights. However, the same raters agreed in theory on cue importance. Differences in candidate ratings were due to the inability of the raters to apply their own rating strategies to the actual rating situation. As a whole, these studies provide excellent support for the conclusion that an oral interview developed according to a content validity model would indeed be valid. The first group of studies cited provides support for the idea that oral examinations which are based on job analysis and which 10 define the job dimensions to be evaluated will be reliable whereas interviews not so based will not be reliable. This is only logical and indeed it is not surprising that raters without a job description would define the job for themselves differently and would therefore be assessing the candidates on different factors. The second group of studies shows that raters with job knowledge tend to rate the same dimensions as important. The inconsistent ratings of the Valenzi and Andrews' subjects, all of whom had job descriptions, were not due to disagreement on content importance but to the inability of the raters to apply their rating strategies correctly. Interview structure. A second group of studies has investigated the effect of interview structure (asking all candidates the same questions). In support of this approach, studies done by Reynolds (1979) and Mayfield, Brown, and Hampstra (1980) showed moderate-high interrater reliability for structured oral interviews used to select police officers and insurance agents. Latham, Saari, Pursell, and Campion (1980) demonstrated both reliability and validity for a structured situational interview. They conducted three studies using samples of hourly workers, foremen, and entry-level workers. Results of the studies showed moderate—high interrater and internal consistency reliability and concurrent and predictive validity. Additional studies of the structured situational interview by these researchers (Latham and 11 Saari, 1984) also showed moderate to high internal consistency and interrater reliability and concurrent validity. On a logical basis, one would conclude that structure would be bound to increase interrater reliability since one source of variation, the questions, has been removed. In support of this contention, Schwab and Heneman (1969) investigated the effect of interview structure on interrater reliability and found that degree of structure corresponded to degree of interrater reliability for the position of clerk-stenographer. In a similar study, Janz (1982) investigated a patterned behavioral interview and concluded that this type of structured interview was more valid but less reliable than unstructured interviews. This study also found that content differed for the two formats with unstructured interviews focusing on credentials and self-perception content and structured interviews focusing on behavior descriptions. This difference in content was largely due to the differences in training received by the interviewers. Interviewers using the unstructured format were trained in establishing rapport and control while those using the structured format were trained in specific behavioral description techniques. In spite of these positive findings and the rationale supporting the use of structured interviews as a way of at least increasing interrater reliability, several studies have had negative results. A 1971 study by Hakel showed 12 that interrater reliability was low to moderate even with highly structured interviews. A follow—up study by Heneman, Schwab, Huett, and Ford (1975) showed low interrater reliability and low validity for both structured and unstructured interviews for social worker jobs. Finally, in an unpublished study, Davey (1984) investigated the effect of highly structured interviews on interview validity. He concluded that high interrater reliability does not necessarily correspond to high validity. Trained oral panels with high within-panel interrater reliability (.95 or greater) who used the same structured interview differed significantly across panels in the validity of their ratings. This study was done on a structured oral examination for State Police Trooper, which had six oral interview panels, each interviewing over 100 candidates. A high degree of structure was achieved by standardized panel training and videotape practice; standardized questions, factors and scales; and examples of good and poor responses. Oral examination ratings correlated with police academy rank .23 for 104 graduates. However, even though sample sizes were small, there were highly significant differences between the validities of individual panels, which ranged from -.O4 to .79. Although the majority of these studies support the conclusion that structured orals are at least more reliable than non—structured orals, the contradictory results obtained in the Davey, Hakel, and Heneman et al. studies are 13 puzzling and disturbing. Heneman et a1. offered the suggestion that their negative findings may have been due to the lack of behaviorally anchored rating scales while Davey has hypothesized that high within—panel agreement was achieved by the interaction of the members throughout the process, a factor which was absent across panels. Whatever the cause of these inconsistent results, it must be con- cluded that structured orals are preferable to unstructured orals since different questions are a potentially undesirable source of variation in a oral examination. Question type. Two studies have investigated the topic of question type. Latham and Saari (1984) developed an interview with both situational questions, which required applicants to state how they would behave in hypothetical situations, and questions regarding applicants' past experiences. The situational question ratings correlated significantly with job performance while the past experience question ratings did not. The positive results for situational interviews are supported by the previous findings of Latham et a1. (1980). However, any conclusions on the relative effectiveness of various question types are only tentative because of the limited evidence. Also the low validity for the past experience questions may have been affected by the small number of such questions compared to the number of situational questions. Tengler and Jablin (1983) focused on different aspects 14 of question type. They investigated interviewers' use of open versus closed-ended questions and primary (introducing new topics) versus secondary (probing previously—introduced topics) questions. The researchers concluded that applicant responses were longer for open-ended and secondary questions, open-ended and secondary questions occurred mainly during the later parts of the interviews, and there was no relationship between question type and whether applicants were offered second interviews. Results of the Tengler and Jablin study become moot in the case of structured interviews because interviewers have no leaway in the types of questions asked. Although results of the Latham and Saari study are more relevant to this research, any conclusions regarding the superiority of situational to past experience questions must be considered tentative. Question type was not a part of this research design. Rating Method. A third group of studies investi- gated the effect of rating method on oral examination quality. In spite of the conflicting results cited above on the use of structured interviews, the studies of rating method effect for oral examinations give more consistently positive results, supporting the theory underlying the use of behaviorally anchored rating scales as well as showing the increased reliability and validity of such scales. Two studies, while not providing direct support of the use of behaviorally anchored rating scales, provide 15 support for the theory behind their use. In a 1963 study, Rowe showed that there are significant between-rater differences and within-rater consistencies in where raters set passing points. That is, raters differ among each other but are consistent as individuals in how high a standard they set for passing applicants. Different job standards affected where passing points were set. This study was conducted under the condition that passing and failing categories were undefined for the raters. If such categories were defined for the raters, then it follows that the between-individual differences should decrease and the within-individual consistencies could become an advantage in overall consistency. A second study (London and Hakel, 1974) showed that ratings are affected by ideal applicant stereotypes held by raters (whether the ideal applicant was well or not well- qualified). It follows from this finding that if an ideal, average, and unsatisfactory applicant stereotype were based on job analysis and were provided to the raters through behaviorally anchored rating scales, interrater reliability, and validity should be improved. There have been at least five studies which demonstrated the increase in test quality with specifically anchored rating scales. Maas (1965) demonstrated that a scaled expectation rating method had significantly higher interrater reliability than an adjective rating scale. Results further suggested the use of rating panels rather 16 than individual raters. This would eliminate both question and candidate inconsistencies, which occur when different raters ask questions at different times and when candidates appear before different raters at different times. The 1980 study by Latham, Saari, Pursell, and Campion and the 1984 study by Latham and Saari showed adequate interrater and internal consistency reliability and concurrent validity (and predictive validity for the 1980 study) for situational interviews, which were characterized by questions based on critical incidents and behavioral statements as benchmarks. And, in an investigation of behavioral versus graphic rating scales, Vance, Kuhnert and Farr (1978) found higher interrater reliability and accuracy for behavioral than for graphic rating scales. There were also more halo and leniency errors for the graphic scales. (A halo error occurs when an applicant's rating on one dimension unduly influences his or her ratings on the others. A leniency error occurs when most of the applicants are given high or low ratings.) Finally, Fay and Latham (1982) showed that two types of behaviorally anchored rating scales were less subject to rating errors of contrast and first impression than trait based scales. However, they were not less subject to halo errors. (A contrast error occurs when an applicant's rating is influenced by those of the preceding applicants. A first impression error occurs when an applicant's rating is based primarily on the first few minutes of the interview.) One of the behavioral scales 18 anchored rating scales had not demonstrated greater interrater reliability, greater discrimination among ratees, or fewer halo and leniency errors than other types of scales. These researchers suggested that behaviorally anchored rating scales may not be worth the time and effort necessary for development if the above criteria are paramount. These conclusions from the performance evaluation literature substantially weaken the argument for including behaviorally anchored rating scales in a plan to strength oral examination reliability and validity. However, oral raters may benefit more from their use than performance evaluation raters since they usually know less about both the job and the ratee than performance evaluation raters. Also, research on the use of behaviorally anchored rating scales in oral interviews is certainly more directly applicable to additional research on oral interviews than is similar research from the performance evaluation literature. For these reasons, the use of behaviorally anchored rating scales in oral examinations still seems to have potential as a way of increasing reliability and validity, and it seems reasonable to include them in the design for a content valid oral examination. Rater training. Rater training will be discussed as an examination characteristic since it is part of the test administration process. Research on the other rater effects will be summarized later in this review. 19 Most of the studies of rater training have focused on the reduction of rating errors with the majority finding that training was successful in the reduction of such errors. Several studies have compared training methods in their ability to reduce errors. In a 1973 study, Wexley, Sanders, and Yukl found that combining warnings regarding the errors with anchoring the rating points failed to eliminate contrast errors (the effect of the previous applicant). However, a workshop eliminated this type of error. A comparative study by Latham, Wexley, and Pursell in 1975 showed that a control group committed similarity, contrast and halo errors, a discussion group committed first impression errors, and a workshop group committed none of these types of errors. In a 1981 study, Ivancevich and Smith found that training with role playing and either videotape or lecture was superior to no training in a goal setting situation. Finally, Fay and Latham in a 1982 study concluded that training reduced rating errors significantly regardless of the type of rating scale used, behavioral or trait. There was only one contradictory study in this group of training studies: a study by Vance, Kuhnert, and Farr (1978), which concluded that training had no effect on rating errors with the use of either behavioral or graphic scales. A possible explanation for this conflicting result concerns length and intensity of training. The training 20 program investigated in the Vance et a1. study was minimal in length and involvement while those in the studies previously cited were longer and more intensive. It is likely that a minimum amount of time and involvement is necessary for any training program to be effective. In contrast to the above studies, which focused on rater error, Pulakos (1984) investigated the differential effectiveness of rater training programs focusing on error, accuracy, and both error and accuracy. A no training condition was also included in the study. Findings were as follows: the most accurate ratings corresponded to accuracy training while the least accurate corresponded to no training, less leniency corresponded to accuracy training and error/accuracy training, less halo corresponded to error training and error/accuracy training, and the effectiveness of training differed across dimensions. The results of the research studies on rater training clearly show its effectiveness in increasing rater accuracy and in reducing rating errors and support its inclusion in a test administration program. Given the results of the Pulakos study, a focus on accuracy rather than error training seems appropriate. Summary of examination characteristics results. In spite of some contradictory findings concerning the effect of interview structure and type of rating scale, the results of the research literature on the effect of examination characteristics seem to warrant the following conclusions: 21 that oral examinations are more reliable and valid when the raters have detailed information about the job, when the same questions are asked of all applicants, when there are definitions for each rating point along the rating continuum, and when rater training is provided. Question type was not included in this research design. Rater, Applicant, and Situational Characteristics As the model proposed by Schmitt in his 1976 review of the literature suggested, characteristics of the oral examination itself are only one group of factors affecting oral examination quality. Other major factors in oral examination ratings are the effects due to the raters, the effects due to applicants (on non-job-related factors), and the effects due to situational factors. Findings regarding these factors will be presented because they are major sources of unreliability and invalidity in oral examina- tions. Job—related applicant effects are of course positive since examinations are designed to assess the applicants on those factors. Rater characteristics. Studies of rater characteristics and their effect on orals have included investigations of rater experience, rater selective attention ability and memory demand, rater accountability, rater authoritarianism and prejudice, and rater sex. The findings of studies investigating rater experience have not been consistent, with two investigations showing differences between experienced and inexperienced raters and 22 five studies showing no differences. Rowe (1963) in a study of armed forces personnel showed that rank, which was used as an indicator of length of experience, determined where passing standards were set. Hakel, Dobmeyer, and Dunnette (1970) found that the relative importance of content categories differed between rater groups comprised of students and interviewers. In contrast, Langdale and Weitz (1973) found no differences between experienced and inexperienced interviewers in the use of job descriptions and in ratings of dimension importance. Wiener and Schneiderman (1974) also found that experienced and inexperienced interviewers did not differ in use of relevant or irrelevant job information although experienced interviewers tended to reject applicants oftener. In a 1974 study, Moore and Lee found no difference between interviewers and managerial groups in rating errors, and Heneman et al.(l975) found no differences in the ratings of students and social worker subject matter experts in reliability and validity. Finally, Mullins (1982) found that the ratings of students and experienced interviewers were comparable. Cardy and Kehoe (1984) investigated the relationship between rater accuracy, selective attention ability, and memory demand. They distinguished between raters who were field-independent (inclined to perceive things analytically) and field-dependent (inclined to perceive things holisti— cally). The researchers found that field-independent raters 23 were more accurate although field-independency accounted for only a small part of the variance in rater accuracy. They also found that memory demand affected accuracy of ratings (with high memory demand corresponding to lower accuracy) but did not interact with selective attention ability. However, memory demand is not an issue in oral examinations if ratings are made immediately after the candidate is interviewed. A single study has been done on the important issue of rater accountability and responsibility. Rozelle and Baxter (1981) found that under the condition of high accountability and responsibility, there was high interjudge agreement and low within-judge overlap (similar ratings by one judge of several applicants) in contrast to the low accountability and responsibility condition. The subject of rater authoritarianism and prejudice has likewise received surprisingly little attention. Two studies have been done which linked rater authoritarianism and prejudice with applicant race and sex. Simas and McCarrey (1979) found that high authoritarian raters of both sexes rated males higher than females and made more job offers to males than did low authoritarian raters. In a 1982 study, Mullins investigated the impact of rater prejudice (measured by an attitude inventory), applicant race, and applicant quality on ratings and found that while high quality applicants were rated high regardless of race, marginally qualified blacks were rated better than 24 marginally qualified whites. Prejudiced raters rated blacks higher. A final set of studies has investigated the effect of rater sex. Ferris and Gilmore (1977) found that male raters gave higher ratings than female raters. In contrast, Parsons and Lidden (1984) found that female raters gave higher ratings than males but that this effect was inconsequential compared to the effect of non—verbal applicant behavior. Interviewer sex also interacted with applicant non-verbal behavior with female interviewers giving higher ratings of non—verbal cues. Two other studies showed the interaction of rater sex and applicant non-verbal language and attractiveness. Sterrett (1978) found that men and women raters differed in their ratings of applicants with differing body language, and Baron (1983) found that males gave lower ratings to scented applicants while females gave them higher ratings. Summary of rater characteristics results. Although the results of the studies investigating rater character- istics are not completely consistent, the following conclu- sions can be drawn: the effects of rater experience and selective attention ability are negligible, the effect of memory is substantial but not an issue, raters who are accountable and responsible produce more reliable ratings, raters who are authoritarian or prejudiced favor males and blacks, and male and female raters differ in their responses to non-verbal communication. As a whole these results are 25 encouraging because these factors can potentially be controlled in the test development process. Of the six factors discussed, experience, selective attention ability, and memory are not concerns, and raters can be chosen so that they are accountable and responsible. Although the effects of rater authoritarianism and prejudice and sex are disturbing, it is probable that they could be alleviated by proper training. Applicant characteristics. In addition to characteristics of the interview and the rater, another important component of the oral interview model is applicant characteristics. Studies investigating the effect of non-job-related applicant characteristics upon oral examinations have dealt with the following topics: applicant race, sex, age, attractiveness, similarity to the rater, non-verbal factors, motivation and anxiety, and training. Considering the importance of race effects, there has been a paucity of studies concerned with applicant race. Five studies have been done. Rand and Wexley (1975) found no significant effect for race while McDonald and Hakel (1985) found significant but inconsequential effects. Parsons and Lidden (1984) found some race effects on applicant ratings with whites receiving higher ratings, but these were inconsequential compared to non-verbal effects. Race was also related to non—verbal behavior with whites receiving higher ratings on non—verbal cues than blacks. In contrast, Mc Intyre, Moberg, and Posner (1980) 26 and Mullins (1982) found preferential treatment for blacks over whites. The Mullins study, cited earlier in this paper, showed that high quality applicants were highly rated regardless of race but that marginally performing blacks were rated better than comparably performing whites. In contrast to the dearth of studies on race, there has been a multiplicity of studies on applicant sex. Several studies have shown that raters have a general tendency to prefer males. In a 1975 study, Dipboye, Fromkin, and Wiback found that both professional interviewers and students preferred males to females. This finding was confirmed by two other studies: McIntyre et al. (1980) and Cann, Siegfried, and Pearce (1981) both found that male applicants were preferred over female applicants. The remainder of the applicant sex studies were concerned with the interaction of applicant sex with some other factor. Simas and McCarrey (1979) found that high authoritarian personnel officers of both sexes rated males more favorably than females and made more job offers to males. Several researchers studied the interaction of sex with type of job. A study by Heilman (1980) showed that women were rated lower by male and female raters when females comprised less than twenty-five percent of the applicant pool. Heilman concluded that the effect of sex was mediated by the degree to which sex stereotypes operated. In similar studies, Cash, Gillen, and Burns 27 (1977) found that males were rated more favorably for masculine jobs and females were rated more favorably for female jobs, and Cohen and Bunker (1975) found that males were preferred for a male—oriented position and females were preferred for a female-oriented position. Finally, Heilman and Saruwatari (1979) found that men were preferred for managerial positions while women were preferred for non- managerial positions. They further found that attractive- ness was a third interacting factor. Attractiveness was an advantage for men regardless of type of job while it was a disadvantage for women seeking managerial positions and an advantage for women seeking non-managerial positions. A recent study by Forsythe, Drake, and Cox (1985) extended the investigation of sex effects to women's clothing. Results of their study showed that women wearing more masculine clothing received higher ratings when applying for managerial jobs than women wearing more feminine clothing. Another factor considered by researchers to have a possible interaction with applicant sex was the strength of an organization's employment policy. The results of this study were puzzling. Rosen and Mericle (1979) found that the strength of the employment policy and sex of applicant had no effect on the hiring decision, but females received lower starting salaries from companies with strong employment policies. This review of the literature on applicant sex effects 28 concludes with three studies which contradict the consistent results cited previously. Ferris and Gilmore (1977) found that a female applicant received higher overall favorability ratings from student raters than a male applicant. However, this study had a relatively high alpha level (.10). Parsons and Lidden (1984) found some sex effects on applicant ratings with females receiving higher ratings, but these were inconsequential compared to non-verbal effects. Sex was also related to non-verbal behavior with females receiving higher ratings on non-verbal behavior than males. However, the preference for females in this case may have been due to the nature of the jobs, which were in an amusement park and may have demanded stereotypical feminine traits. A last study by McDonald and Hakel (1985) found significant but inconsequential sex effects in a study involving student raters. There was one study on applicant age effects. Rosen and Jerdee (1976) investigated the interaction of age of applicant with sex of rater. They concluded that older employees were judged less reactive, more cautious, less physically capable, less interested in technology, and less trainable than younger workers by both male and female raters. There have been numerous studies on applicant attractiveness, all concluding that attractiveness affects ratings positively although it may interact with applicant sex. Dipboye et a1. (1975) found that professional 29 interviewers and students both preferred attractive to unattractive applicants. The results were less strong in a study by Cash et al.(1977). Study results indicated that attractive applicants were preferred for in-role and neutral jobs but only on one of three criteria, ratings of qualifications. The other criteria were ratings of success expectancy and hiring recommendations. As mentioned earlier, Heilman and Saruwatari (1979) found that attractiveness interacted with sex; it was an advantage for men but only for women seeking non-managerial positions. Cann et a1. (1981) found that male and attractive applicants were preferred. In contrast to these findings, Carlson (1967) found no effect for appearance. However, when both appearance and job-related factors were complimentary, there was an additional component in the ratings greater than that contributed by the separate ratings alone. A final study of attractiveness was conducted by Baron (1983). Study results were that rater sex and use of scent interacted in the ratings of applicants on job-related dimensions and personal characteristics. Males assigned lower ratings to scented applicants; females assigned higher ratings to scented applicants. Several researchers have studied the effect of similarity of the applicant to the rater. Results from these studies have been mixed. Frank and Hackman (1975) found that the effect of similarity varied according to the 30 rater. There was no general similarity effect. However, Baskett (1973) found that while applicant competency influenced the hiring decision and the salary offered, similarity also affected the salary. Rand and Wexley (1975) also found a significant effect for similarity. The effect of non—verbal factors on interview ratings has also been the subject of several studies. Washburn and Hakel (1973) and Imada and Hakel (1977) both found a significant effect due to non-verbal applicant communi- cation. Sterrett (1978), cited earlier, found that men and women differed in their ratings of applicants with different body language. Women interpreted high intensity body language to mean low ambition while men interpreted low intensity body language as meaning low ambition. Since this study confounded body language with attractiveness, these results may not be meaningful. Finally, Parsons and Lidden (1984) found that non—verbal cues were highly correlated with applicant qualification ratings and that they accounted for a large part of the variance in applicant ratings compared to objective biographical information or applicant race or sex or rater sex. There were also significant relationships between non-verbal cues and applicant sex, applicant race, and interviewer sex with females and whites receiving higher ratings and female interviewers giving higher ratings. The researchers also investigated the relative contributions of various types of non-verbal cues and found that speech characteristic cues accounted for most 31 of the variance in qualification ratings whereas personal appearance cues accounted for little or none of the variance after speech characteristics were taken into account. Results of two studies contradict these previously cited findings. Hollandsworth, Kazelskis, Stevens, and Dressel (1979) concluded that non-verbal behavior was unimportant compared to content, fluency, and composure on the part of the applicant, and Rasmussen (1984) concluded that non-verbal behavior was unimportant compared to resume credentials and verbal content. Rasmussen also found an interaction between verbal content and non-verbal behavior, with an effect for non-verbal behavior when verbal content was high but not when verbal content was low. A final group of studies on applicant effect has to do with self-esteem, anxiety, motivation, and training of the applicant. King and Manaster (1977) found that self-esteem and body satisfaction of applicants had no effect on interview ratings they received. Keenan (1978) found that anxiety had no effect on the ratings but that there was a motivation effect with intermediate motivation on the part of applicants having a positive effect on their ratings. Applicants were more confident of success when they were highly motivated and liked the interviewer. Finally, studies of applicant training in interviewing skills were conducted by Barbee and Keil (1973) and Hollandsworth, Dressel, and Stevens (1977). According to the Barbee and Keil study, a combined treatment of videotape 32 feedback and behavior modification improved applicants' interview ratings over the only videotape condition or the control condition. Hollandsworth et a1. (1977) investigated behavioral training versus group discussion and found that interviewees from the discussion group increased their speaking times and were superior to those from the behavioral and non-trained groups in explaining their skills and opinions. Summary of applicant characteristics results. Studies of applicant effects lead to the following general conclusions. The effect due to applicant race is unclear, there is an applicant sex effect with males preferred, there is an applicant age effect with younger applicants preferred, there is an applicant attractivenesss effect with attractive applicants preferred except for females desiring non-traditional jobs, there is an applicant similarity to rater effect with similar applicants preferred, and there is a non-verbal communication effect with vivacious applicants preferred. There is limited evidence on the effects of applicant self-esteem, anxiety, and motivation, but training improves applicant performance. If the desired outcome of the oral interview process is to rate all applicants fairly based on their demonstrated skills on job-related dimensions, the results of these studies on applicant effects provide reason to doubt its effectiveness. However, many of these studies suffered from problems of poor test development. If the raters were not 33 given a job description, if the job dimensions were undefined and if there were no behavioral anchors for the rating scales, then the Wiener and Schneiderman study (1974) suggests that ratings are more liable to be based on irrelevant information such as race and sex. Their study showed that when job information was provided, the irrelevant applicant information, while still having an effect, had considerably less effect than the relevant information. Situational characteristics. Situational characteristics comprise the final major factor affecting interview results. Studies investigating the effect of situational characteristics upon oral examinations have dealt with the following topics: primacy/recency effects, contrast effects, typical expected applicant effects, and interview length effects. Primacy effects refer to the predominance of information presented early in the interview while recency effects refer to the predominance of information presented late in the interview. Most of the primacy/recency studies have also investigated the interaction between information favorability and its timing. Results of these studies have been inconsistent, with some studies finding primacy effects, some finding recency effects and some finding both. Studies by Bolster and Springbett (1961), Blakeney and Mac Naughton (1971), and Tucker and Rowe (1979) all found primacy effects. Tucker and Rowe provided the clearest 34 demonstration of primacy effects in their investigation of reference letters. Applicants with negative reference letters were given less credit for past successes and more blame for past failures than applicants with positive reference letters. This study also showed a greater primacy effect for negative information, with negative reference letters penalizing applicants more than positive reference letters benefitted them. Bolster and Springbett found primacy effects for the first piece of inconsistent information, especially when the information was negative. Blakeney and Mac Naughton found primacy effects for negative information but concluded that these effects were not meaningful, accounting for only a small percentage of the variance of the final results. In contrast to these studies, investigations by London and Hakel (1974) and Okanes and Tschirgi (1978) found recency effects. London and Hakel found recency effects for unfavorable information, and, in direct contrast to the Tucker and Rowe study, Okanes and Tshirgi found that initial ratings based on other materials were changed with the addition of an interview. Several studies found both recency and primacy effects, depending on the consistency of information. Carlson (1971) found no recency or primacy effects for consistently favorable information but primacy effects for consistently unfavorable information and recency effects for inconsistent unfavorable information. Farr (1973) found recency effects 35 with repeated judgments but no effect with single judgments except for a primacy effect in one condition. Farr and York (1975) found recency effects for repeated judgments and primacy effects for a single judgment. A second set of studies on situational characteristics investigated contrast effects. These effects have to do with the effect of the previous applicants on ratings of each following applicant. Wexley, Yukl, Kovacs, and Sanders (1972) found signficant contrast effects. While this accounted for only a small part of the variance for strong or weak applicants, it accounted for a large part of the variance for average applicants. In addition, an investi- gation by Heneman et al. (1975) found that interviewee order affected ratings while amount of interview structure and biographical data did not. In contrast, Landy and Bates (1973) found no contrast effects in two studies, and Wexley, Sanders, and Yukl (1973) found that a rater training workshop eliminated contrast errors. Finally, several studies (Carlson, 1970; Hakel, Ohnesorge, and Dunnette, 1970; and Kopelman, 1975) found contrast effects but also found that they accounted for an insignificant portion of the variance of ratings. Kopelman also found that contrast effects were most influential for candidates of average performance. This coincides with the Wexley finding cited earlier. There has been a single study on the effect of the typical expected applicant. London and Hakel (1974) found 36 no main effect on ratings for the level of the typical expected applicant. However, there was an interaction with information favorability. A high caliber typical expected applicant led to a better rating of unfavorable information. A final group of studies on situational effects had to do with interview length. Three studies have investigated this topic. Anderson (1960) found that applicants talked the same amount of time and interview length was constant regardless of the hiring decision, but that interviewers spoke at greater length with accepted applicants. Tullar, Mullins and Caldwell (1979) found that raters took more time to make their rating decisions with high quality applicants and that they took more time to decide when interviews were expected to last longer. Tengler and Jablin (1983) found that applicants who were offered second interviews differed on a composite measure of interview response time from those who were not offered second interviews. The successful applicants spent less time answering questions but more time talking than the unsuccessful applicants. However, these differences were not significant. Summary of the situational characteristics results. Results of the situational research literature cannot be summarized easily because they are inconsistent. However, several of the contrast effect studies and one of the primacy/recency effect studies showed that, while these effects were present, they did not account for a meaningful portion of the variance of ratings. Also, there was no main 37 effect due to the typical expected applicant and no clear conclusion regarding differences in speaking or interview times of successful and unsuccessful applicants. Based on the above analysis, it seems likely that situational effects are not meaningful and, unlike applicant effects, do not pose a serious threat to oral interview validity. Furthermore, the situational effect studies have in general not described the test development process. As pointed out in the discussion of applicant effects, if the interview were not based on job analysis, if job dimensions were not defined for the raters, if behaviorally anchored rating scales were not provided, and if raters received no training, then the Wiener and Schneiderman study (1974) suggests that ratings are liable to be based on irrelevan- cies such as situational characteristics. Finally, the Wexley et al. study (1973) demonstrated the effectiveness of rater training in eliminating contrast errors. Comparison to Other Methods Research comparing the oral interview to other testing methods has been extremely limited. Tubiana and Ben Shakhar (1982) compared the results of an objective questionnaire to results of an interview assessing personality factors for Israeli army officers and found that the two methods were comparable. In a similar study, James, Campbell, and Lovegrove (1984) compared the results of a personality test to the results of an oral interview assessing suitability. One of the personality test scales (social conformity) 38 correlated moderately with the oral suitability score for men; however, there was no relationship between the personality test and oral interview for women. Several studies, while not providing direct evidence regarding comparison of the oral interview with other approaches, have contributed related information by trying to get at the issue of generalizability. Unfortunately, the results of these studies are inconsistent. Moore and Lee (1974) found no differences between live and videotaped interviews, and Ferris and Gilmore (1977) showed that there were no differences between ratings of resumes, videotapes or audiotapes. On the other hand, Imada and Hakel (1977) found significant differences along a rater proximity continuum (whether the rater observed a videotape, observed an interview or was him or herself the interviewer). Also, Washburn and Hakel (1973) found that there were differences due to whether the presentation mode was audiovisual, visual or transcript. Finally, a recent study by Ricchiute (1985) found that mode of presentation (visual, auditory, and visual/auditory) and task importance affected the decision-making of auditors. The studies presented above are not only inconsistent in their findings but are not completely relevant in that the content of the oral interview was undefined, the interview was based on personality characteristics rather than job dimensions, or the oral presentation mode did not consist of an interview. Therefore, the need for a study to 39 assess the comparability of the oral to other testing methods which are intended to assess job-related character— istics of applicants has been confirmed. An issue related to the relationship between the oral interview and other testing methods is whether oral interviewers should have access to application materials and if so what effect these materials have on the ratings. A study by Tucker and Rowe (1977) supports the use of the application in its finding that applications increased the amount of relevant information the raters had. However, a second study by the same authors (Tucker and Rowe, 1979) found that raters were liable to develop unfavorable expectations of a candidate when presented with an unfavorable reference letter and were then liable to give the applicant less credit for past successes and hold the applicant more personally responsible for past failures. They were also more likely to reject these applicants for jobs since they attributed their failures to internal rather than external reasons. A negative application effect was also shown by Dipboye, Fontenelle, and Garner (1984). These researchers found that raters without applications made more reliable ratings of applicants' fit to the job and interview performance. Raters with applications gathered more correct information but were more variable in information gathering. Several studies have produced results counter to the above, showing that the effect of the interview is liable to 4O prevail over the effect of prior information. Carlson (1971) showed that the interview changed the effect of valid test information. Heneman et al. (1975) showed that biographical information had no effect on interview results. Similarly, Okanes and Tschirgi (1978) found that judgments made before interviewing based on application information shifted significantly due to the interview. Most of the ratings shifted either up or down from the neutral category, but half of the positive recommendations were changed to either the neutral or low category. The low category had the fewest number of changes. Based upon the results of four studies, Sackett (1982) concluded that interviewer decisions are not based upon previous hypotheses about applicants; that is, interviewers do not try to confirm their initial hypotheses about applicants based on previous information. In a final study, McDonald and Hakel (1985) found that raters did not select questions based on previous resume judgments and that resume effect was small compared to interview effect. Although the majority of these studies showed lack of effect for application materials, it seems reasonable to conclude that the application should not be a part of most oral interviews intended to be job—related ranking devices. The interview should be planned to assess the most important and relevant job dimensions, thus eliminating the need for further information that was suggested by the 1977 Tucker and Rowe study. If application information is indeed 41 relevant, it should probably be assessed separately by different raters so that the dimensions to be assessed by the oral are not contaminated. Since the results of the other studies show either that application materials have a negative effect on the oral or that they have no effect, they support this conclusion. Training and Experience Evaluation Literature There is little research on the methods of rating training and experience questionnaires or job applications although employers have been using them for years. Five major methods have been used to rate the education and experience of job candidates. These have been discussed in three articles by Ash and Levine (Ash, 1984; Ash, 1983; and Ash and Levine, 1982) and will be summarized below. Method The point method. When the point method is used, each applicant's education and experience is compared to certain previously specified requirements that the applicant must meet to be considered. If the applicant meets the minimum requirements then he or she receives additional points for any experience he or she has in addition to the amount required. The extra experience or education must be of equal quality to that described by the requirements. The grouping method. When the grouping method is used, raters make a holistic judgment about the quality of the applicants' experience and education and place them in groups according to quality. A similar method, not 42 discussed separately here, is the holistic method. Raters make a holistic judgment about each candidate but then place them in rank order according to quality or give them a numerical rating according to quality. The task-based method. When the task-based method is used, applicants are asked to complete a form listing the tasks that are done on the job. The applicants indicate their degree of experience for each task by checking one of the following: they have not performed the task, they have performed it under supervision, they have performed it independently, or they delegated the task to subordinates and reviewed their performance. Applicants receive points for each task based on their degree of experience, and their total scores consist of the total of the task scores. Another method, the job element or KSA method, is similar to the task-based method except that candidates rate their degree of knowledge or skill for each of the knowledges, skills or abilities deemed necessary for efficient job functioning. The behavioral consistency method. Use of the behavioral consistency method involves having candidates provide examples of their past achievements that relate to important job dimensions. Subject matter experts then typically rate a sample of the applicant achievements in order to provide examples of behavior along a performance continuum for each dimension. That is, the candidate achievements are identified as examples of low, high, and 43 moderate performance. Behavioral consistency raters then rate all of the candidates' past achievements by comparing them to the examples along the performance continuum provided by the subject matter experts. The activity/achievement indicator. The activity/achievement indicator method is based on the behavioral consistency approach and was developed by Ash (1984) to avoid some of the problems engendered by that method. The development of the activity/achievement indicator is similar to that of the behavioral consistency questionnaire. However, the benchmarks identified by the subject matter experts are included as part of the questionnaire. The applicant is asked to choose, for each dimension, the type of accomplishment most similar to those he or she had done him or herself. Incidence of Use It would be hard to overestimate the degree to which applications and resumes are used throughout government, industry, and education although the formal scoring methods discussed above probably predominate in the public sector. A 1984 survey of state and municipal jurisdictions conducted by the State of Alabama showed that the use of training and experience methods to rate applicants for public sector jobs is quite high. The survey found that all except one jurisdiction used training and experience evaluations to select candidates for some job classes and that use varied from complete to seldom. The survey also showed that 44 seventy-five percent of the agencies used training and experience evaluations 25% of the time and that most used the point evaluation system. Studies Two groups of researchers (Schmidt, Caplan, Bemis, Decuir, Dunn, and Antone, 1979 and Hough, 1984) have conducted studies describing the development of the behavioral consistency approach. The purpose of the Schmidt et al. (1979) study was to explain the rationale and content validity basis for the new approach and to do a study of its empirical validity and utility. The study included a comparison of the point and KSA rating systems and was conducted on a sample of budget analysts. Since the return rate for all types of questionnaires was low (20%), the validity and utility analyses could not be done. However, reliability data showed superiority for the behavioral consistency approach which had a reliability of .78 for one rater compared to .48 and .52 for the point and KSA approaches respectively. However, with three raters the point and KSA reliabilities improved to .74 and .77 respectively. Correlations among the three rating methods showed high comparability between the point and KSA methods (.94) but low comparability between the behavioral consistency approach and both of the other two methods (.11 and .05). Efficiency of scoring (the amount of time needed for rating) was low for the behavioral consistency method in comparison to the other approaches. 45 A second study of this method was done by Hough (1984). Hough applied the behavioral consistency approach to attorneys in a federal regulatory agency, asking them to describe their accomplishments for each of nine job dimensions. Hough found that the reliability of the Accomplishment Record Inventory varied from .75 to .80 for the nine dimensions and was .85 for the total dimension scores. Concurrent validities for the dimensions (with performance ratings comprised of both dimension and task ratings) were significant, varying from .17 to .25 for the single dimensions to .25 for the overall behavioral consistency evaluation. Behavioral consistency ratings were related to amount and level of experience as an attorney but not to educational variables such as scores on law aptitude and knowledge tests, school grades, honors, or quality, or to self-perception or other prior—experience variables. The study also showed no effect for race or sex. Johnson, Guffey, and Perry (1980) conducted a study comparing the behavioral consistency method with the point and task—based methods for selecting senior eligibility counselors. Their study showed a significant concurrent validity coefficient of .25 for the behavioral consistency method and low and non-significant validity coefficients for the other two methods. Relationships between the traditional point method and the other two methods were low and non-significant. Amount of experience with the hiring agency was highly related to point method ratings but was 46 unrelated to ratings of the other two approaches or to ratings of job performance. Ash conducted several studies comparing the behavioral consistency approach with other methods. A 1983 study compared the behavioral consistency method with the holistic method for the evaluation of students applying for a job of planner. Results showed that intrarater reliability was higher for the behavioral consistency approach (.95) than for the holistic approach (.77). However, interrater reliability was comparable (.84 and .83 respectively). Time for applicant completion of the questionnaires was 99.3 minutes and 43.2 minutes for the behavioral consistency and holistic approaches respectively. Scoring time was higher for the behavioral consistency method. The correlation between the two methods was .36, and their correlations with an IQ test were low. A second study by Ash (1982) compared four training and experience methods: the point method, the grouping method, the task—based method and the behavioral consistency method. The study was carried out using three jobs: Auto Equipment Repair Foreman, Computer Operations Supervisor, and Medical Disability Examinations Supervisor. A comparison of the reliabilities of the four methods showed the task-based method to be superior with reliabilities of .98 to .99 for the three jobs. Reliabilities for the point method ranged from .77 to .92, for the grouping method from .44 to .78, and for the behavioral consistency method from .76 to .93. 47 The grouping method was superior to the others with respect to validity, which was based on peer ratings. It had a significant validity coefficient of .35 while the task-based method had a marginally significant validity of .21 and the other two methods had non—significant and low validities. Completion rate was highest for the point and grouping methods with return rates of .97. The task-based method had a completion rate of .90 and the behavioral consistency method a completion rate of .56. Raters spent less time on the grouping and task-based methods and more time on the point and behavioral consistency methods. Finally, the correlation between the behavioral consistency and the task-based method was moderate (.36 to .54), but the correlation between the behavioral consistency method and the other methods was low. The purpose of a final study by Ash (1984) was to compare the behavioral consistency approach with a new approach (the activity/achievement indicator method) designed to overcome a major problem of the behavioral consistency approach--low completion rate on the part of applicants. The study also compared these two measures with a KSA-based questionnaire. Reliability results showed the superiority of the behavioral consistency method. Inter- rater reliability was .50 to .80 for the dimensions and .74 for the total scores while reliability for the activity/achievement indicator using coefficient alpha was .51 to .73 for the dimensions and .56 for the total scores. 48 The test-retest reliability for the KSA-based approach was .60 to .89 for the dimensions and .71 for the total scores. A multitrait—multimethod matrix showed convergent validity for some of the dimensions. However, discriminant validity was not shown for any dimension. Correlation of the behavioral consistency and activity/achievement indicator was moderate (.58). However, both methods had low correlations with the KSA-based method. Applicant completion time was unknown for the behavioral consistency method, but it was under 30 minutes for both of the other methods. Rater time was substantial for the behavioral consistency approach while rating was done by computer for the other two methods. A final study of training and experience approaches was done by Pannone (1984), who described the development of a task-based questionnaire for the selection of applicants to take a written examination for electrician. The tasks performed by electricians were listed on the questionnaire, and applicants indicated whether they had performed the tasks and at what level. Pannone found that the reliability (using coefficient alpha) of the task-based questionnaire was .96 and that the validity (correlation with the written test) was .42. Results of the task-based method were compared with years of experience and education. Experience and education correlated with the written test .13 and .11 respectively and with the task—based method .30 and .09 respectively. When faking was taken into account, the 49 validity was higher; the questionnaire and written test correlated .55 for non-fakers. The correlation for fakers was .26. Comparison of Methods While the results from the studies of training and experience methods have been neither consistent nor completely conclusive, they allow comparison of the methods along several important dimensions. Table 1 shows the training and experience methods, the dimensions along which they have been compared, and a summary of study results for each method-dimension intersection. Several factors not discussed above, but which are important in a comparison of the methods, have also been included. This summary may be somewhat misleading because some of the conclusions have been based on only one study while others have been based on several. However, it is neverthe- less useful in a comparison of the methods. A primary criterion in selecting the training and experience method to be investigated in this study was that it be parallel in content and development to the oral examination proposed earlier. This necessitates a content validity base or, in other words, development based on the job tasks or dimensions determined to be relevant in a job analysis. Three methods fit that criterion—-the task or KSA~based method, the behavioral consistency method, and the activity/achievement indicator method. The point and grouping methods are not based on a content validity model Table 2.1 50 Comparison of Training and Experience Methods Method Dimension Point Group- Task BCa AAIb ing ContentC no no yes yes yes Rationale poor poor good good good Reliability mod/ low/ high mod/ low/ high mod high mod Validity low mod low/ low/ --— mod mod Correlationd low/ low/ low/ low/ mod high mod mod mod Completion high high high low high Rater time mod mod low high low Fakeability no no yes no yes Note. Mod = moderate. aBC = behavioral consistency. bAAI = activity/ achievement indicator. CContent dCorrelation correlation with other methods. content validity base. 51 since they are not based on job analysis. A second major criterion focused on rationale. The rationale for the behavioral consistency model, explicated by Schmidt et al. (1979), is that past behavior is the best predictor of future behavior. What the person accomplished on the job is important rather than the fact that he or she merely held the job and was exposed to the job tasks. The Schmidt et al. approach provides a way of determining the level of past performance rather than assuming that performance on similar jobs is similar. Since the activity/achievement approach is derived from the behavioral consistency approach, it has an identical rationale, and the task and KSA-based methods could be considered to have a similar rationale. However, the point and grouping methods assume that performance on similar jobs is similar. Reliability and empirical validity are obviously critical factors in evaluating testing methods. Reliability has generally been high for the point, task, and behavioral consistency methods. However, the Schmidt et al. study showed higher reliability for the behavioral consistency than for the point system, and the Johnson et al. study showed higher validity for the behavioral consistency method than for the point or task-based methods. Empirical validity has generally been low for the point method and low to moderate for the behavioral consistency and task—based methods. Both reliability and validity are promising for the behavioral consistency and task—based 52 methods. However, because there have been few reliability and fewer validity studies of these methods, the evidence is inconclusive. It seems prudent to rely on a content valid method and to seek to improve reliability by having several raters. Several raters are also necessary for fairness even though for some methods there is adequate reliability with one rater. It does seem clear according to the studies that choice of method is a meaningful decision since the correlations among the methods are generally low to moderate. The last set of factors to be compared among the methods includes completion rate, rater time, and fake- ability. The comparisons with respect to these factors are clear. Completion rate is generally high for all methods except the behavioral consistency approach. Rater time is low for the task and activity/achievement approaches, moderate for the point and grouping approaches, and high for the behavioral consistency approach. The task and activity/achievement approaches are particularly prone to fakeability while the point and grouping methods are much less so and the behavioral consistency method least of all. Based on the purpose of the proposed study and the factors discussed above, the behavioral consistency method was selected for comparison to the oral examination for the following reasons: it is based on a content validity model with development parallel to that of the oral approach; it is based on a well-developed rationale; it has high 53 interrater reliability, especially with several raters; it has shown promising though not strong empirical validity; and it is not as easily fakeable as the other methods. Problems with use of this method have to do with completion rate and rater time. However, completion rate was high for at least one study (Hough, 1984) and low completion rates may have been due in part to lack of incentive for the applicants, who were taking part in concurrent validity studies and had nothing personal to gain from participating. Rater time is higher for this method, but that is not a critical criterion since time per candidate for the oral interview is probably longer. Chapter Three METHODOLOGY This research study compared the use of an oral examination to the use of a behavioral consistency examination in the selection process for two positions, Affirmative Action Officer and Sanitation District Manager, both of which were managerial positions in a large midwestern city. Methodology for the Affirmative Action Officer Study Subjects The subjects were 18 applicants for the position of Affirmative Action Officer, a managerial position in the Personnel Department in a large midwestern city. Each met the following minimum education and experience requirements: a Bachelor's Degree with a major in personnel management, public administration, business, the social sciences, or re— lated field and five years of affirmative action experience performing duties closely related to those of the job or five years of experience in personnel management with significant responsibility for affirmative action. Subjects applied for the position based upon a job announcement list- ing job duties, job requirements, and type of examination. 54 55 Raters The raters were eight subject matter experts in the areas of personnel management or affirmative action who were not employed by the Personnel Department and who were willing to donate their time to the hiring process. Four of the raters were used as oral examination raters while the other four raters were used as behavioral consistency examination raters. There were two minorities and two non-minorities, two women and two men on each panel. Because of the large time demand of each type of test method upon the raters, it was infeasible to use the same raters for both methods. Job Analysis Test development for both the oral examination and the behavioral consistency examination was based on a job analysis method which defined essential job tasks and critical knowledges, skills, and abilities for the Affirmative Action Officer job. Development of job task statements and a job task inventory. An initial group of task statements was taken from a job description for the position developed when recruitment of applicants began in November, 1984. Then a group of four subject matter experts from the Personnel Department (supervisors and colleagues of the position) developed additional statements in a two to three hour brainstorming session held during December, 1984. A final group of applicable task statements was added to this list 56 from a task inventory for the position of Personnel Assess- ment Specialist, which had been recently developed for a professional organization of personnnel assessment specialists. These three sources provided 55 task statements, which were grouped into five major areas by the researcher. These areas were: Affirmative Action Plan Activities, Compliance Expert Activities, Affirmative Action Status and Account- ability Activities, Affirmative Action Project Activities, and Supervisory/Management Activities. These groupings were verified by a test analyst assigned to this project by the Personnel Department. A task inventory (see Appendix A) was then developed in order to determine which of the job tasks were essential to the Affirmative Action Officer job. The inventory consisted of the job tasks grouped according to major area and three five-point rating scales developed to ascertain the importance, relative amount of time spent, and consequence of error for each task. Development of knowledges, skills, and abilities (KSA's) and a KSA inventory. An initial list of relevant KSA's was developed by the four subject matter experts, who selected appropriate statements from a set of KSA's for the position of Assistant City Personnel Director during the same meeting in which task statements were generated. This list of KSA's was augmented by the researcher, who developed additional KSA's based on the previously developed tasks. 57 These KSA's were grouped into seven major dimensions by the researcher, with these dimensions verified by the test analyst. The dimensions were: Knowledge of Affirmative Action; Knowledge of Personnel Management; Planning and Analysis Skills; Decision-making, Judgment, and Independence Skills; Communication Skills/Interpersonal Skills; Supervision Skills; and Other Management Skills. Then three rating scales were developed to assess importance, necessity at time of hire, and usefulness in distinguishing effective from ineffective workers for each KSA. The rating scale for necessity at time of hire was a three point scale while the other two scales were five point scales. The KSA inventory (see Appendix B) consisted of 91 KSA's grouped into seven major dimensions and three rating scales used to determine which KSA's were critical for effective job performance. Task inventory results. Fifty—five tasks were independently evaluated by five subject matter experts from the Personnel Department (three of whom had been involved in task statement development) during January, 1985. The initial criteria set for defining critical tasks were as follows: each task defined as critical had to receive three or more ratings at Point 4 or above (the high scale values) on each of the three scales. However, when these criteria were applied to the tasks, only 16 survived. Since so many tasks were discarded using this approach, the reasons for removal were analyzed, particularly with respect 58 to the second criterion, which concerned relative time spent. Of the 39 tasks omitted, 22 had been omitted because of the second criterion and 17 because of at least one other criterion. Since a task could conceivably be critical although rarely performed, this scale was dropped for those tasks meeting more stringent criteria for the first and third scales--at least 4 ratings at Point 4 or 5. Application of the new criteria resulted in 16 additional surviving tasks for a total of 32. Appendix C contains the list of task groups and essential tasks. KSA inventory results. Ninety-one KSA's were independently evaluated by the five subject matter experts. Critical KSA's were defined as those receiving at least three ratings of 4 or above on the importance scale, at least 3 ratings of 4 or above on the effectiveness scale, and at least 3 ratings at 3 on the necessity at time of hire scale. When these criteria were applied to the KSA's, 62 survived the criteria, and 2 which almost met the criteria were discussed by the researcher and test analyst and retained as critical KSA's. The surviving KSA's were combined into a slightly different set of 10 dimensions. One dimension was eliminated because only one of its KSA's survived the criteria. (The surviving KSA was added to another appropriate dimension.) In addition, several dimensions were broken up to provide clearer definitions. The 59 resulting ten dimensions were as follows: Knowledge of Affirmative Action; Planning and Organizing Skills; Analytical and Quantitative Reasoning Abilities; Decision-making, Judgment, and Independence Skills; Oral Communication and Interpersonal Skills; Written Communication Skills; Supervisory Skills; Initiative/ Creativity/Intelligence; Toleration of Stress; and Professionalism. Appendix D contains the list of revised dimensions and critical KSA's. Task—KSA link-up. After the essential tasks and critical KSA's had been defined based on inventory results, the researcher completed a rational link-up of the task groups and KSA dimensions. Each KSA dimension had a link-up with at least one task group (see Appendix E). Dimensions tested. Based on reported problems in the research literature regarding low return rates for behavioral consistency questionnaires, it was decided to reduce the number of dimensions to be tested in the research design to five. The researcher and test analyst assessed each dimension for practical testability and dropped four dimensions from consideration by discussion and consensus, leaving 6 dimensions to be assessed. Those considered difficult to assess, particularly by the oral examination, were Decision-making, Judgment, and Independence Skills; Initiative/Creativity/Intelligence; Toleration of Stress; and Professionalism. Since it was untestable by the oral examination method, 60 the Written Communication dimension was also eliminated from the design, although, since it was so important to job performance, it was included as a separate part of the examination. The candidates completed a written problem prior to the oral examination, which was scored separately by the oral raters after they had completed scoring the oral portion. Candidates also completed a multiple-choice test which was not part of the research design. This test was developed to further test the Analytical and Quantitative Reasoning Abilities dimension. It was included as an unweighted component in the testing process because it was critical that anyone hired be minimally competent in this area. The decision to eliminate dimensions could have been based on the number of their related task groups. However, it was decided that this would be inappropriate since a small number of related task groups did not imply lack of criticality. For example, the supervisory dimension was related to only one task group but was very important to job performance according to consensus between the researcher and test analyst. The final test plan was as follows. Five dimensions (Knowledge of Affirmative Action, Planning and Organizing Skills, Analytical and Quantitative Reasoning Abilities, Oral Communication and Interpersonal Skills, and Supervisory i Skills) were tested by both the behavioral consistency and oral examination approaches. The Analytical and 61 Quantitative Reasoning Abilities dimension was further tested by a multiple-choice examination, and the Written Communication dimension was tested by a written problem. The written problem and multiple choice test were not part of this research study. Appendices F and G contain the list of dimensions and KSA's tested and definitions for the dimensions tested by the behavioral consistency and oral examination methods. Behavioral Consistency Examination Development Questionnaire development. Development of the behavioral consistency questionnaire (entitled the Achievement History Questionnaire) was based upon examples available in the research literature and the dimensions generated in the job analysis. Dimension definitions were developed from the KSA's comprising the dimensions. Because the behavioral consistency approach consists of asking candidates to describe their major achievements for each job dimension, development of specific questions for each dimension was not necessary. However, it was necessary to provide a relevant example and to modify the instructions found in the research literature. Criteria for selecting an example dimension and constructing an example accomplishment were that the dimension not be in the group of critical dimensions to be evaluated in the examination but be one with which the candidates would likely be familiar. The dimension selected was that of training skill, and a relevant accomplishment was developed based on the experi- 62 ence of the researcher. Rating scale development. Several options for the development of benchmark accomplishments for the behavioral consistency questionnaire rating scales were considered. The research literature suggested that subject matter experts use a sample of the accomplishments returned by the candidates to develop benchmarks (Ash, 1984; Hough, 1984; and Schmidt et al., 1979). Other approaches considered by the researcher included asking the behavioral consistency questionnaire rating panel to use a sample of the questionnaires to develop benchmarks, asking the subject matter experts to generate benchmarks, and asking the subject matter experts to generate general criteria. The use of actual accomplishments provided by the candidates was desirable since such accomplishments would be realistic and meaningful. As discussed in the research literature, subject matter experts are typically asked to rate the examples on a seven point scale, and examples whose ratings have high interrater agreement are included in the dimension rating scales to be used by the behavioral consistency raters, who are usually personnel generalists or psychology students. However, development of benchmarks by the subject matter experts seemed inappropriate in this case because, with such a small group of candidates, the majority of questionnaires would have to be used for benchmark development with few left to be rated according to the benchmarks, making the task of the rating panel superfluous. 63 There was also a timing problem with this approach, since the accomplishments were due only one day before the rating panel was scheduled to meet. Having the actual rating panel develop benchmarks from the accomplishments furnished by the candidates also seemed inappropriate due to the small applicant sample size. The rating panel would have had to use most of the accomplish- ments for benchmark development, leaving few to be rated according to the benchmarks. These two approaches were rejected, and the researcher decided to have subject matter experts brainstorm accomplishments that they considered to be examples of high, moderate, and low performance with respect to the dimen- sions. Examples rather than general criteria were chosen for development since they seemed more useful to the raters. Four subject matter experts from the Personnel Department, a different but overlapping group from those involved previously, were asked to brainstorm accomplish- ments of previous effective or ineffective Affirmative Action Officers. They were also asked to generate examples of high, low, and moderate accomplishments that they thought would be typical of those the applicant group would submit. The subject matter experts brainstormed 49 such accomplish- ments during February, 1985. These accomplishments were placed in a randomized list, and the same subject matter experts independently sorted them into dimensions (a choice of more than one dimension 64 was allowed) and rated them according to level of perform- ance on a seven point scale. Accomplishments were assigned to a dimension if two or more raters chose that dimension. Some accomplishments were assigned to more than one dimension. Means and standard deviations for the level of performance were determined for each accomplishment. It was intended that those with standard deviations of 1.5 or greater be eliminated. However, there were none. The next stage of the process consisted of developing a matrix of accomplishments organized by dimension and level. This was done to provide a means of ascertaining scale coverage for each dimension and the degree of overlap of accomplishments across dimensions. The goal at this stage was to reduce the number of accomplishments at each level of each dimension to two, to use as many different accomplish- ments as possible, and to minimize the overlap of accom- plishments across dimensions. Selection of the accomplish- ments was done by the researcher and test analyst with these goals in mind. This process resulted in the development of two new accomplishments which covered empty cells in the matrix. The final step was to rewrite the accomplishments to provide additional clarity and to correct grammar. The final behavioral consistency questionnaire rating scales consisted of benchmarks along seven point rating scales for each dimension. The number of accomplishments per dimension varied from six to nine with 34 accomplish- 65 ments for all five dimensions. The scale points defined varied on each scale. Description of the behavioral consistency examination. The behavioral consistency examination had content parallel to that of the oral examination except that candidates were asked to describe their major past achievements which were illustrative of their capabilities in each of the job dimensions identified through the job analysis. Each dimension was rated according to a behaviorally anchored rating scale with exemplary statements along various points of the scale. There was a separate rating form for each dimension on which raters recorded the ratings for all candidates. The behavioral consistency examination (see Appendix H) consisted of a list of the job dimensions and their defini- tions, a form on which to provide the requested information, instructions, and an example. Applicants were requested to describe an accomplishment for each dimension which illus- trated their degree of competency on that dimension. For each achievement they were asked to provide: what they did and the objective of the achievement, the outcome of the achievement, the amount of credit they claimed if there were shared responsibilities, when the achievement occurred and for what employer, and the name, address, and telephone number of a reference who could verify their statements. Acceptable accomplishments could be drawn from educational, job-related or volunteer experiences. They could be based 66 upon one-time incidents or long-term projects or policies. Oral Examination Development Question development. During February, 1985, each of the four subject matter experts was presented with the task inventory and dimension definitions and was asked to independently develop two questions for each of the five dimensions to be tested. This resulted in 64 questions. All questions for each dimension were assembled, and the same four subject matter experts met to select two questions per dimension from the set. First each subject matter expert independently selected his or her four preferred questions for each dimension. Then differences in choice were discussed and consensus reached on the top two questions for each dimension. Several questions were switched from one dimension to another by group consensus. The final oral examination consisted of ten questions (two per dimension) with a general question at the end allowing the candidate to add anything he or she wished. Development of question rating scales. The goal in developing rating scales for the oral questions was to provide benchmark answers for seven point rating scales at high, mid, and low scale values for each question. The same four subject matter experts who had developed and selected the questions were asked to brainstorm likely answers along a seven point rating scale for each question. They provided criteria for a seven point answer for all dimensions but were unable to consistently provide criteria 67 for the other scale values. Benchmarks for poor answers were developed for seven of the ten questions, but benchmarks for moderate-quality answers were developed for only four of the questions. However, for many of the questions it was reasonable to define moderate and/or poor answers as those which covered only some or few of the points expected in a high quality answer. Description of the oral examination. The oral examination consisted of two questions for each of the five dimensions identified through the job analysis procedure. An additional question at the end allowed the candidates to add anything they wished. Each question was rated on a seven point numerically anchored rating scale with benchmark answers at all high scale values and at some mid and low scale values. Candidates' answers were compared to the benchmarks included on the question rating scale, and ratings were recorded on a question rating form. Then each rater combined the ratings on the relevant questions to arrive at a score for each of the dimensions. The questions pertaining to each dimension were listed on the dimension rating form. Although two questions were developed specifically for each dimension, most of the questions contributed to more than one dimension. For example, each question contributed to the rating on the Communication/ Interpersonal Skill dimension. Dimensions were rated on a scale of 60 to 100, with 70 defined as the passing score. Each candidate's final score consisted of the average of 68 his/her average dimension scores. After completion of the oral dimension ratings, the oral board raters also rated the Written Communication dimension separately based on a written problem which each candidate had completed prior to his or her oral examination. Procedure Rater training for the behavioral consistency examination. There were separate training sessions for the two groups of raters, one for those who were oral examination raters and one for those who were behavioral consistency examination raters. Training lasted approxi- mately two hours for the behavioral consistency question— naire raters and occurred immediately prior to the rating session. The behavioral consistency questionnaire training session included discussion of the following points: purpose of the ratings, rationale for the selection process being used, explanation of the dimensions and KSA's and how they were developed, definition of the dimensions to be rated, explanation of the rating scale including its development, and candidate experience requirements. Discussion of the rating scales included a modification of the Supervisory Skills dimension rating scale. (The benchmark for the scale value of 6 was changed to a scale value of 4, and the wording for one of the Scale Value 5 examples was changed slightly.) Raters were also cautioned and advised on the following 69 points: they were asked to rate the candidates independ— ently without reference to the ratings of the other judges; they were asked to rate one dimension at a time, putting the questionnaires in three quality groups for that dimension and then making finer distinctions within the groups; they were asked to be lenient regarding a candidate's selection of an accomplishment for one dimension versus another since an accomplishment could be expected to relate to more than one dimension; and finally they were told that there would be a wrap-up session after all candidates had been scored with an opportunity for discussion and consensus if there were significant disagreements regarding the candidates. The following materials were also provided to the raters: a job announcement sheet, the job task inventory, a list of dimensions with definitions, the criteria for rating dimensions, the dimension rating form, the completed candidate questionnaires, and an introductory booklet for oral board members (provided approximately a week prior to the examination) which discussed the Personnel Department's general guidelines for board members. After discussion of the preceding points, the raters were given two candidate questionnaires and asked to score them on all dimensions. Scoring of the two questionnaires was followed by a discussion of the differences among the ratings and some consensus on points of disagreement. 70 Behavioral consistency evaluation. The behavioral consistency questionnaires were sent by mail to the candidates who met the requirements two weeks prior to the due date for their return in mid-February, 1985. Twenty-nine of the 35 accepted candidates (83%) returned completed questionnaires. Six candidates who were initially judged not to have met the requirements for the position also returned completed questionnaires along with supplemental application forms in which they further explained their experience. Prior to rating the behavioral consistency questionnaires of the accepted applicants, the panel was asked to make a determination on whether the six rejected applicants met the requirements. The panel accepted one of the applicants and confirmed the rejection decisions for the other five. The final number of questionnaires rated by the panel was therefore 30. Ratings of the questionnaires, unidentified by name to the raters, were completed during mid-February, 1985. The questionnaire raters worked individually but in a group setting. The group setting was optimal for ensuring that the raters were not distracted while completing the ratings and for ensuring that the raters actually completed the ratings, a tedious and arduous task. Raters rated each candidate on each dimension; each candidate's final score was the total of his or her total dimension ratings. After independent ratings of the questionnaires had been completed, there was discussion and some changes of 71 dimension scores which were substantially different. However, only the independent ratings were used for this study. Rater training for the oral examination. Training for the oral examination raters took approximately one and one-half hours and included a brief presentation by the Personnel Director and a discussion of the following points by the researcher: definition and development of the job dimensions, critical KSA's, and job tasks; candidate experience requirements; the selection process, including the behavioral consistency questionnaire, oral examination, and written problem; and the questions, question rating criteria, question rating form, and dimension rating form. The following materials were also provided to the raters: a job announcement sheet, definitions of the job dimensions and the job task inventory, the oral examination questions and written problem, the rating criteria for the questions and written problem, question and dimension rating forms, and a schedule of candidates. Approximately a week prior to the oral examination, the raters were also provided with an introductory booklet for board members which discussed the Personnel Department's general policies regarding ratings. Raters were not provided with any application or behavioral consistency questionnaire materials. i The raters were also instructed to make independent ratings, to ask follow-up questions only for purposes of 72 clarification (that is, to ask no new questions), and to indicate any candidates with whom they were too familiar to rate. They were informed that there would be a wrap—up session at the end of the process for discussion of major disagreements regarding the candidates. A mock candidate (one of the subject matter experts used in the test development process) was then presented to the raters as an example for trying out the oral examination process. He had been instructed to give answers of varying quality to the questions and the question ratings given by the board members reflected the quality of his answers. Agreement among the raters on question ratings was high; because of lack of time, dimension ratings were not completed. Oral examination. Rater availability precluded testing all 30 of the behavioral consistency questionnaire candidates by the oral method. Therefore, candidates were selected for inclusion in the oral examination process based on their scores on the behavioral consistency questionnaire. Twenty-two candidates were invited to take the oral examination with three people declining the invitation and one not appearing at the time of examination. One person who had not been rated by the behavioral consistency examination raters was added. This resulted in 19 candi— dates taking the oral examination, but only 18 were included in the research sample. The oral examination was held during late February, 73 1985 and consisted of a 30 minute structured oral interview for each candidate. Candidates were assigned their order of appearing before the board according to their convenience since many were from out of town. Raters alternated in asking questions of the candidates. The raters rated each candidate independently as he or she was interviewed. Raters rated each candidate on each dimension; each candidate's final score was the average of his or her average dimension ratings. The written problem was rated after the oral dimension ratings had been completed. After all of the independent ratings were completed, substantial differences were discussed. However, only the independent ratings were used in this research study. Ratings of examination efficiency and acceptability. Both sets of raters were asked to complete forms assessing examination efficiency and acceptability after each examination (see Appendices I and J). Since there was time available, the achievement history questionnaire raters completed these ratings immediately after the rating process. The oral examination raters received their forms by mail after completion of the process since there was no time available immediately after the examination. Rater time for both examinations and candidate time for the oral examination was recorded by the test administrator. 74 Methodology for the Sanitation District Manager Study Subjects The subjects were 14 applicants for the position of Sanitation District Manager, a managerial position in the Bureau of Sanitation in a large midwestern city. Each met an experience requirement of current status as a city employee and two years of experience as a Sanitation Supervisor with the Bureau of Sanitation. Subjects applied for the position based upon a job announcement listing job duties, job requirements, and type of examination. Raters The raters were nine subject matter experts in the areas of public works or personnel management who were not employed by the Sanitation Bureau and who were willing to donate their time to the hiring process. Five raters (including one white female, one black male, and three white males) were used as oral examination raters while four raters (one black male and three white males) were used as the behavioral consistency examination raters. Because of the large time demand of each type of test method upon the raters, it was infeasible to use the same raters for both methods. Job Analysis Development of the oral examination and the behavioral consistency questionnaire were based upon Flanagan's (1954) critical incident job analysis technique and Smith and 75 Kendall's (1963) retranslation of expectations technique. Development proceeded according to the following steps: Critical incident generation. Two groups of subject matter experts developed examples of effective and ineffective behavior for the job being studied. Subject matter experts, defined as supervisors or incumbents of the job to be assessed, were assembled in separate groups and asked to generate examples of effective behavior and ineffective behavior based on their observations of job incumbents. The supervisors, consisting of three area managers (immediate supervisors of the subject position), one district manager on special assignment, and the three top level managers in the Bureau, developed incidents in two group brainstorming sessions lasting approximately three hours each during September, 1984. Then each manager committed to individually providing five examples of effective behavior and five of ineffective behavior. These were returned within the next few weeks. The incumbents, consisting of eight district managers, met in two group sessions during September and October, 1984. Brainstorming was done initially to ensure understanding of the objective. Then the managers worked individually, with each committed to providing five examples of effective and ineffective behaviors. For the individually produced items, the managers were asked to provide the following information: what the person did that was effective or ineffective; the circumstances surrounding the incident; why the incident was 76 an example of effective or ineffective behavior; and when the incident occurred. The critical incident generation process produced 246 incidents, with the supervisors providing approximately two-thirds and the incumbents providing approximately one- third. Approximately one-half were brainstormed, and one-half were produced individually. Generation of job dimensions. Job dimensions are the general knowledges, skills, and abilities necessary to do a job. The object of this part of the process was to generate job dimensions related to the critical incidents and to group the related critical incidents within those job dimensions. Generation of preliminary job dimensions was done during October, 1984 by one of the test analysts assigned to this project by the Personnel Department with advice from the researcher. (Personnel Department staff included the researcher and two test analysts.) The same test analyst then grouped together similar instances of effective and ineffective behavior into the job dimensions. Nine dimensions were initially generated for the Sanitation District Manager job: Planning/Organizing, Analyzing/ Decision—making, Researching/Investigating, Supervision, Controlling, Communication (Relaying Information), Oral/ Written Communication Skill, Training, and Professionalism. Retranslation of dimensions/incidents. The purpose of the retranslation phase (completed during October, 1984) was to determine whether an independent judge would make 77 similar judgments regarding the grouping of incidents and dimensions. This phase was carried out by having the second test analyst group the same incidents and dimensions from randomized groups of each. Agreement between the two judges was determined by dividing the number of examples both judges placed in a category by the total number of examples placed in the category by either. Dimensions with less than 70% agreement between judges were reevaluated for clarity. Degree of agreement between the two judges on placement of the critical incidents in the nine dimensions was as follows: Planning and Organizing--80%; Analyzing and Decision-making-—75%; Researching and Investigating--71%; Supervision——83%; Controlling——29%; Communication (Relaying Information)--86%; Oral/Written Communication--67%; Training--100%; and Professionalism--90%. Dimension determination. Completion of the job analysis and test development process had been scheduled for October through December of 1984, shortly after completion of the retranslation phase described above. However, two problems intervened to delay progress. First, the managers in the Sanitation Bureau became unavailable for meetings due to snow emergencies, and, second, the researcher became involved with the development and administration of the Affirmative Action Officer examination, the other project comprising this study. Consequently, job analysis and test development were interrupted during the fall and did not recommence until the following March. 78 Resolution of disagreements regarding dimension definitions and placement of incidents in categories was done by discussion and consensus between the two judges and the researcher. (One judge participated in only part of the process.) The results of the discussions were to combine the Oral/Written Communication dimension with the Communication (Relaying Information) dimension, to combine the Researching/Investigating dimension with the Analyzing/Decision-making dimension, and to combine the Controlling dimension with the Supervision dimension. These decisions were based not only upon the degree of agreement between the two judges but also upon the intended definitions of the dimensions and the overlap between them. The eliminated dimensions were considered subparts of other dimensions and not as important in their own right as the other dimensions. The decisions were also based on the researcher's decision to reduce the number of dimensions to no more than five or six based on experience with the Affirmative Action Officer examination. This seemed prudent to avoid examinee and rater fatigue, particularly in completing and rating the behavioral consistency questionnaire. As these decisions were being made, the definitions of the other dimensions were clarified, and additional incidents (incidents initially agreed upon) were moved from one dimension to another. The six dimensions and number of related incidents 79 resulting from the discussion and consensus were as follows: Planning and Organizing--35; Analyzing and Decision-making-- 69; Supervision-—68; Communication-~40; Training--21; and Professionalism--13. Review of dimensions and critical incidents. Prior to development of the critical incident inventory, the next step in the job analysis process, the researcher reviewed the incidents in their original format to ensure that all had been included in the analysis and that there were no duplicate incidents. Incidents were also rewritten by the researcher to provide additional clarity and completeness and to correct grammatical errors. As a result of this review, several previously omitted incidents were added, several duplicate incidents were omitted, and incidents which were a combination of two or more examples were separated. This process yielded 10 additional incidents which were added to the original set of 246 incidents and placed in appropriate dimensions by the researcher. Because of the changes made in the dimension resolution stage (the elimination of several dimensions and the changes in dimension by critical incident match) and because of the addition of new critical incidents to the dimensions, the researcher did a second review of the placement of critical incidents in dimensions. Degree of agreement with the previous decisions was as follows for the six dimensions: Planning and Organizing—-85%; Analyzing and Decision- making--60%; Communication and Interpersonal Skill—-95%; 80 Supervision-—79%; Training--100%; and Professionalism/ Dedication——55%. An analysis of these differences revealed that there was confusion between the Analyzing/Decision—making dimension and three other dimensions: Planning/Organizing, Supervision, and Professionalism/Dedication. Consideration was given to combining the Analyzing/Decision-making and Professionalism/Dedication dimensions with the most appropriate of the other dimensions. However, that idea was rejected because these dimensions were logically different from the other two and seemed meaningful as critical dimensions for the Sanitation District Manager job. Instead, the definitions for the four dimensions were further clarified and distinguished from one another. It was evident, however, that many of the critical incidents related to more than one dimension and that they could reasonably be placed under more than one. Final placement of the incidents in the dimensions was done by consensus between the researcher and senior test analyst. The six dimensions and number of related incidents resulting from the discussion and consensus were as follows: Planning and Organizing-—35; Analyzing and Decision—making-— 60; Communication/Interpersonal Skill—-39; Supervision--76; Training-—23; and Professionalism/Dedication——23. Critical incident inventory. The purpose of the critical incident inventory was to determine average effectiveness ratings for the critical incidents, which 81 could then be used as benchmarks for dimension effectiveness scales. The critical incident inventory (see Appendix L) consisted of six dimensions and 256 incidents randomly listed within dimension. The incidents were independently scored on a seven point effectiveness scale by five subject matter experts (two top managers and three area managers) during April, 1985. Most of the critical incidents received average ratings near the end points of the effectiveness scale, and all dimensions contained incidents with means at both high and low scale values. Most of the standard deviations were low. (Only 9% were over 1.5.) The standard deviations were highest for critical incidents with average ratings near the midpoint. Considering how critical incidents were defined (as examples of either effective or ineffective behavior), these results were quite reasonable. Examples purposely developed to be extreme would be expected to have average ratings near the end points of the range while examples averaging a moderate rating could be expected to do so because of disagreement rather than agreement on a moderate rating. Dimension effectivness scale development. The final stage of the job analysis process was to develop dimension effectiveness scales based on the results of the critical incident inventory. The objective in developing the dimension effectiveness scales was to provide benchmarks 82 (critical incidents) at each point of a seven point rating scale for each dimension. Criteria for selection of the critical incidents as benchmarks were as follows: a high degree of agreement on scale value by the subject matter experts evidenced by a small standard deviation; scale coverage as complete as possible; and reduction of the number of critical incidents at each scale point to a meaningful number. These criteria were applied to the critical incidents as follows: 1. Twenty-four critical incidents with standard deviations above 1.5 were omitted. 2. Critical incidents with standard deviations above 1.0 were omitted if their scale values were covered by other critical incidents. Three critical incidents were omitted according to this criterion. 3. Since there was a dearth of incidents with values at the middle of the scale, critical incidents at both whole and midpoint scale values were used (e.g., at Values 5.0, 5.5, and 6.0) to provide as much information as possible to the raters. 4. The number of critical incidents at each whole or midpoint scale value was limited to five since application of the above criteria still resulted in too many incidents at various scale values to be meaningful. Where choices had to be made, the critical incidents with average ratings closest to the scale value were retained, those with the 83 lowest standard deviations were retained, and retention decisions due to further ties were based upon the judgment of the researcher. Application of the above criteria to the critical incidents resulted in effectiveness scales on six dimensions with five or fewer critical incidents listed as benchmarks at as many points as possible on 7 point rating scales. There were few examples on any of the scales for the middle values between 3.0 and 5.0. The number of incidents on each scale varied from 17 to 42. Dimensions tested. As a result of a discussion with the Sanitation Bureau managers during March, 1985, a final modification was made to the dimensions by the addition of Written Communication Skill as a separate dimension. Written Communication Skill was defined as a separate dimension because it was important to successful job performance apart from Oral Communication/Interpersonal Skill and could not be tested by an oral examination. Because of this, the Written Communication Skill dimension was not included in the research design. However, it was rated by both the behavioral consistency and oral examination raters. The oral raters evaluated a written problem produced by each candidate prior to the oral examination, and the behavioral consistency examination raters evaluated the writing ability displayed throughout each candidate's behavioral consistency questionnaire. The final set of dimensions tested included the 84 following: Planning and Organizing, Analyzing and Decision-making, Oral Communication/Interpersonal Skill, Supervision, Training, Professionalism/Dedication, and Written Communication Skill (see Appendix M for definitions). The first six dimensions were tested by the behavioral consistency and oral methods. The Written Communication Skill dimension was tested by the same two sets of raters but on the basis of a written problem and the writing sample produced by the behavioral consistency questionnaire. Behavioral Consistency Examination Development Questionnaire development. Development of the behavioral consistency questionnnaire (entitled the Achievement History Questionnaire) was based upon examples available in the research literature and the dimensions generated in the job analysis. Since the behavioral consistency approach consists of asking candidates to describe their major achievements for each job dimension, development of specific questions for each dimension was not necessary. However, it was necessary to provide a relevant example, which was based on a critical incident from the job analysis, and to modify the instructions found in the research literature. Rating scale development. As discussed earlier in this paper, previous researchers have developed benchmarks for behavioral consistency rating scales by using samples of the accomplishments provided by the applicants themselves. 85 Subject matter experts are typically asked to rate the examples on a seven point scale, and examples whose ratings have high interrater agreement are included in the dimension rating scales to be used by the behavioral consistency raters, who are usually personnel generalists or psychology students. This approach is ideal if there are a large number of candidates since it provides benchmarks which are realistic and meaningful. However, as pointed out earlier, it is not appropriate with a small number of candidates because drawing a large enough sample of accomplishments to develop the benchmarks would reduce the number left to be rated by the behavioral consistency raters to the extent that their task would be superfluous. For the same reason, a second approach of having the behavioral consistency raters themselves develop benchmarks is also inappropriate. A third approach considered by the researcher for rating scale development was having the subject matter experts brainstorm benchmarks based upon the types of accomplishments they expected to be submitted. This approach was used in the Affirmative Action Officer study for development of both the behavioral consistency and oral examination rating scales and in the present study for development of the oral examination rating scales. However, use of this approach was rejected due to the difficulty experienced by the subject matter experts in coming up with examples based on their expectations in all three instances. The fourth approach considered by the researcher was to 86 use the dimension effectiveness scales developed as part of the job analysis as the basis for rating accomplishments. The job incumbents could be considered comparable to the applicants because their duties were similar although at a higher level and many duties overlapped the two types of positions. However, this approach was also rejected because of the researcher's concern that there might not be enough overlap between the experiences of applicants and incumbents and because of other scale properties that made the scales inappropriate for this use. These were the dearth of midpoint scale values against which accomplishments could be judged and the large number of negatively worded incidents on the lower ends of the scales. (Applicants could be expected to submit low value accomplishments but not negatively worded accomplishments.) Because none of the above approaches seemed appropriate, the researcher decided to use graphic rating scales rather than behaviorally anchored rating scales. This decision seemed reasonable in light of the performance evaluation research literature which indicates that behaviorally anchored rating scales are not superior to other types of rating scales with respect to the reliability and validity of ratings. Description of the behavioral consistency examination. The behavioral consistency examination had content parallel to that of the oral examination except that candidates were asked to describe their major past 87 achievements which were illustrative of their capabilities in each of the job dimensions identified through the job analysis. Each dimension was rated on a graphic rating scale with numeric and qualitative anchors. After completion of the behavioral consistency dimension ratings, the behavioral consistency raters also rated the Written Communication Skill dimension separately using the questionnaire itself as a writing sample. The behavioral consistency examination materials consisted of a list of the job dimensions and their definitions, a form on which to provide the requested information, instructions, and an example (see Appendix N). Applicants were requested to describe an accomplishment for each dimension which illustrated their degree of competency on that dimension. For each accomplishment they were asked to provide: what they did and the objective of the achievement, the outcome of the achievement, the amount of credit they claimed if there were shared responsibilities, when the achievement occurred and for what employer, and the name and title of a reference who could verify their statements. Acceptable accomplishments could be drawn from educational, job-related, or volunteer experiences. They could be based upon one-time incidents or long—term projects or policies. 88 Oral Examination Development Question development. This stage consisted of presenting the group of subject matter experts (in this case, the top three managers) with the original list of dimensions and their associated examples of effective and ineffective job behavior. The subject matter experts were asked to brainstorm at least two questions per dimension, each based on a critical behavior exemplifying that dimension. Brainstorming took place in a series of five two to three hour meetings beginning in November of 1984 and ending in March of 1985. Sixteen questions were developed through the group brainstorming process. Two questions per dimension were then selected for inclusion in the oral examination by the researcher and test developer. Development of question rating scales. The goal in developing rating scales for the oral questions was to provide benchmark answers for seven point rating scales at high, mid, and low scale values for each question. Development of the rating scales occurred simultaneously with question development. After the generation of each question, the subject matter experts were asked to develop examples of effective, ineffective, and adequate answers for each question based on how they would expect high, low, and moderate performers to answer the questions. They provided criteria for a seven point answer for all dimensions but were unable to consistently provide criteria for the other scale values. Benchmarks for poor answers were developed 89 for seven of the twelve questions, but benchmarks for moderate-quality answers were developed for only three of the questions. However, for many of the questions, it was reasonable to define moderate and/or poor answers as those which covered only some or few of the points expected in a high quality answer rather than being substantially different. Description of the oral examination. The oral examination consisted of two questions for each of the six job dimensions identified through the job analysis procedure. Each question was rated on a seven point numerically anchored rating scale with benchmark answers at all high scale values and at some mid and low scale values. Candidates' answers were compared to the benchmarks included on the question rating scale, and ratings were recorded on a question rating form. Then each rater combined the ratings on the relevant questions to arrive at a score for each of the dimensions. Although two questions were developed specifically for each dimension, most of the questions contributed to more than one dimension. For example, each question contributed to the rating on the Oral Communi- cation/Interpersonal Skill dimension. Dimensions were rated on a scale of 60 to 100, with 70 defined as the passing score. Each candidate's final score consisted of the average of his/her average dimension scores. After completion of the oral dimension ratings, the oral board raters also rated the Written Communication dimension 90 separately based on a written problem which each candidate had completed prior to his or her oral examination. Procedure Rater training for the behavioral consistency examination. There were separate training sessions for the two groups of raters, one for those who were oral examination raters and one for those who were behavioral consistency examination raters. Training lasted approximately two hours for the behavioral consistency raters and occurred immediately prior to the rating session. The behavioral consistency questionnaire training session included discussion of the following points: purpose of the ratings, rationale for the selection process being used, explanation of the dimensions and critical incidents and how they were developed, definition of the dimensions to be rated, explanation of the dimension effectiveness scales including their development, explanation of the graphic rating scale, and candidate experience requirements. Raters were also cautioned and advised on the following points: they were asked to rate the candidates independ- ently without reference to the ratings of the other judges; they were asked to rate one dimension at a time, putting the questionnaires in three quality groups for that dimension and then making finer distinctions within the groups; they were asked to be lenient regarding a candidate's selection of a behavioral example for one dimension versus another 91 since an example could be expected to relate to more than one dimension; and finally they were told that there would be a wrap—up session after all candidates had been scored with an opportunity for discussion and consensus if there were significant disagreements regarding the candidates. The following materials were also provided to the raters: a job announcement sheet, a list of dimensions with definitions, the dimension effectiveness scales from the job analysis, the graphic dimension rating form, the 14 questionnaires, and an introductory booklet for board members (provided approximately a week prior to the examination) which discussed the Personnel Department's general guidelines for oral board members. After discussion of the preceding points, the raters were given three examples to score prior to scoring the candidates. The examples were provided by three higher level supervisors in the department. The training session concluded with a discussion of the differences among the independent ratings and some consensus on the initial points of disagreement. Behavioral consistency evaluation. The behavioral consistency questionnaire was administered to 14 candidates in May, 1985 in a group session lasting up to four hours. There were 15 original applicants with one person dropping out prior to the administration of either examination. A group session was necessary in order to prevent candidates from collaborating. For the same reason, only one session, 92 prior to the oral examination, was given. Although test method was confounded with order, this was preferable to candidate collaboration. Ratings of the questionnaires, unidentified by name to the raters, were completed during May, 1985. The questionnaire raters worked individually but in a group setting. The group setting was optimal for ensuring that the raters were not distracted while completing the ratings and for ensuring that the raters actually completed the ratings, a tedious and arduous task. Raters rated each candidate on each dimension; each candidate's final score was the total of his or her total dimension ratings. After independent ratings of the questionnaires had been completed, there was discussion and some changes of dimension scores which were substantially different. However, only the independent ratings were used for this study. Rater training for the oral examination. Training for the oral examination raters took approximately three hours and included a brief presentation by the two top Bureau managers and a discussion of the following points by the researcher: definition and development of the job dimensions, critical incidents, and dimension effectiveness scales from the job analysis; candidate experience requirements; the selection process, including the behavioral consistency questionnaire, oral examination and written problem; and a review of the questions, question 93 rating criteria, question rating scale, and dimension rating scale. The following materials were also provided to the raters: a job announcement sheet, definitions of the job dimensions and the dimension effectiveness scales from the job analysis, the oral examination questions and written problem, the rating criteria for the questions and written problem, question and dimension rating forms, and a schedule of candidates. Approximately a week prior to the oral examination, the raters were also provided with an introductory booklet for board members which discussed the Personnnel Department's general policies regarding ratings. Raters were not provided with any application or behavioral consistency questionnaire materials. The raters were also instructed to make independent ratings, to ask follow-up questions only for purposes of clarification (that is, to ask no new questions), and to indicate any candidates with whom they were too familiar to rate. They were informed that there would be a wrap-up session at the end of the process for discussion of major disagreements regarding the candidates. A mock candidate (one of the top three managers in the Bureau) was then presented to the raters for the oral examination. He had been instructed to give answers of varying quality to the questions; however, he was unable to bring himself to do this and gave consistently excellent responses. In consequence there was high agreement among 94 board members on his question ratings with no real opportunity for disagreement. Because of this and because of time pressure, dimension ratings were not made. Oral examination. The oral examination was held during May, 1985 and consisted of a 30_minute structured oral interview for each candidate. Candidates were assigned their order of appearing before the board according to the convenience of the Sanitation Bureau. Raters alternated in asking questions of the candidates. The raters rated each candidate independently as he or she was interviewed. Raters rated each candidate on each dimension; each candidate's final score was the average of his or her average dimension ratings. The written problem was rated after the oral dimension ratings had been completed. After all of the independent ratings were completed, those with substantial differences were discussed. However, only the independent ratings were used for this research study. Ratings of examination efficiency and acceptability. Both sets of raters were asked to complete forms assessing examination efficiency and acceptability after each examination (see Appendices O and P). Since there was time available, the behavioral consistency questionnaire raters completed these ratings immediately after the rating process. The oral examination raters received their forms by mail after completion of the process since there was no time available immediately after the examination. The time devoted to each examination by both 95 the candidates and raters was recorded by the test administrator. Method of Analysis The first research question was whether the oral and behavioral consistency methods provided comparable ratings of applicants for employment. This question was analyzed by computing Pearson product-moment correlation coefficients between final overall ratings for the two approaches and between final dimension ratings for the two approaches. Statistical significance tests (to determine whether the population correlations were greater than zero) were conducted by using a table of critical correlation values based on the t distribution. The second research question was whether the oral and behavioral consistency methods were equally reliable. This question was assessed by comparing interrater reliabilities for the overall rating and separate dimension ratings for the two approaches. Interrater reliability was determined according to the intraclass correlation method suggested by Ebel (1951) with between-raters variance omitted from the error term. (Between—raters variance was omitted from the error term because each rater rated each candidate and final ratings were based on averages or totals including ratings from all raters.) Significant differences between dimension and overall reliability coefficients for the two approaches were determined according to a method developed by Feldt (1980) 96 to test whether Cronbach's coefficient alpha is the same for two tests administered to the same sample. Using one of the three procedures recommended by Feldt, the test statistic was: tN_2 = (w—1)(N-2)1/2 / [4W(l-rf2)]1/2 where W = (1-r2)/(1-rl), r1 = reliability for Test 1, r2 = reliability for Test 2, r12: correlation between Tests 1 and 2, and N = sample size. The null hypothesis was that the population reliabilities were equal, and the alternative hypothsis was that the population reliabilities were different for each set of reliabilities tested. The t test statistic was referred to the t distribution with N—2 degrees of freedom. Oral reliabilities were adjusted to correspond to an average of four raters. (There were five oral and four behavioral consistency raters for the Sanitation District Manager examination while there were four raters for each method for the Affirmative Action Officer examination.) The third research question was whether the two approaches demonstrated convergent and discriminant validity. This question was explored through the multitrait-multimethod matrix approach proposed by Campbell 97 and Fiske (1959), which argues that, in order for tests to demonstrate validity, they should have high correlations with other measures of the same trait and should have low correlations with similar measures of different traits. The multitrait-multimethod matrix is comprised of correlations between at least two traits and at least two methods. The correlations are organized into monomethod and heteromethod blocks. Monomethod blocks are comprised of correlations between the same and different traits which have been measured by the same method and include reliability diagonals (correlations between the same traits using the same methods). Heteromethod blocks are comprised of correlations between the same and different traits which have been measured by different methods and include validity diagonals (correlations between the same traits using different methods). According to Campbell and Fiske, convergent validity is demonstrated when there are significant and meaningful correlations between different methods measuring the same traits. (Values in the validity diagonals were tested for statistical significance as described above for the first research question.) Discriminant validity is demonstrated when three criteria are met: correlations between different methods measuring the same traits are higher than correlations between different methods measuring different traits (values in the validity diagonals are higher than those in corresponding rows and columns of the heteromethod 98 blocks); correlations between different methods measuring the same traits are higher than correlations between the same methods measuring different traits (values in the validity diagonals are higher than relevant values in the monomethod blocks); and correlations between the traits should reflect the same pattern across methods. The fourth research question was whether the oral and behavioral consistency methods were equally acceptable to the raters and equally efficient in terms of rater time. This question was analyzed by comparing mean responses to the examination evaluation forms completed by the raters for the two approaches and by comparing total rater time for the two approaches. For the acceptability ratings, a statistical significance test (to determine whether the population means of the oral and behavioral consistency rater evaluations were equal) was conducted by using a t test statistic for testing differences between means of independent samples referred to the t distribution. Differences between rater times per candidate could not be tested for significance due to the absence of standard deviations for one or both methods in both studies. The last research question was whether the oral and behavioral consistency approaches were equally efficient in terms of candidate time. This question was assessed descriptively by comparing mean (or estimated mean) candidate times for the two approaches. Differences in candidate time were not tested for significance because 99 candidate time for the behavioral consistency approach was not available for the Affirmative Action Officer study (and so was estimated) and a standard deviation for the behavioral consistency approach was not available for the Sanitation District Manager study. The same methods of analysis were used for both of the research samples. Chapter 4 RESULTS Results for the Affirmative Action Officer Study Comparability The first research question was whether the oral and behavioral consistency methods provided comparable ratings of applicants for employment. This question was analyzed by computing Pearson product-moment correlation coefficients between final overall ratings for the two approaches and between final dimension ratings for the two approaches. The correlation between final overall ratings for the two approaches was .17. This figure increased to .25 when corrected for restriction in range of the behavioral consistency scores. The behavioral consistency scores were restricted in range because only 18 of the 30 candidates who took the behavioral consistency examination also took the oral examination. (The top 22 behavioral consistency candidates were invited to take the oral examination; however, four people dropped out.) Correlations between dimension ratings for the two approaches were as follows with corrections for restriction in range in parentheses: Knowledge of Affirmative Action (Dimension 1)--.l3(.13), Planning and Organizing Skills (Dimension 2)-—.04(.O6), Analytical and Quantitative 100 101 Reasoning Abilities (Dimension 3)-—.18(.25), Oral Communi— cation and Interpersonal Skills (Dimension 4)--.08(.09), and Supervisory Skills (Dimension 5)--.34(.39). Neither the correlation for the overall ratings nor any of the correla- tions for the dimension ratings was significant at the .05 level (df = 16, one—tailed test) even when corrected for restriction in range. Therefore, it must be concluded that the ratings from the two approaches were not comparable. Reliability The second research question was whether the oral and behavioral consistency methods were equally reliable. This question was assessed by comparing interrater reliabilities for the overall rating and separate dimension ratings for the two approaches. Interrater reliability, based on the average of four raters for both of the methods, was .84 for overall oral examination ratings and .81 for overall behavioral consistency examination ratings. Interrater reliabilities for the dimension scores for the oral examination were as follows: Knowledge of Affirma— tive Action-—.82, Planning and Organizing Skills—-.83, Analytical and Quantitative Reasoning Abilities-—.82, Oral Communication and Interpersonal Skills——.76, and Supervisory Skills--.71. Interrater reliabilities for the dimension scores for the behavioral consistency examination were: Knowledge of Affirmative Action--.64, Planning and Organiz- ing Skills-—.72, Analytical and Quantitative Reasoning Abilities--.79, Oral Communication and Interpersonal 102 Skills--.64, and Supervisory Skills--.75. Testing for significant differences by using Feldt's method with an alpha level of .10 revealed no significant differences between dimension reliabilities or overall reliabilities (see Appendix K for the t values). Convergent and Discriminant Validity The third research question was whether the two approaches demonstrated convergent and discriminant validity. This question was explored through the multitrait-multimethod matrix approach proposed by Campbell and Fiske (1959). Table 4—1 contains a multitrait- multimethod matrix based on the two approaches and five dimensions of interest in this study. According to the Campbell and Fiske approach, convergent validity is demonstrated when there are significant and meaningful correlations between different methods measuring the same traits. The validity diagonal in Table 4-1 does not demonstrate convergent validity for any dimension; all of the dimension correlations between the methods were low and non—significant (alpha = .05). When convergent validity has not been demonstrated, discriminant validity is not possible. However, the discriminant validity criteria were applied to the matrix to further explore the relationships between the methods and dimensions. The first criterion was that the correlations between the different methods measuring the same traits be higher than correlations between different methods measuring 103 Table 4-1 Multitrait-multimethod Matrix for the Oral and Behavioral Consistency Methods for Affirmative Action Officer Method Oral BC Oral D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 D1 .82 D2 .96 .83 D3 .89 .93 .82 D4 .91 .91 .88 .76 D5 .55 .62 .60 .75 .71 BC D1 .13 .09 .01 .23 .29 .64 .13 .09 .01 .23 .29 D2 .06 .04 .11 -.04 -.13 .11 .72 .08 .06 .15 -.06 —.18 D3 .21 .22 .18 .07 -.10 .46 .43 .79 .28 .29 .25 .10 -.14 D4 .12 .14 .10 .08 -.13 .05 .59 .43 .64 .13 .15 .11 .09 -.14 D5 -.09 .01 -.02 .12 .34 .38 .36 .30 .37 .75 —.10 .01 -.02 .14 .39 Note. BC = behavioral consistency; D1 = Knowledge of Affirmative Action; D2 = Planning/Organizing Skills; D3 = Analytical/Quantitative Reasoning Abilities; D4 = Oral Com— munication/Interpersonal Skills; D5 = Supervisory Skills; figures below each value in the heteromethod block were cor- rected for range restriction; monomethod block figures were based on sample sizes of 19 and 30 for the oral and BC approaches respectively; heteromethod block figures were based on a sample size of 18. (Only the top behavioral consistency candidates took the oral.) No validity diagonal value was significant (p > .05, df = 16, one-tailed). 104 different traits. A comparison of the values in the validity diagonal with the values in the corresponding rows and columns of the heteromethod block revealed that this criterion was met for only the Supervisory Skills dimension. The correlations among dimensions across methods were uniformly low. The second criterion was that correlations between different methods measuring the same traits be higher than correlations between the same methods measuring different traits. Inspection of the values in the monomethod blocks revealed that this criterion was not met for any dimension. The correlations between dimensions for the oral method were considerably higher than the correlations between methods for the same dimensions. Except for the Supervisory Skills dimension, the correlations between dimensions for the behavioral consistency method were also generally higher than the correlations between methods for the same dimensions. The last criterion was that correlations between the dimensions reflect the same pattern across methods. This criterion was not met since there was no apparent pattern across the monomethod blocks and the correlations in the heteromethod block were all low. Application of the three criteria clearly showed method effect and failed to suggest anything except lack of relationships between dimension-method combinations. As a last step in the exploration process, the 105 monomethod blocks were analyzed to determine the relation- ships between different dimensions when rated according to the same method. For the oral examination, these relation- ships were high (.88 to .96) with the exception of those involving the Supervisory Skills dimension, which were moderate. Except for the Supervisory Skills dimension, most of the interdimension correlations exceeded the reliability coefficients for the related dimensions. Interdimension correlations for the Supervisory Skills dimension exceeded its reliability coefficient in one case. For the behavioral consistency examination, the relationships were generally low to moderate. In this case the reliability coefficients exceeded the interdimension correlations for every dimension. Rater Acceptability and Efficiency The fourth research question was whether the oral and behavioral consistency methods were equally acceptable to the raters and equally efficient in terms of rater time. This question was analyzed by comparing mean responses to the examination evaluation forms completed by the raters for the two approaches and by comparing total rater time for the two approaches. The examination evaluation forms consisted of questions regarding the effectiveness, fairness, difficulty, and reasonableness of the two approaches. The mean rating for the behavioral consistency raters was 15.75, with 20 being the highest rating possible. The 106 mean rating for the oral raters was 17.00 of 20. These mean ratings were not significantly different, t(5) = 1.38, p > .05, two-tailed test. Both sets of raters considered their respective methods positive with respect to the questions asked. These results were based on ratings from all four of the behavioral consistency raters but from only three of the four oral raters. One of the oral raters did not return the evaluation form even after a follow-up telephone call. Average total time for the behavioral consistency raters was 14.8 hours; average time per candidate was one-half hour. These averages included training time and lunch time in addition to the time actually spent rating the candidates. They also included time that some of the raters spent at home during the evening between rating sessions. Average total time for the oral raters was 17.5 hours; average time per candidate was .9 hours. These averages also included time for training and lunch but none of the raters took materials home during the evening. Since the behavioral consistency raters assessed 30 candidates while the oral raters assessed 19, the average time per candidate must be used to assess rater efficiency. Differences between rater times per candidate could not be tested for significance due to the absence of a standard deviation for the oral group. However, the difference between approximately one-half hour versus one hour per candidate does have practical significance in the opinion of 107 the researcher, especially when raters are volunteers. If both the oral and behavioral consistency raters had rated 30 candidates, there would have been a difference of about one and one-half days in time with the oral raters taking longer. Despite this conclusion, inspection of the ratings for the question regarding time revealed that both sets of raters found the time demand reasonable. Candidate Efficiency The last research question was whether the oral and behavioral consistency approaches were equally efficient in terms of candidate time. This question was not formally assessed for this study because candidate time for the behavioral consistency approach was not available. However, candidate time for the oral examination was approximately one-half hour. Based on the researcher's previous experience, a likely estimate of candidate time for the achievement history questionnaire is ten hours. Therefore, time efficiency for the candidate is in favor of the oral examination unless significant travel time is involved, as it was for many of these examination candidates. In that case, time would be approximately the same. However, the majority of time used for behavioral consistency questionnaire completion would involve actual work on the questionnaire while the majority of time used for the oral examination would involve travel. The apparent time advantage of the oral may also prove illusory because application materials are required in most selection 108 processes. Such materials may be unnecessary if the behavioral consistency approach is used but required if an oral examination is used. Results for the Sanitation District Manager Study Comparability For the Sanitation District Manager study, the correlation between final overall ratings for the oral and behavioral consistency approaches was .67. Correlations between dimension ratings for the two approaches were as follows: Planning and Organizing (Dimension 1)--.58, Analyzing and Decision-making (Dimension 2)--.29, Oral Communication/Interpersonal Skill (Dimension 3)--.62, Supervision (Dimension 4)--.74, Training (Dimension 5)--.31, and Professionalism/Dedication (Dimension 6)—-.62. The correlation for the overall ratings was significant at the .01 level (df = 12, one—tailed test) as were three of the six correlations for the dimension ratings. One additional dimension correlation was significant at the .025 level. Therefore, it must be concluded that the ratings from the two methods were comparable. Reliability Interrater reliability for this study was based on the average of five raters for the oral examination and four raters for the behavioral consistency examination. For testing significant differences between reliabilities, the oral reliabilities were modified to correspond to an average of four raters. The numbers in parentheses indicate the 109 adjusted reliabilities. Interrater reliability was .92 (.90) for overall oral examination ratings and .86 for overall behavioral consistency examination ratings. Interrater reliabilities for the dimension scores for the oral examination were as follows: Planning and Organizing—-.90(.88), Analyzing/Decision—making—-.91(.89), Oral Communication/Interpersonal Skill--.89(.87), Supervision--.80(.76), Training--.83(.80), and Profession- alism/Dedication--.87(.84). Interrater reliabilities for the dimension scores for the behavioral consistency examination were: Planning and Organizing--.80, Analyzing/ Decision-making—-.61, Oral Communication/Interpersonal Skill——.73, Supervision——.68, Training--.54, and Profession- alism/Dedication-—.88. Testing for significant differences by using the Feldt method revealed a significant difference between the methods for one dimension—-Analyzing and Decision-making-—with the oral reliability higher (p < .05). There were no signifi— cant differences (at the .10 level) between reliabilities for the overall ratings or for the other five dimensions (see Appendix Q for the t values). Convergent and Discriminant Validity Table 4—2 contains a multitrait-multimethod matrix based on the two approaches and six dimensions of interest in this study. According to the Campbell and Fiske approach, conver— gent validity is demonstrated when there are significant and 110 Table 4-2 Multitrait-multimethod Matrix for the Oral and Behavioral Consistency Methods for Sanitation District Manager Method Oral BC Oral D1 D2 D3 D4 D5 D6 D1 D2 D3 D4 D5 D6 D1 90 D2 92 91 D3 88 93 89 D4 91 94 83 80 D5 91 91 81 94 83 D6 90 90 83 91 91 87 BC D1 58 61 69 48 56 42 80 D2 30 29 47 29 30 28 38 61 D3 52 41 62 29 25 35 45 58 73 D4 80 73 79 74 81 69 58 49 51 68 D5 43 29 46 24 31 22** 65 28 64 54 54 D6 67 68 79 58 60 62 64 63 61 77 51 88 Ngte. BC: behavioral consistency; = Planning and Organizing; = Analyzing and Decision—making; = Oral Communication/Interpersonal Skill; = Supervision; D5 = Training; D6 have been omitted from correlations; Professsionalism/Dedication; decimal points matrix are based on a sample size of 14. *p< tailed. .025, 12, one-tailed. **p< .01, df 12, all figures in the one- 111 meaningful correlations between different methods measuring the same traits. The validity diagonal in Table 4-2 demon- strates convergent validity for four of the six dimensions with significant correlations ranging from .58 (p < .025) to .74 (p < .01). Based on this finding, discriminant validity criteria were applied to the matrix to further explore the relationships between the methods and dimensions. The first criterion was that the correlations between the different methods measuring the same traits be higher than correlations between different methods measuring different traits. A comparison of the values in the validity diagonal with the values in the corresponding rows and columns of the heteromethod block revealed that this criterion was not met for any dimension. The correlations among the dimensions across methods varied considerably, and many were comparable to or higher than the correlations between methods for the same dimensions. The second criterion was that correlations between different methods measuring the same traits be higher than correlations between the same methods measuring different traits. Inspection of the values in the monomethod blocks revealed that this criterion was not met for any dimension for either method. The correlations among the dimensions within method were generally comparable to or higher than the correlations between methods for the same dimensions. The last criterion was that correlations between the 112 dimensions reflect the same pattern across methods. There was no pattern apparent across methods. However, the heteromethod block showed consistently high correlations between the Supervision and Professionalism/Dedication dimensions as measured by the behavioral consistency approach and all dimensions as measured by the oral. There were consistently low correlations between the Analyzing and Decision—making and Training dimensions as measured by the behavioral consistency method and the other dimensions as measured by the oral. Application of the three criteria clearly showed lack of discriminant validity for the methods. Relationships between dimension-method combinations were generally moderate. As a last step in the exploration process, the monomethod blocks were analyzed to determine the relationships between different dimensions when rated according to the same method. For the oral examination, these relationships were uniformly high (in the 80's and 90's), with most of the interdimension correlations equalling or exceeding the reliability coefficents for the related dimensions. For the behavioral consistency examination, the relationships were generally moderate. In this case the interdimension correlations exceeded or equalled the reliability coefficients for three of the six dimensions. 113 Rater Acceptability and Efficiency The mean rating on the examination evaluation forms for the behavioral consistency raters was 29.75, with 40 being the highest rating possible. The mean rating for the oral raters was 33 of 40. Although the mean rating for the oral examination was higher, ratings for the two approaches were not significantly different, t(7) = 1.57, p > .05, two- tailed test. Both sets of raters considered their respec— tive methods positive with respect to the questions asked. These results were based on ratings from all nine of the raters involved in the Sanitation District Manager examination. Average total time for the behavioral consistency raters was 7.5 hours; average time per candidate was one-half hour. These averages included training time and lunch time in addition to the time actually spent rating the candidates. Average total time for the oral raters was 15 hours; average time per candidate was one hour. These averages also included time for training and lunch. Differences between rater time per method could not be tested for significance due to the absence of standard deviations for the two methods. However, the difference between one—half hour versus one hour per candidate does have practical significance, especially when raters are volunteers. There was essentially a day's difference in total time for 14 candidates with the oral raters taking longer. Despite this conclusion, inspection of the ratings 114 for the question regarding time revealed that both sets of raters found the time demand reasonable. Candidate Efficiency Candidate time for the oral examination was approxi- mately one-half hour, and candidate time for behavioral consistency approach, completed in a group session, was approximately 3.5 hours. Differences in candidate time per method were not tested for significance due to the absence of a standard deviation for the behavioral consistency method. Based on descriptive statistics, time efficiency for the candidate was in favor of the oral examination since travel time was necessary for both methods. However, the point made earlier in the discussion of the Affirmative Action Officer study regarding application materials still holds. These may be unnecessary when the behavioral consistency approach is used but necessary when candidates take an oral examination. Chapter 5 SUMMARY AND CONCLUSIONS Summary The purpose of this study was to compare the oral examination method to the behavioral consistency examination method. The need for such a study arose from the researcher's desire to find a testing method which possessed the desirable characteristics of the oral interview but which avoided its disadvantages. The oral examination is a practical selection technique for jobs with interpersonal and oral communication dimensions and for managerial jobs where there are relatively few candidates and technical knowledge and managerial skills must be assessed. Further, it can be considered content valid and fair if developed according to the content validity model proposed earlier. However, the research literature has shown that oral examinations have been plagued by several serious problems. Non-job-related applicant characteristics, rater characteristics, and situational characteristics can affect oral ratings. Ratings may also be subject to a halo effect due to the applicant's general likeability or oral communications skill over and above its importance to the job. Finally, it is logistically difficult to assemble the raters and candidates together on the same days. 115 116 The behavioral consistency approach, a relatively new approach to assessing training and experience, appeared promising as an alternative to the oral examination because it is parallel in development and content to the oral method, it is similar in administration except that the presentation format is written rather than oral, and there is no interaction between raters and candidates. There has apparently been no previous research comparing the oral examination with the behavioral consistency approach. In fact, there has been little research comparing the oral examination with any other method, and the behavioral consistency method has been compared mainly to other training and experience approaches. This research study compared the use of an oral examination to the use of a behavioral consistency examination in the selection process for two positions, Affirmative Action Officer and Sanitation District Manager, both of which were managerial positions in a large midwestern city. For the Affirmative Action Officer study, there were 18 subjects and eight raters, four behavioral consistency raters and four oral examination raters. For the Sanitation District Manager study, there were 14 subjects and nine raters, four behavioral consistency raters and five oral examination raters. For the Affirmative Action Officer study, test development for both approaches was based on a job analysis 117 method which defined essential job tasks and critical knowledges, skills, and abilities for the Affirmative Action Officer job. Subject matter experts developed job task statements and KSA's and completed inventories designed to assess essentiality and criticality. Similar job tasks and KSA's were combined into task groups and dimensions respectively, and a rational link-up was completed between the dimensions and task groups. Five dimensions were tested by the two approaches: Knowledge of Affirmative Action, Planning and Organizing Skills, Analytical and Quantitative Reasoning Abilities, Oral Communication and Interpersonal Skills, and Supervisory Skills. For the Sanitation District Manager study, job analysis was done according to Flanagan's (1954) critical incident technique and Smith and Kendall's (1963) retranslation of expectations technique. Subject matter experts generated critical incidents (examples of effective and ineffective behavior). The researcher and two test analysts then developed job dimensions related to the critical incidents, grouped the critical incidents within those job dimensions, and retranslated the incidents and dimensions. The retranslation phase consisted of an independent regrouping of the dimensions and incidents and discussion and consensus on disagreements. The job analysis resulted in the identification of six job dimensions to be tested and a dimension effectiveness scale consisting of critical incidents along a seven point effectiveness scale for each 118 dimension. The six dimensions were as follows: Planning and Organizing, Analyzing and Decision-making, Oral Communication/Interpersonal Skill, Supervision, Training, and Professionalism/Dedication. For each study, the behavioral consistency examination had content parallel to that of the oral examination except that candidates were asked to describe their major past achievements which were illustrative of their capabilities in each of the job dimensions identified through the job analysis. Development of the behavioral consistency questionnaire was based upon examples available in the research literature and the dimensions generated in the job analysis. For the Affirmative Action Officer examination, each dimension was rated according to a behaviorally anchored rating scale with exemplary statements along various points of the scale. For the Sanitation District Manager examination, each dimension was rated on a graphic rating scale. The oral examination for both studies consisted of two questions for each of the dimensions identified through the job analysis procedure. Each question was rated on a seven point numerically anchored rating scale with benchmark answers at all high scale values and at some mid and low scale values. Questions were developed by subject matter experts who, in the first study, independently generated several questions per dimension and then chose the final questions through discussion and consensus. In the second 119 study, subject matter experts generated questions through group brainstorming. In this case, final questions were chosen by the researcher and test analyst. In both cases, the same subject matter experts brainstormed likely answers to the questions along a seven point rating scale. They provided criteria for a seven point answer for all dimensions but were unable to consistently provide criteria for the other scale values. Benchmarks for poor and moderate quality answers were provided for only some of the questions. For both examinations, there were separate training sessions for the two groups of raters, one for those who were oral examination raters and one for those who were behavioral consistency examination raters. Training lasted from one and one—half to three hours for the four sessions. The Affirmative Action Officer candidates completed the behavioral consistency questionnaires individually while the Sanitation District Manager candidates completed them in a group session. However, the behavioral consistency raters for both examinations were assembled as a group to rate the questionnaires. The oral examination for both studies consisted of a 30 minute structured interview for each candidate, and the raters were present as a group to evaluate each candidate. Findings for the two studies were as follows: 1. For the Affirmative Action Officer study, the oral and behavioral consistency examination methods were not 120 comparable. Neither the correlation between final overall ratings for the two approaches nor any of the correlations between dimension ratings was significant. For the Sanitation District Manager study, the oral and behavioral consistency examination methods were comparable. The correlation between final overall ratings for the two approaches and correlations between four of the six dimension ratings were significant and meaningful. 2. For the Affirmative Action Officer study, there were no significant differences between the oral and behavioral consistency methods for either dimension or overall reliabilities. For the Sanitation District Manager study, there were significant differences in reliability between the two methods for one dimension with the oral reliability higher for that dimension. However, there were no significant differences in reliability for the overall ratings and the other five dimension ratings. 3. For the Affirmative Action Officer study, dimension correlations between the methods were low and non-signifi- cant, demonstrating lack of convergent validity. The discriminant validity criteria were not met either. Corre— lations between different methods measuring different traits were higher than correlations between different methods measuring the same traits, and correlations between the same methods measuring different traits were higher (considerably higher for the oral method) than the correlations between 121 different methods measuring the same traits. No consistent pattern was apparent across methods. Finally, many of the interdimension correlations for the oral method were higher than the respective dimension reliabilities. Sanitation District Manager study results demonstrated convergent but not discriminant validity. Convergent validity was demonstrated by significant and meaningful correlations between the final overall ratings and four of the six dimension ratings. However, the criteria for demonstrating discriminant validity were not met. Correlations between different methods measuring different traits were higher than correlations between different methods measuring the same traits, and correlations between the same methods measuring different traits were higher than correlations between different methods measuring the same traits. No consistent pattern was apparent across methods. Finally, many of the interdimension correlations within method were higher than or equal to the respective dimension reliabilities except for three dimensions measured by the behavioral consistency approach. 4. For both studies, the two methods were equally acceptable to the raters but were not equally efficient in terms of rater time. Differences in rater time could not be tested for significance due to the absence of standard deviations for one or both approaches for both studies. Using descriptive statistics, the behavioral consistency method was more efficient than the oral method. 122 5. Based on descriptive comparisons, time efficiency for the candidate was in favor of the oral examination for both studies. (Differences in candidate time were not test- ed for significance because candidate time for the behavior- al consistency method was not available for the Affirmative Action Officer study, and so was estimated, and a standard deviation for the behavioral consistency method was not available for the Sanitation District Manager study.) However, this time advantage could prove illusory if signif— icant travel time were necessary for the oral examination or if additional application materials were necessary for the oral but not for the behavioral consistency examination. Conclusions Comparability The first research question was whether the oral and behavioral consistency methods provided comparable ratings of applicants for employment. Study results were inconclusive for this question since correlations between the methods were significant and meaningful for one study but were non-significant for the other. Reliability The second research question was whether the oral and behavioral consistency methods were equally reliable. Results for this question were generally consistent across the two studies and indicated that there were no significant differences in reliability between the two methods for either the overall ratings or the dimension 123 ratings. There was one significant difference between dimension reliabilities for the two methods--for the Analyzing and Decision-making dimension in the Sanitation District Manager study, which demonstrated higher reliability for the oral method. However, the combined evidence from both studies seems to indicate that there was little difference in reliability for the two methods. These data also indicate that the dimension and overall score reliabilities of both methods were generally adequate, comparing favorably to the reliabilities achieved for typical employee selection tests. The behavioral consistency reliabilities were also comparable to those achieved in other studies of this approach. For these two studies, the overall reliabilities were .81 and .86 for the behavioral consistency method and .84 and .92 for the oral method. Dimension reliabilities across the studies ranged from .54 to .88 for the behavioral consistency method and from .71 to .91 for the oral method. Although some of the individual dimension reliabilities were relatively low, they were acceptable because the dimension scores were part of a composite. A final consideration in evaluating the dimension reliabilities has to do with their comparison to the other values in the monomethod blocks of the multitrait-multi— method matrices. Inspection of the monomethod blocks for the oral method revealed that, although the oral reliabilities were generally high, they were exceeded in 124 each case by at least one and usually more of the related interdimension correlations. This leads to the conclusion that the separate dimensions were not rated accurately and to the supposition that the raters made global judgments of the candidates, perhaps based on likeability, poise, or interpersonal skill, and then based dimension scores on their overall judgments rather than rating each dimension separately. This situation was not as true for the behavioral consistency approach. In the Affirmative Action Officer study, behavioral consistency dimension reliabilities were generally moderate to low but exceeded the interdimension correlations in every case. For the Sanitation District Manager study, behavioral consistency dimension reliabili- ties exceeded the related interdimension correlations in only three of the six cases. This indicates that, for the Sanitation District Manager examination, some global factor such as writing skill may have been operating. Based on the above analysis, the reliabilities of the dimension ratings cannot be considered satisfactory for either method. Dimension reliabilities for the oral approach are questionable because the interdimension correlations exceeded the respective reliabilities in every case. This also happened for three of the six dimensions measured by the behavioral consistency approach for the Sanitation District Manager examination. The behavioral consistency approach in the Affirmative Action Officer study 125 did not have this failing but its reliabilities were only moderate to low. Convergent and Discriminant Validity The third research question was whether the two approaches demonstrated convergent and discriminant validity. Results were inconclusive regarding convergent validity with the Affirmative Action Officer study demonstrating lack of convergent validity and the Sanitation District Manager study demonstrating possession of convergent validity. Neither of the studies demonstrated discriminant validity. Rater Acceptability and Efficiency The fourth research question was whether the oral and behavioral consistency methods were equally acceptable to the raters and equally efficient in terms of rater time. Results were consistent for the two studies with both methods demonstrating equal acceptability to the raters and the behavioral consistency method demonstrating superior efficiency in terms of rater time. However, the rater efficiency results were based on descriptive rather than inferential statistics. Statistical significance tests were not done due to the absence of standard deviations for rater time for one or both methods for both studies. Candidate Efficiency The last research question was whether the oral and behavioral consistency approaches were equally efficient in terms of candidate time. Candidate time was not available 126 for the behavioral consistency approach for the Affirmative Action Officer study but based on the researcher's experience and descriptive evidence from the Sanitation District Manager study, the oral examination was more efficient for candidates. (A statistical significance test could not be done for the Sanitation District Manager study due to the absence of a standard deviation for the behavioral consistency approach.) However, this conclusion was based only upon actual examination time for the candidate. If travel or preparation time for either method were significant, this conclusion would be affected. Discussion There has been limited research comparing the oral interview to alternative selection devices. This study has contributed to filling this gap by comparing the interview to the behavioral consistency method, a relatively new type of approach to the evaluation of training and experience. The study also had the advantage of being conducted in an applied setting as described in the introduction, with subject matter experts as raters, job applicants as ratees, oral interviews as the stimulus material, a continuum of ratings as the outcome measure, and structured questions based on job-related dimensions as the interview content. The specific jobs selected for the study were typical low to mid—management positions, one with an in-house candidate pool and the other with a candidate pool based on nationwide recruitment. Because this setting applies to many actual 127 selection situations, the study was more realistic than many of the typical studies conducted on selection interviews. Although this study was based on two samples, its major limitation was small sample size and the resulting lack of power. The intention of the study was to compare the oral examination method with the behavioral consistency method. However, results regarding the primary question of comparability were inconclusive, perhaps due to the small sample size. Results showing only one significant difference in reliability between the methods may also have been affected by low power. A related limitation was the high alpha level. This was caused by doing separate significance tests for the dimensions in each study. However, the researcher was willing to accept a high alpha level in order to achieve greater power. A third major problem centered on the issue of reliability. For the oral method in particular, dimension reliabilities were lower than related interdimension correlations. This was not as true for the behavioral consistency method but applied to three of the Sanitation District Manager dimensions. Given that there may have been legitimate relationships among the dimensions, this is still an unsatisfactory situation. Dimension ratings cannot be considered accurate if their correlations with other dimensions exceeds their correlations with themselves. This situation certainly affected the discriminant validity of 128 the methods and obscured what was being measured. The study may also have been affected by several other factors, some of which were related to the above reliability problem: rater characteristics, dimension overlap, lack of behaviorally anchored rating scales, and limited administration time. Of these problems, the most serious was rater characteristics. The previous discussion regarding the high intercorrelations among the dimensions within method suggested that some of the raters formed global judgments of the candidates and then rated the dimensions based on their overall impressions. This was borne out during the discussion and consensus session that occurred after the oral examination for Sanitation District Manager when it became apparent that two of the raters had misunderstood or disregarded the instructions and had made dimension ratings based on their overall judgment of the candidate rather than on their separate consideration of the candidate on each separate dimension. When completing his examination evaluation form, one of these raters protested that he preferred making global judgments about candidates rather than assessing them on specific dimensions. This tendency may have been due to the limited training provided. However, since the training covered this issue, the researcher thinks it was more likely due to other factors. First, those raters who preferred the global method were experienced raters, having served on previous 129 oral examination panels which used the global method. These raters may have been unwilling to try a new method, preferring to use one with which they felt comfortable. Second, such raters may have been demonstrating field- dependency, a characteristic which, according to the Cardy and Kehoe study, would incline them to perceive things holistically rather than analytically. Finally, since these raters were unpaid subject matter experts rather than City employees, their accountability was relatively low and they may have felt little compunction about doing the ratings their way. A second factor encountered in this study involved legitimate dimension overlap. The job analysis and test development for both studies were based on the assumption that the dimensions were relatively independent and could be assessed separately. However, it is more likely that at least some of the dimensions for the same job were closely related. This was evidenced in the retranslation phase of the Sanitation District Manager examination and in the high intercorrelations among dimensions for the oral examina- tions. These high intercorrelations among dimensions were probably caused by both legitimate relationships among the dimensions and rater unwillingness or inability to rate them separately. Except for the Analyzing/Decision-making dimension, low reliability and lack of comparability between methods for specific dimensions in the Sanitation District Manager study did not appear to be associated with dimension 130 definition problems encountered in the job analysis. Another potential problem was the lack of behaviorally anchored rating scales for three of the four examinations. (There was a behaviorally anchored rating scale for the Affirmative Action Officer behavioral consistency examination.) The researcher made a case for including behaviorally anchored rating scales in a content validity model but, without large applicant samples or previous applicant answers, was unable to develop them in practice. The primary rationale for considering behaviorally anchored rating scales desirable was their supposed contribution to reliability. It stands to reason that having clear examples for various rating scale points would provide raters with common rating criteria and therefore would provide for more reliable ratings. However, the opposite was true. Interrater reliability for the Affirmative Action Officer behavioral consistency questionnaire was lower than interrater reliability for any of the other three examinations with respect to both overall scores and dimension scores. This evidence, along with evidence from the performance evaluation literature regarding the lack of superiority for behaviorally anchored rating scales, has caused the researcher to conclude that lack of behaviorally anchored rating scales did not significantly affect study results. A final limitation of the study had to do with administration time. If the raters had had more time to 131 rate the candidates, at least reliability may have been improved. However, extension of administration time does not seem practical with the use of volunteer raters, which is a common practice in governmental agencies. This practice would have to be changed to one of using employees as raters or to one of using paid consultants as raters. There were several differences between the two studies, some of which have been discussed above as limitations, which could potentially have accounted for the differences in findings regarding the primary question of comparability. These were differences in applicant group characteristics, rater characterisitics, job analysis methodology, and use of behaviorally anchored rating scales. The applicant groups differed considerably in degree of heterogeneity. The Affirmative Action Officer candidates were a nationwide group with diverse backgrounds while the Sanitation District Manager candidates were City employees occupying similar positions. The rater groups for the two studies differed in degree of expertise. Most of the Affirmative Action Officer raters had backgrounds in personnel management and were more knowledgeable although not necessarily more experienced than the Sanitation District Manager raters in the use of rating scales. Job analysis for the two studies differed with the Affirmative Action Officer study based on a task analysis and the Sanitation District Manager study based on a critical incident/retranslation of expectations approach. The last major difference between the studies was in the use 132 of behaviorally anchored rating scales. Both studies used modified behaviorally anchored rating scales for the oral approach, but the Affirmative Action Officer study employed behaviorally anchored rating scales for the behavioral consistency approach while the Sanitation District Manager study used a graphic rating scale for this approach. These differences between the studies would provide a logical explanation for the differences in comparability results if the comparability differences were in the opposite direction. Except for the job analysis methodology, each of the differences between the studies (hetereogeneity of sample, rater expertise, and use of behaviorally anchored rating scales) would lead to the expectation that the oral and behavioral consistency approaches would be significantly correlated for the Affirmative Action Officer examination but not for the Sanitation District Manager examination. Because the opposite occurred, these differences did not appear to influence the study results, and the above analysis has not provided insight on the problem, leaving the researcher unable to furnish an explanation. Implications for Future Research The most obvious implication for future research is to replicate the study using a large sample of examination candidates. This would increase power and confidence in study results. Such a study should seek to increase the reliability of the behavioral consistency approach and the 133 accuracy of the separate dimension ratings. The usefulness of such a study would be increased if ratings from both approaches could be compared to job performance, enabling conclusions to be drawn regarding the empirical validity as well as the comparability of the methods. Problems encountered in the study also point to directions for future research. Additional studies of rater characteristics, including field—dependency and experience, would be very useful to practitioners. These studies should compare subject matter experts rather than students on these factors. Studies comparing behaviorally anchored rating scales to other types of rating scales should be done for oral and behavioral consistency examinations as well as for performance evaluations. In general, the previous research on oral interviews has not included careful development of an oral examination based on a content validity model. Interview content has been unspecified in many studies with candidates perhaps given global ratings based on different dimensions and different questions. Oral examination research in which the oral examination was developed according to a content validity model is still sorely needed. APPENDICES Appendices for Affirmative Action Officer APPENDIX A INSTRUCTIONS FOR THE AFFIRMATIVE ACTION OFFICER TZSK INVENTORY This Task Inventory consists of lists of task statements grouped into five major areas and three rating scales which will be used to determine which tasks are critical to the Affirmative Action Officer job. 1. Please begin by detaching the Task Inventory Rating Scales page (last page of the inventory) from the rest of the inventory. Apply Rating Scale A to each task statement, indicating how important each task is relative to other tasks to be performed. Use the task inventory answer sheet for Rating Scale A to record your response. Apply Rating Scale 8 to each task statement, indicating the relative amount of time spent for each task. Use the task inventory answer sheet for Rating Scale 8 to record your response. Apply Rating Scale C to each task statement, indicating the consequence of error which may result from inadequate or incorrect performance of each task. Use the task inventory answer sheet for Rating Scale C to record your response. After rating each task on each rating scale, please consider whether any tasks are missing. Please include any missing tasks on the last page of the inventory. 134 10. ll. 12. 13. 14. 135 Personnel Department examination Division January 23, 1985 APPIRMATIV! ACTION OFFICER TASK INVENTORY APPIRMATIV! ACTION PLAN ACTIVITIES Responsible for the development, implementation and annual updating of a comprehensive affirmative action plan for the City of Milwaukee. Responsible for assisting in developing, amending or rejecting affirmative action plans of the various City departments and ensuring that each plan is consistent with the overall City affirmative action plan. Responsible for reviewing the employment practices of the various City departments and evaluating their progress in meeting their affirmative action goals. Reviews the City's affirmative action goals in terms of fairness and feasibility and makes recomendations to the Cannon Council and City Service Commission on proposed modifications. COMPLIANCE EXPERT ACTIVITIES Reviews all state and federal rules and regulations concerning equal employment opportunity to ensure that the City is in conformance. Serves as subject matter expert and provides authoritative advice on fair employment practices. Keeps current on affirmative action related legislation/court decisions and implements necessary changes in department policies/practices. Serves as affirmative action advocate in Personnel Department and City. Consults with legal counsel on matters related to 580 and/or affirmative action. Testifies as an expert witness in court or by deposition regarding personnel practices . Identifies problem areas in affirmative action. Writes/rewrites personnel policies or procedures or suggestions for changes related to EEO/affirmative action. Keeps current on affirmative action programs, policies and issues. AFFIRMATIVE ACTION STATUS AND ACCOUNTABILITY ACTIVITIES Maintains all necessary statistics such as the proportion of affected and underrepresented group members at all levels and job classifications in the City’s workforce and the availability of affected and under- represented group members in the relevant labor force. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 136 Presents information in written and oral form to csc, MCCR, Finance Committee and other official bodies. Responds to charges, challenges, complaints, allegations regarding the City's affirmative action plan. Responsible for answering questions from a variety of sources on affirmative action. writes reports on EEO/affirmative action matters. Responsible for computation of mathematical sums, averages, or percentages. Designs new or modifies existing records management systems. Responsible for verifying the accuracy of numerical data. Selects, applies, and interprets statistical indices appropriate to the situation. AFFIRMATIVE ACTION PROJECT ACTIVITIES Responsible for receiving and investigating complaints of discriminatory employment practices from employees and prospective employees in con- junction with the City Attorney. Develops and implements special programs such as departmental succession plans and career ladder plans to accelerate training and experience in underrepresented areas. Responsible for designing and implementing training programs related to affirmative action. Coordinates and oversees affirmative action related recruitment activities. Provides various types of counseling to current and prospective employees and to City supervisors and managers. Researches and remedies problems associated with retaining protected class members in the workforce. Develops and implements new programs and policies to increase the employment and retention of minorities and women. Supervises the Disabled Employees Placement Program. Plans and develops recruiting networks of minorities and females. Responsible for presentations before groups of potential minority and female applicants to explain job opportunities, requirements, procedures, etc. Reviews resumes and/or makes reference checks. Responsible fer administering alternative selection devices/tests to handicapped applicants. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 137 Supervises training needs assessment related to affirmative action. Plans and budgets affirmative action projects and programs. Evaluates the effectiveness of affirmative action programs. Supervises the preparation of requests for proposals, evaluates proposals and specifications from vendors of consulting services, equipment, supplies, etc., and monitors affirmative action related contracts. Identifies specific topics for research. Writes research proposals and grant applications. works with Testing staff on recruitment and other issues pertaining to affirmative action. Recommends or determines organizational or geographical area of competition based on affirmative action concerns. Discusses the qualifications and/or suitability of minority/female candidates for positions to be filled with hiring managers or supervisors. SUPERVISORY/MANAGEMENT ACTIVITIES Supervises Affirmative Action Program staff-motivating, training, assigning work, evaluating, directing and disciplining. Develops and monitors Affirmative Action Program budget. Monitors work unit expenditures to insure overall compliance with budget. Sets goals and objectives for employees in the work unit. Assigns or adjusts work responsibilities of employees based on organizational needs, experience and competency of staff, developmental needs of staff, emergencies, and other factors. Checks (monitors) the progress of work assignments periodically or at critical points to insure objectives and timetables are being met. Reviews work products, correspondence, recommendations, and other written materials prepared by staff to insure that the quality is satisfactory, that policy is being fbllowed or interpreted correctly, that they are technically correct, etc. Evaluates work of employees against criteria identifying strengths and deficiencies in products, performance, or other dimensions of importance to the unit or organization. Defines courses of action to correct deficiencies in performance and provides positive feedback and reinforcement for successful employee performance. Counsels employees with regard to developmental objectives, career plans, promotional opportunities, etc. 138 54. Serves as a primary representative and communication link between the work unit and other work units in the department and City. 55. Prepares reports detailing work unit activities, program status, or reportable statistics for other work units, outside agencies, or management information. ADDITIONAL TASK STATEMENTS If any Affirmative Action Officer tasks have been omitted from the inventory, please add them below. 1139 020% Sam: m~emumbwm=ob .m 0203 ERG: QDOMLQW HQC}QEOW ov utzoam mmamw ths .m moon Sum: maom .m ecmosm omamq .v ucmuuoqsw Aum> .m moon Etna ucmw~m .N ucaosm mmmumsw .m stouaoqsw x~omummwm209 .v atom are: m~nwme~mm= to sneeze: .~ utoosm -msm .N ecmuaoqsw x~m~mumuoz .m .0 u~eum «aware how umocm heamce xu0e=m>=w ammo use :0 mmeoqmmu muouom ucaoee -oem Ate; .~ ucmuuonswumcamsom .N mamas ecu %0 mueuahomhmq uumuhoutw no noon Ac econ 0o m~=oa Sue: we assess are: .m memom museum h0§ HQQCM hmkmCQ ABOHEQ>=N «mes are to mmcoqmmu muoumm ucnsaoqswtb .~ houum e0 mucmaommcou «names emcee cu bmhmasoo menu cu amen mace uumnxm m~=oa 30x QEwu No utnosu mswumwmu ecu me use: .v «snow «sauna Lou ummcm umamcm xu0u=m>cw smug was :0 mmcoqmmh muoumm steam mean &0 utooso m>wum~mk stewuwmoq was No mt0wu noose aches was soo mcwxuuuu cw mamas guano cu mswumamu amen menu ummwmcou 30x on utmeuoqsw 30: .cOwuwmoq ecu we cowuucok ecu Cu ucmscmwaqaouum ammo we mueuuaoqa~ U mqmum UZ~$2~ kWtR APPENDIX B INSTRUCTIONS FOR THE AFFIRMATIVE ACTION OFFICER KSA INVENTORY This KSA Inventory consists of lists of KSAs grouped into seven major areas and three rating scales which will be used to determine which KSAs should be tested for the Affirmative Action Officer job. 1. Please begin by detaching the KSA Inventory Rating Scales page (last page of the inventory) from the rest of the inventory. Apply Rating Scale A to each KSA statement, indicating how important each KSA is relative to other KSAs. Use the KSA inventory answer sheet for Rating Scale A to record your response. Apply Rating Scale 3 to each KSA statement, indicating the necessity for each KSA at time of hire. Use the KSA inventory answer sheet for Rating Scale 8 to record your response. Apply Rating Scale C to each KSA.statement, indicating which KSAs distinguish between effective and ineffective workers. Use the KSA inventory answer sheet for Rating Scale C to record your response. After rating each KSA on each rating scale, please consider whether any KSAs are missing. Please include any missing KSAs on the last page of the inventory. 140 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 141 Personnel Department Examination Di vision January 23, 1985 AFFIRMATIVE ACTION OFFICER KSA INVENTORY KNOWLEDGE OE AFFIRMATIVE ACTION Knowledge of affirmative action goal setting Knowledge of affirmative action plans Knowledge of labor force availability methods Knowledge of state/federal affirmative action regulations Knowledge of affirmative action information sources Knowledge of City of Milwaukee affirmative action plans and policies Ability to provide authoritative advice on fair employment practices on short notice Knowledge of concepts and concerns relevant to affirmative action and equal employment opportunity in the public sector Knowledge of/sensitivity to needs of disadvantaged Knowledge of comparable worth Ability to research affirmative action issues Ability to recognize discriminatory practices (systemic) Comitment to affirmative action KNOWLEDGE OF PERSONNEL MANAGEMENT Ability to design/implement training programs Knowledge of recruitment methods Ability to counsel Knowledge of retention problems Knowledge of personnel testing Knowledge of human resources planning Knowledge of labor relations Knowledge of wage and salary administration, compensation systems Knowledge of merit system concepts Knowledge of impact of laws and other environmental influences on recruitment and selection 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 142 Knowledge of job classification/evaluation systems Knowledge of ways of making job evaluation more relevant to employee and organiza tional producti vi ty . Knowledge of the legal environment of disciplinary actions and grievance procedures. Knowledge of the relationship of disciplinary actions to other personnel activities. Knowledge of internal personnel maintenance issues (promotions, transfers, demotions, terminations, lay-offs, retirements...) Ability to construct and conduct surveys. PLANNING, ANALYSIS SKILLS Planning/organizing ability Knowledge of descriptive statistics Accuracy in performing mathematical computations Ability to determine impact of decisions on others or other components of the organization Skill in identifying problems, securing relevant information and identifying possible causes of problems Ability to critically analyze issues and challenges Ability to assess a course of action in terms of its long-range effects bility to draw valid conclusions from data Ability to analyse problems and develop alternatives Ability to meet deadlines Ability to establish priorities Ability to use data processing resources effectively Ability to anticipate and solve problems Ability to develop alternative solutions to problems 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 143 ' ’ DECISION-MAKING, JUDGMENT, INDEPENDENCE SKILLS Ability to work independently Ability to stand up for ideas Ability to make decisions within guidelines Decisiveness: Readiness to make decisions,render judgments, take action, or commit oneself. Ability to apply policies Willingness to accept responsibility Selfbconfidence Judgment COMMUNICATION SKILLS/INTERPERSONAL SKILLS Oral presentation skills writing skills Interpersonal/persuasive skills Oral communication skill Listening skill Sensitivity Ability to negotiate Ability to relate to policy makers Ability to read and understand written material Impact Leadership Ability to receive information tactfully from abrasive people Ability to testify effectively Ability to see more than one side of an issue 66. 67. 68. 69. 70. 71. 72. 73. 74. 750 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 144 ' SUPERVISION SKILLS Knowledge of motivation and ability to motivate staff members Ability to train staff Knowledge of disciplinary methods, grievance procedures Ability to delegate Ability to develop quality/timeliness controls Ability to evaluate work performance Ability to develop and monitor budget Knowledge of methods to increase productivity Ability to assign work based on appropriate criteria Knowledge of Management by Objectives Ability to analyze reasons for poor or untimely work performance Ability to give appropriate feedback to employees Ability to counsel subordinates OTHER MANAGEMENT SKILLS Creativity Ability to generate and support good ideas Ability to perfOrm under pressure/extreme stress Ability to learn rapidly Ability to work with frequent interruptions Ability to accept constructive criticism Ability to contribute effectively as a member of a team; ability to cooperate Ability to act in accordance with high ethical and professional standards Ability to work in a political system High motivation for professional growth Conscientiousness Discretion Initiative 145 ADDITIONAL KSA ' 3 If any Affirmative Action Officer KSA's have been omitted from the inventory, please add them below. 146 .fizkmm: mum: .m .u2muuomew mum; .m .Mzumm: magnummwmcou .v .utmuuomsw mmnmummwmtoo .v .«zumm: mwmumummOE .m .wuw: ho meu an mmwucmmmm .m .utuuuomsw m~0uuummo= .m .~:umm: umcaofiom .N .muwc mo mEflu an queen: ..~ .u:MuquEw uncamEom .N .azumm: 002 .~ ..muw: Mo meu on mummy: uoz .H .utMuNOQEwc: .H .U mamum unwuum How ummcm nuance mucucm>=w cm: may to encommmu vacuum 1 .m «anon mewumm how ummcm nuance mucuco>cw cm: use to mmtoqmmu vacuum .< mason mtwumm now ummcu nuance mucutm>tw wuomhem2w EOuN m>wuomwwm mtwcmwzm:wumwm :w wax menu me «sewn: so: «mew: ho oEwu um wmx ecu aw mocuEHONumm waummmoozu how mummmmum: 30: «manna QOfi mau No moCmEHONHmQ use :w (W: may we unuuhomsw 30: WQMXQOR H>HSUWKLMZH EOKL M>H60hkhm GZH=WHDUZHBWHQ HRH: LO MINE SQ ahHmmfiumz MUQQSMOQIH U NQQUW OZHEQE m MQQUW GZHRQM E NQQUW UZHhfim WWQm0&2m>2~ (WM APPENDIX C Task Groups and Essential Tasks Task Groups Essential Tasks A. Affirmative Action Plan Activities 1-4 8. Compliance Expert Activities 5-9, ll-13 C. Affirmative Action Status and l4-l9,21,22 Accountability Activities D. Affirmative Action Project 24.32.37 Activities E. Supervisory/Management Activities 44,47-52,54,SS 147 APPENDIX D Revised Dimensions and Critical KSA's Revised Dimensions 10. Knowledge of Affirmative Action Planning and Organizing Skills Analytical and Quantitative Reasoning Abilities Decision-Making, Judgment and Independence Skills Oral Communication and Inter- personal Skills Written Communication Skills Supervisory Skills Initiative/Creativity/Intelligence Toleration of Stress Professionalism 2148 Critical KSA's l,2,4,7,8,ll,12,13,23 30,39,40,42 31,32,33,34,3S,36,37, 38,43,60 44-51 52,54,55,56,S7,58,59, 61,62,63,64,65 53 66,69,70,71,74,76,77 79,80,82,9l 81,83 84-90 APPENDIX E Link up of Task Groups/KSA Dimensions Dimensions Knowledge of Affirmative Action Planning and Organizing Skills Analytical and Quantitative Reasoning Abilities Decision-Making, Judgment and Independence Skills Oral Communication and Interpersonal Skills Written Communication Skills Supervisory Skills Initiative/Intelligence/Creativity Toleration of Stress Professionalism Task Groups* 8 C D X X X X X X X X X X X X X X X X X *Task Group A: Affirmative Action Plan Activities Task Group 3: Compliance Expert Activities Task Group C: Affirmative Action Status and Accountability Activies Task Group D: Affirmative Action Project Activities Task Group E: Supervisory/Management Activities 149 ‘APPENDIXIF‘ Dimensions and KSA's Tested Dimensions 1. Knowledge of Affirmative Action 2. Planning and Organizing Skills 3. Analytical and Quantitative Reasoning Skills 4. Oral Communication and Inter- personal Skills 5. Written Communication Skill 6. Supervisory Skills 150 KSA's l.2,4,7,8,ll,12,l3,23 30,39,40,42 31,32,33,34,35,36,37, 38,43 52,54,55,56,57,58,S9, 61,62,63,64,65 53 66,69,70,71,74,76,77 II. III. IV. Ad?PERflDI)( G DIMENS IONS Knowledge of Affirmative Action The Affirmative Action Officer must be able to develop affirmative action goals and plans. He or she must have current knowledge of state and federal laws. regulations and court cases related to affirmative action and be able to provide authoritative advice on fair employment practices on short notice. In addition. he or she must be knowledgeable regarding concepts and concerns relevant to affirmative action and equal employment opportunity in the public sector and be able to recognize discriminatory practices and to research affirmative action issues. The Affirmative Action Officer must be committed to affirmative action. Planning and Organizing Skills The Affirmative Action Officer must be able to plan and organize work effectively. This involves the ability to establish a situationally appropriate plan or course of action for one's self and others to attain specific goals or accomplish defined tasks. It involves the ability to employ a systematic approach to the work, to set meaningful priorities. to manage time and resources effectively and to meet estab- lished deadlines. Analytical and Quantitative Reasoning Abilities The Affirmative Action Officer must have analysis and quantitative reasoning skills. This involves the ability to identify problems. secure relevant information and identify possible causes of problems; the ability to develop alternative solutions to problems; the ability to assess a course of action in terms of its long-range effects and to determine the impact of decisions on others or other components of the organization. It also involves accuracy in perfonning mathematical computations, knowledge of descriptive statistics, the ability to draw valid conclusions from data, and the ability to relate and compare data from a variety of sources. Oral Communication and Interpersonal Skills The Affirmative Action Officer must be able to effectively and accurately communicate ideas and information to others orally in both formal and informal one-to-one and group situations. extemporaneously or with prior preparation. In addition, he or she must possess interpersonal and persuasive skills. This involves the ability to interact effectively with peers. superiors and subordinates. staff of other departments and members of the various "publics" served. It also involves the ability to see more than one side of an issue. to work cooperatively, to adapt one's behavior to enable the effective pursuit of goals despite obstructions presented by conflicting attitudes, opinions or the action of others and the ability to generate the trust and confidence needed to obtain agreement or acceptance with respect to ideas, plans or programs. Supervisory Skills The Affirmative Action Officer must be able to supervise both professional and clerical staff members. He or she must be able to delegate work and to assign work based on appropriate criteria, to develop quality and time- liness controls._to motivate staff members and evaluate work performance.. to analyze reasons for poor or untimely work performance and give appropriate feedback to employees. 151 APPENDIX H CITY OF MILWAUKEE ACHIEVEMENT HISTORY QUESTIONNAIRE FOR AFFIRMATIVE ACTION OFFICER 5 fl I NAME 1 if? CIIZI ADDRESS I _1‘ 1 O 1. t1. “L’OI PHONE (DAY) )filWélllkee (EVENING) T . . NOTE: Deadline for return of this questionnaire is Tuesday, February 12, 1985 in order to be guaranteed consideration bu the rating panel. I l .__..—.——.__ GENERAL INSTRUCTIONS PLEASE NOTE: It is very important to read these instructions very carefully before you begin to complete the questionnaire. 1. Please answer the sets of questions which appear on the attached pages. Each set of questions relates to an important dimension of the Affirmative Action Officer position. The questions ask you to describe what you con- sider to be your major achievement(s)which would demonstrate that you possess the job-related knowledge, skills or abilities identified. In other words, we are looking for concrete examples of things you have actually done or accomplished pertaining to each dimension rather than a general description of your skills. These achievements may be either specific incidents or examples of sustained high performance over a period of time. An example of an achievement demonstrating skills related to another dimension is attached. 2. You should select your strongest achievements regardless of where they were attained. You need not restrict yourself to those related to affirmative action positions. 3. If you cannot have your responses typewritten.please write as neatly as possible in black ink. Attach additional pages as necessary. Your responses will be photocopied for distribution to the panel of raters. 4. Do not substitute your resume for any responses to the questionnaire. All responses must be in the format provided in order to ensure a fair evalua- tion. 5. You are asked to describe one achievement for each of the five dimensions in the questionnaire; however, if you would like, you may describe more than one. If so. attach a separate sheet with the same format. 1.522 153 6. Return your completed questionnaire to: Timothy J. Keeley City of Milwaukee Personnel Department Room 706. City Hall 200 East Wells Street Milwaukee, WI 53202 7. Questionnaires must be received in our office (not postmarked) by Tuesday, February 12, 1985 to be guaranteed consideration by the rating panel. PLEASE READ AND SIGN THE FOLLOWING STATEMENT AND RETURN WITH THE QUESTIONNAIRE I certify that all information provided herein relating to my own achievements and experience is true to the best of my knowledge. and that the information can be verified through persons I have so listed in the questionnaire. I understand that falsification of information may result in disqualification or removal from a City position. DATE SIGNATURE 1554 EXAMPLE This is an example of an achievement demonstrating skills related to the training dimension for another position (that of Personnel Officer). You do not need to describe an achievement for this dimension. Example Dimension: Training Skills The Personnel Officer must be able to plan and implement training programs for groups of employees as well as orient new employees to their jobs. l. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did, and include the objective or the problem. While working as a personnel administrator for the State of Ohio, I was responsible for supervising a federally funded project whose objective was to provide assistance to State and local governments in the area of personnel testing. This assistance was provided through training seminars and workshops as well as research projects. The objective of the training function was to determine training needs of local jurisdictions. develop training programs to meet these needs and implement the training programs. The project began with a needs assessment survey of local jurisdictions throughout Ohio. Based on the survey, we determined that two sets of training programs were needed - one on oral examinations and legal guide- lines for testing and the other on job analysis and test development. DevelOpment of the seminar on oral examinations and legal guidelines was carried out by a staff of four personnel management specialists under my direction. We used several techniques to present the material including lecture, tryout, group discussion and role-playing. This was a one day seminar given in five locations across the state. I was one of five presenters. Development of the seminar on job analysis and test development was carried out by myself and one of my staff members. The primary techniques used were lecture, tryout and group discussion. This was a one fay seminar given in one location. I was one of two presenters. 155 What was the outcome or result? The oral examination and legal guidelines seminars were attended by groups varying in size from 10 to 25. The job analysis seminar drew a group of 75. Evaluations from seminar participants showed that they consrdered the seminars useful, informative and well—presented. Was this achievement entirely attributable to you? Yes No x If no, what is the estimated percentage of this achievement which 35 due to the efforts of other people (excluding clerical staff)? ° However, this was all done under my superviSion. (a) When did this achievement take place (approximate date)? The oral examination seminars were given in April and May of £975. The job analysis seminar was given in September of 1975. Seminar development took place throughout the Spring and Summer of l975. (b) For what employer? State or Ohio Please give the name. address and telephone number of someone who can verify this information. Joseph Jones (Mlo annrtmenr of Admi nisrra Cl VI“ jt‘l'v i. '05 P. O. Box 30007 Columbus, ”him Jl215 (614) Jéo-JSUT 156 Knowledge of Affirmative Action The Affirmative Action Officer mu5t be able to develop affirmative action goals and plans. He or she must have current knowledge of state and federal laws. regulations and court cases related to affirmative action and be able to provide authoritative advice on fair employment practices on short notice. In addition, he or she must be knowledgeable regarding concepts and concerns relevant to affirmative action and equal employment opportunity in the public sector and be able to recognize discriminatory practices and to research affirmative action issues. The Affirmative Action Officer must be committed to affirmative action. 1. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did. and include the objective or the problem. 157 2. What was the outcome or result? 3. Was this achievement entirely attributable to you? Yes No If no, what is the estimated percentage of this achievement which is due to the efforts of other people (excluding clerical staff)? 4. (a) When did this achievement take place (approximate date)? (b) For what employer? 5. Please give the name, address and telephone number of someone who can verify this information. II. 158 Planning and Organizing Skills The Affirmative Action Officer must be able to plan and organize work effectively. This involves the ability to establish a situationally apprOpriate plan or course of action for one's self and others to attain specific goals or accomplish defined tasks. It involves the ability to employ a systematic approach to the work, to set meaningful priorities. to manage time and resources effectively and to meet estab- lished deadlines. l. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did, and include the objective or the problem. 159 2. What was the outcome or result? 3. Was this achievement entirely attributable to you? Yes No If no, what is the estimated percentage of this achievement which is due to the efforts of other people (excluding clerical staff)? 4. (a) When did this achievement take place (approximate date)? (b) For what employer? 5. Please give the name, address and telephone number of someone who can verify this information. III. 160 Analytical and Quantitative Reasoning Abilities The Affirmative Action Officer must have analysis and quantitative reasoning skills. This involves the ability to identify problems, secure relevant information and identify possible causes of problems; the ability to develop alternative solutions to problems; the ability to assess a course of action in terms of its long-range effects and to determine the impact of decisions on others or other components of the organization. It also involves accuracy in performing mathematical computations, knowledge of descriptive statistics. the ability to draw valid conclusions from data. and the ability to relate and compare data from a variety of sources. 1. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did. and include the objective or the problem. 161 2. What was the outcome or result? 3. Was this achievement entirely attributable to you? Yes No If no, what is the estimated percentage of this achievement which is due to the efforts of other people (excluding clerical staff)? 4. (a) When did this achievement take place (approximate date)? (b) For what employer? 5. Please give the name, address and telephone number of someone who can verify this information. 162 Oral Communication and Interpersonal Skills The Affirmative Action Officer must be able to effectively and accurately communicate ideas and information to others orally in both formal and informal one-to-one and group situations, extemporaneously or with prior preparation. In addition, he or she must possess interpersonal and persuasive skills. This involves the ability to interact effectively with peers, superiors and subordinates, staff of other departments and members of the various ”publics" served. It also involves the ability to see more than one side of an issue, to work cooperatively. to adapt one's behavior to enable the effective pursuit of goals despite obstructions presented by conflicting attitudes, opinions or the action of others and the ability to generate the trust and confidence needed to obtain agreement or acceptance with respect to ideas, plans or programs. l. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did, and include the objective or the problem. 163 2. What was the outcome or result? 3. Was this achievement entirely attributable to you? Yes No If no, what is the estimated percentage of this achievement which is due to the efforts of other people (excluding clerical staff)? 4. (a) When did this achievement take place (approximate date)? (b) For what employer? 5. Please give the name, address and telephone number of someone who can verify this information. . 164 Supervisory Skills The Affirmative Action Officer must be able to supervise both professional and clerical staff members. He or she must be able to delegate work and to assign work based on appropriate criteria, to develop quality and time- liness controls, to motivate staff members and evaluate work performance. to analyze reasons for poor or untimely work performance and give appropriate feedback to employees. l. Describe an achievement which would show that you have the knowledges, skills or abilities described above. Tell us what you actually did. and include the objective or the problem. 165 2. What was the outcome or result? 3. Was this achievement entirely attributable to you? Yes No If no, what is the estimated percentage of this achievement which is due to the efforts of other people (excluding clerical staff)? 4. (a) When did this achievement take place (approximate date)? (b) For what employer? 5. Please give the name, address and telephone number of someone who can verify this information. APPENDIX I Rater Evaluations We appreciate your serving as a rater for the Affirmative Action Officer examination. Your answers to the following questions will be used to improve our selection process. I. Do you consider it fair to use accomplishment ratings to determine which applicants will participate in an oral examination? A. Very fair D. Unfair 8. Fair E. Very unfair C. Somewhat fair Do you consider it fair to use accomplishment ratings to determine final rankings in the examination process? A. Very fair D. Unfair 8. Fair E. Very unfair C Somewhat fair How effective do you consider the accomplishment rating process in determining the best qualified applicants to be called for an oral examination? A. Very effective D. Ineffective 8. Effective E. Very ineffective C. Somewhat effective How effective do you consider the accomplishment rating process in determining the best qualified applicants for a job? A. Very effective D. Ineffective 8. Effective E. Very ineffective C. Somewhat effective How difficult did you consider the accomplishments to rate? A. Very easy D. Difficult 8. Easy E. Very difficult C. Somewhat difficult How reasonable do you consider the time you spent rating accomplishments for a position of this level and type? A. Very reasonable D. Unreasonable B. Reasonable E. Very unreasonable C. Somewhat reasonable 166 APPENDIX J Rater Evaluation We appreciate your serving as a rater for the Affirmative Action Officer oral examination. Your answers to the following questions will be used to improve our selection process. 1. Do you consider it fair to use oral dimension ratings to determine final rankings in the examination process? A. Very fair D. Unfair B. Fair E. Very unfair C. Somewhat fair How effective do you consider the oral dimension rating process in determining the best qualified applicants for a job? A. Very effective D. Ineffective B. Effective E. Very ineffective C. Somewhat effective How difficult did you consider the dimensions to rate? A. Very easy D. Difficult B. Easy E. Very difficult C. Somewhat difficult How reasonable do you consider the time you spent rating dimensions for a position of this level and type? A. Very reasonable D. Unreasonable B. Reasonable E. Very unreasonable C. Somewhat reasonable Please add any comments in the space below. 167 APPENDIX K Significance Tests for Differences Between Oral and Behavioral Consistency Reliability Coefficients for Dimension and Overall Reliabilities for the Affirmative Action Officer Study r1 r2 r12 W t D1 .82 .64 .13 2.00 1.43 D2 .83 .72 .04 1.65 1.01 D3 .82 .79 .18 1.17 .32 D4 .76 .64 .08 1.50 .82 D5 .71 .75 .34 .86 —.32 O .84 .81 .17 1.19 .35 Note. D1 = Knowledge of Affirmative Action; D2 = Planning and Organizing; D3 = Analytical and Quantitative Reasoning Abilities; D4 = Oral Communication and Interpersonal Skills; D5 = Supervisory Skills; 0 = Overall Score; column definitions are as follows: r1 = oral reliability, r2 = behavioral consistency reliability, r12 = correlation between oral and behavioral consistency ratings, W : (1”r2)/(1—rl)r _ _ _ 1/2 _ 2 1/2 tN—2_ (W 1)(N 2) / [4W(1 r12)] , and N = sample size; none of the t values were significant (p > .10, df = 16, two-tailed). 168 Appendices for Sanitation District Manager APPENDIX L INSTRUCTIONS FOR THE SANITATION DISTRICT MANAGER CRITICAL INCIDENT INVENTORY This Critical Incident Inventory consists of lists of critical incidents grouped into six major areas. Determine how effective each incident is compared to other incidents in carrying out the major functions of the position. Please rate the effectiveness of each critical incident using the following scale. extremely ineffective considerably ineffective moderately ineffective somewhat effective moderately effective considerably effective \JO‘U’Iwa-d extremely effective Record your rating in the space next to each incident on the inventory. Please rate each incident even if it duplicates or closely resembles another incident. After rating each incident. please consider whether you would like to add any incidents. If so. please include them on the last page of the inventory. .169 10. ll. l2. l3. l4. l7. l8. 19. 170 SANITATION DISTRICT MANAGER CRITICAL INCIDENTS PLANNING AND ORGANIZING Assigns all regular sanitation supervisors to work during the initial wave of a snow emergency thereby not having any supervisors to staff interim operations. Updates and routes weed-cutting locations for mowers and consolidates a list for mowers to follow. Anticipates need for pick-up of garbage if special occasion occurs. Assigns and requests appropriate equipment for bulky item collection. Holds informal meetings periodically for district staff to plan seasonal needs. Gives inaccurate indication of supply needs. Unavailable when staff meetings with the area manager are necessary to plan manpower needs during the winter resulting in inadequate manpower and infrequent collections. Has up-to-date list of streets prone to develop drifting problems. Uses a cluster technique to insure all carts in district are serviced without overtime. (Crews that finish daily route assist crews that are behind schedule.) Assigns, moves and reassigns equipment and/or manpower as needed. Ignores routing for bulky item collection. Does not attempt to reduce manpower in using the end-loader and hopper for leaf collection. Does not adequately plan manpower needs or adjust to shortage of collectors. Makes up the winter supervisor duty roster ensuring that holiday assignments are shared equally and that no supervisor has two duty weeks in a row. Balances supervisors“ assignments to avoid hardship, overworking. Establishes inappropriate priorities and policies for vehicle and operator assignment. Develops weed route books and maps indicating salt routes and wall maps indicating limits of collection routes. Does not set up or enforce good office procedures in district office. Does not have maps of salt or plow routes in district and the individual route sheets are not up to date. 20. Zl. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 1'7]. Does not follow ore-planned snow operation, assigning snowplow drivers to routes in the order in which routes were numbered and drivers arrived. . Does not establish policies for vehicle and operator assignment. Balances overtime hours of supervisors. Maintains and updates an efficient, accurate, up-to-date routing system, reviewed annually (or as need develops). Establishes and maintains an accurate filing system and record keeping procedure in district headquarters. Does not anticipate clean-up assignments of equipment, personnel and material delivery after snow storm. Routing system for ice control is inefficient or not current. Anticipates and sets up clean-up assignments of equipment, personnel and material delivery after snow storm. Revises filing system for cart chits by filing all chits by quarter section numbers. Does not maintain accurate records due to inaccurate documentation or ignoring records. Ensures adequate staffing for collection operations during daytime salting operations. Establishes priorities and policies for assignment of vehicles and operators according to expertise and capability. Exceeds the allotment of scheduled vacations causing a shortage of manpower by granting an unscheduled vacation request. Utilizes resources across district to ensure that weekly collection schedules are met. Gradually reduces manpower in the fall to avoid unnecessary costs for the Bureau. Ignores balancing overtime hours or inappropriately balances hours. ANALYZING & DECISION MAKING Utilizes resources available to solve collection problem. Investigates complaint regarding a damaged metal bushel basket and finds crew to be innocent. Follows through by contacting complainant. Appropriately assigns bulky item collection to regular residential crews. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. ST. 52. 53. 54. 55. 56. 1'72 Does not evaluate effectiveness of ice control operations or utilize tools at his disposal. Inappropriately responds to an emergency snow plowing situation by not providing information or a course of action to be taken at initial phone call to his home. Orders leaf equipment before there are enough leaf piles to use the equipment for a full day to ensure having the equipment at times of greater demand. Acts immediately to clear a private parking lot at Children's Hospital due to emergency situation, permitting a helicopter to land. Takes initiative to address emergency ice control situation immediately. Calls superintendent at home to recommend plowing due to developing poor conditions after the area manager was informed and rejected recommendation. Assigns a county worker to help the yardman collect bulky items in order to bring the large number of bulky item requests down to a manageable limit.- Analyzes why progress in collection is not effective (volume, manpower and equipment). Orders crews to plow snow while salting the streets on a weekend when there is a low salt supply and snow from a previous storm is present. Reviews plow routes and eliminates several streets from the "Mains” list that no longer carry traffic to justify priority treatment. Formulates a selection method, has it approved by the superintendent and makes a recommendation for appointment following a newly established procedure in filling a job. Ensures that all citizen hardship requests are handled in person and not over the phone. Assigns a supervisor to an Alternate Plowing Situation who is unfamiliar with the problem and cannot be relied upon, resulting in trucks being used incorrectly and problems not being addressed. Denies request for use of a bulky collection packer to help a crew which had its regular packer break down for one hour. Does not pre-check bulky collection items (just assigns items to be picked up). Drives through district on a daily basis to determine where problem areas exist and what might have occurred overnight that needs immediate attention, e.g. litter problems. Waits for direction from main office regarding ice control operations. Takes appropriate follow-up measures regarding an elderly person who could not move brush to proper location. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 173 Puts unresolved problems on hold. Grants request of additional snow equipment without delay during an ice control operation. Checks field conditions first hand andcwdersout heavy equipment to open a subdivision cut off by wind-driven snow. Interprets litter policy literally; sends trucks to empty containers already empty. Ignores policy on Friday litter; lets trash accumulate. Makes no recommendations or inappropriate recommendations regarding operational needs during an ice control operation. Utilizes resources in order to reduce a growing number of bulky collection requests by using regular garbage collection crews to make two furniture stops daily while the bulky collection truck handles brush pick-ups. Is unable to make decisions when necessary due to procrastination or letting problem take care of itself. Makes reliable recommendations for operational needs, e.g. equipment. during ice control operations. Ignores ineffective collection methods and does not attempt to improve them. Does not physically check status of buildings and issues free carts to multiple unit buildings. Transfers men and equipment where needed to make up for deficiencies in manpower in order to maintain schedule. Pre-checks bulky material to be collected to ensure appropriate action. Designs Hardship Factor for cart hardships determining how long it takes to service a hardship vs. a regular cart collection. Based on Aldermanic c0mplaint, dispatches truck and helper to plow street without investigating, resulting in unnecessary overtime since street had already been plowed. Begins use of a hopper and end loader for the leaf program rather than hand or vac pickups to increase efficiency. Ensures that regular crew picks up special collection when warranted. Ignores requests, does not respond immediately, or action taken is incorrect. Orders out men and equipment for weekend work to clean up after a parade on his own authority when he could not reach and get approval of area manager or a higher level supervisor. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 174 Routinely checks district for potential problems using interaction with employees and subordinate supervisors. Reconnemds trial use of log loader trucks in the curb pick-up of brush to increase efficiency. Requests. schedules, reassigns equipment and manpower as conditions warrant during snow plowing. Removes all Dead Ends and Cul-De-Sacs from regular plow routes and places them on special end-loaders' routes for plowing, resulting in too much work for end-loaders. Evaluates/critiques individual salting (plowing) operations in terms of time, quality and complaints. Takes appropriate action when discovers complaint being mishandled. Makes recommendations for policy change to area manager. Sends inappropriate truck to pick up bulky collection items. Does not move equipment and manpower when needed. Investigates aldermanic complaint regarding weeds before acting and finds complaint to be unjustified, the result of a feud between neighbors. Allows personnel to act alone in emergency situation. Recommends that two more salt trucks be parked at the District Headquarters, making use of two empty stalls. Handles complaints uniformly. Fails to investigate breakdown of cart collection truck and requests unnecessary replacement for minor breakdown. Waits until main office gives approval to address emergency ice control situation. Does not make any bulky item assignments to residential crews. Learns about a new area not by hearsay, but rather through actual observation to determine what is going on in the area. Addresses any problem or complaint that is still unresolved the day after a snow operation. Pulls plows off a route where the main streets were complete and moves them to a route that has equipment shortages. Does not order clean up plows to handle snow islands, complaints and missed streets the day after a storm. 96. 97. 98. 99. lOO. ___101. __ 102. __ 103. ___104. _ 105. l06. l07. l08. 109. llO. 111. 112. ll3. ll4. 175 COMMUNICATION/INTERPERSONAL SKILL Fails to follow-up on a citizen complaint, not informing the complainant of the legitimate reason for the lack of collection and the appropriate action that will be taken. Does not participate during a brainstorming session related to improving bureau services. Communicates problems to superiors. Does not correct misinformation being given out. Maintains good relations and communication with other bureaus during snow operations. Informs Alderman of results regarding a citizen/Aldermanic complaint. Fails to pass along an ice control alert to the next district manager. Informs main office of changing conditions. Promises to handle complaint and get back to person. Follows up on a complaint handed up from the route supervisor regarding crews setting out cans at the alley line ahead of the truck. Compromises with the complainant by not setting out cans in her block and notifies supervisor of the complaint and action taken. Does not meet with supervisors at all or infrequently. Warns higher level supervisors of problem that was not solved regarding a citizen that refused to move garbage to curb. Inappropriately responds to an Aldermanic service request by not explaining why the City does not plow alleys and not suggesting any alternatives. Responds in writing to Aldermanic service request, clearly and concisely, and to the heart of the problem. Becomes involved in staff meeting by voicing opinions and informing others what is going on in the field. Explains that everyone in neighborhood has same problem when answering complaint. Informs main office of action taken in emergency situations. Establishes poor relations/communications with other bureaus or departments. Makes timely and appropriate response to Aldermanic requests during storm and emergency situations. llS. ll6. 117. 118. 119. lZO. lZl. 122. l23. l24. 125. 126. 127. l28. l29. l30. lBl. I32. 133. 134. 176 Introduces himself/herself to the employees and supervisors when going into the new area and lets them know what he/she expects. Does not have open communication with field supervisors and does not inform field superv1sors of changes in various procedural matters. Conveys incorrect information to subordinate supervisors regarding how to conduct the survey prior to the installation of carts. Maintains an open door policy to all employees to allow them to come and speak freely. Examines and writes down critical complaints (cart-related, property damage, rude behavior) from other City bureaus/departments, Aldermen, area manager. Subordinate supervisors do not know what to expect or are unsure how to proceed. Does not take notes during the monthly District-Area Managers meeting, thus not conmunicating the essence of the meeting to subordinate supervisors. Returns a call in a timely manner to a citizen or an Alderman on a matter relating to a Sanitation practice or procedure. Fails to inform upper management that crews are unable to collect garbage due to unpassable alleys. Conducts monthly meetings with supervisors in the district insuring broader downward communication. Stifles communication with subordinates and has a poor listening ability. Tactfully handles a citizen complaint regarding the use of carts for garbage collection. During an ice control, notifies headquarters that a street has drifted shut and he has taken action to plow. Does not inform main office of action taken during an ice control. Calls Water Department and main office in case of water main breaks. Allots inadequate number of vacations/weeks due to lack of communication. Does not communicate ideas for policy change to area manager. Holds a “pre-season" meeting with staff to discuss procedures and M.0.'s. Meets with supervisors when new routes are being changed over in the area to permit as few complications as possible. Takes detailed notes at monthly staff meeting to ensure that information will be relayed accurately to subordinate supervisors. 135. ___136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 1'77 SUPERVISION Spot-checks routing for collection of bulky items. Maintains inventory control of supplies. Does not ensure that crew picks up special collection when warranted. Meets with subordinate supervisors regularly. Backs up and reinforces subordinate supervisors‘ judgement when questioned (if appropriate). Checks salt application rates (# of lbs./mile). Does not allow subordinate supervisors to make decisions on their own. Does not reprimand or send home an intoxicated employee brought into District Headquarters by employee's supervisor. Ensures that litter cans are empty on Friday. Shows a lack of trust and respect in subordinates. "Subordinates are treated as pawns to be used-discarded!“ Checks daily progress reports. Investigates citizen complaint regarding abusive language of crew member before giving disciplinary action. Sits in the office and does not get involved in the operation during a plowing operation. Assigns plowing routes to drivers he favors, resulting in drivers being unfamiliar with the streets and streets remaining unplowed. Allows collection routes to fall short of weekly collection objectives. Does not ask for/require recommendations from line supervisors regarding disciplinary actions. Delegates responsibility for snow plowing operations or goes home. Reprimands a subordinate supervisor for referring to a female employee in a derogatory manner. Fails to monitor salt application rates. Delegates responsibility of reworking salt routes to line supervisors, giving them guidelines and monitoring the operation. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 178 Makes periodic checks during crew coffee and lunch breaks to deter- mine if bureau policy in this area is being carried out. Makes adjustments in salt application rates as necessary. Inappropriately advises employee to file a grievance even though employee's action clearly violates bureau policies. Instructs supervisors to inform all employees of job opening and especially encourages minorities to apply. Follows up on an assignment delegated to a subordinate to make sure that the job was completed and done properly. Does not check daily progress reports. Does not refer employee to the Employee Assistance Program, ignoring obvious employee problems that he was aware of. Justifies and receives bureau permission to issue "Favorable Occurances" to deserving collection crews. Ensures nuinflywnce of accurate records during snow plowing operations (e.g. drivers' time sheets, payroll records, emergency personnel hours). Violates union contract by having non-union personnel perform duties done regularly by union personnel. Allows a Sanitation crew to take a mid-morning break less than one hour after starting time. Ensures that procedures are consistently and uniformly applied. Asks employee if he would be able to move the time of a dental appointment to late afternoon to avoid loss of his services. Corrects the directions of a subordinate supervisor which created an unsafe situation for the collection crew. Ensures that the schedule for biweekly maintenance checks for garbage trucks is in the truck, in the immediate supervisor's route book and on the District Manager's calendar. Leaves meeting weekly collection schedules to subordinate supervisors. Does not follow a course of progressive discipline, allowing an em- ployee who was 10 minutes late to work to begin working without giving him any disciplinary action. Tolerates or endorses drinking in the field office at the end of the day allowing employees to wind down. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 179 Spot-checks performance and accuracy of first line supervisors and reports progress of crew. Does not take action to correct long breaks taken by crew on a daily basis. Does not immediately correct the unsafe work habits of an employee which results in a fellow employee getting injured. Does not check the total hours from an employee's time sheet against the actual hours via the tacograph. Requires recommendations from line supervisors regarding disciplinary action. Fails to review previous disciplinary action resulting in an inappropriately lenient action being taken. Gives incomplete orders and instructions to subordinates and tries to hold them accountable. Does not check back to ensure promised action (as result of a complaint) was carried out. Allows unauthorized use of supplies. Does not check daily weight sheets, resulting in a truck with low weight continuing to go to the transfer sunfion twice per day when once would have been sufficient. Based on a citizen complaint, corrects behavior of subordinate supervisor. Follows guidelines/policies of progressive discipline, avoiding potential grievances. Corrects improper procedures followed by clerk on radio and instructs clerk on how situation should have been handled. Makes an impromptu visit with a different garbage collection crew every day to inspect and develop rapport with the crew. Spends several days in the field observing collection crews to ensure that directives are being carried out. . Assumes control from start to finish of snow plowing operation. Fails to check up to see if an assignment is complete after delegating it to someone. Does not make adjustments in salt application rates. Investigates employee complaint and handles on a one-on-one basis. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 180 Ensures that correct information is being given out. Directs and monitors activity of emergency personnel from other bureaus. Takes no action when complaint being mishandled. Favors specific first line supervisors in district. Checks to determine if all work is completed before operation is ended. Makes arrangements for injured county worker to receive medical attention and transportation to the hospital. Utilizes supplemental supervisors appropriately. Approves a written warning for an employee without first checking the employee's record. Ends operations before verifying if all work is complete. Is not consistent or uniform in applying procedures. Remains knowledgeable of status of equipment and progress of operation during snowplowing. Follows up and completes investigation based upon a citizen complaint about a supervisor before taking action. Takes time to meet with an employee who has drinking problems, his family and priest to insist on formal treatment to prevent the employee from being fired. Does not use supplemental supervisors appropriately. (Gives them larger or more difficult routes than they can handle, lets them sit around, or doesn't use them at all.) Fails to be compassionate, not allowing subordinate supervisor to take 2 hours off due to his child having been in an accident. Shows empathy to terminally ill employee by disregarding disciplinary action ordered by another district manager. Utilizes the Employee Assistance Program to save an employee's life. Shows empathy for employee in a predicament, going to help him when he had car trouble on a freeway on his way to work. Allows the mid-morning rest period and/or paid lunch hour to exceed what is specified in the union contract. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 181 TRAINING Fails to train first line supervisors resulting in wasting time answering questions. Leaves policy training to subordinates' peers. Knows why training is necessary. Ensures that subordinates can anticipate problems. Shares training opportunities with employees. Does not keep subordinates informed of training opportunities. Encourages supplemental training for subordinates. Personally trains subordinates in Bureau policies and procedures. Trains subordinate supervisor to handle manpower, equipment, resolve problems and citizen complaints. Ensures that transfer subordinates from the Bureau know its policies and procedures. Does not show subordinates how to make adjustments in the salt application rates. Delegates nuts-and-bolts training to subordinate supervisors. Talks to new supervisor about how to deal with employees, providing the supervisor with a better idea of how to get results without ordering subordinates around. Explains to new supervisor how to handle a water main break problem. Trains subordinates on how to make adjustments in the salt application rates. Subordinates transferred lack training in Bureau policies and procedures. Ensures that subordinates are familiar with list of streets prone to develop drifting problems. Lets subordinates learn job on their own. Trains subordinates to take his/her place if absent, giving authority along with responsibility. Communicates with subordinates to know their training needs. Fails to take responsibility for training subordinate supervisor I, 232. 233. 234. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 182 turning job of orientation and training over to a sanitation laborer. Does not train subordinates to investigate complaints, resulting in crew picking up hazardous material. Discourages or avoids discussion of supplemental training. PROFESSIONALISM/DEDICATION Responds inappropriately when no collection for 3 weeks--response: we'll pick it up next time. Refuses to take ownership for policy responsibility ("downtown says”). Takes ownership and responsibility for policies. When garbage not picked up and street under construction, his/her response is "we can't get in". Minimizes importance of complaints. Contacts duty area manager when off duty to inform manager of freezing rain that had started to fall near his home. Does not support a new collection program that is being field tested, therby not providing a true indication of what the program could do. Has not used any sick leave for 12 straight years which indirectly attributed to other supervisors in that district being reluctant to use sick leave. Does not comply with several bureasu policies, distorts policies and is unwilling to make efforts involved in policy changes. Voices disapproval of policies outside of staff meetings. Does not provide assistance when asked what to do concerning a situation involving sexual harrassment in another bureau. Supports the Bureau's affirmative action plan, explaining it to subordinates. Arrives late for a plowing intoxicated. End result is delayed plowing activities. Is available day and night for call. Treats complaints as problems to be solved. 249. 250. 251. 252. 253. 254. 255. 256. 183 Shows approval/support of policies when outside staff meetings. Unavailable at certain times. Investigates a citizen complain regarding vehicle used by another bureau and handles in an effective manner. Handles complaints differently based on where located. Ensures that catch basins stay open in case of water main breaks. Volunteers use of special service crew and truck to other districts after rapid drop-off of special service requests. Is intoxicated during working hours setting a bad example for subordinates and being unable to perform duties. When brush not picked up, his/her response is "It's too large for us to handle." APPENDIX M SANITATION DISTRICT MANAGER DIMENSIONS PLANNING ANP ORGANIQING SKILLS The Sanitation District Manager must be able to plan and organize work effectively. This involves the ability to plan ahead in order to attain specific goals or accomplish defined tasks. It involves the ability to set meaningful priorities, to manage time and resources effectively and to meet established deadlines. ANALYZING AND DECISION-MAKING SKILLS The Sanitation District Manager often encounters situations requiring immediate analysis and decision-making skills. This involves the ability to identify problems, obtain relevant information and identify possible causes of problems; develop alternative solutions to problems; and determine the possible effects of an action. It also involves the ability to choose from among alternatives based on the facts of the situation and to commit oneself to a course of action. ORAL COMMUNICATION AND INTERPERSONAL SKILLS The Sanitation District Manager must be able to speak clearly and communicate ideas and information to others in both one-to-one and group situations such as discussing how a snow emergency will be handled. In addition, he or she must possess good interpersonal skills and must be persuasive in dealing with people such as irate citizens. This involves the ability to interact effectively with Aldermen, citizens, supervisors, subordinates, peers and staff of other departments. It also involves the ability to see more than one side of an issue, to work cooperatively and to obtain the trust and confidence needed to reach agreement. SUPERVISORY SKILLS The Sanitation District Manager must be able to supervise subordinate sanitation supervisors and sanitation laborers. This involves the ability to direct district operations, assign work and delegate work to lower level supervisors, support and motivate staff, check to see if work is done properly and evaluate staff and discipline when necessary. TRAINING ABILITY The Sanitation District Manager must have the ability to train or provide training for subordinate supervisors and sanitation laborers. This involves recognizing the need for training, a commitment to providing training opportunities for others and the ability to clearly explain policies and procedures. 184 185 PROFESSIONALISM/DEDICATION The Sanitation District Manager must act as a professional who is dedicated to his/her job. This involves the willingness to accept responsibility, to support department policies, to work beyond one's job description if necessary, to set high goals for one's performance, to strive for accuracy and thoroughness in One's approach to work, to exhibit a positive attitude and to set a good example for others. WRITTEN compmcmon SKIL_I_._S_ The ability to effectively and accurately communicate ideas and information to others in writing. Involves clarity and conciseness, use of appropriate vocabulary and grammar, appropriate punctuation and acceptable business style. PLEASE NOTE: It is very important to read these instructions very carefully before 1. APPENDIX N CITY OF MILWAUKEE ACHIEVEMENT HISTORY OUESTIONAIRE FOR SANITATION DISTRICT MANAGER GENERAL INSTRUCTIONS you begin to complete the questionnaire. Please answer the sets of questions which appear on the attached pages. Each set of questions relates to an important dimension of the Sanitation District Manager position. The questions ask you to describe what you consider to be your major gchievgment(sz which would demonstrate that you possess the job- related knowledge. skills or abilities identified. In other words. we are looking fer concrete examples of things you have actually done or accomplished pertaining to each dimension rather than a general description of your skills. These achievements may be either specific incidents or examples of sustained high performance over a period of time. Examples of achievements demonstrating skills related to the Planning and Organizing and Analyzing and Decision Making dimensions are attached. Ybu should select your strongest achievements regardless of where they were attained. YOU need not restrict yourself to those attained as a Sanitation Supervisor. Please write as neatly as possible in black ink. Attach additional pages as necessary. Your responses will be photocopied for distribution to a panel of raters. You are asked to describe one achievement for each of the six dimensions in the questionnaire: however. if you would like. you may describe more than one. If 50. attach a separate sheet with the same format. PLEASE READ AND SIGN THE FOLLOWING STATEMENT AND RETURN WITH THE QUESTIONNAIRE I certify that all information provided herein relating to my own achievements and experience is true to the best of my knowledge. and that the information can be verifiedthrough persons I have so listed in the questionnaire. I understand that falsification of information may result in disqualification for this examination. DATE SIGNATURE 186 187 EXAMPLE This is an example of an achievement demonstrating skills related to the planning and organizing dimension. Example Dimension: Planning and Organizing The Sanitation District Vanager must be able to plan and organize work eltectiveiv. This involves the ability to plan ahead in order to.mtain specific goals or accomplish defined tasks. It involves the ability to set meanianui priorities. to manage time and resources effectively and to meet established deadlines. l. Describe an achievement which would show that you have the ability to plan and organize. Tell us what vou actuallv did. and include the Objective or the problem. - While working as a Sanitation District Manager, l was reSponsiblo for weed—cutting in my district. We operated from a card iile of weed-cutting locations, but the cards were out-of-date with some including locations no longer to be cut and the cards didn't indicate the most efficient route for the tractor mowers. \lso I either had to make out a daily list for the mower operator or give him the loose cards. This made keeping records for charging for weed cutting difficult. I investigated the locations to see which still needed to be cut and developed the best route for the mower operator to take. i also set up a weed cutting book which included all Information about each lot including a diagram and which provided a record of when each lot was cut for charging purposes. 2. 188 What was the outcome or result? Weed-cutting became more efficient because this action: 1) Removed locations that no longer were to be cut from the list. 2) Cut down on travel time for mowers by routing locations. 3) Provided easy reference for any questions regarding the locations, including cutting charges. and 4) Cut the time spent by the supervisor making out daily lists. Was this achievement entirely due to you? Yes X No If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 2 a) When did this achievement take place (approximate date)? Summer, 1984 (b) In what district or far what employer? City of Milwaukee, Bureau of Sanitation, Central Area I Please give the name and title of someone who can verify this information. Joseph Jones Central Sanitation Area Manager 189 EXAMPLE This is an example of an achievement demonstrating skills related to the analyzing and decision-making dimension. Example Dimension: Analyzing and Decision—Making The Sanitation District Manager often encounters situations requiring immediate analysis and decision-making skills. lhis involves the ability to identify problems. obtain relevant information and identify possible causes of problems; develop alternative solutions to problems: and determine the possible effects of an action. It also involves the ability to choose from among alternnti es based on the facts of the situation and to commit oneself to a course of action. l. Describe an achievement which would show that you have the ability to analyze situations and make good decisions. Tell us what vou accuallv did. and include the objective or the problem. While working as a Sanitation District Manager for the City of Milwaukee, [ was responsible for bulky item collection as well as regular household garbage collection in my district. During a time of heavy demand for bulky item collection, which is done by a separate crew, l was able to reduce a growing number of requests efficiently by separating brush stops from furniture stops and having the regular garbage collection crew also collect furniture. There had been a lot of tree waste to be collected during the month of July because of the severe thunderstorm which had swept through the Milwaukee area. Bulky collection requests mounted. This is also a typically high volume period for household garbage collection. However, I thought that the collection crews could still afford to help out in this situation and asked my super- visors to give two furniture stops to each crew on a daily basis until such time as the requests could be handled reasonably well by the bulky collection crew. in the meantime, [ had the bulky truck concentrate on the more time-consuming brush pick ups. [ was able to reduce the number of bulky collection requests in a significantly shorter amount of time than other districts and was able to lend a hand to a neighboring district which had a greater quantity of requests to handle. 190 2. What was the outcome or result? I was able to handle the situation without requesting additional resources, which cost money. I also prevented complaints of delayed pickup of the bulky items either by the individual citizen who made the request or the Alderman’s office who sometimes forwards complaints of this nature. 3. Was this achievement entirely due to you? Yes X Nb If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 2 I was solely responsible for analyzing the situation and making the decision. However the lower level supervisors and sanitation laborers carried it out. 4. a) When did this achievement take place (approximate date)? This occurred during July and August, 1984. (b) In what district or for what employer? City of Milwaukee, Bureau of Sanitation, South Area 2. 5. Please give the name and title of someone who can verify this information. Joseph Jones South Sanitation Area Manager 191 Planning and Organizing Skills The sanitation District Manager must be able to plan and organize work effectively. This involves the ability to plan ahead in order to attain specific goals or accomplish defined tasks. It involves the ability to set meaninngl priorities. to manage time and resouces effectively and to meet established deadlines. 1. Describe an achievement which would show that you have the ability to plan and organize. Tell us what you actually did. and include the objective or the problem. 192 2. What was the outcome or result? 3. we: this achievement entirely due to you? Yes Nb If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 4. a) When did this achievement take place (approximate date)? (b) In what district or fbr what employer? 5. Please give the name and title of someone who can verify this infbrmation. II. 193 Analyzing and Decision-Making Skills The sanitation District Manager often encounters situations requiring immediate analysis and decision-making skills. This involves the ability to identify problems. obtain relevant information and identify possible causes of problems: develop alternative solutions to problems: and determine the possible effects of an action. It also involves the ability to choose from among alternatives based on the facts of the situation and to commit oneself to a course of action. 1. Describe an achievement which would show that you have analyzing and decision-making skills. Tell us what you actually did. and include the objective or the problem. 2. 5. 194 What was the outcome or result? Was this achievement entirely due to you? Yes Nb If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 2 a) When did this achievement take place (approximate date)? (b) In what district or for what employer? Please give the name and title of someone who can verify this information. III. 195 Oral Communication and Interpersonal Skills The Sanitation District Manager must be able to speak clearly and communicate ideas and information to others in both one-to-one and group situations such as discussing how a snow emergency will be handled. In addition. he or she must possess good interpersonal skills and must be persuasive in dealing with people such as irate citizens. This involves the ability to interact effectively with Aldermen. citizens. supervisors. subordinates. peers and staff of other departments. It also involves the ability to see more than one side of an issue. to work cooperatively and to obtain the trust and confidence needed to reach agreement. 1. Describe an achievement which would show that you have oral communica- tion and interpersonal skills. Tell us what vou actuallv did. and include the objective or the problem. 2. 3. 5. 196 What was the outcome or result? Was this achievement entirely due to you? Yes Nb If no. what is the estimated percentage of this achievement which is due to the efferts of other people? a) When did this achievement take place (approximate data)? (b) In what district or for what employer? Please give the name and title of someone who can verify this information. IV. 197 Supervisory Skills The sanitation District Manager must be able to supervise subordinate sanitation supervisors and sanitation laborers. This involves the ability to direct district operations. assign work and delegate work to lower level supervisors. support and motivate staff. check to see if work is done properly and evaluate staff and discipline when necessary. 1. Describe an achievement which would show that you have supervisory skills. Tell us what you actually did. and include the objective or the problem. 2. 198 What was the outcome or result? Was this achievement entirely due to you? Yes Mo If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 2 a) When did this achievement take place (approximate date)? (b) In what district or for what employer? Please give the name and title of someone who can verify this information. V. 199 Training Ability The sanitation District Manager must have the ability to train or provide training fer subordinate supervisors and sanitation laborers. This involves recognizing the need for training. a commitment to providing training opportunities for others and the ability to clearly explain policies and procedures. 1. Describe an achievement which would show that you have the ability to train subordinates. Tell us what you actually did. and include the objective or the problem. 200 2. What was the outcome or result? 3. Was this achievement entirely due to you? Yes No If no. what is the estimated percentage of this achievement which is due to the efforts of other people? 2 4. a) When did this achievement take place (approximate date)? (b) In what district or for what employer? 5. Please give the name and title of someone who can verify this information. VI. 201 Professionalism/Dedication The sanitation District Manager must act as a professional who is dedicated to his/her job. This involves the willingness to accept responsibility. to support department policies. to work beyond one's job description if necessary. to set high goals for one's perfbrmance. to strive for accuracy and thoroughness in one's approach to the work. to exhibit a positive attitude and to set a good example for others. 1. Describe an achievement which would show that you act as a professional who is dedicated to his/her job. Tell us what you actually did. and include the objective or the problem. 202 2. What was the outcome or result? 3. Was this achievement entirely due to you? Yes Ne If no. what is the estimated percentage of this achievement which is due to the efforts of other people? Z 4. a) When did this achievement take place (approximate date)? (b) In what district or for what employer? 5. Please give the name and title of someone who can verify this information. APPENDIX 0 RATER EVALUATIONS We appreciate your serving as a rater for the sanitation District Manager examination. Your answers to the following questions will be used to improve our selection process. I. Do you consider it fair to use accomplishment ratings to assess candidates on job dimensions? A. very fair 8. Fair C. somewhat fair D. Unfair E. Very unfair 2. Do you consider it fair to use accomplishment ratings to determine which appli- cants will participate in an oral examination? A. Very fair B. Fair C. somewhat fair D. Unfair E. Very unfair 3. Do you consider it fair to use accomplishment ratings as a weighted part of the exam process? A. Very fair B. Fair C. somewhat fair D. Uhfair E. Very unfair 4. Do you consider it fair to use accomplishment ratings to determine final rankings in the examination process? A. Very fair B. Fair C. somewhat fair D. Unfair E. Very unfair 5. How effective do you consider accomplishment ratings in assessing candidates on the job dimensions? A. Very effective B. Effective C. somewhat effective 0. Ineffective E. Very ineffective 6. How effective do you consider accomplishment ratings in determining the best qualified applicants to be called for an oral examination? A. Very effective 8. Effective C. somewhat effective D. Ineffective E. Very ineffective 7. How effective do you consider accomplishment ratings as a weighted part of the exam process? . Very effective . Effective somewhat effective Ineffective Very ineffective WDPDJ}. 203 204 How effective do you consider accomplishment ratings in determining the best qualified applicants for a job? A. Very effective B. Effective C. somewhat effective D. Ineffective E. Very ineffective How difficult did you consider the accomplishments to rate? A. very easy 8. Easy C. somewhat difficult D. Difficult E. Very difficult How reasonable do you consider the time you spent rating accomplishments for a position of this level and type? A. Very reasonable 8. Reasonable C. somewhat reasonable D. Unreasonable E. Very unreasonable APPENDIX P EATER EVALUATIONS We appreciate your serving on the oral examination panel for the position of Sanitation District Manager. As you recall, the examination involved rating candidates on individual dimensions such as Planning G Organizing Skills. Analytical and Decision-Making Skills, Supervisory Skills. etc. Your answers to the following questions will be used to improve‘our selection process. 1. Do you consider it fair to use oral examination ratings to assess candidates on job dimensions? A. very fair 3. Fair . C. Somewhat fair D. Unfair E. very unfair 2. Do you consider it fair to use oral dimension ratings as a weighted part of the exam process? l A. Very fair 3. Fair . C. Somewhat fair D. Unfair E. very unfair a. Do you consider it fair to use oral dimension ratings to determine final rankings in the examination process? A. Veiy fair B. Fair C. Somewhat fair D. Unfair E. Very unfair 4. How effective do you consider oral examination ratings in assessing candidates on the job dimensions? A. Very effective 3. Effective C. Somewhat effective D. Ineffective E. Very ineffective 5. How effective do you consider oral dimension ratings as a weighted part of the exam process? A. Very effective 3. Effective C. Somewhat effective 0. Ineffective E. Very ineffective 6. How effective do you consider oral dimension ratings in determining the best qualified applicants for a job? A. Very effective B. Effective C. Somewhat effective D. Ineffective E. Very ineffective 205 206 7. How difficult did you consider the oral dimensions to rate? A. very easy B. Easy C. Somewhat difficult D. Difficult E. very difficult 8. How reasonable do you consider the time you spent rating oral dimensions for a position of this level and type? A. Very reasonable B. Reasonable C. Somewhat reasonable D. Unreasonable E. Very unreasonable 9. Please add any comments in the space below. APPENDIX Q Significance Tests for Differences Between Oral and Behavioral Consistency Reliability Coefficients for Dimension and Overall Reliabilities for the Sanitation District Manager Study r1 r2 r12 W t D1 .88 .80 .58 1.67 1.10 D2 .89 .61 .29 3.55 2.44* D3 .87 .73 .62 2.08 1.65 D4 .76 .68 .74 1.33 .74 D5 .80 .54 .31 2.30 1.56 D6 .84 .88 .62 .75 —.64 O .90 .86 .67 1.40 .79 Note. D1 = Planning and Organizing; D2 = Analyzing and Decision-making; D3 = Oral Communication/Interpersonal Skill; D4 = Supervision; D5 = Training; D6 = Profession— alism/Dedication; O = Overall Score; column definitions are as follows: r1 = oral reliability based on average of four raters, r2 = behavioral consistency reliability, r12 = correlation between oral and behavioral consistency ratings, w = (l-r2)/(l-rl), _ _ _ 1/2 _ 2 1/2 tN-2— (W 1)(N 2) / [4W(1 r12)] , and N = sample size. * p < .05, df = 12, two—tailed. 207 B I BLI OGRAPHY BIBLIOGRAPHY Anderson, C. W. (1960). The relation between speaking times and decision in the employment interview. Journal of Applied Psychology, 44, 267-268. Ash, R. A. (1984, May). The activity/achievement indi- cator: A possible alternative to the behavioral consis- tency method of training and experience evaluation. Paper presented at the annual conference of the Inter- national Personnel Management Association Assessment Council, Seattle, WA. Ash, R. A. (1983, May). A comparative study of the behav- ioral consistency and wholistic judgment methods of job applicant training and work experience evaluation. Paper presented at the meeting of the International Personnel Management Association Assessment Council, Washington, DC. Ash, R. A., & Levine, E. L. (1982, August). Job applicant training and work experience evaluation: An empirical investigation. Paper presented at the meeting of the American Psychological Association, Washington, DC. Arvey, R. D., & Campion, J. E. (1982). The employment interview: A summary and review of recent research. Personnel Psychology, 55, 281—322. Barbee, J. R., & Keil, E. C. (1973). Experimental techniques of job interview training for the disadvantaged: Video- tape feedback, behavior modification and microcounseling. Journal of Applied Psychology, 55, 209-213. Baron, R. A. (1983). "Sweet smell of success"? The impact of pleasant artificial scents on evaluations of job appli- cants. Journal of Applied Psychology, 58, 709-713. Baskett, G. D. (1973). Interview decisions as determined by competency and attitude similarity. Journal of Applied Psychology, 51, 343-345. 208 209 Blakeney, R. N., & Mac Naughton, J. F. (1971). Effects of temporal placement of unfavorable information on deci— sion making during the selection interview. Journal of Applied Psychology, 55, 138—142. Bolster, B. I., & Springbett, B. M. (1961). The reaction of interviewers to favorable and unfavorable information. Journal of Applied Psychology, 45, 97-103. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait—multimethod matrix. Psychological Bulletin, 55,81-105. Cann, A., Siegfried, W. D., & Pearce, L. (1981). Forced attention to specific applicant qualifications: Impact on physical attractiveness and sex of applicant biases. Personnel Psychology, 54, 65-75. Cardy, R. L., & Kehoe, J. F. (1984). Rater selective attention ability and appraisal effectiveness: The effect of a cognitive style on the accuracy of differentiation among ratees. Journal of Applied Psychology, 69, 589—594. — Carlson, R. E. (1967). Selection interview decisions: The relative influence of appearance and factual written in- formation on an interviewer's final rating. Journal of Applied Psychology, 55, 461—468. Carlson, R. E. (1970). Effect of applicant sample on ratings of valid information in an employment setting. Journal of Applied Psychology, 54, 217—222. Carlson, R. E. (1971). Effect of interview information in altering valid impressions. Journal of Applied Psychology, 55, 66-72. Carlson, R. E., Thayer, P. W., Mayfield, E. C., & Peterson, D. A. (1971, April). Improvements in the selection interview. Personnel Journal, 268-317. Cash, T. F., Gillen, B., & Burns, D. S. (1977). Sexism and "Beautyism" in personnel consultant decision making. Journal of Applied Psychology, 55, 301—310. Cohen, S. L., & Bunker, K. A. (1975). Subtle effects of sex role stereotypes on recruiters' hiring decisions. Journal of Applied Psychology, 55, 566—572. Davey, B. (1984, May). Are All Oral Panels Created Equal? A Study of Differential Validity Across Oral Panels. Paper presented to the International Personnel Management Association Assessment Council, Seattle, WA. 210 Dipboye, R. L., Fontenelle, G. A., & Garner, K. (1984). Ef- fects of previewing the application on interview process and outcomes. Journal of Applied Psychology, 55, 118-128. Dipboye, R. L., Fromkin, H. L., & Wiback, K. (1975). Rela- tive importance of applicant sex, attractiveness, and scholastic standing in evaluation of job applicant res- umes. Journal of Applied Psychology, 55, 39-43. Dunnette, M. D. (1962). Personnel management. Annual Review of Psychology, 45, 285-314. Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 45, 407-424. Farr, J. L. (1973). Response requirements and primacy- recency effects in a simulated selection interview. Journal of Applied Psychology, 51, 228—232. Farr, J. L., & York, C. M. (1975). Amount of information and primacy—recency effects in recruitment decisions. Personnel Psychology, g5, 233-238. Fay, C. H., & Latham, G. P. (1982). Effects of training and rating scales on rating errors. Personnel Psychology, 55, 105-116. Feldt, L. S. (1980). A test of the hypothesis that Cronbach's alpha reliability coefficient is the same for two tests administered to the same sample. Psychometrika, 45, 99-105. Ferris, G. R., & Gilmore, D. C. (1977). Effects of mode of information presentation, sex of applicant, and sex of interviewer on simulated interview decisions. Psychological Reports, 45, 566. Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 54, 327-358. Forsythe, S., Drake, M. F., & Cox, C. F. (1985). Influence of applicant's dress on interviewer's selection decisions. Journal of Applied Psychology, 15, 374-378. Frank, L. L., & Hackman, J. R. (1975). Effects of inter— viewer-interviewee similarity on interviewer objectivity in college admissions interviews. Journal of Applied Psychology, 55, 356-360. 211 Hakel, M. D. (1971). Similarity of post—interview trait rating intercorrelations as a contributor to interrater agreement in a structured employment interview. Journal of Applied Psychology, 55, 443-448. Hakel, M. D., Dobmeyer, T. W., & Dunnette, M. D. (1970). Relative importance of three content dimensions in overall suitability ratings of job applicants' resumes. Journal of Applied Psychology, 54, 65-71. Hakel, M. D., Ohnesorge, J. P., & Dunnette, M. D. (1970). Interviewer evaluations of job applicants' resumes as a function of the qualifications of the immediately preced- ing applicants: An examination of contrast effects. Journal of Applied Psychology, 54, 27—30. Heilman, M. E. (1980). The impact of situational factors on personnel decisions concerning women: Varying the sex composition of the applicant pool. Organizational Behavior and Human Performance, 25, 386—395. Heilman, M. E., & Saruwatari, L. R. (1979). When beauty is beastly: The effects of appearance and sex on evaluations of job applicants for managerial and nonmanagerial jobs. Organizational Behavior and Human Performance, g3, 360—372. Heneman, H. G. III, Schwab, D. P., Huett, D. L., & Ford, J. L. (1975). Interviewer validity as a function of interview structure, biographical data, and interviewee order. Journal of Applied PsychOlOgy, 60, 748-753. _ Hollandsworth, J. G., Jr., Dressel, M. E., & Stevens, J. (1977). Use of behavioral versus traditional procedures for increasing job interview skills. Journal of Applied Psychology, 24, 502-510. Hollandsworth, J. G., Jr., Kazelskis, R., Stevens, J., & Dressel, M. E. (1979). Relative contributions of verbal, articulative and nonverbal communication to employment decisions in the job interview setting. Personnel Psychology, 53, 359-367. Hough, L. M. (1984). Development and evaluation of the "Ac- complishment Record" method of selecting and promoting professionals. Journal of Applied Psychology, 55, 135-146. Imada, A. S., & Hakel, M. D. (1977). Influence of nonverbal communication and rater proximity on impressions and decisions in simulated employment interviews. Journal of Applied Psychology, 53, 295—300. 212 Ivancevich, J. M., & Smith, S. V. (1981). Goal setting in- terview skills training: Simulated and on—the—job analy- ses Journal of Applied Psychology, 55, 697-705. Jacobs, R., Kafry, D., & Zedeck, S. (1980). Expectations of behaviorally anchored rating scales. Personnel Psychology, 55, 595-640. James, S. P., Campbell, I. M., & Lovegrove, S. A. (1984). Personality differentiation in a police—selection interview. Journal of Applied Psychology, 69, 129—134. ‘— Janz, T. (1982). Initial comparisons of patterned behavior description interviews versus unstructured interviews. Journal of Applied Psychology, 51, 577—580. Johnson, J. C., Guffey, W. L., & Perry, R. A. (1980, July). When is a T and E rating valid? Paper presented at the annual conference of the International Personnel Management Association Assessment Council, Boston, Ma. Keenan, A. (1978). The selection interview: Candidates' reactions and interviewers' judgements. British Journal of Social and Clinical Psychology, 17, 201—209. _— King, M. R., & Manaster, G. J. (1977). Body image, self- esteem, expectations, self—assessments and actual success in a simulated job interview. Journal of Applied Psychology, 51, 589-594. Kopelman, M. D. (1975). The contrast effect in the selection interview. British Journal of Educational Psychology, 45, 333-336. Landy, F. J., & Bates, F. (1973). Another look at contrast effects in the employment interview. Journal of Applied Psychology, 55, 141-144. Langdale, J. A., & Weitz, J. (1973). Estimating the influ- ence of job information on interviewer agreement. Journal of Applied Psychology, 51, 23-27. Latham, G. P., Fay, C. H., & Saari, L. M. (1979). The devel- opment of behavioral observation scales for appraising the performance of foremen. Personnel Psychology, 51, 299—311. Latham, G. P., & Saari, L. M. (1984). Do people do what they say? Further studies on the situational interview. Journal of Applied Psychology, 55, 569-573. 213 Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A. (1980). The situational interview. Journal of Applied Psychology, 55, 422-427. Latham, G. P., Wexley, K. N., & Pursell, E. D. (1975). Training managers to minimize rating errors in the observation of behavior. Journal of Applied Psychology, 55, 550-555. Leonard, R. L., Jr. (1974). Relevance and reliability in the interview. Psychological Reports, 54, 1331—1334. London, M., & Hakel, M. D. (1974). Effects of applicant stereotypes, order and information on interview impres- sions. Journal of Applied Psychology, 55, 157-162. Maas, J. B. (1965). Patterned scaled expectation interview: Reliability studies on a new technique. Journal of Applied Psychology, 45, 431-433. Mayfield, E. C. (1964). The selection interview—-A re—evalu- ation of published research. Personnel Psychology, 11, 239-260. Mayfield, E. C., Brown, S. H., & Hamstra, B. W. (1980). Selection interviewing in the life insurance industry: An update of research and practice. Personnel Psychology, 55, 725-739. McDonald, T., & Hakel, M. D. (1985). Effects of applicant race, sex, suitability, and answers on interviewer's questioning strategy and ratings. Personnel Psychology, 55, 321-334. McIntyre, 8., Moberg, D. J., & Posner, B. Z. (1980). Pre— ferential treatment in preselection decisions according to sex and race. Academy of Management Journal, 15, 738-749. Moore, L. F., & Lee, A. J. (1974). Comparability of inter— viewer, group and individual interview ratings. Journal of Applied Psychology, 55, 163-169. Mullins, T. W. (1982). Interviewer decisions as a function of applicant race, applicant quality and interviewer prejudice. Personnel Psychology, 55, 163-174. Okanes, M. M., & Tschirgi, H. (1978). Impact of the face-to- face interview on prior judgments of a candidate. Perceptual and Motor Skills, 45, 322. 214 Osburn, H. G., Timmreck, C., & Bigby, D. (1981). Effect of dimensional relevance on accuracy of simulated hiring decisions by employment interviewers. Journal of Applied Psychology, 55, 159-165. Pannone, R. D. (1984). Predicting test performance: A content valid approach to screening applicants. Personnel Psychology, 51, 507-514. Parsons, C. K., & Liden, R. C. (1984). Interviewer percep- tions of applicant qualifications: A multivariate field study of demographic characteristics and nonverbal cues. Journal of Applied Psychology, 55, 557—568. Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal of Applied Psychology, 55, 581-588. Rand, T. M., & Wexley, K. N. (1975). Demonstration of the effect, "Similar to Me," in simulated employment inter- views. Psychological Reports, 55, 535-544. Rasmussen, K. G., Jr. (1984). Nonverbal behavior, verbal behavior, resume credentials, and selection interview outcomes. Journal of Applied Psychology, 69, 551-556. ‘— Reynolds, A. H. (1979). The reliability of a scored oral interview for police officers. Public Personnel Management, Sep-Oct, 324-328. Ricchiute, D. N. (1985). Presentation mode, task importance, and cue order in experimental research on expert judges. Journal of Applied Psychology, 15, 367—373. Rosen, B., & Jerdee, T. H. (1976). The influence of age stereotypes on managerial decisions. Journal of Applied Psychology, 51, 428-432. Rosen, B., & Mericle, M. F. (1979). Influence of strong versus weak fair employment policies and applicant's sex on selection decisions and salary recommendations in a management simulation. Journal of Applied Psychology, 54, 435-439. Rothstein, M., & Jackson, D. N. (1980). Decision making in the employment interview: An experimental approach. Journal of Applied Psychology, 55, 271-283. Rowe, P. M. (1963). Individual defferences in selection decisions. Journal of Applied Psychology, 41, 304-307. 215 Rozelle, R. M., & Baxter, J. C. (1981). Influence of role pressures on the perceiver: Judgments of videotaped in- terviews varying judge accountability and responsibility. Journal of Applied Psychology, 55, 437-441. Sackett, P. R. (1982). The interviewer as hypothesis tester: the effects of impressions of an applicant on interviewer questioning strategy. Personnel Psychology, 15, 789—804. Schmidt, F. L., Caplan, J. R., Bemis, S. E., Decuir, D., Dunn, L., & Antone, L. (1979). The behavioral consis— tency method of unassembled examining. (TM—79-21) Washington, DC: Office of Personnel Management. (NTIS NO. PB80-l39942). Schmitt, N. (1976). Social and situational determinants of interview decisions: Implications for the employment interview. Personnel Psychology, 15, 79-101. Schwab, D. P., & Heneman, H. G. III. (1969). Relationship between interview structure and interinterviewer relia— bility in an employment situation. Journal of Applied Psychology, 55, 214—217. Schwab, D. P., Heneman, H. G. III, & DeCotiis, T.A. (1975). Behaviorally anchored rating scales: A review of the literature. Personnel Psychology, 15, 549—562. Simas, K., & McCarrey, M. (1979). Impact of recruiter au- thoritarianism and applicant sex on evaluation and selec- tion decisions in a recruitment interview analogue study. Journal of Applied Psychology, 54, 483—491. Smith, P., & Kendall, L. M. (1963). Retranslation of expec— tations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 41, 149-155. State of Alabama Personnel Office. (1984). Summary of T and E Examinations Data Reported by State and Municipal Jurisdictions. Montgomery, AL. Sterrett, J. (1978). The job interview: Body language and perceptions of potential effectiveness. Journal of Applied Psychology, 55, 388-390. Tengler, C. D., & Jablin, F. M. (1983). Effects of question type, orientation, and sequencing in the employment screening interview. Communication Monographs, 55, 245-263. 216 Tubiana, J. H., & Ben—Shakhar, G. (1982). An objective group questionnaire as a substitute for a personal interview in the prediction of success in military training in Israel. Personnel Psychology, 15, 349-357. Tucker, D. H., & Rowe, P. M. (1977). Consulting the applica— tion form prior to the interview: An essential step in the selection process. Journal of Applied Psychology, 51, 283-287. Tucker, D. H., & Rowe, P. M. (1979). Relationship between expectancy, causal attributions and final hiring deci- sions in the employment interview. Journal of Applied Psychology, 54, 27-34. Tullar, L., Mullins, T., & Caldwell, S. (1979). Effects of interview length and applicant quality on interview deci- sion time. Journal of Applied Psychology, 64, 669-674. ’— Ulrich, L., & Trumbo, D. (1965). The selection interview since 1949. Psychological Bulletin, 55, 100-116. Valenzi, E., & Andrews, I. R. (1973). Individual differences in the decision process of employment interviewers. Journal of Applied Psychology, 55, 49-53. Vance, R. J., Kuhnert, K. W., & Farr, J. L. (1978). Interview judgments: Using external criteria to compare behavioral and graphic scale ratings. Organizational Behavior and Human Performance, 11, 279-294. Wagner, R. (1949). The employment interview: A critical sum— mary. Personnel Psychology, 1, 17-46. Washburn, P. V., & Hakel, M. D. (1973). Visual cues and verbal content as influences on impressions formed after simulated employment interviews. Journal of Applied Psychology, 55, 137-140. Wexley, K., Sanders, R., & Yukl, G. (1973). Training inter— viewers to eliminate contrast effects in employment in- terviews. Journal of Applied Psychology, 51, 233-236. Wexley, K., Yukl, G., Kovacs, S., & Sanders, R. (1972). Importance of contrast effects in employment interviews. Journal of Applied Psychology, 55, 45-48. Wiener, Y., & Schneiderman, M. L. (1974). Use of job infor- mation as a criterion in employment decisions of inter— viewers. Journal of Applied Psychology, 59, 699-704. '—_ 217 Wright, 0. R. Jr. (1969). Summary of research on the selec- tion interview since 1964. Personnel Psychology, 11, 391-413. MICHIGAN STATE UNIV. LIBRARIES WWIWIWIHH INWWW”WWW 31293007962636