A STUDY OF THE RNFLUENCE OF CERTAIN SELECTED FACTORS ON THE RATINGS OF SPEECH PERFORMANCES Thais far Hm Beam of Ed. D. MECHIGAN‘ STATE COLLEGE Em?! R. Miss?» T955 This is to certify that the thesis entitled A Study of the Influence of Certain Selected Factors on the Ratings of Speech Performances presented by Emil R. Pfister has been accepted towards fulfillment of the requirements for Ed.D. Higher Education degree in Major professor 0-169 M J 3' > \ .‘ D - A STUDY OF THE INFLUENCE OF CERTAIN SELECTED FACTORS ON THE RATINGS OF SPEECH PERFORMANCES By 4 {.1 (z «.4 Q" Emil R? Pfister m A THESIS Submitted to the School of Graduate Studies of Michigan State College of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION School of Education 1955 TN 115;“ ACKNOWLEDGMENTS The writer wishes to express his appreciation to the many people who helped him throughout the de- velopment of this study. . The encouragement and guidance came primarily from Dr. Milosh muntyan, Dr. Wilson B. Paul, and Dr. Barry W. Sundwall. Their cooperation, sympathetic attitgde, and helpful suggestions are deeply appre- c ate . Sincere appreciation also should be given to the Juniors, seniors and faculty of the Department of Speech and Drama, Central Michigan College of Education, who cooperated in the collection of data during the 1952-53 academic year. Furthermore, acknowledgment is given to those of the staff of Michigan State College and the writer's colleagues at Central Michigan College of Education, especially Dr. Wilbur Moore and Dr. Karl Pratt, who were kind enough to answer questions and offer sug- gestions during the progress of the investigation. Invaluable help in setting the data up so that it could be tabulated by IBM equipment was given the writer by Dr. Willard Warrington of the Board of Examiners and Mr. Frank Martin, Supervisor of Tabulation, Michigan State College. This dissertation is dedicated to my wife, Frances Pfister, whose patience, confidence, and de- votion have fostered the continuous concentration necessary to the successful completion of this study. {3.653586 VITA Emil Robert Pfister candidate for the degree of Doctor of Education Final Examination: May 6, 1955 Thesis: A Study of the Influence of Certain Selected Factors on the Ratings of Speech Performances. Outline of Studies: Major subject -- Higher Education Cognate Field -- Speech Biographical Items: Born: January 3, 1913, Chicago, Illinois Undergraduate Studies: Central Michigan College of Education, 1931-35 Graduate Studies: Master of Arts, University of Michigan, 1939 Columbia University, Summer, 1 47 Denver University, Summer, 194 Michigan State College, 1949-55 Experience: Speech Teacher and High School Principal, Kingston, Michigan, 1935-40 High School Principal and Director of Debate, Clare, Michigan, 1940-45 Assistant Professor of Speech, Central Michigan College of Education, Mount Pleasant, Michigan, 1945-53 Associate Professor of Speech, Central Michigan College of Education, Mount Pleasant, Michigan, 1953- Member of: Pi Kappa Delta; Kappa Delta Pi; Speech Association of America; American Associa— tion of University Professors; National Society for the Study of Communication; American Forensic Association; National Education Association A STUDY OF THE INFLUENCE OF CERTAIN SELECTED FACTORS ON THE RATINGS OF SPEECH PERFORMANCES BY." c"\“ (I Emil RSQPfister AN ABSTRACT Submitted to the School of Graduate Studies of Michigan State College of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION School of Education 1955 Approved W M M Emil R. Pfister Thesis Abstract This study was designed to determine whether any sta— tistically significant relationships existed between the ratings given by Speech evaluators and (1) their academic speech training, (2) their acquaintanceship with the speaker, (3) their experience with the rating scale, and (4) their sex in relation to the sex of the speaker. The five hundred and forty-nine speakers who partici— pated in this project were freshmen enrolled in Fundamentals 9; Speech classes at Central Michigan College of Education during the 1952—53 academic year. The fifty—five evaluators (speech faculty members and juniors and seniors who were speech majors or minors) compiled a total of 4392 ratings. Precautions were taken and controls were employed with re— spect to Speaker, speech, audience, and occasion with a view toward making these ratings comparable. The Evaluator's Rating Scale devised for this study employed ten criteria based on a study of existing Speech rating instruments. Appropriate tests of reliability and validity were made. All of the data obtained from these rat— ing scales were transferred to punch cards which permitted sorting and tabulating by IBM methods. 2 Emil R. Pfister (Thesis Abstract Continued) The data were analyzed by appropriate procedures to discover the role played by each of the four selected factors under investigation. Differences of the means were computed for groups that were comparable in all respects except the factor being studied. The "t test" for significance of the difference of the means was applied and coefficients of correlation were computed. The findings of this research led to the conclusion that the academic speech training of the evaluator influences his ratings. Undergraduate evaluators with majors or minors in speech gave significantly higher ratings than did evalua- tors with advanced degrees in speech. Furthermore, scores given by pairs of undergraduate evaluators had a higher cor- relation than did scores given by undergraduate-graduate pairs of evaluators. Pairs of evaluators with advanced de— grees in speech had the highest correlation. The investigation, in itself, provided inconclusive results with respect to the influence of acquaintanceship on the ratings of speech performances. However, the results of this study tend to substantiate the findings of previous research, i.e., that evaluators who are acquainted with the speakers give them higher ratings than do evaluators who are unacquainted with these speakers. 3 Emil R. Pfister (Thesis Abstract Concluded) In this particular study the experience of the evalua- tor with the rating scale employed was found to have no sig- nificant influence upon the scores given. However, all the evaluators had a certain minimum of speech training and had rated speeches previously. The literature and data of this study support the con- tention that male and female evaluators rate male and female speakers differently: (1) Female student evaluators gave higher ratings to both male and female speakers than did male student evalua- tors. (2) Female student evaluators gave higher ratings to male speakers than they gave to female speakers. (3) Male student evaluators gave higher ratings to female speakers than they gave to male speakers. TABLE OF CONTENTS CHAPTER I. II. III. INTRODUCTION . . . . . . . . . . . A. The Problem Statement of the problem . . Importance of the study . . B. Definition of Terms Selected factors . . . . . . Influence . . . . . . . . . Ratings . . . . Intergroup speech project Fundarentals of Speech . . . C. Summary . . . . . . . . . . . REVIEW OF THE LITERATURE . . . . . A. Speech Rating in General . . . B. The Four Factors The academic speech training of the evaluator . . . . The acquaintanceship of the evaluator with the speaker The experience of the evaluator with the rating scale . . The sex of the evaluator in relation to the sex of the speaker C. Summary........... PROCEDURE 0 O O O O O O O O O O O A. Devising the Evaluator's Rating Scale B. Collecting the Data Scheduling the evaluators . Preparing the speakers . . . Preparing the evaluators . . Description of the experimental setting Additional evaluators . . . Checking rater—speaker acquaintance PAGE aw \O (DCDCD'QVI IO 11 NNNNHHp—J \OOOWOCDVL» -¥>- bwwww |-‘ O\OU1 kw CHAPTER IV. C. D. Tabulating the Data Mechanical tabulation Organizing the data . Statistical procedure . . . . . . . summary 0 O O O O O O O O O O O O O 0 ANALYSIS OF THE DATA . . . . . . . . . . . A. D. The Rating Scale validity I O O O O O O O O O O I O 0 Reliability 0 O O O O O O O O O O O The Distribution of Scores Group I (First Semester Students) . Group II (Second Semester Students) Findings Regarding the Four Factors The academic speech training of the evaluator . . . . . . . The acquaintanceship of the evaluator with the speaker . . . . The experience of the evaluator with the rating scale . . . . . The sex of the evaluator in relation to the sex of the speaker . . . . sumary O O O O O O O O O O C O O O 0 CONCLUSIONS AND RECOMMENDATIONS . . . . . A. Principal Findings The academic speech training of the evaluator . . . . . . . The acquaintanceship of the evaluator with the speaker . . . . The experience of the evaluator with the rating scale . . . . The sex of the evaluator in relation to the sex of the speaker . . . . PAGE 42 42 44 46 48 49 53 58 63 66 7O 7O 77 79 80 80 81 82 |lllllllllllllllllllllll CHAPTER PAGE l B. Educational Implications Significance for Central Michigan College of Education . . . . . . . . . . . 83 Significance for education in general . . . . . . . . . . . . . . . . 85 C. Suggestions for Further Study . . . . . . . . 86 D O Smary O O O O O O O O O O O O O O O O O O O 88 BIBLIOGRAPHY O O O O O O O O O O O O O O O O O O O O O O 90 APPENDIX 0 O O O O O O O O O O O O O O O O O O O O O O O 103 LIST OF TABLES TABLE PAGE I. A Comparison of Marks Given by Evaluators with Marks Given the Same Students by Teachers . . . . 52 II. Range, Mean, and Standard Deviation of Total Scores Received by Students Participating in the Intergroup Speech Projects (Group I). . . . . 56 III. Range, Mean, and Standard Deviation of Total Scores Received by Students Participating in the Intergroup Speech Projects (Group II) . . . . 57 IV. Differences Between First and Second Intergroup Speech Project Scores (Group I) . . . . . . . . . 60 V. Total Scores and Academic Ratings Received by Students Participating in the Intergroup Speech Projects (Group I) . . . . . . . . . . . . 61 VI. Differences Between First and Second Intergroup Speech Project Scores (Group II). . . . . . . . . 64 VII. Total Scores and Academic Ratings Received by Students Participating in the Intergroup Speech Projects (Group II). . . . . . . . . . . . 65 VIII. A Comparison of the Mean Scores Given by Faculty and Student EvaluatorS‘Rating the Same Speakers 0 O O O O O O O O O O O 0 O O O 0 O O O 67 TABLE PAGE IX. Scores Given when the Student Evaluator is Acquainted and the Faculty Evaluator is Un- acquainted with the Speaker Compared with the Scores Given when the Faculty Evaluator is Acquainted and the Student Evaluator is Un- acquainted with the Speaker . . . . . . . . . . . 69 X. A Comparison of the Correlation of Ratings Given by Student and Faculty Evaluators Judging the Same Group of Students Three Months Apart . . . . 71 XI. A Comparison of the Mean Scores Given by Faculty and Student Evaluators Rating Female Speakers (Group I) . . . . . . . . . . . . . . . . . . . . 73 XII. A Comparison of the Mean Scores Given by Faculty and Student Evaluators Rating the Male Speakers (Group I). . . . . . . . . . . . . . . . 74 XIII. A Comparison of the Mean Scores Given by Faculty and Student Evaluators Rating Female Speakers (Group II). . . . . . . . . . . . . . . . . . . . 75 XIV. A Comparison of the Mean Scores Given by Faculty and Student Evaluators Rating Male Speakers (Group II). . . . . . . . . . . . . . . . . . . . 76 CHAPTER I INTRODUCTION The speech teacher cannot escape responsibility for the evaluation of the oral performance of his students. His educational philosophy may make academic marks seem undesirable, or he may be disturbed by the influence of subjective factors which impair the validity of such evalua- tion. Nevertheless, the practical necessities of the learn— ing situation, as well as customary institutional procedures, require that he evaluate the speech competency of his students. Research has shown that students believe that rating speeches is of primary importance in a Fundamentals pf Speech course. Graunkel, who administered student judgment questionnaires to 1,024 Fundamentals pf Speech students in four different universities, secured data showing that oral work was consistently judged by students to be of more value than reading assignments and written work. lDean F. Graunke, "The Use of Student Opinion in the Improvement of Instruction in Speech Fundamentals," (unpub- lished Master's thesis, The University of Nebraska, Lincoln, 1951), pp. 94-126. Reid2, in discussing computation of final grades for beginning speech classes, advocates giving two or three times as much weight to oral work as to written work. Hollister3 points out the need for good judging when he says: The question of judging contests is one of importance to every teacher of public speaking, for it influences his faith in contests, his spirit in classroom work, and the tone of public speaking in the school. Evaluation of speech must of necessity involve some 4 pointed out, subjective judgments. Nevertheless, as Pelsma every attempt should be made to improve the fairness and accuracy of these judgments since they are used as a basis for instructing and guiding the student as well as determin— ing his status. The crucial importance of accurate evaluation of the student is illustrated by a recent regulation of Central Michigan College of Education5 regarding the demonstration 2Leon D. Reid, Teaching Speech in High School. Columbia, Missouri: Artcraft Press, 1955, p. 191. 3R. D. T. Hollister, "Faculty Judging," Quarterly Joppnal pf Public S eakin , 3:235, July, 1917. 4J. R. Pelsma, "Standardization of Grades in Public Speaking," Quarteply Journal pf Public S eakin , 1:268, October, 1915. 5Bulletin, 1952—53 Sessions. Central Michigan College pf Education. Mount Pleasant, Michigan, 1952, p. 74. 3 of speech competency by candidates for teaching certificates. The regulation prescribes completion of the course, EEQQA- t mentals pf Spgppp (Speech 101), with achievement of at least a "C" in the course. This "C" is interpreted to mean "average" skill and facility in communicating information to a group of persons. A. The Problem Statement 2; phg problem. This research is designed specifically to study critics' ratings of speech performances by students in Fundamentals p; Sppgph classes at Central Michigan College of Education. The purpose of the study is to examine the role which certain factors play in the evalua- tion of Speech competency. The experiment will attempt to secure evidence in answer to four questions: (I) Do ratings given by college juniors and seniors who are speech majors or minors differ significantly from those given by members of the speech faculty? (2) Do ratings given by the evaluators who know a speaker differ significantly from those given by the evalua- tors who do not know the speaker? (3) Do the ratings given by the evaluators who have had experience with the rating scale differ significantly from those given by the evaluators who have had no experi- ence with the rating scale? 4 (4) Do the ratings of speakers of each sex differ sig- nificantly according to the sex of the evaluator? Impoppance pf php pppgy. This study is of particular concern to the students and faculty of Central Michigan College of Education. The significance that speech per- formance ratings have for all freshmen on that campus has already been explained. 6 Jones , who studied the current practices in the beginning Speech courses of 318 colleges, found that rating charts were used in the majority of them. Thus this study may be of general interest to a number of colleges in the United States. However, it is not the purpose of this experiment to determine the relationship of these factors in all schools but rather particularly in the situation at Central Michigan College of Education. Before any adaptations to other colleges are made, one must first determine the extent to which their students and faculties are comparable to those at Central Michigan College of Education. Naturally the characteristics of the evaluator who uses the rating scale are of primary consideration. As 6Horace Redman Jones, "The Development and Present Status of Beginning Speech Courses in the Colleges and Uni— versities in the United States," (unpublished Doctor's dis- sertation, Northwestern University, Evanston, 1952), 216 pp. 5 Hudgins7 points out, one way to increase the reliability of the evaluation is, of course, to reduce the variability among the evaluators. To do this, evidence must first be secured regarding the extent of influence that certain factors, other than the speech performance itself, have on the ratings. The four questions being considered in this experiment have been selected because they have not been answered by previous research. Furthermore, they involve factors that can be controlled in the ordinary classroom situation. The possible concrete results in terms of action may be seen by briefly considering the significance that each of the four factors might have when selecting evaluators: (1) If ratings given by upperclassmen who are speech majors or minors tend to differ significantly from those given by members of the speech faculty, speakers who are to be compared ought to be rated exclusively by students, ex- clusively by fabulty members, or by a like number of each. If, on the other hand, this difference in the academic speech training of the evaluators plays no significant role, one may select speech judges at random in this reSpect. (2) If evaluators who know a speaker tend in general to rate him higher, or lower, than the evaluators who do not know the speaker, the teacher must be sure that either each 7C1arence V. Hudgins, "The Validity of Speech Tests," V015; Review, 45:271-2, May, 1943. 6 evaluator has no acquaintance with the speaker or that a like number of each Speaker's acquaintances are used as evaluators. However, if acquaintanceship plays no role in this regard, then this factor need not be considered in securing evalua— tors. (3) If evaluators who have had experience with the rating scale that is being used make more reliable evalua- tions than those who do not have such experience, teachers must be sure that all evaluators receive practice experi— ences. However, if experience with the rating scale makes no significant difference, mere verbal instruction may suffice; and any time spent in practice will be wasted. (4) If evaluators rate Speakers of the same sex differently than speakers of the opposite sex, this factor must be taken into consideration when securing evaluators. If, on the other hand, neither men nor women evaluators show an appreciable sex-tied favoritism in their rating of speak- ers, the evaluators may be secured at random without regard to sex. This study has been predicated on the assumption that these sources of uncertainty in the evaluation of speech performance warrant careful investigation. There are, of course, other important factors such as the social back- ground, intelligence, and physical health of the speech evaluators which have not been explored in this research. B. Definition 93 Terms Selected factors. Factors may be considered as cer- tain characteristics of the raters. In this study the char— acteristics being considered are: (1) The academic speech training of the evaluator. The ratings given by students who are in either their third or fourth year of college will be compared with the ratings given by faculty members who have advanced degrees. (2) The acquaintanceship of the evaluator with the speaker. The ratings given by evaluators who are acquainted with the speakers will be compared with the ratings given by evaluators who are not acquainted with the speakers. (3) The experience of the evaluator with the rating scale. The ratings given by evaluators experienced with the rating scale will be compared with the ratings given by these same evaluators before they had experience with the rating scale. (4) The sex of the evaluator in relation to the sex of the speaker. The ratings given female Speakers by male evaluators will be compared with the ratings given the same speakers by female evaluators. Likewise, the ratings given male speakers by male evaluators will be compared with the ratings given the Same speakers by female evaluators. Influence. Influence may be assumed to exist whenever a statistically significant relationship is established be- tween ratings given in any one of the above categories. Ratings. Ratings in this study are the judgments of the "intergroup Speech projects" as expressed by critics on the Evaluator's Rating Scale.8 Intergroup speech project. This is a phrase used at Central Michigan College of Education to designate an ex- pository speech of approximately three minutes duration. It is delivered before a group of fifteen freshmen who are mem- bers of various sections of the Fundamentals 9; Speech course. In each audience there are two evaluators, one a student and the other a faculty member, who make independent ratings of each speaker. Each student evaluator is a col- lege junior or senior who is also a speech major or minor and who has been approved by the speech faculty as a competent student. The faculty evaluators are members of the Speech Department of Central Michigan College of Education. Fundamentals pf Speech. This class, bearing the col- lege catalog designation of Speech 101, is a two semester- hour course required of all freshmen on campus. They may register for it either the first or second semester. Ap- proximately six hundred students take this course each year. 8See Appendix A. C . Summary In the light of the defined terms the problem may be regarded as an attempt to determine what statistically sig- nificant relationships, if any, exist between the judgments expressed by critic-judges on the Evaluator's Rating Scale and: (a) their academic speech training, (b) their ac- quaintance with the speaker, (c) their experience with the rating scale, and (d) their sex in relationship to the sex of the speaker. I This study is of particular concern to the students and faculty of Central Michigan College of Education where speech competency is a prerequisite to candidacy for a teaching certificate. However, it may also have implica- tions for other comparable institutions that have similar programs. CHAPTER II REVIEW OF THE LITERATURE In order to explore the literature pertinent to the present study, the writer not only read published and un- published research in the Speech field,l’ 2’ 3 but also examined psychological, sociological, and pedagogical writ- ings.4’ 5’ 6 lLester w. Thonssen and Elizabeth Fatherson, Bib- liography p; Speeph Education. New York: H. W. Wilson and Company, 1939. 800 pp. 2Lester W. Thonssen, Mary Margaret Robb, and Dorthea Thonssen, Bibliography pf Speech Education - Supplement 1939-48. New York: H. W. Wilson and Company, 1950- 393 pp. 3Franklin H. Knower, Table pf Contents 92 phg 'ua terl' Journal pf Speech (1915-19223; speech Monographs Wane Speech Teacher 1 2 wiph g Revised Ipdex Compiled Through 1952. Columbia, Missouri: Speech Association of America, 1953. 61 pp. 4Walter S. Monroe, editor. Encyclopedia %f Edg- capional Research Revised Edition. New York: he Macmillan Company: 1950. 1520 pp. 5Alice F. Moench and others, editors. Thg Int :— ngtional Inde; pg Periodicgls Devoped Chiefly pp phg Humanities and Sciencpg. New York: H. W. Wilson Company, V018 o I-XII , 1913-19573 0 6Isabell Towner and Ross Carpenter, editors. T e Educapion Index; A Cumulapive Auphor gpd Subjecp Inde; pp p Selected Lisp p; Egucational Periodicals, Books, gpd Pamphlets. New York: H. W. Wilson Company, Vols. I-VIII, January, 1929 - June, 1953. Also Educg- pion Index Monthly Check-Lisp, July, 1953 - April, 1954. 11 The bibliographies compiled by Rosenberg7 were con- sulted to discover the masters' and doctoral theses completed before 1945 which might have some bearing upon the problem. More recent studies were listed by Knower8 and Auer.9 The latter even included this study.10 The literature relating to this study will be re- viewed under two categories: (1) speech rating in general, and (2) studies which give Specific consideration to any of the four factors selected for investigation in the present study. A. Speech Rgping ip Genezal Although much has been written about the use of 11 little has been published on rating scales in general, the rating of oral performances in Fundamentals 9; Speech classes. 7Ra1ph P. Rosenberg, "Bibliographies of Theses in America," Bulle in p; Biblio h , 18:181-82, Septem- ber-December, 1945. 8Franklin H. Knower, "Graduate Theses——An Index of Graduate Work in Speech," Speech Monographs, 21:108- 35, June, 1954. 9J. Jeffery Auer, "Doctoral Dissertations in Speech: Work in Progress, 1954," Speech Mono ra ha, 21: 136-41, June, 1954. 10Ibid., p. 141 110arter v. Good, A. s. Barr, and Douglas E. Scates, The Mephodology pf Educapiongl Reseagch. New York: D. Appleton-Century Crofts, 1941, pp. 424-37. 12 Symondsl2 points out that group rating is more reli— able than is individual judgment: A Single observation is unreliable, a single rating is unreliable, a Single test is unreliable, a single measurement is unreliable, a single answer to a question is unreliable. Reliability is achieved by heaping up observations, ratings, tests, questions, measures... An adequate rating requires the judgment of several raters in several situations at several different times. Reliable evidence must be multiplied evidence. Rugg13 recommends the use of pooled or averaged rat- ings of not less than three independent raters. In each instance it is assumed that the several raters are all com— petent to rate and that the reliability of pooled ratings tends to increase according to the Spearman—Brown formula.14 Holcombls found that, although most judges take care- ful notes and have a number of definite points on which to judge, they do have personal standards which vary widely from one judge to another. 12Percival M. Symonds, Diggnosing Pepsonalipy gpd Conduct. New York: D. Appleton—Century Company, 1931, p. 5. 13Harold C. Rugg, "Is the Rating of Human Character Practicable?" Journal pf Educational Psychology, 12:425-38, November, 1921. 14Joy P. Guilford, Psychometrip Methods. New York: McGraw-Hill Book Company, Inc., 193 , p. ZZI. 15Martin J. Holcomb, "The Critic-Judge System," Quarperly Journal 93 Speech, 19:28-38, February, 1933. 13 who has done a great deal of research in the 16 Knower, field of speech evaluation, says: The objectivity of observational evaluation is en- tirely a matter of the objectivity of raters. Although the standards of evaluation in this process are osten- sibly subjective, it remains a fact that such judgments may be as accurate, or even more accurate than an arbitrarily assigned score derived from items on an objective paper and pencil test. B. The Four Fact0pp The almost complete lack of research in the area out- lined by this study, namely the four factors which may affect the rating of speech performances, indicates the need for this work to be done. Furthermore, where studies have been conducted, as in the area of sex influences, the evidence is inconclusive and even contradictory. 1. Egg gcademic ppggph ppgining p: th evaluator. West and Larsen17 experimented with students in the required freshman course in speech in the State University of Iowa. Students ranked their classmates, and their "class ratings" were compared with ”grades" given by the instructor. They reported: 16Franklin H. Knower, "What is a Speech Test?" Quapteply Journal 9; S eech, 30:485-93, December, 1944. 17Robert West and Helen Larsen, "Some Statistical Investigations in the Field of Speech," Quarterly Joppnal pf Speech, 7:375-82, November, 1921. 14 The relation between the combined judgment of the class on each speaker and the instructor's judgment on each speaker, or the correlation between "class ratings" and "grades" computed on about 300 cases, is .453. Assuming that the instructor's grade is made upon a reasonable basis, one would say that comparatively just marks could be given a student of speech by getting a rating from the class. This conclusion embodies as one of its basic assump- tions the acceptance of the standard postulated by Rugg18 as a test of Significance: The experience of the present writer in examining many correlation tables has led him to regard correla- tion as 'negli ible' or 'indifferent' when r (coefficient of correlation is less than .15 to .20; as being 'preS— ent but low' when r ranges from .15 or .20 to .35 or .40; as being 'markedly present' or 'marked' when r ranges from .35 or .40 to .50 or .60; as being 'high' when it is above .60 or .70. Knower19 investigated the extent of agreement between students and instructors in their rating of student speakers. He had instructors and students rate thirty-three speakers. The ratings given by students were correlated, by the rank order method, with the raw score given by the instructors. In light of these correlations he concludes: Since the correlations were consistently higher, with one exception, between the students' ratings and the in- structors' ratings than between the ratings of the in- structors, we have a more objective criterion of effective public speaking in the average of a number of student scores than we have in the scores assigned by one in— structor. 18Haro1d o. Rugg, Spatispical Methods Applied ;9 Education. New York: Houghton Mifflin Company, 1917, p. 256. 19Franklin H. Knower, "A Suggestive Study of Public Speaking Rating-Scale Values," Quarteply Journgl pf Speech, 15:30-41, February, 1929. 15 20 made a study of the ratings given by 169 Anderson students to their classmates in a basic communications course, at that time called Written and Spoken En lish, at Michigan State College. The student Speakers were in eight different rooms. Each of these eight groups was rated by three faculty members as well as by their fellow classmates. By comparing these ratings she came to the conclusion that the students were more in agreement as to the ratings the speaker should get while the faculty varied more in their judgments. How- ever, this is not necessarily a valid comparison of faculty raters with student raters since the faculty used a rating scale listing five traits while the students evaluated only upon one of these five traits. Gibbs21 investigated the degree of consistency in evaluations made by faculty members and students listening to recordings of three minute speeches. The students were classmates of the speakers. According to this study student evaluators place more Speakers in the below average classi- fication than do faculty evaluators. However, the evalua- tions were made only of voice and articulation; and, Since 2QMaryMargaret Anderson, "An Analysis of Some of the Sources of Variation Involved in Rating Speeches," (unpublished Master's thesis, Michigan State College, East Lansing, Michigan, 1945), 19 pp. 21David Elmore Gibbs, "A Study of Reliability and Variation of Critical Rating of Speech by Trained and Un— trained Observers," (unpublished Master's thesis, The University of Washington, Seattle, 1948), 87 pp. 16 none of the students had any courses in this field, training may have been the important factor here. There is no evi— dence to indicate whether the results would be the same if the students were juniors and seniors who had Speech training. Andregg22 made an analysis of the ratings on Six traits: "thinking, knowledge, initiative, cooperation, organizing ability, and expression." These ratings were made by students and instructors on the performance of student officers attending the Air Command and Staff School at the Air University. The officers participated in the planning of the tactical and strategic air Operations; rated each other's performances; and were rated by their instructors, also officers, who devoted full time to observation and rating. The study Showed that students and instructors rated most reliably on "expression." Also students rated their fellow staff officers more leniently than did instructors. However, the Situation of rating officers on general char- acteristics in the Air Command and Staff School may not be entirely comparable to rating freshmen in Eppggmpppglp pg Speech by faculty members and upperclassmen who are Speech majors and minors. 22Neal Berry Andregg, "A Critical Study of Graphic Rating Scales," (unpublished Doctoral dissertation, Michigan State College, East Lansing, Michigan, 1951), 138 pp. l7 2. Thp gcquaintanceship 2f the evaluator,piph php speaker. Seedorf23 conducted a study to find out how much agreement there is among individuals in their response to an oral interpretation of literature. Among other things she answered the question: How does acquaintance with a fellow- classmate's quality of work affect the amount of agreement among judges? She states: The correlation of the mean scores of each member of the two groups of student-judges, the acquainted and the unacquainted, for one group of readers, were .958 and .887 respectively, indicating that when evaluated by fellow students of approximately the same degree of training, the readers received about the same rank whether given by fellow classmates or by students who were not classmates. Knight24 analyzed ratings of 1948 public school teachers of one school system made by the supervisors under whom the teachers were working. He concluded: The factor of acquaintance operates to make ratings more lenient, i. e., increases the over—rating, and to make ratings less critical and less analytical, i. e., increases the influence of the halo of general estimate. In a way it is literally true to say of a judge's estimate: "His judgment is of doubtful validity be- cause he has known this man too long." 23Evelyn H. Seedorf, "An Experimental Study in the Amount of Agreement Among Judges in Evaluating Oral Inter- pretation," Journal p; Egucational Research, 43:10-21, September, 1949. 24Frederic B. Knight, "The Effect of the Acquaint- ance Factor upon Personal Judgments," Journal pf Educa- Liongl PS cholo , 14:129-42, March, 1923. l8 HenricksonZS made a study of one hundred and seventy- nine students in Fundamentals pf Speech courses, eighty-one from three classes at the University of Montana and ninety- eight from four classes at Iowa State Teachers College. He asked them to rate their classmates at the end of a semester or quarter on: (1) how well they knew the person; (2) how well they liked the person; (3) how good they thought the person was as a speaker. From a study of these data he came to the conclusion that the better known students are appar— ently liked better and are judged to be somewhat better speakers. 3. Thg experience pf the evaluator with the gaping l26 conducted an experiment in rating musical gpglg. Carrol selections played on phonograph records. He used two sec- tions of a class in educational psychology as raters; and each section rated the selections according to (1) volume, (2) expression, (3) quality, (4) melody, (5) harmony, and (6) rhythm. One section, the control group, rated three records. Some weeks later they rated the same three records again. The second section, the drill group, rated the same three records at the same times as the control 25Ernest H. Henrickson, "The Relation Among Knowing a Person, Liking a Person, and Judging Him as a Speaker," Speech Monographs, 7:22-25, 1940. 26Robert P. Carroll, "Practice in Rating," Joupnal pf Experimentgl Psychology, 14:299-302, June, 1931. 19 group. However, between the first and second rating periods, this second section was given new records to rate three times a week. A great deal of discussion relative to method of rating was done in the drill group between ratings. Carroll criticized his research in that, due to the absences, there were changes in each group. Also he did nothing to equate the two groups. However, he concludes that "in general, the results of the experiment seem to indicate that by training in subjective ratings individuals may improve; that their ratings may more nearly agree and that they may become more reliable." Thompson27 in his investigation to determine the accuracy of typical speech rating techniques collected data from eleven classes in Fundamentalg pf Spgpgh. Both speakers and raters were members of these classes. The raters were freshmen taking their first college course in speech and had little or no formal training in rating speeches. The pro- cedure was to divide the class into two groups of raters. Both groups listened to the same speeches, but one used rat— ing scales while the other used letter grades. The data were analyzed to find the variance for each method. Then, on the assumption that the system which produces the least variance 27Wayne N. Thompson, "An Experimental Study of the Accuracy of Typical Speech Rating Technique," (unpublished Doctor's dissertation, Northwestern University, Evanston, 1943), 204 pp. 20 is the best, the investigator concluded that a rating scale is superior the first day of the experiment but that after practice (or fatigue) letter grades are more accurate. 4. The sex pf the evaluator ip relation 39 the sex pf the Speaker. There have been many studies conducted and much has been written, often quite contradictory, in the field of sex differences. A few of these are somewhat relevant to the aspect of sex differences being considered in this study. Anastasi,28 who has done considerable writing in the field of sex differences says: It is apparent that the effectual environment of the two sexes are fundamentally diverse from an early age. Under such conditions, we Should expect pronounced varia~ tion in the emotional and intellectual development of the two sexes. Lehman and Witty29 caution that in collecting data from males and females to determine variability the same age—levels only should be compared. He states: One source of error is hasty generalization due to the inclusion of all or nearly all age-levels in the form— ulation of conclusions. Because of the irregular develop- ment of numerous human characteristics, one may expect to find that at a certain age—level girls will be more vari— able than boys in some regards, and at an earlier or later age the reverse may be true. 28Ann Anastasi, Differential Psychology: Individual and Group Differences in Behavior. Macmillan Company, New York, 1937. p. 386? 29Harvey G. Lehman and Paul A. Witty, "Sex Differ- ences: Some Sources of Confusion and Error," American Journal pf Psychology, 42:140-47, January, 1930, p. 143. 21 Symonds3O studied the differences of areas of interest according to sex. Fifteen areas of human interest were ranked by 784 high school boys, 857 high school girls, 276 college men, 387 college women, 73 men graduate students, and 111 women graduate students in order of interest for reading or discussion. By taking average ranks for the various groups he noted differences by sex for different maturity levels. He found a greater difference in the area of inter- est between college men and college women than he found be- tween high school boys and high school girls or between men and women doing graduate work. Carter31 made an investigation of the assignment of marks by teachers of beginning algebra to determine whether or not teachers tend to favor one sex and whether the sex favored tends to be determined by the sex of the teacher. Nine classes, five taught by men and four taught by women, were used. In these classes there were 135 boys and 100 girls. Intelligence, achievement, and personality of these students were measured by standardized tests; and the rela- tionship between teachers' marks and (1) intelligence, 30Percival M. Symonds, "Changes in Sex Differences in Problems and Interests of Adolescents with Increasing Age," Journal pf Genepic PS cholo , 50:83-89, iarch, 1937. 31Robert Scriven Carter, "Non—intellectual Variables Involved in Teachers' Marks," Journal 91 Educapional Reseapch, 47:81-95, October, 1953. 22 (2) achievement, and (3) personality was determined by com- puting the correlation coefficient. The results showed that the relationship between teachers' marks and intelligence was higher for the boys than for the girls. Also that the rela- tionship between teachers' marks and achievement scores was higher for the boys than the girls. However, the relation- ship between teachers' marks and personality was higher for the girls than for the boys. Marks given by women teachers correlated higher with personality test scores than did the marks given by men. Conversely, marks given by men corre- lated higher with achievement and intelligence than did marks given by women. Douglas and Newman,32 who made a study of achievement and marks of 3366 students in four Minnesota high schools, say: In the light of the data of this and other investi- gations, it seems probable that marks are determined by factors other than achievement, especially marks assigned by women teachers, and that these influences result in the slight overrating of girls generally and the peculiar underrating of boys by women teachers. However, this refers to marks in English, history, and mathematics given to high school students by faculty members and may not be comparable to ratings given in a college speech course by college students or faculty members. 32Harl R. Douglas and Olson E. Newman, "The Relation of High School Marks to Sex in Four Minnesota Senior High Schools," School Review, 45:481-88, April, 1937, p. 288. 23 Fifty members of the faculty at Northwestern Univer— sity ranked one hundred and four students, fifty-three men and fifty-one women, by classifying them into ten different groups according to estimated intelligence. These were cor- related with scores received by the students on a battery of intelligence tests. As Webb33 reports: Each group Showed some partiality to the opposite sex in estimating intelligence; that is, the men gave evi- dence of placing a slightly higher value on the intelli- gence of women than they do that of men. The women appear to do the same thing in regard to the men. The writer34 made a survey of the attitudes of 227 intercollegiate debaters towards debate judges. He found that, as far as the feelings of debaters were concerned, the sex of the judge made a difference in more than half of the cases. Only 39 per cent of the men and 49 per cent of the women felt that the sex of the debate judge had no effect upon the ratings they received. Both men and women debaters preferred male over female judges. However, this preference for male judges was slightly more pronounced among women debaters (45 per cent) than among men debaters (40 per cent). On the other hand 21 per cent of the men debaters thought 33L. w. Webb, "The Ability of Men and Women to Judge Intelligence," School ppg Sociepy, 20:251-54, August 23, 1924. 34Emil R. Pfister, "A Survey of Attitudes Toward Debate Judges," Forensic pg 2; Kappa Delta, 39:102—03, May, 1954. 24 female judges rated them higher while only 6 per cent of the women debaters thought they were rated higher by female judges. Knower35 administered one form of the Smith and Thur- stone Attitude towarg Prohibition Scale before a speech and gave an equated form after the presentation. He concluded that the delivery of speeches produces a change of attitude statistically significant and that women speakers are more influential with a male audience than are men speakers. Similarly, a female audience is influenced more by men speak- ers than by women Speakers. Graunke36 found that female instructors gave higher ratings than did the male instructors. However, these ratings were not broken down to determine whether female instructors rated both male and female students higher than did the male instructors. Penland37 made a study of the ratings given to eighty- seven university sophomores, fifty-three women and thirty—four men. These students read orally and were rated by both male 3SFranklin H. Knower, "Experimental Studies of Changes in Attitudes," Journal 9: Social Psychology, 6:315-44, August, 1935. 36Graunke, pp. p$p., p. 102. 37Virgil Darrell Fenland, "An Experimental Study to Measure Effectiveness in Oral Reading by Means of a Rating Scalme Technique," (unpublished Doctor's dissertation The UHlVWErsity of Southern California, Los Angeles, 1948), 177 pp. 25 and female judges. He found one "probably significant dif- ference," i.e. that female judges tended to be more "severe" in rating women performers in this field of oral reading. C . Summary Speech rating ip genera . A survey of the literature that deals with Speech rating in general indicates that: (1) Group rating is more reliable than individual judgment. (2) Reliability of pooled ratings increases as the number of competent raters is increased. (3) Personal standards vary widely among judges. 122 four factors. Although no research identical to this experiment has been conducted, studies have been made that are related to the four factors being considered in this study: (1) Five studies compare ratings given by faculty members with ratings given by students. One experiment con- cludes that rating by a group of students is more accurate than by a single faculty member. Two studies point out that ratings given by students are more in agreement than are ratings given by faculty. One study indicates that student raters give more lenient ratings than do the faculty while another indicates that faculty give the more lenient ratings. However, all five of these studies use freshmen evaluators who care the speakers' classmates. These ratings may not be 26 equivalent to those given by speech majors and minors in their junior or senior year of college. (2) Three studies consider the factor of acquaintance with the Speaker. Here, too, the students used as evaluators were classmates of the Speakers. All three studies agreed that evaluators acquainted with the speakers were more leni- ent than evaluators unacquainted with the speakers. (3) Only two research studies could be found that were concerned with the evaluator's experience with the speech rating scale. The first of these studies found that training improved the reliability of the rater. However, it should be noted that this experiment was conducted by rating music on phonograph records and that coaching as well as practice was used. The second study used classmates to rate speeches. It concluded that a rating scale is superior to letter grades the first day, but after that letter grades are more accu- rate. (4) The studies regarding the relationship between ratings given a speaker and the sex of the evaluator are in- conclusive. This survey concurs with an earlier report by 8 McNemar and Terman3 regarding variability in mental traits between sexes: 38Quinn McNemar and Lewis M. Terman, "Sex Differences in Variational Tendency," Genetic Psychology Mono a hs, 18:8, February, 1936. 27 Research has not proved either the presence or absence of a sex difference in variability with respect to psychological traits. There are few problems in psychology on which investigations that would appear to be comparable have yielded results so discordant. CHAPTER III PROCEDURE The data for this study were collected during the 1952-53 academic year at Central Michigan College of Educa- tion, Mount Pleasant, Michigan. Six hundred and four people cooperated to make this experiment possible. They may be divided into two groups: (1) There were the speakers, five hundred and forty— nine of the five hundred and ninety-eight freshmen enrolled in Fundamentals g; Spgggp classes.l Three hundred and five of these were in the nineteen sections which were taught during the first semester and two hundred and forty—four were in the nineteen sections taught during the second se- mester. Each of these students gave two different three minute informational Speeches before audiences which aver- aged about fifteen people, most of whom were unacquainted with the Speaker. Thus the freshmen gave a total of 1098 three minute informational speeches. Furthermore, since each of these speeches was given before two different audi- ences, the students compiled a total of 2196 speech per- formances. lThe forty-nine freshmen excluded from this experi- ment were those who, because of some reason such as illness, were unable to participate in all four of the intergroup Speech projects. 29 (2) There were the fifty-five evaluators, forty-six students and nine faculty members.2 The student evaluators were juniors and seniors who were speech majors or minors while the faculty evaluators were members of the Department of Speech. Each of the 2196 speech performances was rated by at least one student and one faculty member. These raters sat in the audience, worked independently, and used a stand- ard rating scale. Therefore, by having various pairs of evaluators, one student and one faculty, rate the 2196 speech performances, a total of 4392 ratings was collected. As will be explained later in this chapter, precautions were taken so that these ratings would be comparable. A. Devising the Evaluapor's Raging SCQIC The Evaluator’s Rating Scale3 that was used in this experiment was devised by the writer who employed the fol- lowing procedure:' (1) A study was made of the Speech rating scales 28ee Appendix B. 3See Appendix A. 30 which have appeared in speech textbooks and periodicals pub- lished in the United States.4-14 4Arthur W. Cable, "A Criticism Card for Class Use," Journal 9; Speech Education, 12:186-88, April, 1926. 5J. Stanley Gray, "Objective Measurement for Public Speaking," Journal 9; Expression, 2:20-26, March, 1928. 6Wilmer E. Stevens, "A Rating Scale for Public Speakers," Qparterly Journal p; Speech Education, 14:223-32, April, 1928. 7Alice J. Bryan and Walter H. Wilke, "A Scale for Measuring Speaking Abilities," Psychological Sulletin, 33:605-0 , October, 1936. 8Harry G. Barnes, "Appendix," S eech Handbook. Iowa City: Privately printed, 1936. 13 pp. 9Helen L. Ogg and Ray K. Immel, "Speech Criticism Chart," Speech Improvepent. New York: F. S. Crofts and Company, 193 . 190 pp. loElwood Murray, The Speech PegSonalipy. New York: J. B. Lippincott Company, 1944. pp. 271-391. llWilhelmina G. Hedde and William N. Brigance, "A Score Sheet for Judging Speeches " Americ n S eech. New York: J. B. Lippincott Company,’l946. pp. 581—8 . 12Alice Evelyn Craig, The S eech Art . New York: The Macmillan Company, 1947, p. 252. 13A. Craig Baird and Franklin H. Knower, "Appendix D," General Speech. New York: McGraw-Hill Company, 1949, p. 294. 14Karl F. Robinson, Spaching Speech Sp Secondapy School. New York: Longmans, Green and Company, 1951. pp. 123-28. 31 (2) A survey was made of the literature regarding the construction of speech rating scales.15—22 (3) Taking into consideration the conclusions from previous research conducted in the field of rating scale con- struction, the writer devised a rating instrument. This in- strument incorporated the elements common to other speech rating scales that had been used by others with some satis— faction in the past. This was revised and refined in the light of suggestions offered by faculty members of the Speech Department as well as members of the advisory committee for this thesis. 15J. B. Miner, "The Evaluation of a Method for Finely Graduated Estimates of Ability," Journal p; Applied Psy— cholo , 1:123—33, June, 1917. 16Max Freyd, "The Graphic Rating Scale," Journal p: Educational Psychology, 14:83-102, February, 1923. l‘7Percival M. Symonds, "Notes on Rating," Journal p: Applied Psychology, 9:188-95, June, 1925. 18Paul H. Furfew, "An Improved Rating Scale Tech- nique," Journal pi Educational Psychology, 17:45-48, January, 19Percival M. Symonds, "Rating Methods," Diagnosing Personality and Conduct. New York: D. Appleton-Century Company, 1931. pp. 41-121. 2OLee Norvelle, "Development and Application of a Method for Measuring the Effectiveness of Instruction in a Basic Speech Course," Speech Monographs, 1:41-65, 1934. 21Alice J. Bryan and Walter H. Wilke, "A Technique for Rating Speeches," Journal p; Consulting Ps cholo , 5:80-90, March-April, 1941. 22Isabel Kincheloe, "On Refining the Speech Scales," English Journal, 34:204-07, April, 1945. 32 (4) The writer presented this speech rating scale to his colleagues at a departmental staff meeting of the speech faculty of Central Michigan College of Education where it was discussed and received unanimous approval. (5) The last two steps, (a) that of introducing the rating scale to the evaluators and (b) that of investigating its reliability and validity, will be discussed later. B. Collecting the Data Before conducting the experiment it was essential to secure the cooperation of the faculty members of the speech department. During September, 1952, several Speech Depart- ment staff meetings were held previous to the registration of students. At one of these the writer outlined a plan for conducting this experiment in evaluating oral perform- ances of students in Epndamentals p; Speech classes. The members of the speech faculty were not only willing to cooperate but they also liberally contributed ideas, time, and effort. The project also required the cooperation of the juniors and seniors who were on either a Speech major or Speech minor curriculum. When asked if they would be will- ing to serve as evaluators, their response indicated that in general they were eager to have the experience as a back- ground for preparation as future teachers of speech. 33 Scheduling the evaluatorp. Scheduling the evaluators from the Speech faculty was accomplished with little diffi— culty since the instructors of_Fundamentals p; Speech agreed that no class sessions of the course were to be held during the weeks that the intergroup speech projects were scheduled. This policy freed the faculty to serve as evaluators. The fact that each student missed two class sessions during that week could be justified because each student was having the experience of giving the same speech before two different audiences. Furthermore, he was getting the evaluations of four well qualified evaluators. Securing qualified student evaluators required more effort. The first step was to compile a list of the seventy students who were either speech majors or minors and who were also either juniors or seniors.23 This list was dupli— cated and copies were sent to each member of the speech faculty in order to determine (1) the number and type of speech courses that each student had taken, (2) professors with whom he had done speech work, and (3) whether the pro- fessor regarded the student as qualified to serve as an evaluator.24 23See Appendix C. 24See Appendix C. 34 Meanwhile, letters signed by the Head of the Depart- ment of Speech and Drama were sent to all juniors and seniors who were Speech majors and minors.25 These letters explained the intergroup speech project, solicited student cooperation, and included a student evaluator's preference report blank.26 When these blanks were filled out and returned, the juniors and seniors were assigned groups to evaluate. These assignments were made according to the student's availability and preference. Then each student was sent a letter inform- ing him of the time or times that he was scheduled to serve 2'7 as an evaluator. Preparing the Speakers. At the sixth meeting of the Fundamentals p; Speech classes during the fall as well as during the spring semester, the speech sections were given the following assignment: You are to prepare a three minute informative Speech and deliver it on the week of . You will be scheduled to speak before two different audiences com— posed of students from other Speech 101 classes. You will be rated in each of the performances by a student who is a Junior or Senior and a Speech major or minor. You will also be rated in each of the performances by a member of the speech faculty. 258cc Appendix D. 26See Appendix E. 27See Appendix F. 35 At the seventh meeting of the class a sheet was given to each student on which he could list the dates and times that he preferred as well as those when he could not speak.28 This helped in scheduling students for speech performances. Each instructor was given a schedule on which were listed the date, time, and room that each student was assigned. After the teacher read this aloud and the student wrote down his speech schedule for that week, the sheets were posted outside the speech secretary's office so that any student might double check his speaking assignments. During each semester every student enrolled in Epppp- menpals p; Speech classes was expected to participate in two Intergroup Speech Projects. First semester students gave their first project Speeches the week of October 27-31, 1952, and their second project performances the week of January 12- 16, 1953. The second semester students gave their first project speeches the week of March 9-13, 1953, and their second series of speeches the week of May 11-15, 1953. Eyeparing the evaluators. The first problem in pre- paring the evaluators was to familiarize them with the rating scale without any indoctrination that would make this experi- ment sterile. However, the evaluators had to have verbal 28See Appendix G. 36 agreement regarding what was to be rated. As Wilke29 says: The first difficulty which anyone attempting to rate individuals runs up against is the matter of attaching exact meanings to the terms used on the rating scale. Many previous users of rating devices have urged the use of careful definitions to establish unequivocal meaning. According to Monroe,30 research upon the problem of increasing the agreement among judges when rating scales are used discloses that an adequate definition of what is being rated is crucial. Symonds31 attempts to outline the method by which such definition is attained: Particular attention must be paid to the definition of the items in the scale. On this hinges much of the success or failure of ratings in general. One of the most potent factors causing unreliability of ratings is ambiguity in the meaning of items on the scale. Thus in every rating scale the items should be defined in some way. There are several possible ways of doing this. One, perhaps the least satisfactory, is to give synonyms of the original term. Another is a short paragraph amplifying the descriptive title. Another method is to ask a question which not only limits the meaning of the term but somehow helps the rater to see the problem of rating more clearly. Furfey32 conducted research which indicated that re- liability could be increased not only by increasing the 29Walter H. Wilke, "A Subjective Measurement in Speech: A Note on Method," Quarterly Journal p; S eech, 21:55, February, 1935. 30Monroe, pp. pip., p. 961. 318ymonds, pp. plp., p. 84. 32Furfey, pp. pip., p. 92. 37 number of judges but also by increasing the number of judg- ments which each judge makes. He explains: This is easily accomplished by analyzing the trait to be rated into several sub—traits, by having the judges rate all these sub-traits separately and then combining these separate ratings into a final score. This is quite comparable to the process of measuring intelligence by measuring separately a number of abilities which are be- lieved to correlate highly with intelligence and then com- bining the separate results into a final score. This subdividing of traits may be overdone, of course, but the need for more specific items cannot, according to Freeman,33 be ignored: Frequently it is held that the uniqueness of the in- dividual personality pattern renders futile any analysis into elements which, when isolated and measured, lose their meaning. Because this view is at variance with canons of scientific procedure, it should be examined very critically. There is a middle ground somewhere, and this we must find before real progress in personality assay is made. The students and faculty who agreed to serve as eval- uators were given copies of the Evaluator's Rating Scale34 to study two weeks before the intergroup speech projects were scheduled to begin. The week prior to the projects the eval- uators met twice and discussed the question: "What is meant by the various items on this rating scale?" Student-faculty committees were set up on each of the four major divisions: (1) "Thought," (2) Language," (3) "Voice," and (4) "Action." 33Graydon LaVern Freeman, The Energetics of Human fighavioz. Ithaca, New York: Cornell University Press, 0 O l 194 7. 34See Appendix A. 38 Students acted as committee chairmen while faculty members served as resource persons. Only those items that were sub- mitted by the committee and agreed upon unanimously by the evaluators were accepted as the official interpretation of the criteria used. These criteria were then mimeographed and distributed so that each evaluator would have a copy of the interpretation of the rating scale.35 Thus an attempt was made to reduce the variables inherent in interpreting the rating instrument. The speech faculty members who had considered and dis- cussed the intergroup speech project early in the semester compiled a list of instructions regarding how the project should be carried out so that the procedure would be con- sistent in all sections.36 These also were mimeographed and sent to all the evaluators. Thus an effort was made to prepare the evaluators so that they would understand and appreciate the meaning, use, and purpose of the rating scale. This was in accord with the advice offered by Strang:37 Only by taking into consideration the way in which the rating is used, the harm that may result from super— ficial or inaccurate rating, and the service which the 35See Appendix H. 368ee Appendix I. 37Ruth Strang, "Seven Ways to Improve the Rating Process," Occupations, 29:107-10, November, 1950. 39 rating may perform in preventing the individual from get- ting into situations in which he is likely to fail, can the rater appreciate the importance of the rating. Description pf pp; pgperimental setting. Directions given in "Instructions to Evaluators"38 were followed care- fully since it was essential to the success of this investi- gation that certain conditions be kept constant. To aid in achieving this objective, precautions were taken to see that several controls operated and that all of the speakers gave the same type of speeches under similar conditions before paired judges. No exceptions were considered.39 Accordingly, these procedures were followed: (1) Only college freshmen enrolled in Fundamentals pi Speech participated as Speakers. They gave the same length speeches (approximately three minutes) with the same general purpose (to inform). (2) All audiences were similar, being composed of approximately fifteen speakers from various sections of the class, and two evaluators, one a member of the speech fac- ulty and the other a college junior or senior majoring or minoring in speech. (3) Each pair of evaluators followed identical in- structions, heard the same speeches at the same time, used 388ee Appendix I. 39Where an exception occurred, the rating scales were kept separate and were not used in this study. 40 the standard rating scale, and had previously agreed upon the interpretation of that rating scale. (4) Each Speaker gave two intergroup speech project information talks. Both of these were given in the same room at the same time of day and the same day of the week, exactly 0 Each speaker also had the same audience 41 nine weeks apart.4 and the same pair of judges listen to both of his speeches. Additional evaluators. When pairs of evaluators heard the same speakers give their second series of information speeches during the week of January 12—16, a third evaluator was present in sixteen of the sections. Each of these ad- ditional sixteen evaluators was also either a member of the speech faculty or a junior or senior who was a Speech major or minor. This was done in order to be able to study the correlation of scores given by two different faculty mem- bers, or two different upperclassmen, who heard the same speech at the same time. 40During the first semester, 1952—53, the first intergroup speech project was conducted during the week of October 27-31, 1952, and the second project the week of January 12-16, 1953. The second semester the first pro— ject was March 9-13, and the second, May 11-15, 1953. 41Since the second series of intergroup speech pro— jects were scheduled for corresponding times and days, there was not a great deal of difficulty in securing the same evaluators. However, in such cases where substitute evaluators were necessary, the ratings were not considered in this experiment. 41 The additional evaluators, sitting in with the paired evaluators and rating the Speakers, made another comparison possible. Ratings given by evaluators hearing the speakers at the same time could be compared with the ratings given by the evaluators hearing these speakers give the same speech at a different time. Checking rater—speaker acquaintance. Immediately preceding the fourth intergroup speech project, May 11-15, 42 was added to the rating scale in order to an extra form determine whether or not the rater was acquainted with the speaker. A Similar form43 was given to each speaker so that he could indicate the extent of his acquaintance with the raters. C. Tabulating ppg Data The fact that this experiment was designed to study four separate factors and that the data consisted of nearly forty-five hundred rating scales, each filled out with twenty-five specific items of information, made machine tabulation a practical necessity. This need was met by the use of the IBM equipment.44 —.__._ 42See Appendix J. 43See Appendix K. 44IBM equipment, manufactured by International Business Machines, 590 Madison Avenue, New York City, New York, is available for research at Michigan State College. 42 Mechanical tabulation. Mechanical tabulation was fa- cilitated by the use of a special punch card.45 This punch card made it possible to record sixty separate items on each card by use of a zero through nine code.46 Four steps had to be taken in order to convert the raw data on the rating scales into a form which could be handled by IBM methods: (1) Data on the rating scales were reduced to a numer— ical code.47 (2) The data were then entered on the punch cards by a trained IBH Operator. (3) The cards were sorted by a mechanical sorter. (4) The data were then assembled by an electronic tabulator. Organizing the data. First the data were arranged in tables designed to facilitate determining how well each stu— dent performed in comparison with his fellow classmates as well as how much improvement he had shown during the nine weeks between the first and second series of intergroup 45This punch card was designed by Francis B. Martin, Supervisor of Tabulation, Michigan State College. 46See Appendix L. 47See Appendix M. 43 speech projects.48 This arrangement of the data, although useful in computing grades for the students, could not be used to answer the four primary questions being considered in this study. Secondly, the data were arranged into sixteen cate- gories, taking into consideration the academic speech train- ing of the raters and their sex in relationship to the sex of the speaker. These categories consisted of two major divisions, male speakers and female Speakers, each broken down into eight sub-groups: (1) Scores given by male faculty evaluators serving with male student evaluators. (2) Scores given by the male student evaluators serv- ing with the above male faculty evaluators. (3) Scores given by female faculty evaluators serving with female student evaluators. (4) Scores given by the female student evaluators serving with the above female faculty evaluators. (5) Scores given by male faculty evaluators serving with female student evaluators. (6) Scores given by the female student evaluators serving with the above faculty evaluators. 48These tables, consisting of forty-two pages, have not been included in the appendix of this thesis because of their bulkiness. The writer has a duplicate copy available for anyone's use. 44 (7) Scores given by female faculty evaluators serving with male student evaluators. (8) Scores given by the male student evaluators serv- ing with the above female faculty evaluators. Comparisons of these scores and their statistical // significance are presented in the following chapter. Thirdly, in order to determine the influence which experience with the rating scale had upon the ratings, the data were arranged so that the scores given by pairs of evaluators during the first intergroup speech project could be compared with the scores these pairs of evaluators gave the same speakers during the second intergroup speech pro- ject. These were then treated statistically as will be explained later. Lastly, in order to consider whether evaluators who were acquainted with the speakers whom they rated tended to give scores significantly higher or lower than the evalua- tors who were not acquainted with these same Speakers, the Evaluator's Acquaintanceship Check Sheet was used.49 Scores given by evaluators who indicated that they were unacquainted with the speakers were compared with scores given by evalua- tors who were acquainted with these same Speakers. Statistical procedure. The available literature de- scribing the principles and methods of population parameters 49See Appendix J. 45 and sample statistics is far too extensive to summarize in this study. However, certain citations are included in an attempt to provide examples of typical authoritative support which is available concerning the mathematical methods used in this study. An example of calculation of the standard deviation from original scores is given by Garrett.5O He also demon- strates how to find the limits in any normal distribution which will include a given percentage of cases.51 This was especially useful in allocating grades according to the normal probability curve. In order to determine the significance of the dif- ference in the means of scores given by evaluators influ— enced by one factor compared with judges influenced by another factor, the "Student‘s t" test was used.52 Coefficients of correlation, symbolized by "r," were computed by the product—moment method.53 This is described 50Henry E. Garrett, Statistics ip Psychology apd Education. New York: Longmans, Green and Company, 1947. p- 3- 511bid., pp. 197-208. 52A full account of this test and the table for its use will be found in Ronald A. Fisher's Statistical Methods for Research Workers, London: Oliver and Boyd, Ltd., 1941, pp. 116—17. Its originator published anonymously under the pseudonym "Student." 53The coefficient of correlation, "r," is often called the "Pearson r" after Professor Karl Pearson who developed the product—moment method. 46 in detail by Snedecor.54 Its importance in the determination of the reliability of an evaluation instrument (such as a rating scale) was pointed out by Good and others:55 Correlation has an extensive use in connection with the critical study of tests and other instruments. The correlation of two series of measure that are supposed to represent the same thing (such as two applications of a standard test, or of comparable forms of it), is known as the coefficient of reliability. The writer was fortunate to have at his disposal electric calculators.56 These were most useful when comput— ing correlation coefficients. Further references to statistical methods are made in the chapter presenting the analysis of the data obtained dur- ing the course of the investigation. D. Summary After a year of planning and experimentation the writer devised an instrument to measure speech proficiency, the Evaluator's Rating Scale. Then during the academic year, 1952-53, with the c00peration of Central Michigan College of Education's juniors and seniors who were speech 54George W. Snedecor, Statistical Methods. Ames, Iowa: The Iowa State College Press, 1946, pp. 123-41. 55Good, Barr, and Scates, pp. pit., p. 607. 56rechniques for the efficient operation of these machines are given in Katharine Pease's Machine Computa— tions 91 Elementary Statistics. New York: Chartwell House, Incorporated, 1949, 203 pp. 47 majors or minors, the speech faculty, and the freshmen in Fundamentals 9: Speech classes, the experiment was conducted. Five hundred and forty-nine freshmen prepared two speeches and gave each speech twice. Each speech was approximately three minutes long and its general purpose was expository. Two evaluators, one an upperclass student and the other a faculty member, were in each audience and rated each speech performance. Thus 4392 ratings were collected. Furthermore, a check was made of rater—speaker acquaintance- ship. Although some variability was unavoidable, every pos— sible effort was made to have sufficient controls operating regarding speaker, speech, audience, and occasion so that the ratings would be comparable. The data were transferred from the rating scales onto punch cards and IBM methods for sorting and tabulating were employed. Then with the use of electric calculators the data were dealt with statistically. The formulas used and the organized presentation of the findings will appear in the next chapter. CHAPTER IV ANALYSIS OF THE DATA The analysis of the data gathered in this experiment considers (l) the rating scale itself, particularly its va— lidity and reliability, (2) the distribution of scores and their practical application to a marking system, and (3) the statistical relationships between each of the four factors and the ratings given by the evaluators. A. The Rating Scale Two important considerations of any measuring instru- ment are its validity and reliability. Validity means the degree to which any device or technique measures that which it is designed to measure. As stated by Cook:1 A test is said to have high validity when it measures effectively the property it purports to measure. A meas- ure of validity of a test is secured by computing a co- efficient of correlation between scores on the test and an outside criterion. Reliability refers to the consistency with which an instrument measures whatever it does measure. It is defined by Thorndike:2 1Walter W. Cook in the Encyclopedia 92 Educational Research (Walter S. Monroe, ed. , New York: The Macmillan Company, 1950, p. 1473. 2Robert L. Thorndike in the Enc clo edia p; Educa- tional Research (Walter S. Monroe, ed.§, New York: The IMacmillan Company, 1950, p. 1016. 49 The reliability of measurement has to do with the pre- cision of a measurement procedure. Measurement in educa- tion is a process of estimating the amount of some quality or attribute possessed by individual objects or specimens. These estimates are usually expressed in numbers (scores) which correspond more or less accurately to the amount of the quality or trait in question. Validity. In this research validity is held to be the degree to which the rating scale actually measures speaking skill. However, it is difficult to determine the validity of a speech rating scale because this requires some acceptable measure of the trait being rated as a basis for comparison. One of the commonly accepted measures of speaking skill is the critical response of the listener. In discussing speech rating methods Monroe and others3 point out: The problem of validity may be viewed first of all qualitatively. On logical grounds the audience response constitutes the ultimate practical criterion of the effec- tiveness of any speech. This granted, it follows that to the extent to which the judgments recorded by means of a rating scale are reliable, they are also valid. Remmers,4 who made a study of students' ratings of their teachers, states: 3Allan Monroe, Hermann H. Remmers, and Elizabeth Venemann-Lyle, "Measuring the Effectiveness of Public Speech in a Beginning Course," Studies in Higher Education, XXIX, Bulletin pf Purdue University, 37:14, September, 1936. 4Hermann H. Remmers, "Reliability and Halo Effect of High School and College Students' Judgments of Their Teach- ers," Journal pf Applied Ps cholo , 18:621, October, 1934. 50 The problem of validity of judgments is hardly perti- nent. While reliability may be defined as the accuracy with which a measuring instrument measures whatever it does measure, validity is defined as the extent to which the instrument measures what it purports to measure. Since it is student judgments that constitute the cri- terion, reliability and validity are in this case synony- mous. While the writer believes that this use of the word "synonymous" is inaccurate, he does agree with Carps who, in discussing the validity of a speech rating form, points out: Agreement by experts is an accepted technique in es- tablishing validity and it is therefore plausible to maintain that validity and reliability may be derived from the same index of agreement among judges. Kelley6 has treated validity of rating scales simi- larly saying: If competent judges appraise Individual A as being as much better than Individual B as Individual B is better than Individual C, then it is so, and there is no higher authority to appeal to. Symonds7 in discussing the validity of ratings says, "In a certain sense there is nothing more valid than a judg- ment." He goes on to point out that all our knowledge has its origin in observation and in interpretations made of observations. 5Bernard Carp, A Study 9; the Influence pf Certain Personal Factors pp p Speech Judgment. New Rochelle, New York: The Little Print, 1945, p. 113. 6Truman Lee Kelley, The Influence p: Nurtppe Upon Individu 1 Differences. New York: The Macmillan Company, 1923, p- 9 7Symonds, pp. ppp., p. 108. 51 In the present study, since the student and faculty evaluators had discussed and agreed upon the speaking skills being evaluated, their ratings of the speakers were used as the criterion. Hence the validity of these ratings is deter- mined by inference from the reliability of the ratings. However, a second method, that of comparison with some other measure of speaking Skill, was possible. All of the speakers were members of Fundamentalp pf Speech classes at the time that they participated in the intergroup Speech pro- jects. At the end of the course each student was given a mark (A, B, C, D, or E) by his speech instructor. This mark was to be regarded as indicative of the student's speech effectiveness in general. Each student also was given a let- ter mark derived from the total score received by adding the ratings given by the four evaluators. By using the method of random sampling,8 a comparison was made of the marks that the students received in the intergroup speech projects with the marks they received for general effectiveness of speech. As indicated in Table I, eighty-four per cent of the students received the same mark from their speech teacher as from the evaluator. Thus they disagreed on the marks of only sixteen per cent of the students. Of this sixteen per cent 8The method of random sampling used was that described by Everett F. Lindquist in "The Technique of Random Selec- tion," Statistical Analysis pp Educational Research, Boston: lHoughton Mifflin Company, 1940, pp. 24-29. 52 TABLE I A COMPARISON OF MARKS GIVEN BY EVALUATORS WITH MARKS GIVEN THE SAME STUDENTS BY TEACHERS Mark given Mark given Percentage by Evaluator by Teacher of Students of Speech of Speech with These Project Class Marks Students with A A 2 Identical B B 24 Marks from C C 38 both Eval- D D 19 uator and E E _1_ Teacher Total 84 Students who were A B 0 Marked B C 2 Lower by C D l the Teacher D E _;_ than by Evaluator Total 4 Students who were Marked B A 1 Lower by C B 3 the Eval- D C 7 uator than E D _;_ by the Teacher Total 12 NCDte: In no case did the mark given by the evaluators of 1319 intergroup speech project differ two degrees (i.e., A tC> C or C to A, etc.) from the mark given by the teacher. 53 the speech teacher gave four per cent of the students lower marks and twelve per cent of the students higher marks than did the evaluators of the intergroup speech projects. Reliability. Reliability may be expressed by the ex- tent to which two independent measurements will yield the same quantitative score. In the present study it was assumed that if the Speech rating scale is a trustworthy device it should give approximately the same results when employed by evaluators having a certain minimum background of speech courses. Hence a calculation of the coefficient of correla- tion of the rating by pairs of judges Should furnish an index of reliability of the scale. In order to determine this coefficient of reliability the writer computed the correlation of the scores given by each pair of judges who rated a group of speakers. This cor- relation was extended by the use of the Spearman-Brown formula to include all judges.9 Since no machine exists for measuring speaking skills, any evaluative system involves some sort of human fallibility. This is substantiated by Shen:10 9Guilrord, pp. pip., p. 421. Also: E. L. Clark, "Spearman-Brown Formula Applied to Ratings of Personality frraits," Journal pf Educationpl Ps holo , October, 1935. pp. 552-55° 10Eugene Shen, "The Reliability Coefficient of Per- S<1na1 Ratings," Journal p; Educational Psychology, 163232, April, 1925. 54 The reliability of mental tests is usually measured by correlation between results of two comparable tests. By analogy, the reliability of personal ratings may be evaluated by a correlation between ratings by two comparable judges. By pairing elements of two tests such that they are similar in difficulty and type, an author can to a certain extent insure the comparability between his tests. But the compar- ability of judges is much more precarious; it is entirely beyond the control of the investigator except by a meager selective function that he may fallibly exercise. On account of this uncontrollable variability of judges, a correlation between two judges is a very crude approxima- tion of the reliability of either. The reliability of a judge thus crudely evaluated often varies considerably according to the judge with whom he happens to be correla- ted. In this study the coefficient of reliability, when correlating ratings given by student evaluators with those given by faculty evaluators, was .61 for the first semester and .62 for the second semester. However, when additional evaluators participated, the coefficient of reliability was .68 for the student evaluators and .72 for the faculty eval- uators. This, according to Slawson,ll indicates very high reliability for the use of a scale rating personal traits. B. The Distribution pf Scores *‘ A study of the distribution of scores is not only essential in order to measure the speech proficiency and llJohn Slawson, "The Reliability of Judgments of .Personal Traits," Journal 9; Applied PS cholo , 6:161-71, April, 1922. Also Symonds, pp. cit., p.95 55 degree of improvement made, but also worthwhile to provide background material for understanding the factors affecting these scores. The scores of the speakers in each of the four inter- group speech projects were treated separately. Since the rat- ing scale was constructed with a hundred points maximum, the highest possible total score that could be given to a student by adding the four evaluations given in any intergroup speech project would be four hundred. Actually during the 1952-53 academic year no one received a score over 371 or under 115. The range, mean, and standard deviation of the total scores in each of the intergroup speech projects may be seen in Tables II and III. Since Fundamentals pf Speech is a required course for all freshmen on the campus of Central Michigan College of Education, academic grades for each intergroup speech project 12 were computed according to institutional policy. This was done by plotting a curve and assigning marks as follows: (1) "C's" were given to all of the scores within the range of a point one-half standard deviation below the mean and a point one-half standard deviation above the mean. 12Central Michigan College of Education, Faculty Handbook; A Summary pf the More Important Policies, Regula— tions, and Procedures. Mt. Pleasant, Michigan, 1953, p. 58. TABLE II 56 RANGE, MEAN, AND STANDARD DEVIATION OF TOTAL SCORES RECEIVED BY STUDENTS PARTICIPATING IN THE INTERGROUP SPEECH PROJECTS GROUP 1* 1952-53 First Inter- group Speech Project Range 166-348 Mean 259 Standard Deviation 36 Second Inter- group Speech Project 181-371 287 35 *Group I is comprised of the 305 students who partic- ipated in the intergroup speech projects first semes- ter. 57 TABLE III RANGE, MEAN, AND STANDARD DEVIATION OF TOTAL SCORES RECEIVED BY STUDENTS PARTICIPATING IN THE INTERGROUP SPEECH PROJECTS GROUP 11* 1952-53 First Inter- Second Inter- group Speech group Speech Project Project Range 115-334 200-355 Mean 257 279 Standard Deviation 28.5 27.8 *Group II is comprised of the 244 students who par- ticipated in the intergroup speech projects second semester. 58 (2) "B's" were given to all of the scores between one half and one and a half plus standard deviations from the mean. (3) "D's" were given to all of the scores between one half and one and a half minus standard deviations from the mean. (4) "A's" were given to all of the scores on the plus end of the curve beyond one and a half standard deviations from the mean. (5) "E's" were given to all of the scores on the minus end of the curve beyond one and a half standard deviations from the mean. In a normal bell shaped curve this method would mean 38.30 per cent "C's," 24.17 per cent each for the "B's" and "D's," and 6.68 per cent each for the "A's" and the "E's."13 Qpppp I. (First Semester Students). In order to avoid confusion of terms "first semester" and "second semes- ter" with "first intergroup speech project" and "second in- tergroup speech project," first semester students will be referred to as Group I and second semester students as Group II. As indicated earlier in this study, Group I as well as Group II had two intergroup speech projects. There are three 13Harry‘W. Sundwall, "Normal Curve Score Probabili- ties." East Lansing: Michigan State College, 1950. (Mumeographed). 59 significant facts to notice regarding the distribution of scores of this Group I. (1) From Table II it may be noticed that the range of scores in the second intergroup speech project, 181-371, was greater on both the lower and upper level than the range of scores in the first intergroup Speech project. The mean rose from 259 in the first intergroup speech project to 287 in the second intergroup speech project with a mean improvement of 28 points per student. By applying the "t test" this was found to be statistically significant at the .01 level of confidence.14 0n the basis of this calculation the differ- ence of 28 points between these mean gains could happen by chance less than once in a hundred times. (2) Inspection of Table IV shows that 305 students in Fundamentals g§_Speech class participated in all of the per- formances in the experiment during the first semester. As indicated in this table, 80.7 per cent of these received a higher score while 19.3 per cent either received the same or a lower score in the second intergroup speech project. The greatest gain made by a student was 115 points while the greatest loss was 54 points. (3) Table V demonstrates that it was necessary for a student to have a higher score in the second intergroup l4Garrett, pp. cit., pp. 189—93. TABLE IV DIFFERENCES BETWEEN FIRST AND SECOND INTERGROUP SPEECH PROJECT SCORES GROUP I Point Differences Number of Per Cent between Scores in Students of the First and Second Making This Total Speech Project Gain (or Loss) Group 111 through 120 101 H H I! II N N 1| N n tl " 110 100 go 0 7O 6O 50 4O 3O 2O 10 Total with higher scores in second speech project 246 - 9 -29 -39 -49 -59 through I! H N N H O -10 -2O -30 -4O -50 28 13 9 6 O __3 Total with lower scores in second speech project 59 O O O O HNUINOWUJOOl—‘O O O O H-F-FF-‘(DNOQCfiVIWOUU HHHHH 00 O V l—‘OHme OO\O\OLAJO\ l-‘ \O O U) 60 61 TABLE V TOTAL SCORES AND ACADEMIC RATINGS RECEIVED BY STUDENTS PARTICIPATING IN THE INTERGROUP SPEECH PROJECTS GROUP I Academic First Inter- Second Inter— Rating group Speech group Speech Given Project Project A 331 or above 345 or above B 278 - 330 304 - 353 C 241 - 277 270 - 303 D 187 - 240 220 - 269 E 186 or below 219 or below 62 speech project in order to receive the Same mark as was given in the first intergroup speech project. This follows because grades were determined on the basis of the normal curve and the overall group improved. Thus: (a) Scores between 331 and 344, which were equal to "A's" in the first project, were valued as "B's" in the second. (b) Scores of 278-303, which were "B's" in the first project, were "C's" in the second. (c) Scores in the "C" range in the first project, 241-269, were given "D" grades in the second project. (d) Scores of 187 to 219 that had been "D" scores became "E's." This was true because, as indicated earlier in this chapter, the grades were allotted according to standard deviations from the mean; and the mean of the second intergroup speech project was 28 points higher than the first. 0 Qpppp II. (Second Semester Students). The findings that were evidenced by the data on Group I were substantially the same for Group II. They are as follows: (1) There was a statistically significant improvement shown in the total scores received by Group II speakers in the second intergroup speech project when compared with the scores received in the first project. By comparing data in Table II with that in Table III one can see that the mean 63 score of the first intergroup speech project of Group II was 257 compared with 259, the mean score of Group I. The mean score of the second intergroup Speech project of Group II was 279 compared to Group 1'5 287. Thus the mean improvement in points was 28 for Group II and 22 for Group I. Furthermore, it may also be noted from Table III that the range of scores was 115-334 in the first project and 200-355 in the second project. The same phenomena occurred with both groups, namely, the scores in the second intergroup speech projects were consistently higher than those in the first intergroup speech projects. (2) As presented in Table VI, 195 (79 per cent) of the 244 Group II speakers who participated in the experiment had a higher score in the second project than they had in the first. This 79 per cent is comparable to the 80.7 per cent of Group I students who also made higher scores in the second projects. (3) The results in Group II (presented in Table VII) were similar to those found in Group I, i.e., the over-all improvement gains necessitated higher scores by the speaker in the second intergroup speech project in order to maintain the same letter mark he earned in the first intergroup speech project. C. Findingp Regarding the Four Factors The academic speech training pf the evaluator. There was a statistically Significant difference between the rating TABLE VI DIFFERENCES BETWEEN FIRST AND SECOND INTERGROUP SPEECH PROJECT SCORES GROUP II Point Differences Number of Per Cent between Scores in Students of the First and Second Making This Total Speech Project Gain (or Loss) Group 1 through 100 2 0.8 l " go 1 0.4 71 " 0 5 1.6 61 " 70 5 1.6 51 " 60 15 6.1 41 " 5O 25 10.3 31 " 4O 31 13.1 21 " 30 33 1§.5 11 " 2O 44 1 .0 l " 10 .3& 1312 Total with higher scores in second speech project 195 79.0 — 9 " 0 21 8.6 '19 II “‘10 15 6 01 —29 " -20 7 2.9 -39 " -30 4 1.6 -49 " -4O __2 0.4 Total with lower scores in second speech project 49 21.0 65 TABLE VII TOTAL SCORES AND ACADEMIC RATINGS RECEIVED BY STUDENTS PARTICIPATING IN THE INTERGROUP SPEECH PROJECTS GROUP II 1952-53 Academic First Inter- Second Inter- Rating group Speech group Speech Given Project Project A 315 or above 355 or above B 270 - 314 294 - 334 C 243 - 269 265 - 293 D 200 - 242 225 - 264 E 199 or below 224 or below 66 given by students, who were in either their third or fourth year of college, and the ratings given by faculty members with advanced degrees. This is indicated in Table VIII. In rating 305 speakers (called "Group I" in this study) during the first semester of the 1952-53 academic year, the faculty evaluators gave a mean score of 62.08. The student evaluators, hearing the same speeches in the same room at the same time as the faculty evaluators, gave a mean score of 68.94. Thus the students' ratings averaged 11.1 per cent higher than the instructors' ratings. This is sta- tistically significant at the five per cent level of con- fidence. In other words, this phenomenon of student evalua- tors rating 11.1 per cent higher than the faculty evaluators could happen by chance only five in a hundred times.15 In rating the 244 speakers (called "Group II" in this study) during the second semester of the 1952-53 academic year, the faculty evaluators gave a mean score of 58.89 while the student evaluators gave a mean score of 69.95. Thus, the students' ratings averaged 19.9 per cent higher than the instructors' ratings. This is statistically Significant at the one per cent level of confidence. The acquaintanceship pf the evalgator with the speakep. Each of the twenty-one pairs of evaluators participating in 15Everett Franklin Lindquist, Statistical Analysis pp Educational Research. Boston: Houghton Mifflin Company, 1940. p. 72. TABLE VIII A COMPARISON OF THE MEAN SCORES GIVEN BY FACULTY AND STUDENT EVALUATORS RATING THE SAME SPEAKERS Group I* Mean Score Given by Student Evaluators Mean Score Given by Faculty Evaluators Difference between Student and Faculty Rating Group I;** Mean Score Given by Student Evaluators Mean Score Given by Faculty Evaluators Difference between Student and Faculty Rating 69.95 58.89 11.06 *N **N 305 244 67 68 the fourth intergroup speech project, May 11-15, filled out an Evaluator's Acquaintanceship Check Sheet.16 However, only eight cases occurred where one speaker was well known by a faculty evaluator and not known by a student evaluator while another Speaker in the same group was well known by this stu- dent evaluator and not known by the faculty evaluator. These eight cases are presented in Table IX. It can be seen here that the students who knew the speakers gave a mean score two points higher than did the faculty members who did not know the speakers. However, when these same faculty members knew a speaker, they gave him a mean score seven points higher than did the student evaluators who were unacquainted with these speakers. Thus in both cases evaluators who knew the speakers rated them higher than did evaluators who did not know the Speakers. This was especially true of faculty mem- bers. FUrthermore, as indicated in Table I speech teachers gave the same mark to eighty-five per cent of their students as was given by the evaluators. However, when these marks did differ, the teacher gave a higher rating than did the evaluators in three of every four cases. Since the teacher was acquainted with all of his students and the evaluators were acquainted with only about ten per cent of these speak- ers, one may assume that acquaintanceship was a positive 16See Appendix J. 69 .No. we maoxmmmm on» Socx pom ow on: enema can mummmmmm one Roux one mnouwsaw>m may moospon “av downwaonaoo Ho psoHOHhmmoo anew m m NI: mm mm mu: mm mm «Heuaao m Hm mm mu mm mm mmanmmo m we om m mm Hm Hemnmeo NH mm mm Ha- am on mmeuaeo ma mm mm mu me me mmcuauo ma- em we m mm mm moaimeo can mm mm a Hm mm oeeummo m we 45 e no He mmmnaeo umpcfiwswow mnmxmomm mnoxmomm wopufimsuow mnmxmoqm whoxmmgm mnopmsam>m lab madam 302M poz Scam nab mscafi Roam poz Roam Ho mama condemnaow om mummuzpm haddowm nopqfim5do< on spaswmflllmmmmmmwm you ocoo mononommfia "Coma Go>fiw monoom oodonwmmam ”awn; mmefio monoom mmmwmmm mma 39H; QMHZH¢DGU«ZD mH moadqu>m Bzmmbam mma Q24 amaszboow mH mOHm qubodm was 2mm; zm>Ho mmmoom Mme mHHS ammdmsoo mmmm Magbogm Ema Gad QmBZHm Hzmmbam mma zmm; zm>Ho mmmoom NH mqmwa 7O factor in securing a higher rating. Tpp experience pf the evaluator with the rating scale. In order to determine whether experience with the rating scale improved the reliability of the ratings, coefficients of correlation were computed on (1) the scores given by the raters when they first used the rating scale in October, and (2) the scores given by the same raters when, after some ex- perience, they used the rating scale again in January. These coefficients of correlation of ratings given by pairs of judges in October were compared with those of ratings given by the same pairs of evaluators rating the same speakers three months later. As indicated in Table X, there was no evidence of a Significant difference between the experienced and inexperienced evaluators. Ten pairs of evaluators showed a mean increase in correlation of fourteen points. However, nine pairs of evaluators showed a mean decrease in correla- tion of fifteen points. The correlations of the ratings given by two pairs of evaluators remained substantially the same 0 Tpp pp; pf tpp evaluato; 2p relation tp thg pp; pf thp speaker. A comparison was made of the mean scores given female speakers and male speakers by four combinations of judges, i.e., (l) a male faculty member judging with a male student evaluator, (2) a female faculty member judging with a female student evaluator, (3) a male faculty member judging 71 .QOprHmupoo no pcmfioammooo esp opmcmflmmc on com: ma en: Honamm chew .mmwsmwh ma .mocooomm ucoaowmflo mmw>fim .mpGoUSPm mo agonw mean on» copay menu was nonopoo ca mnoxmoam mo macaw a couch one «panama anamomm mmo use pmovdpm oao .mnopmSHw>o no pawn a now mcdwpm Amppoa nommx ov. me. mo. em. no. ww. Ho. no. HO. OH. ma. He. Ho. mm. Hm. HH. aw. mm. ON. mm. 5H. ow. mm. Nw. *n ma omwmmoom *h QH mmwonocH mmnflpwm hawSGmh 9H0 *.H mmdapwm monopoo .HO *H Noooo Bm¢m< mmazofi mmmme mazmmbam mo abomo m2¢m mma GZHUQDW mmOB¢DA¢>m Neabodm 02d HZMQDHm Mm zm>HU mozHB \J'l #LJU N H CO (D'Q O\\J\ 105 EVALUATORS Richard Torongo Petrine Churchill Robert Beckley John Kirn Alice Wagner Jacqueline Robinson Marian Sanborn Herbert Sanford Kenneth Downing L. D. Foster Anita Hoag Betty Borman Robert Gravelle Virgil Scott Royal Riggs Jack Clary Bernard Randolph Jean Conklin Jack White Jean Caldwell Norma Levi Martha Fuce Patricia Thwaits Jean Detzur Carol Clark Donna Clapp Keith Birdsall (1) Females who judged both semesters (2) Males who judged both semesters (3) Females who judged second semester only (4) Males who judged second semester only LETTER TO SPEECH FACULTY Dear Colleague: APPENDIX C 106 Below is a list of the juniors and seniors who are Speech majors and minors and who are not on academic pro- bation. Intergroup Speech Projects. They are being considered as evaluators for the Will you (1) please put our initials after each student you have had in class; (2 add a question mark after anyone who you feel may be an incom- petent judge. MAJORS Richard Balwinski Betty Borman Jean Caldwell Shirley Clark Jack Clary Jean Conklin Jean Detzur Kenneth Downing Phyllis Eichhorn L. D. Foster Phyllis Gordon Lonna Rae Hall James Jaska Don Kemp John Kirn Norma Levi Joyce A. McNamara Josephine Nickora Richard Powell Jacqueline Robinson Marian M. Sanborn Carla Snow Neil Suomela Loyal Thornton John Trask Alice Wagner Jack White Emil R. Pfister MINORS Brian Beckley John Bilsky Keith Birdsall Dale Brown Petrine Churchill Carol Clark Donna Clapp Dale Edgerle Martha Fuce Robert Gravelle Clyde W. Hatter Anita Hoag Vivienne Jack Jean Klozik Betty LaLone James McLennan Lorna Lesnick Robert Lucas Sheila Maule Barbara Moore John Murchie Alma Nevins Raymond Page MINORS Roger Parrish James Prough Bernard Randolph Arthur Rice Royal Riggs Herbert Sanford Virgil Scott J.D. Shuttleworth Thomas Simpson Art Stinchcomb Patricia Thwaites Richard Torongo Paul Totzke Everett Vincent Russell Ward Mary Weber Jack Weir Joyce Wells David West Doris Whitcomb 107 APPENDIX D INVITATION TO STUDENT EVALUATORS Central Michigan College of Education Mount Pleasant, Michigan October 13, 1952 Dear , You have been recommended by members of the Department of Speech and Drama to serve as one of the Evaluators in the Intergroup Speech Projects this year. In these projects Freshmen in Speech 101 give three minute speeches for speech majors and minors to evaluate on a rating blank. We should appreciate it very much if you could serve at two or three of the possible thirteen times. If you are willing to cooperate in conducting this speech project, please answer on the enclosed preference blank. Please return this to the Speech Office (W261) no later than 5:00 p.m. this Friday, October 17. The speech majors and minors will hold two meetings next week, Thursday, October 23, in Keeler Dining Room C to discuss the rating blank to be used. One will be from 4:00 - 5:00 p.m.; the other from 6:30 — 7:30 p.m. Since the worth of the project depends so much upon the coop- eration and mutual understanding among the evaluators, you are requested to attend both of these meetings. Cordially yours, Wilbur E. Moore, Head Department of Speech and Drama 108 APPENDIX E STUDENT EVALUATOR'S PREFERENCE REPORT I am willing to serve as an evaluator at two of the Speech 101 Intergroup Projects. I will be avail- able at the following times: (Put "P" in two or three of the blanks below to indicate preference; "X" for those times that are im- possible.) 12:05-1:55 1:05-1:55 4:05-4:55 Eonday, Oct . 27 Tuesday, Oct. 28 Wednesday, Oct. 29 Thursday, Oct. 30 Friday,Oct. 31 (signed) Plegse geturn tp Speech Offtpp pp p; befoge Frid , October 12. APPENDIX F 109 LETTER ASSIGNING STUDENT EVALUATORS Department of Speech and Drama Central Michigan College of Education Mount Pleasant, Michigan October 20, 1952 Dear : Thank you for filling out and returning the Stu- dent Evaluator's Preference Report. Please report to the Speech Office, W261, at the following times to be available as an evaluator of Intergroup Speech Pro- jects: (1) Time: Date: (2) Time: Date: (3) Time: Date: You will probably be asked to evaluate only twice and be an alternate the third time. Thank you for the cooperation you have given us. Sincerely, Emil R. Pfister, Director Intergroup Speech Projects 110 APPENDIX G SPEAKER'S PREFERENCE SHEET FOR INTERGROUP SPEECH PROJECT Name Instructor Class time and days Please mark with "P" the two dates you prefer to give your speech for the Intergroup Speech Project. Mark with an "X" the times that you are unable to participate. 12:05-l:55 1:05-1:55 4:05-4:55 Monday, Oct. 27 A Tuesday, Oct. 28 Wednesday, Oct. 29 Thursday, Oct. 30 111 APPENDIX H INTERPRETATION OF RATING SCALE CRITERIA Below is a list of the specific questions agreed upon by the evaluators. These may make for a better interpretation of the criteria used in the "Evaluator's Rating Scale for the Intergroup Speech Project." I. II. THOUGHT (A) Content 1. Is there enough material to cover the subject well? 2. Is subject and the material interesting enough to hold the attention of the listener? 3. Is the information accurate? (B) Organization 1. Is there an adequate introduction? 2. Is the body of the speech well planned? 3. Is there continuity? 4. Is there an adequate conclusion? LANGUAGE (A) Vocabulary (B) 1. Is pronunciation correct? 2. Is there an adequate variety? 3. Are words used correctly? 4. Is there an overuse of slang? 5. Is vocabulary suitable to audience? Sentence Structure 1. Are sentences grammatically correct? 2. Is there an adequate variety of sentences? 3. Is there an overuse of the word "and"? III. IV. 112 APPENDIX H (CONCLUDED) VOICE (A) Enunciation 1. Can you understand the speaker? 2. Does the speaker slur his words? 3. Is the speaker over-articulate? (B) Adequacy . Does the speaker speak too loudly or too softly? 2. Is the voice pleasant? 3. Is there enough variety? 4. Is there good timing and use of pauses? ACTION (A) Posture 1. Does he lean against anything? 2. Are his hands and feet in a comfortable, natural position? 3. Does he hold his head properly? (B) Gesture 1. Are the speaker's actions distracting? 2. Are the Speaker's actions suitable and meaningful? 3. Is there either insufficient or too much action? I10. 113 APPENDIX I INSTRUCTIONS TO EVALUATORS OF THE INTERGROUP SPEECH PROJECT Do all judging independently. Select seats near the center of the room. Try to get the meeting started as soon as possible. There will be a few participants who are on leave from other classes so please permit them to give their speeches at the beginning of the session. Please furnish all information called for at the top of the rating scale. All speeches are to be limited to three minutes. Do not permit any contestant to exceed this time limit. If you have failed to complete the rating on a Speaker by the time he has concluded, complete the scale before calling another speaker. Be impersonal in your judgment. Do not let personalities temper your evaluation. Rate the speakers on all ten items on the rating scale, and total the ratings. You can aid the speakers by being a considerate judge. Try to refrain from showing disagreement, disgust, or disinterest in the speeches. Be an attentive listener. Above all, remember that these meetings have been arranged to furnish additional speaking experience for the Speech Fundamentals students. For that reason, it is imperative that you do the best job of evaluating that you are capable of rendering. Do your judging fairly and conscientiously. At the end of the session return the rating scales to the Speech Office (W261). 114 APPENDIX J EVALUATOR'S ACQUAINTANCESHIP CHECK SHEET This speech: Date Time Room Name of Speaker Please circle the number below which is the most nearly correct answer: I. To what extent do you know the speaker? 1 2 3 4 5 Not at all Casually Moderately Very well Intimately II. I converse with this person: 1 2 3 4 5 Never About once Once a Once a Almost a semester month week daily Evaluator's Signature 115 APPENDIX K SPEAKER'S ACQUAINTANCESHIP CHECK SHEET This speech: Date Time Room Name of Evaluator Please circle the number below which pp the most nearly correct answer: I. To what extent do you know the evaluator? 1 2 3 4 5 Not at all Casually Moderately Very well Intimately II. I converse with this person: 1 2 3 4 5 Never About once Once a Once a Almost a semester month week daily Speaker's Signature 116 «:2 0:: 8 2 a... 2 2 a : 2 2 £2 2 3 m E 3 3 31.3% ammwmwmmwmammwmmmmwm Bogzhnfimhzahzlgsyeflieinsteaala _ 823232.122 :52235333335 NNNNNNNNNANNNNNNNNNN Bahzghznhznfia.gemosawsnoznaulw ___—_—___—____—_____ 9222223222288383383; aaaaaaaaaaaaaaaaaaaa 732229. 2: 2252333333333.333533333.233333933333.33Baflfinnnn—ndanfloufiunflvnnaa ommwmmwmmmommuaomomm.mmmmmmmmommmmmmammmam immmmmmmmmmmmmmmmm oommmmsnwmonvnnnmjp360932333 v9.13mmxmfimmnnvnnmmr Eommwmmfigflaannfi $9323,“ 3m... oommmnnm3mn3nn~n$ NNNNNNNNNN 3333333333 fi______—__ 3333333333 cceoocceca rah-pnnfimbznsfi52833833333 Easssmxaa; _ Savawnv'nvvvng; mmmnmnnnmn 893339333; NNNNNNNNNN —_____.___ 333339393398838333fi.n szagnaaxzai 9333.381. mug; mmmmmmnnmn 9%33393333 vvqcvvvvvv Snnzxnxawa mannnmnnnm Son-nunonnnxnnnun; NN---~N ______——._ 3933'9399; 39333939:SuvmngsnflflxuflfiS aoccceocococceoacoca lemmaseesawwaawmammamawmmaaaaemaeaaaaeaaa i3 8 3 : S 9 3 9 2 : ----~i-NN--~N~snshm--:RN--------ms--fiss~h Balmstwnmhizfi 1:3 monoamoanwwwaorm 53mm3$m 31.3% 3.?93399399 3.92.33»meva 333333333232 mmmmmmom¢m:owm mmmmmccmmmmomowmmmommmomoammomammmm 322333: 2N2 ZimuS “33333—333 333333?3333339373: mammmmmnmm:mmmmmmmmmmmmmmmmmm Emnmmmmmmm 3m.2:o~mn:22 Fobcsgficmouwwonoww £39333?!an 33933393991 quva¢vHm nommmm ccoomm can :o mpr* M Una h moxwocoagd mom mIH *nopoww aHnmoocmchwsuo< mm: m Hmnom mQEsHoo mo oawm no: son; ¢mmIHoo *AmQSSG mooo m.om©:h mmtmm mmuwm «menace now; mananwasoo mmuoa *onoom proa wmumm uNIwH mGSSHoo sza mHnwnmoaoo ONIQH *mwnmpHno m>Hw mzp do monoom Nmnmo uHIw massHoo :pHa oHnwnwmaoo OHIHo *mpwwnp sop mop do monoom qumm m xficcmag« mmm ¢mmuaoo ampssq muoo m.mmusa mm-om so>Hm mpma m.OOH oz mmIOH whoom pros mNumN < xHocmamd mom ONIOH mempHno o>Hw mnu no monoom uNlmH ¢ NHonao< mom OHIHo muwmnp cop map so monoom mHI mGHoHHsn mama onp :H HH< mIH am>Hm mm; nommqm muons aoom m .opo .oouH now N “Goo: NH now H mIH Go>Hw mas noomam pomp oSHa o .opm ..mosa now N m.:oz now H mIH Go>Hw was Summon pan» wan m uncomm now No mpmnww now Ho ONIHo noopo mqwxwmam vim on5 now N mmHmamw pow H NIH nmxmem wo xmm N ocoomm now N mpmnww pow H NIH am>wm mm: nooomm nopmmamm H mcoo Amnssz ammo so ammo :o ooonooom ammo on» wo omzoqsm meom wcwpwm smH no GOpranmxm muonssz mop Eopm wpwm :EsHoo ZOHB¢HDm