A STUDY OF THE RNFLUENCE OF CERTAIN
SELECTED FACTORS ON THE RATINGS OF

SPEECH PERFORMANCES

Thais far Hm Beam of Ed. D.
MECHIGAN‘ STATE COLLEGE
Em?! R. Miss?»
T955

This is to certify that the
thesis entitled

A Study of the Influence of Certain

Selected Factors on the Ratings of Speech Performances

presented by
Emil R. Pfister

has been accepted towards fulfillment
of the requirements for

Ed.D. Higher Education

degree in

Major professor

 

0-169

 

 

M

J 3'
> \
.‘ D -

 

 

A STUDY OF THE INFLUENCE OF CERTAIN
SELECTED FACTORS ON THE RATINGS
OF SPEECH PERFORMANCES

By 4

{.1
(z «.4

Q"
Emil R? Pfister
m

A THESIS

Submitted to the School of Graduate Studies of Michigan
State College of Agriculture and Applied Science
in partial fulfillment of the requirements
for the degree of

DOCTOR OF EDUCATION

School of Education
1955

TN 115;“

 

 

ACKNOWLEDGMENTS

The writer wishes to express his appreciation
to the many people who helped him throughout the de-
velopment of this study.

. The encouragement and guidance came primarily
from Dr. Milosh muntyan, Dr. Wilson B. Paul, and Dr.
Barry W. Sundwall. Their cooperation, sympathetic
attitgde, and helpful suggestions are deeply appre-
c ate .

Sincere appreciation also should be given to
the Juniors, seniors and faculty of the Department
of Speech and Drama, Central Michigan College of
Education, who cooperated in the collection of data
during the 1952-53 academic year.

Furthermore, acknowledgment is given to those
of the staff of Michigan State College and the writer's
colleagues at Central Michigan College of Education,
especially Dr. Wilbur Moore and Dr. Karl Pratt, who
were kind enough to answer questions and offer sug-
gestions during the progress of the investigation.

Invaluable help in setting the data up so that
it could be tabulated by IBM equipment was given the
writer by Dr. Willard Warrington of the Board of
Examiners and Mr. Frank Martin, Supervisor of Tabulation,
Michigan State College.

This dissertation is dedicated to my wife,
Frances Pfister, whose patience, confidence, and de-
votion have fostered the continuous concentration
necessary to the successful completion of this study.

{3.653586

VITA

Emil Robert Pfister
candidate for the degree of

Doctor of Education

Final Examination: May 6, 1955

Thesis: A Study of the Influence of Certain Selected
Factors on the Ratings of Speech Performances.

Outline of Studies: Major subject -- Higher Education
Cognate Field -- Speech

Biographical Items:
Born: January 3, 1913, Chicago, Illinois

Undergraduate Studies: Central Michigan College
of Education, 1931-35

Graduate Studies: Master of Arts, University of
Michigan, 1939
Columbia University, Summer, 1 47
Denver University, Summer, 194
Michigan State College, 1949-55

Experience: Speech Teacher and High School Principal,
Kingston, Michigan, 1935-40
High School Principal and Director of
Debate, Clare, Michigan, 1940-45
Assistant Professor of Speech, Central
Michigan College of Education, Mount
Pleasant, Michigan, 1945-53
Associate Professor of Speech, Central
Michigan College of Education, Mount
Pleasant, Michigan, 1953-

Member of: Pi Kappa Delta; Kappa Delta Pi; Speech
Association of America; American Associa—
tion of University Professors; National
Society for the Study of Communication;
American Forensic Association; National
Education Association

A STUDY OF THE INFLUENCE OF CERTAIN
SELECTED FACTORS ON THE RATINGS
OF SPEECH PERFORMANCES

BY."

c"\“
(I

Emil RSQPfister

AN ABSTRACT

Submitted to the School of Graduate Studies of Michigan
State College of Agriculture and Applied Science
in partial fulfillment of the requirements
for the degree of

DOCTOR OF EDUCATION

School of Education
1955

Approved W M M

Emil R. Pfister
Thesis Abstract

This study was designed to determine whether any sta—
tistically significant relationships existed between the
ratings given by Speech evaluators and (1) their academic
speech training, (2) their acquaintanceship with the speaker,
(3) their experience with the rating scale, and (4) their
sex in relation to the sex of the speaker.

The five hundred and forty-nine speakers who partici—
pated in this project were freshmen enrolled in Fundamentals
9; Speech classes at Central Michigan College of Education
during the 1952—53 academic year. The fifty—five evaluators
(speech faculty members and juniors and seniors who were
speech majors or minors) compiled a total of 4392 ratings.
Precautions were taken and controls were employed with re—
spect to Speaker, speech, audience, and occasion with a view
toward making these ratings comparable.

The Evaluator's Rating Scale devised for this study
employed ten criteria based on a study of existing Speech
rating instruments. Appropriate tests of reliability and
validity were made. All of the data obtained from these rat—

ing scales were transferred to punch cards which permitted

sorting and tabulating by IBM methods.

2
Emil R. Pfister
(Thesis Abstract Continued)

The data were analyzed by appropriate procedures to
discover the role played by each of the four selected factors
under investigation. Differences of the means were computed
for groups that were comparable in all respects except the
factor being studied. The "t test" for significance of the
difference of the means was applied and coefficients of
correlation were computed.

The findings of this research led to the conclusion
that the academic speech training of the evaluator influences
his ratings. Undergraduate evaluators with majors or minors
in speech gave significantly higher ratings than did evalua-
tors with advanced degrees in speech. Furthermore, scores
given by pairs of undergraduate evaluators had a higher cor-
relation than did scores given by undergraduate-graduate
pairs of evaluators. Pairs of evaluators with advanced de—
grees in speech had the highest correlation.

The investigation, in itself, provided inconclusive
results with respect to the influence of acquaintanceship on
the ratings of speech performances. However, the results of
this study tend to substantiate the findings of previous
research, i.e., that evaluators who are acquainted with the
speakers give them higher ratings than do evaluators who are

unacquainted with these speakers.

3
Emil R. Pfister

(Thesis Abstract Concluded)

In this particular study the experience of the evalua-
tor with the rating scale employed was found to have no sig-
nificant influence upon the scores given. However, all the
evaluators had a certain minimum of speech training and had
rated speeches previously.

The literature and data of this study support the con-
tention that male and female evaluators rate male and female
speakers differently:

(1) Female student evaluators gave higher ratings to
both male and female speakers than did male student evalua-
tors.

(2) Female student evaluators gave higher ratings to
male speakers than they gave to female speakers.

(3) Male student evaluators gave higher ratings to

female speakers than they gave to male speakers.

TABLE OF CONTENTS

CHAPTER

I.

II.

III.

INTRODUCTION . . . . . . . . . . .
A. The Problem

Statement of the problem . .
Importance of the study . .

B. Definition of Terms

Selected factors . . . . . .
Influence . . . . . . . . .
Ratings . . . .
Intergroup speech project

Fundarentals of Speech . . .

C. Summary . . . . . . . . . . .

REVIEW OF THE LITERATURE . . . . .

A. Speech Rating in General . . .
B. The Four Factors

The academic speech training

of the evaluator . . . .

The acquaintanceship of the
evaluator with the speaker

The experience of the evaluator

with the rating scale . .

The sex of the evaluator in relation

to the sex of the speaker
C. Summary...........

PROCEDURE 0 O O O O O O O O O O O

A. Devising the Evaluator's Rating Scale

B. Collecting the Data

Scheduling the evaluators .
Preparing the speakers . . .
Preparing the evaluators . .

Description of the experimental setting

Additional evaluators . . .

Checking rater—speaker acquaintance

PAGE

aw

\O (DCDCD'QVI

IO
11

NNNNHHp—J
\OOOWOCDVL»

-¥>- bwwww
|-‘ O\OU1 kw

CHAPTER

IV.

C.

D.

Tabulating the Data

Mechanical tabulation
Organizing the data .
Statistical procedure . . . . . . .

summary 0 O O O O O O O O O O O O O 0

ANALYSIS OF THE DATA . . . . . . . . . . .

A.

D.

The Rating Scale

validity I O O O O O O O O O O I O 0
Reliability 0 O O O O O O O O O O O

The Distribution of Scores

Group I (First Semester Students) .
Group II (Second Semester Students)

Findings Regarding the Four Factors

The academic speech training
of the evaluator . . . . . . .
The acquaintanceship of the
evaluator with the speaker . . . .
The experience of the evaluator
with the rating scale . . . . .
The sex of the evaluator in relation
to the sex of the speaker . . . .

sumary O O O O O O O O O O C O O O 0

CONCLUSIONS AND RECOMMENDATIONS . . . . .

A.

Principal Findings

The academic speech training
of the evaluator . . . . . . .
The acquaintanceship of the
evaluator with the speaker . . . .
The experience of the evaluator
with the rating scale . . . .
The sex of the evaluator in relation
to the sex of the speaker . . . .

PAGE

42
42
44
46

48

49
53

58

63
66
7O
7O
77
79

80
80
81
82

|lllllllllllllllllllllll

 

 

CHAPTER PAGE
l B. Educational Implications
Significance for Central Michigan
College of Education . . . . . . . . . . . 83
Significance for education
in general . . . . . . . . . . . . . . . . 85
C. Suggestions for Further Study . . . . . . . . 86
D O Smary O O O O O O O O O O O O O O O O O O O 88
BIBLIOGRAPHY O O O O O O O O O O O O O O O O O O O O O O 90

APPENDIX 0 O O O O O O O O O O O O O O O O O O O O O O O 103

LIST OF TABLES

TABLE PAGE
I. A Comparison of Marks Given by Evaluators with
Marks Given the Same Students by Teachers . . . . 52
II. Range, Mean, and Standard Deviation of Total
Scores Received by Students Participating in
the Intergroup Speech Projects (Group I). . . . . 56
III. Range, Mean, and Standard Deviation of Total
Scores Received by Students Participating in
the Intergroup Speech Projects (Group II) . . . . 57
IV. Differences Between First and Second Intergroup
Speech Project Scores (Group I) . . . . . . . . . 60
V. Total Scores and Academic Ratings Received by
Students Participating in the Intergroup
Speech Projects (Group I) . . . . . . . . . . . . 61
VI. Differences Between First and Second Intergroup
Speech Project Scores (Group II). . . . . . . . . 64
VII. Total Scores and Academic Ratings Received by
Students Participating in the Intergroup
Speech Projects (Group II). . . . . . . . . . . . 65
VIII. A Comparison of the Mean Scores Given by Faculty
and Student EvaluatorS‘Rating the Same

Speakers 0 O O O O O O O O O O O 0 O O O 0 O O O 67

TABLE PAGE
IX. Scores Given when the Student Evaluator is
Acquainted and the Faculty Evaluator is Un-
acquainted with the Speaker Compared with the
Scores Given when the Faculty Evaluator is
Acquainted and the Student Evaluator is Un-
acquainted with the Speaker . . . . . . . . . . . 69
X. A Comparison of the Correlation of Ratings Given
by Student and Faculty Evaluators Judging the
Same Group of Students Three Months Apart . . . . 71
XI. A Comparison of the Mean Scores Given by Faculty
and Student Evaluators Rating Female Speakers
(Group I) . . . . . . . . . . . . . . . . . . . . 73
XII. A Comparison of the Mean Scores Given by Faculty
and Student Evaluators Rating the Male
Speakers (Group I). . . . . . . . . . . . . . . . 74
XIII. A Comparison of the Mean Scores Given by Faculty
and Student Evaluators Rating Female Speakers
(Group II). . . . . . . . . . . . . . . . . . . . 75
XIV. A Comparison of the Mean Scores Given by Faculty
and Student Evaluators Rating Male Speakers
(Group II). . . . . . . . . . . . . . . . . . . . 76

CHAPTER I
INTRODUCTION

The speech teacher cannot escape responsibility
for the evaluation of the oral performance of his students.
His educational philosophy may make academic marks seem
undesirable, or he may be disturbed by the influence of
subjective factors which impair the validity of such evalua-
tion. Nevertheless, the practical necessities of the learn—
ing situation, as well as customary institutional procedures,
require that he evaluate the speech competency of his
students.

Research has shown that students believe that rating
speeches is of primary importance in a Fundamentals pf
Speech course. Graunkel, who administered student judgment
questionnaires to 1,024 Fundamentals pf Speech students in
four different universities, secured data showing that oral
work was consistently judged by students to be of more value

than reading assignments and written work.

 

lDean F. Graunke, "The Use of Student Opinion in the
Improvement of Instruction in Speech Fundamentals," (unpub-
lished Master's thesis, The University of Nebraska, Lincoln,

1951), pp. 94-126.

Reid2, in discussing computation of final grades for
beginning speech classes, advocates giving two or three times
as much weight to oral work as to written work.

Hollister3 points out the need for good judging when
he says:

The question of judging contests is one of importance

to every teacher of public speaking, for it influences
his faith in contests, his spirit in classroom work, and
the tone of public speaking in the school.

Evaluation of speech must of necessity involve some

4 pointed out,

subjective judgments. Nevertheless, as Pelsma
every attempt should be made to improve the fairness and
accuracy of these judgments since they are used as a basis
for instructing and guiding the student as well as determin—
ing his status.

The crucial importance of accurate evaluation of the

student is illustrated by a recent regulation of Central
Michigan College of Education5 regarding the demonstration

 

2Leon D. Reid, Teaching Speech in High School.
Columbia, Missouri: Artcraft Press, 1955, p. 191.

3R. D. T. Hollister, "Faculty Judging," Quarterly
Joppnal pf Public S eakin , 3:235, July, 1917.

4J. R. Pelsma, "Standardization of Grades in Public
Speaking," Quarteply Journal pf Public S eakin , 1:268,
October, 1915.

5Bulletin, 1952—53 Sessions. Central Michigan
College pf Education. Mount Pleasant, Michigan, 1952,
p. 74.

3

of speech competency by candidates for teaching certificates.
The regulation prescribes completion of the course, EEQQA- t
mentals pf Spgppp (Speech 101), with achievement of at least

a "C" in the course. This "C" is interpreted to mean "average"
skill and facility in communicating information to a group of

persons.

A. The Problem

Statement 2; phg problem. This research is designed
specifically to study critics' ratings of speech performances
by students in Fundamentals p; Sppgph classes at Central
Michigan College of Education. The purpose of the study is
to examine the role which certain factors play in the evalua-
tion of Speech competency.

The experiment will attempt to secure evidence in
answer to four questions:

(I) Do ratings given by college juniors and seniors
who are speech majors or minors differ significantly from
those given by members of the speech faculty?

(2) Do ratings given by the evaluators who know a
speaker differ significantly from those given by the evalua-
tors who do not know the speaker?

(3) Do the ratings given by the evaluators who have
had experience with the rating scale differ significantly
from those given by the evaluators who have had no experi-

ence with the rating scale?

4
(4) Do the ratings of speakers of each sex differ sig-

nificantly according to the sex of the evaluator?

Impoppance pf php pppgy. This study is of particular
concern to the students and faculty of Central Michigan
College of Education. The significance that speech per-
formance ratings have for all freshmen on that campus has
already been explained.

6

Jones , who studied the current practices in the
beginning Speech courses of 318 colleges, found that rating
charts were used in the majority of them. Thus this study
may be of general interest to a number of colleges in the
United States.

However, it is not the purpose of this experiment to
determine the relationship of these factors in all schools
but rather particularly in the situation at Central Michigan
College of Education. Before any adaptations to other
colleges are made, one must first determine the extent to
which their students and faculties are comparable to those
at Central Michigan College of Education.

Naturally the characteristics of the evaluator who

uses the rating scale are of primary consideration. As

 

6Horace Redman Jones, "The Development and Present
Status of Beginning Speech Courses in the Colleges and Uni—
versities in the United States," (unpublished Doctor's dis-
sertation, Northwestern University, Evanston, 1952), 216 pp.

5
Hudgins7 points out, one way to increase the reliability of

the evaluation is, of course, to reduce the variability among
the evaluators. To do this, evidence must first be secured
regarding the extent of influence that certain factors, other
than the speech performance itself, have on the ratings.

The four questions being considered in this experiment
have been selected because they have not been answered by
previous research. Furthermore, they involve factors that
can be controlled in the ordinary classroom situation.

The possible concrete results in terms of action may
be seen by briefly considering the significance that each of
the four factors might have when selecting evaluators:

(1) If ratings given by upperclassmen who are speech
majors or minors tend to differ significantly from those
given by members of the speech faculty, speakers who are to
be compared ought to be rated exclusively by students, ex-
clusively by fabulty members, or by a like number of each.
If, on the other hand, this difference in the academic
speech training of the evaluators plays no significant role,
one may select speech judges at random in this reSpect.

(2) If evaluators who know a speaker tend in general
to rate him higher, or lower, than the evaluators who do not

know the speaker, the teacher must be sure that either each

 

7C1arence V. Hudgins, "The Validity of Speech Tests,"
V015; Review, 45:271-2, May, 1943.

6
evaluator has no acquaintance with the speaker or that a like
number of each Speaker's acquaintances are used as evaluators.
However, if acquaintanceship plays no role in this regard,
then this factor need not be considered in securing evalua—
tors.

(3) If evaluators who have had experience with the
rating scale that is being used make more reliable evalua-
tions than those who do not have such experience, teachers
must be sure that all evaluators receive practice experi—
ences. However, if experience with the rating scale makes
no significant difference, mere verbal instruction may
suffice; and any time spent in practice will be wasted.

(4) If evaluators rate Speakers of the same sex
differently than speakers of the opposite sex, this factor
must be taken into consideration when securing evaluators.
If, on the other hand, neither men nor women evaluators show
an appreciable sex-tied favoritism in their rating of speak-
ers, the evaluators may be secured at random without regard
to sex.

This study has been predicated on the assumption that
these sources of uncertainty in the evaluation of speech
performance warrant careful investigation. There are, of
course, other important factors such as the social back-
ground, intelligence, and physical health of the speech

evaluators which have not been explored in this research.

B. Definition 93 Terms

Selected factors. Factors may be considered as cer-
tain characteristics of the raters. In this study the char—
acteristics being considered are:

(1) The academic speech training of the evaluator.
The ratings given by students who are in either their third
or fourth year of college will be compared with the ratings
given by faculty members who have advanced degrees.

(2) The acquaintanceship of the evaluator with the
speaker. The ratings given by evaluators who are acquainted
with the speakers will be compared with the ratings given by
evaluators who are not acquainted with the speakers.

(3) The experience of the evaluator with the rating
scale. The ratings given by evaluators experienced with the
rating scale will be compared with the ratings given by
these same evaluators before they had experience with the
rating scale.

(4) The sex of the evaluator in relation to the sex
of the speaker. The ratings given female Speakers by male
evaluators will be compared with the ratings given the same
speakers by female evaluators. Likewise, the ratings given
male speakers by male evaluators will be compared with the

ratings given the Same speakers by female evaluators.

Influence. Influence may be assumed to exist whenever

a statistically significant relationship is established be-

tween ratings given in any one of the above categories.

Ratings. Ratings in this study are the judgments of
the "intergroup Speech projects" as expressed by critics on

the Evaluator's Rating Scale.8

Intergroup speech project. This is a phrase used at
Central Michigan College of Education to designate an ex-
pository speech of approximately three minutes duration. It
is delivered before a group of fifteen freshmen who are mem-
bers of various sections of the Fundamentals 9; Speech
course. In each audience there are two evaluators, one a
student and the other a faculty member, who make independent
ratings of each speaker. Each student evaluator is a col-
lege junior or senior who is also a speech major or minor and
who has been approved by the speech faculty as a competent
student. The faculty evaluators are members of the Speech

Department of Central Michigan College of Education.

Fundamentals pf Speech. This class, bearing the col-
lege catalog designation of Speech 101, is a two semester-
hour course required of all freshmen on campus. They may
register for it either the first or second semester. Ap-

proximately six hundred students take this course each year.

 

8See Appendix A.

C . Summary

In the light of the defined terms the problem may be
regarded as an attempt to determine what statistically sig-
nificant relationships, if any, exist between the judgments
expressed by critic-judges on the Evaluator's Rating Scale
and: (a) their academic speech training, (b) their ac-
quaintance with the speaker, (c) their experience with the
rating scale, and (d) their sex in relationship to the sex
of the speaker. I

This study is of particular concern to the students
and faculty of Central Michigan College of Education where
speech competency is a prerequisite to candidacy for a
teaching certificate. However, it may also have implica-
tions for other comparable institutions that have similar

programs.

CHAPTER II
REVIEW OF THE LITERATURE

In order to explore the literature pertinent to the
present study, the writer not only read published and un-
published research in the Speech field,l’ 2’ 3 but also

examined psychological, sociological, and pedagogical writ-
ings.4’ 5’ 6

 

lLester w. Thonssen and Elizabeth Fatherson, Bib-
liography p; Speeph Education. New York: H. W. Wilson
and Company, 1939. 800 pp.

2Lester W. Thonssen, Mary Margaret Robb, and
Dorthea Thonssen, Bibliography pf Speech Education -
Supplement 1939-48. New York: H. W. Wilson and Company,

1950- 393 pp.

3Franklin H. Knower, Table pf Contents 92 phg
'ua terl' Journal pf Speech (1915-19223; speech Monographs
Wane Speech Teacher 1 2 wiph g Revised
Ipdex Compiled Through 1952. Columbia, Missouri: Speech
Association of America, 1953. 61 pp.

 

4Walter S. Monroe, editor. Encyclopedia %f Edg-
capional Research Revised Edition. New York: he
Macmillan Company: 1950. 1520 pp.

5Alice F. Moench and others, editors. Thg Int :—
ngtional Inde; pg Periodicgls Devoped Chiefly pp phg
Humanities and Sciencpg. New York: H. W. Wilson Company,
V018 o I-XII , 1913-19573 0

6Isabell Towner and Ross Carpenter, editors.
T e Educapion Index; A Cumulapive Auphor gpd Subjecp
Inde; pp p Selected Lisp p; Egucational Periodicals,
Books, gpd Pamphlets. New York: H. W. Wilson Company,
Vols. I-VIII, January, 1929 - June, 1953. Also Educg-
pion Index Monthly Check-Lisp, July, 1953 - April, 1954.

11
The bibliographies compiled by Rosenberg7 were con-

sulted to discover the masters' and doctoral theses completed
before 1945 which might have some bearing upon the problem.
More recent studies were listed by Knower8 and Auer.9 The

latter even included this study.10
The literature relating to this study will be re-

viewed under two categories: (1) speech rating in general,

and (2) studies which give Specific consideration to any of

the four factors selected for investigation in the present

study.

A. Speech Rgping ip Genezal

Although much has been written about the use of

11 little has been published on

rating scales in general,
the rating of oral performances in Fundamentals 9; Speech

classes.

 

7Ra1ph P. Rosenberg, "Bibliographies of Theses

in America," Bulle in p; Biblio h , 18:181-82, Septem-
ber-December, 1945.

8Franklin H. Knower, "Graduate Theses——An Index
of Graduate Work in Speech," Speech Monographs, 21:108-
35, June, 1954.

9J. Jeffery Auer, "Doctoral Dissertations in
Speech: Work in Progress, 1954," Speech Mono ra ha, 21:
136-41, June, 1954.

10Ibid., p. 141

110arter v. Good, A. s. Barr, and Douglas E. Scates,

The Mephodology pf Educapiongl Reseagch. New York: D.
Appleton-Century Crofts, 1941, pp. 424-37.

12

Symondsl2 points out that group rating is more reli—
able than is individual judgment:

A Single observation is unreliable, a single rating

is unreliable, a Single test is unreliable, a single
measurement is unreliable, a single answer to a question
is unreliable. Reliability is achieved by heaping up
observations, ratings, tests, questions, measures...

An adequate rating requires the judgment of several
raters in several situations at several different times.
Reliable evidence must be multiplied evidence.

Rugg13 recommends the use of pooled or averaged rat-
ings of not less than three independent raters. In each
instance it is assumed that the several raters are all com—
petent to rate and that the reliability of pooled ratings
tends to increase according to the Spearman—Brown formula.14

Holcombls found that, although most judges take care-
ful notes and have a number of definite points on which to
judge, they do have personal standards which vary widely

from one judge to another.

 

12Percival M. Symonds, Diggnosing Pepsonalipy gpd
Conduct. New York: D. Appleton—Century Company, 1931, p. 5.

13Harold C. Rugg, "Is the Rating of Human Character
Practicable?" Journal pf Educational Psychology, 12:425-38,
November, 1921.

14Joy P. Guilford, Psychometrip Methods. New York:
McGraw-Hill Book Company, Inc., 193 , p. ZZI.

15Martin J. Holcomb, "The Critic-Judge System,"
Quarperly Journal 93 Speech, 19:28-38, February, 1933.

13

who has done a great deal of research in the

16

Knower,

field of speech evaluation, says:

The objectivity of observational evaluation is en-
tirely a matter of the objectivity of raters. Although
the standards of evaluation in this process are osten-
sibly subjective, it remains a fact that such judgments
may be as accurate, or even more accurate than an

arbitrarily assigned score derived from items on an
objective paper and pencil test.

B. The Four Fact0pp

The almost complete lack of research in the area out-
lined by this study, namely the four factors which may affect
the rating of speech performances, indicates the need for
this work to be done. Furthermore, where studies have been
conducted, as in the area of sex influences, the evidence is
inconclusive and even contradictory.

1. Egg gcademic ppggph ppgining p: th evaluator.
West and Larsen17 experimented with students in the required
freshman course in speech in the State University of Iowa.
Students ranked their classmates, and their "class ratings"
were compared with ”grades" given by the instructor. They

reported:

 

16Franklin H. Knower, "What is a Speech Test?"
Quapteply Journal 9; S eech, 30:485-93, December, 1944.

17Robert West and Helen Larsen, "Some Statistical
Investigations in the Field of Speech," Quarterly Joppnal
pf Speech, 7:375-82, November, 1921.

14

The relation between the combined judgment of the
class on each speaker and the instructor's judgment on
each speaker, or the correlation between "class ratings"
and "grades" computed on about 300 cases, is .453.
Assuming that the instructor's grade is made upon a
reasonable basis, one would say that comparatively just
marks could be given a student of speech by getting a
rating from the class.

This conclusion embodies as one of its basic assump-

tions the acceptance of the standard postulated by Rugg18 as

a test of Significance:

The experience of the present writer in examining
many correlation tables has led him to regard correla-
tion as 'negli ible' or 'indifferent' when r (coefficient
of correlation is less than .15 to .20; as being 'preS—
ent but low' when r ranges from .15 or .20 to .35 or .40;
as being 'markedly present' or 'marked' when r ranges
from .35 or .40 to .50 or .60; as being 'high' when it is
above .60 or .70.

Knower19 investigated the extent of agreement between
students and instructors in their rating of student speakers.
He had instructors and students rate thirty-three speakers.
The ratings given by students were correlated, by the rank
order method, with the raw score given by the instructors.

In light of these correlations he concludes:

Since the correlations were consistently higher, with
one exception, between the students' ratings and the in-
structors' ratings than between the ratings of the in-
structors, we have a more objective criterion of effective
public speaking in the average of a number of student
scores than we have in the scores assigned by one in—
structor.

 

18Haro1d o. Rugg, Spatispical Methods Applied ;9
Education. New York: Houghton Mifflin Company, 1917, p. 256.

19Franklin H. Knower, "A Suggestive Study of Public

Speaking Rating-Scale Values," Quarteply Journgl pf Speech,
15:30-41, February, 1929.

15

20 made a study of the ratings given by 169

Anderson
students to their classmates in a basic communications course,
at that time called Written and Spoken En lish, at Michigan
State College. The student Speakers were in eight different
rooms. Each of these eight groups was rated by three faculty
members as well as by their fellow classmates. By comparing
these ratings she came to the conclusion that the students
were more in agreement as to the ratings the speaker should
get while the faculty varied more in their judgments. How-
ever, this is not necessarily a valid comparison of faculty
raters with student raters since the faculty used a rating
scale listing five traits while the students evaluated only
upon one of these five traits.

Gibbs21 investigated the degree of consistency in
evaluations made by faculty members and students listening
to recordings of three minute speeches. The students were
classmates of the speakers. According to this study student
evaluators place more Speakers in the below average classi-

fication than do faculty evaluators. However, the evalua-

tions were made only of voice and articulation; and, Since

 

2QMaryMargaret Anderson, "An Analysis of Some of
the Sources of Variation Involved in Rating Speeches,"
(unpublished Master's thesis, Michigan State College, East
Lansing, Michigan, 1945), 19 pp.

21David Elmore Gibbs, "A Study of Reliability and
Variation of Critical Rating of Speech by Trained and Un—
trained Observers," (unpublished Master's thesis, The
University of Washington, Seattle, 1948), 87 pp.

16
none of the students had any courses in this field, training
may have been the important factor here. There is no evi—
dence to indicate whether the results would be the same if
the students were juniors and seniors who had Speech training.

Andregg22 made an analysis of the ratings on Six
traits: "thinking, knowledge, initiative, cooperation,
organizing ability, and expression." These ratings were made
by students and instructors on the performance of student
officers attending the Air Command and Staff School at the
Air University. The officers participated in the planning
of the tactical and strategic air Operations; rated each
other's performances; and were rated by their instructors,
also officers, who devoted full time to observation and
rating.

The study Showed that students and instructors rated
most reliably on "expression." Also students rated their
fellow staff officers more leniently than did instructors.
However, the Situation of rating officers on general char-
acteristics in the Air Command and Staff School may not be
entirely comparable to rating freshmen in Eppggmpppglp pg
Speech by faculty members and upperclassmen who are Speech

majors and minors.

 

22Neal Berry Andregg, "A Critical Study of Graphic
Rating Scales," (unpublished Doctoral dissertation, Michigan
State College, East Lansing, Michigan, 1951), 138 pp.

l7

2. Thp gcquaintanceship 2f the evaluator,piph php
speaker. Seedorf23 conducted a study to find out how much
agreement there is among individuals in their response to an
oral interpretation of literature. Among other things she
answered the question: How does acquaintance with a fellow-
classmate's quality of work affect the amount of agreement
among judges? She states:

The correlation of the mean scores of each member
of the two groups of student-judges, the acquainted and
the unacquainted, for one group of readers, were .958
and .887 respectively, indicating that when evaluated
by fellow students of approximately the same degree of
training, the readers received about the same rank
whether given by fellow classmates or by students who
were not classmates.

Knight24 analyzed ratings of 1948 public school
teachers of one school system made by the supervisors under
whom the teachers were working. He concluded:

The factor of acquaintance operates to make ratings
more lenient, i. e., increases the over—rating, and to
make ratings less critical and less analytical, i. e.,
increases the influence of the halo of general estimate.
In a way it is literally true to say of a judge's
estimate: "His judgment is of doubtful validity be-
cause he has known this man too long."

 

23Evelyn H. Seedorf, "An Experimental Study in the
Amount of Agreement Among Judges in Evaluating Oral Inter-
pretation," Journal p; Egucational Research, 43:10-21,
September, 1949.

24Frederic B. Knight, "The Effect of the Acquaint-
ance Factor upon Personal Judgments," Journal pf Educa-
Liongl PS cholo , 14:129-42, March, 1923.

l8

HenricksonZS made a study of one hundred and seventy-
nine students in Fundamentals pf Speech courses, eighty-one
from three classes at the University of Montana and ninety-
eight from four classes at Iowa State Teachers College. He
asked them to rate their classmates at the end of a semester
or quarter on: (1) how well they knew the person; (2) how
well they liked the person; (3) how good they thought the
person was as a speaker. From a study of these data he came
to the conclusion that the better known students are appar—
ently liked better and are judged to be somewhat better

speakers.

3. Thg experience pf the evaluator with the gaping

l26 conducted an experiment in rating musical

gpglg. Carrol
selections played on phonograph records. He used two sec-
tions of a class in educational psychology as raters; and
each section rated the selections according to (1) volume,
(2) expression, (3) quality, (4) melody, (5) harmony, and
(6) rhythm. One section, the control group, rated three
records. Some weeks later they rated the same three

records again. The second section, the drill group, rated

the same three records at the same times as the control

 

25Ernest H. Henrickson, "The Relation Among Knowing
a Person, Liking a Person, and Judging Him as a Speaker,"
Speech Monographs, 7:22-25, 1940.

26Robert P. Carroll, "Practice in Rating," Joupnal
pf Experimentgl Psychology, 14:299-302, June, 1931.

19
group. However, between the first and second rating periods,
this second section was given new records to rate three times
a week. A great deal of discussion relative to method of
rating was done in the drill group between ratings. Carroll
criticized his research in that, due to the absences, there
were changes in each group. Also he did nothing to equate
the two groups. However, he concludes that "in general, the
results of the experiment seem to indicate that by training
in subjective ratings individuals may improve; that their
ratings may more nearly agree and that they may become more
reliable."

Thompson27 in his investigation to determine the
accuracy of typical speech rating techniques collected data
from eleven classes in Fundamentalg pf Spgpgh. Both speakers
and raters were members of these classes. The raters were
freshmen taking their first college course in speech and had
little or no formal training in rating speeches. The pro-
cedure was to divide the class into two groups of raters.
Both groups listened to the same speeches, but one used rat—
ing scales while the other used letter grades. The data were
analyzed to find the variance for each method. Then, on the

assumption that the system which produces the least variance

 

27Wayne N. Thompson, "An Experimental Study of the
Accuracy of Typical Speech Rating Technique," (unpublished
Doctor's dissertation, Northwestern University, Evanston,
1943), 204 pp.

20
is the best, the investigator concluded that a rating scale
is superior the first day of the experiment but that after
practice (or fatigue) letter grades are more accurate.

4. The sex pf the evaluator ip relation 39 the sex pf
the Speaker. There have been many studies conducted and much
has been written, often quite contradictory, in the field of
sex differences. A few of these are somewhat relevant to the
aspect of sex differences being considered in this study.

Anastasi,28 who has done considerable writing in the
field of sex differences says:

It is apparent that the effectual environment of the
two sexes are fundamentally diverse from an early age.
Under such conditions, we Should expect pronounced varia~
tion in the emotional and intellectual development of the
two sexes.

Lehman and Witty29 caution that in collecting data
from males and females to determine variability the same
age—levels only should be compared. He states:

One source of error is hasty generalization due to

the inclusion of all or nearly all age-levels in the form—
ulation of conclusions. Because of the irregular develop-
ment of numerous human characteristics, one may expect to
find that at a certain age—level girls will be more vari—

able than boys in some regards, and at an earlier or later
age the reverse may be true.

 

28Ann Anastasi, Differential Psychology: Individual
and Group Differences in Behavior. Macmillan Company,

New York, 1937. p. 386?

29Harvey G. Lehman and Paul A. Witty, "Sex Differ-
ences: Some Sources of Confusion and Error," American
Journal pf Psychology, 42:140-47, January, 1930, p. 143.

21

Symonds3O studied the differences of areas of interest
according to sex. Fifteen areas of human interest were
ranked by 784 high school boys, 857 high school girls, 276
college men, 387 college women, 73 men graduate students, and
111 women graduate students in order of interest for reading
or discussion. By taking average ranks for the various
groups he noted differences by sex for different maturity
levels. He found a greater difference in the area of inter-
est between college men and college women than he found be-
tween high school boys and high school girls or between men
and women doing graduate work.

Carter31 made an investigation of the assignment of
marks by teachers of beginning algebra to determine whether
or not teachers tend to favor one sex and whether the sex
favored tends to be determined by the sex of the teacher.
Nine classes, five taught by men and four taught by women,
were used. In these classes there were 135 boys and 100
girls. Intelligence, achievement, and personality of these
students were measured by standardized tests; and the rela-

tionship between teachers' marks and (1) intelligence,

 

30Percival M. Symonds, "Changes in Sex Differences
in Problems and Interests of Adolescents with Increasing
Age," Journal pf Genepic PS cholo , 50:83-89, iarch, 1937.

31Robert Scriven Carter, "Non—intellectual Variables
Involved in Teachers' Marks," Journal 91 Educapional
Reseapch, 47:81-95, October, 1953.

22

(2) achievement, and (3) personality was determined by com-
puting the correlation coefficient. The results showed that
the relationship between teachers' marks and intelligence was
higher for the boys than for the girls. Also that the rela-
tionship between teachers' marks and achievement scores was
higher for the boys than the girls. However, the relation-
ship between teachers' marks and personality was higher for
the girls than for the boys. Marks given by women teachers
correlated higher with personality test scores than did the
marks given by men. Conversely, marks given by men corre-
lated higher with achievement and intelligence than did
marks given by women.

Douglas and Newman,32 who made a study of achievement
and marks of 3366 students in four Minnesota high schools,
say:

In the light of the data of this and other investi-
gations, it seems probable that marks are determined by
factors other than achievement, especially marks assigned
by women teachers, and that these influences result in
the slight overrating of girls generally and the peculiar
underrating of boys by women teachers.

However, this refers to marks in English, history,

and mathematics given to high school students by faculty

members and may not be comparable to ratings given in a

college speech course by college students or faculty members.

 

32Harl R. Douglas and Olson E. Newman, "The Relation
of High School Marks to Sex in Four Minnesota Senior High
Schools," School Review, 45:481-88, April, 1937, p. 288.

23

Fifty members of the faculty at Northwestern Univer—
sity ranked one hundred and four students, fifty-three men
and fifty-one women, by classifying them into ten different
groups according to estimated intelligence. These were cor-
related with scores received by the students on a battery of
intelligence tests. As Webb33 reports:

Each group Showed some partiality to the opposite sex
in estimating intelligence; that is, the men gave evi-
dence of placing a slightly higher value on the intelli-
gence of women than they do that of men. The women appear
to do the same thing in regard to the men.

The writer34 made a survey of the attitudes of 227
intercollegiate debaters towards debate judges. He found
that, as far as the feelings of debaters were concerned, the
sex of the judge made a difference in more than half of the
cases. Only 39 per cent of the men and 49 per cent of the
women felt that the sex of the debate judge had no effect
upon the ratings they received. Both men and women debaters
preferred male over female judges. However, this preference
for male judges was slightly more pronounced among women

debaters (45 per cent) than among men debaters (40 per cent).

On the other hand 21 per cent of the men debaters thought

 

33L. w. Webb, "The Ability of Men and Women to Judge
Intelligence," School ppg Sociepy, 20:251-54, August 23,
1924.

34Emil R. Pfister, "A Survey of Attitudes Toward
Debate Judges," Forensic pg 2; Kappa Delta, 39:102—03, May,

1954.

24
female judges rated them higher while only 6 per cent of the
women debaters thought they were rated higher by female
judges.

Knower35 administered one form of the Smith and Thur-
stone Attitude towarg Prohibition Scale before a speech and
gave an equated form after the presentation. He concluded
that the delivery of speeches produces a change of attitude
statistically significant and that women speakers are more
influential with a male audience than are men speakers.
Similarly, a female audience is influenced more by men speak-
ers than by women Speakers.

Graunke36 found that female instructors gave higher
ratings than did the male instructors. However, these ratings
were not broken down to determine whether female instructors
rated both male and female students higher than did the male
instructors.

Penland37 made a study of the ratings given to eighty-
seven university sophomores, fifty-three women and thirty—four

men. These students read orally and were rated by both male

3SFranklin H. Knower, "Experimental Studies of Changes
in Attitudes," Journal 9: Social Psychology, 6:315-44,
August, 1935.

36Graunke, pp. p$p., p. 102.

37Virgil Darrell Fenland, "An Experimental Study to
Measure Effectiveness in Oral Reading by Means of a Rating
Scalme Technique," (unpublished Doctor's dissertation The
UHlVWErsity of Southern California, Los Angeles, 1948), 177 pp.

25
and female judges. He found one "probably significant dif-
ference," i.e. that female judges tended to be more "severe"

in rating women performers in this field of oral reading.
C . Summary

Speech rating ip genera . A survey of the literature
that deals with Speech rating in general indicates that:

(1) Group rating is more reliable than individual
judgment.

(2) Reliability of pooled ratings increases as the
number of competent raters is increased.

(3) Personal standards vary widely among judges.

122 four factors. Although no research identical to
this experiment has been conducted, studies have been made
that are related to the four factors being considered in
this study:

(1) Five studies compare ratings given by faculty
members with ratings given by students. One experiment con-
cludes that rating by a group of students is more accurate
than by a single faculty member. Two studies point out that
ratings given by students are more in agreement than are
ratings given by faculty. One study indicates that student
raters give more lenient ratings than do the faculty while
another indicates that faculty give the more lenient ratings.
However, all five of these studies use freshmen evaluators

who care the speakers' classmates. These ratings may not be

26
equivalent to those given by speech majors and minors in
their junior or senior year of college.

(2) Three studies consider the factor of acquaintance
with the Speaker. Here, too, the students used as evaluators
were classmates of the Speakers. All three studies agreed
that evaluators acquainted with the speakers were more leni-
ent than evaluators unacquainted with the speakers.

(3) Only two research studies could be found that were
concerned with the evaluator's experience with the speech
rating scale. The first of these studies found that training
improved the reliability of the rater. However, it should be
noted that this experiment was conducted by rating music on
phonograph records and that coaching as well as practice was
used. The second study used classmates to rate speeches.

It concluded that a rating scale is superior to letter grades
the first day, but after that letter grades are more accu-
rate.

(4) The studies regarding the relationship between
ratings given a speaker and the sex of the evaluator are in-
conclusive. This survey concurs with an earlier report by

8

McNemar and Terman3 regarding variability in mental traits

between sexes:

 

38Quinn McNemar and Lewis M. Terman, "Sex Differences
in Variational Tendency," Genetic Psychology Mono a hs,
18:8, February, 1936.

27

Research has not proved either the presence or
absence of a sex difference in variability with respect
to psychological traits. There are few problems in
psychology on which investigations that would appear to
be comparable have yielded results so discordant.

CHAPTER III
PROCEDURE

The data for this study were collected during the
1952-53 academic year at Central Michigan College of Educa-
tion, Mount Pleasant, Michigan.

Six hundred and four people cooperated to make this
experiment possible. They may be divided into two groups:

(1) There were the speakers, five hundred and forty—
nine of the five hundred and ninety-eight freshmen enrolled
in Fundamentals g; Spgggp classes.l Three hundred and five
of these were in the nineteen sections which were taught
during the first semester and two hundred and forty—four
were in the nineteen sections taught during the second se-
mester. Each of these students gave two different three
minute informational Speeches before audiences which aver-
aged about fifteen people, most of whom were unacquainted
with the Speaker. Thus the freshmen gave a total of 1098
three minute informational speeches. Furthermore, since
each of these speeches was given before two different audi-
ences, the students compiled a total of 2196 speech per-

formances.

 

lThe forty-nine freshmen excluded from this experi-
ment were those who, because of some reason such as illness,
were unable to participate in all four of the intergroup
Speech projects.

29

(2) There were the fifty-five evaluators, forty-six
students and nine faculty members.2 The student evaluators
were juniors and seniors who were speech majors or minors
while the faculty evaluators were members of the Department
of Speech. Each of the 2196 speech performances was rated
by at least one student and one faculty member. These raters
sat in the audience, worked independently, and used a stand-
ard rating scale.

Therefore, by having various pairs of evaluators, one
student and one faculty, rate the 2196 speech performances,
a total of 4392 ratings was collected. As will be explained
later in this chapter, precautions were taken so that these

ratings would be comparable.

A. Devising the Evaluapor's Raging SCQIC

The Evaluator’s Rating Scale3 that was used in this
experiment was devised by the writer who employed the fol-
lowing procedure:'

(1) A study was made of the Speech rating scales

 

28ee Appendix B.

3See Appendix A.

30

which have appeared in speech textbooks and periodicals pub-

lished in the United States.4-14

 

4Arthur W. Cable, "A Criticism Card for Class Use,"
Journal 9; Speech Education, 12:186-88, April, 1926.

5J. Stanley Gray, "Objective Measurement for Public
Speaking," Journal 9; Expression, 2:20-26, March, 1928.

6Wilmer E. Stevens, "A Rating Scale for Public
Speakers," Qparterly Journal p; Speech Education, 14:223-32,
April, 1928.

7Alice J. Bryan and Walter H. Wilke, "A Scale for
Measuring Speaking Abilities," Psychological Sulletin,
33:605-0 , October, 1936.

8Harry G. Barnes, "Appendix," S eech Handbook.
Iowa City: Privately printed, 1936. 13 pp.

9Helen L. Ogg and Ray K. Immel, "Speech Criticism
Chart," Speech Improvepent. New York: F. S. Crofts and

Company, 193 . 190 pp.

loElwood Murray, The Speech PegSonalipy. New York:
J. B. Lippincott Company, 1944. pp. 271-391.

llWilhelmina G. Hedde and William N. Brigance, "A
Score Sheet for Judging Speeches " Americ n S eech. New
York: J. B. Lippincott Company,’l946. pp. 581—8 .

12Alice Evelyn Craig, The S eech Art . New York:
The Macmillan Company, 1947, p. 252.

13A. Craig Baird and Franklin H. Knower, "Appendix
D," General Speech. New York: McGraw-Hill Company, 1949,
p. 294.

14Karl F. Robinson, Spaching Speech Sp Secondapy
School. New York: Longmans, Green and Company, 1951.
pp. 123-28.

31

(2) A survey was made of the literature regarding the
construction of speech rating scales.15—22

(3) Taking into consideration the conclusions from
previous research conducted in the field of rating scale con-
struction, the writer devised a rating instrument. This in-
strument incorporated the elements common to other speech
rating scales that had been used by others with some satis—
faction in the past. This was revised and refined in the
light of suggestions offered by faculty members of the Speech

Department as well as members of the advisory committee for

this thesis.

 

15J. B. Miner, "The Evaluation of a Method for Finely
Graduated Estimates of Ability," Journal p; Applied Psy—
cholo , 1:123—33, June, 1917.

16Max Freyd, "The Graphic Rating Scale," Journal p:
Educational Psychology, 14:83-102, February, 1923.

l‘7Percival M. Symonds, "Notes on Rating," Journal p:
Applied Psychology, 9:188-95, June, 1925.

18Paul H. Furfew, "An Improved Rating Scale Tech-
nique," Journal pi Educational Psychology, 17:45-48, January,

19Percival M. Symonds, "Rating Methods," Diagnosing
Personality and Conduct. New York: D. Appleton-Century
Company, 1931. pp. 41-121.

2OLee Norvelle, "Development and Application of a
Method for Measuring the Effectiveness of Instruction in a
Basic Speech Course," Speech Monographs, 1:41-65, 1934.

21Alice J. Bryan and Walter H. Wilke, "A Technique

for Rating Speeches," Journal p; Consulting Ps cholo ,
5:80-90, March-April, 1941.

22Isabel Kincheloe, "On Refining the Speech Scales,"
English Journal, 34:204-07, April, 1945.

32

(4) The writer presented this speech rating scale to
his colleagues at a departmental staff meeting of the speech
faculty of Central Michigan College of Education where it was
discussed and received unanimous approval.

(5) The last two steps, (a) that of introducing the
rating scale to the evaluators and (b) that of investigating
its reliability and validity, will be discussed later.

B. Collecting the Data

Before conducting the experiment it was essential to
secure the cooperation of the faculty members of the speech
department. During September, 1952, several Speech Depart-
ment staff meetings were held previous to the registration
of students. At one of these the writer outlined a plan
for conducting this experiment in evaluating oral perform-
ances of students in Epndamentals p; Speech classes. The
members of the speech faculty were not only willing to
cooperate but they also liberally contributed ideas, time,
and effort.

The project also required the cooperation of the
juniors and seniors who were on either a Speech major or
Speech minor curriculum. When asked if they would be will-
ing to serve as evaluators, their response indicated that
in general they were eager to have the experience as a back-

ground for preparation as future teachers of speech.

33

Scheduling the evaluatorp. Scheduling the evaluators
from the Speech faculty was accomplished with little diffi—
culty since the instructors of_Fundamentals p; Speech agreed
that no class sessions of the course were to be held during
the weeks that the intergroup speech projects were scheduled.
This policy freed the faculty to serve as evaluators. The
fact that each student missed two class sessions during that
week could be justified because each student was having the
experience of giving the same speech before two different
audiences. Furthermore, he was getting the evaluations of
four well qualified evaluators.

Securing qualified student evaluators required more
effort. The first step was to compile a list of the seventy
students who were either speech majors or minors and who
were also either juniors or seniors.23 This list was dupli—
cated and copies were sent to each member of the speech
faculty in order to determine (1) the number and type of
speech courses that each student had taken, (2) professors
with whom he had done speech work, and (3) whether the pro-
fessor regarded the student as qualified to serve as an

evaluator.24

 

23See Appendix C.
24See Appendix C.

34

Meanwhile, letters signed by the Head of the Depart-
ment of Speech and Drama were sent to all juniors and seniors
who were Speech majors and minors.25 These letters explained
the intergroup speech project, solicited student cooperation,
and included a student evaluator's preference report blank.26

When these blanks were filled out and returned, the
juniors and seniors were assigned groups to evaluate. These
assignments were made according to the student's availability
and preference. Then each student was sent a letter inform-
ing him of the time or times that he was scheduled to serve
2'7

as an evaluator.

Preparing the Speakers. At the sixth meeting of the
Fundamentals p; Speech classes during the fall as well as
during the spring semester, the speech sections were given
the following assignment:

You are to prepare a three minute informative Speech
and deliver it on the week of . You will be
scheduled to speak before two different audiences com—
posed of students from other Speech 101 classes. You
will be rated in each of the performances by a student
who is a Junior or Senior and a Speech major or minor.
You will also be rated in each of the performances by a
member of the speech faculty.

 

258cc Appendix D.
26See Appendix E.
27See Appendix F.

35

At the seventh meeting of the class a sheet was given
to each student on which he could list the dates and times
that he preferred as well as those when he could not speak.28
This helped in scheduling students for speech performances.
Each instructor was given a schedule on which were listed the
date, time, and room that each student was assigned. After
the teacher read this aloud and the student wrote down his
speech schedule for that week, the sheets were posted outside
the speech secretary's office so that any student might
double check his speaking assignments.

During each semester every student enrolled in Epppp-
menpals p; Speech classes was expected to participate in two
Intergroup Speech Projects. First semester students gave
their first project Speeches the week of October 27-31, 1952,
and their second project performances the week of January 12-
16, 1953. The second semester students gave their first
project speeches the week of March 9-13, 1953, and their

second series of speeches the week of May 11-15, 1953.

Eyeparing the evaluators. The first problem in pre-
paring the evaluators was to familiarize them with the rating
scale without any indoctrination that would make this experi-

ment sterile. However, the evaluators had to have verbal

 

28See Appendix G.

36
agreement regarding what was to be rated. As Wilke29 says:

The first difficulty which anyone attempting to rate
individuals runs up against is the matter of attaching
exact meanings to the terms used on the rating scale.
Many previous users of rating devices have urged the use
of careful definitions to establish unequivocal meaning.

According to Monroe,30 research upon the problem of
increasing the agreement among judges when rating scales are

used discloses that an adequate definition of what is being

rated is crucial.

Symonds31 attempts to outline the method by which

such definition is attained:

Particular attention must be paid to the definition
of the items in the scale. On this hinges much of the
success or failure of ratings in general. One of the
most potent factors causing unreliability of ratings is
ambiguity in the meaning of items on the scale. Thus
in every rating scale the items should be defined in
some way. There are several possible ways of doing this.
One, perhaps the least satisfactory, is to give synonyms
of the original term. Another is a short paragraph
amplifying the descriptive title. Another method is to
ask a question which not only limits the meaning of the
term but somehow helps the rater to see the problem of
rating more clearly.

Furfey32 conducted research which indicated that re-

liability could be increased not only by increasing the

 

29Walter H. Wilke, "A Subjective Measurement in
Speech: A Note on Method," Quarterly Journal p; S eech,

21:55, February, 1935.
30Monroe, pp. pip., p. 961.
318ymonds, pp. plp., p. 84.
32Furfey, pp. pip., p. 92.

37
number of judges but also by increasing the number of judg-
ments which each judge makes. He explains:

This is easily accomplished by analyzing the trait
to be rated into several sub—traits, by having the judges
rate all these sub-traits separately and then combining
these separate ratings into a final score. This is quite
comparable to the process of measuring intelligence by
measuring separately a number of abilities which are be-
lieved to correlate highly with intelligence and then com-
bining the separate results into a final score.
This subdividing of traits may be overdone, of course,
but the need for more specific items cannot, according to
Freeman,33 be ignored:

Frequently it is held that the uniqueness of the in-
dividual personality pattern renders futile any analysis
into elements which, when isolated and measured, lose
their meaning. Because this view is at variance with
canons of scientific procedure, it should be examined
very critically. There is a middle ground somewhere, and
this we must find before real progress in personality
assay is made.

The students and faculty who agreed to serve as eval-

uators were given copies of the Evaluator's Rating Scale34

to study two weeks before the intergroup speech projects were
scheduled to begin. The week prior to the projects the eval-
uators met twice and discussed the question: "What is meant

by the various items on this rating scale?" Student-faculty

committees were set up on each of the four major divisions:

(1) "Thought," (2) Language," (3) "Voice," and (4) "Action."

 

33Graydon LaVern Freeman, The Energetics of Human
ﬁghavioz. Ithaca, New York: Cornell University Press,
0 O l

194 7.
34See Appendix A.

38
Students acted as committee chairmen while faculty members
served as resource persons. Only those items that were sub-
mitted by the committee and agreed upon unanimously by the
evaluators were accepted as the official interpretation of
the criteria used. These criteria were then mimeographed
and distributed so that each evaluator would have a copy of
the interpretation of the rating scale.35 Thus an attempt
was made to reduce the variables inherent in interpreting
the rating instrument.

The speech faculty members who had considered and dis-
cussed the intergroup speech project early in the semester
compiled a list of instructions regarding how the project
should be carried out so that the procedure would be con-
sistent in all sections.36 These also were mimeographed and
sent to all the evaluators.

Thus an effort was made to prepare the evaluators so
that they would understand and appreciate the meaning, use,
and purpose of the rating scale. This was in accord with the
advice offered by Strang:37

Only by taking into consideration the way in which

the rating is used, the harm that may result from super—
ficial or inaccurate rating, and the service which the

 

35See Appendix H.
368ee Appendix I.

37Ruth Strang, "Seven Ways to Improve the Rating
Process," Occupations, 29:107-10, November, 1950.

39
rating may perform in preventing the individual from get-
ting into situations in which he is likely to fail, can
the rater appreciate the importance of the rating.

Description pf pp; pgperimental setting. Directions
given in "Instructions to Evaluators"38 were followed care-
fully since it was essential to the success of this investi-
gation that certain conditions be kept constant. To aid in
achieving this objective, precautions were taken to see that
several controls operated and that all of the speakers gave
the same type of speeches under similar conditions before
paired judges. No exceptions were considered.39

Accordingly, these procedures were followed:

(1) Only college freshmen enrolled in Fundamentals pi
Speech participated as Speakers. They gave the same length
speeches (approximately three minutes) with the same general
purpose (to inform).

(2) All audiences were similar, being composed of
approximately fifteen speakers from various sections of the
class, and two evaluators, one a member of the speech fac-
ulty and the other a college junior or senior majoring or
minoring in speech.

(3) Each pair of evaluators followed identical in-

structions, heard the same speeches at the same time, used

 

388ee Appendix I.

39Where an exception occurred, the rating scales
were kept separate and were not used in this study.

40
the standard rating scale, and had previously agreed upon the
interpretation of that rating scale.

(4) Each Speaker gave two intergroup speech project
information talks. Both of these were given in the same room
at the same time of day and the same day of the week, exactly

0 Each speaker also had the same audience

41

nine weeks apart.4

and the same pair of judges listen to both of his speeches.

Additional evaluators. When pairs of evaluators heard
the same speakers give their second series of information
speeches during the week of January 12—16, a third evaluator
was present in sixteen of the sections. Each of these ad-
ditional sixteen evaluators was also either a member of the
speech faculty or a junior or senior who was a Speech major
or minor. This was done in order to be able to study the
correlation of scores given by two different faculty mem-
bers, or two different upperclassmen, who heard the same

speech at the same time.

 

40During the first semester, 1952—53, the first
intergroup speech project was conducted during the week of
October 27-31, 1952, and the second project the week of
January 12-16, 1953. The second semester the first pro—
ject was March 9-13, and the second, May 11-15, 1953.

41Since the second series of intergroup speech pro—
jects were scheduled for corresponding times and days,
there was not a great deal of difficulty in securing the
same evaluators. However, in such cases where substitute
evaluators were necessary, the ratings were not considered
in this experiment.

41
The additional evaluators, sitting in with the paired
evaluators and rating the Speakers, made another comparison
possible. Ratings given by evaluators hearing the speakers
at the same time could be compared with the ratings given by
the evaluators hearing these speakers give the same speech

at a different time.

Checking rater—speaker acquaintance. Immediately
preceding the fourth intergroup speech project, May 11-15,

42 was added to the rating scale in order to

an extra form
determine whether or not the rater was acquainted with the
speaker. A Similar form43 was given to each speaker so that
he could indicate the extent of his acquaintance with the

raters.

C. Tabulating ppg Data

 

The fact that this experiment was designed to study
four separate factors and that the data consisted of nearly
forty-five hundred rating scales, each filled out with
twenty-five specific items of information, made machine
tabulation a practical necessity. This need was met by the

use of the IBM equipment.44

—.__._

42See Appendix J.

43See Appendix K.

44IBM equipment, manufactured by International
Business Machines, 590 Madison Avenue, New York City, New

York, is available for research at Michigan State College.

42

Mechanical tabulation. Mechanical tabulation was fa-
cilitated by the use of a special punch card.45 This punch
card made it possible to record sixty separate items on each
card by use of a zero through nine code.46

Four steps had to be taken in order to convert the raw
data on the rating scales into a form which could be handled
by IBM methods:

(1) Data on the rating scales were reduced to a numer—
ical code.47

(2) The data were then entered on the punch cards by
a trained IBH Operator.

(3) The cards were sorted by a mechanical sorter.

(4) The data were then assembled by an electronic

tabulator.

Organizing the data. First the data were arranged in
tables designed to facilitate determining how well each stu—
dent performed in comparison with his fellow classmates as
well as how much improvement he had shown during the nine

weeks between the first and second series of intergroup

 

45This punch card was designed by Francis B. Martin,
Supervisor of Tabulation, Michigan State College.

46See Appendix L.

47See Appendix M.

43
speech projects.48 This arrangement of the data, although
useful in computing grades for the students, could not be
used to answer the four primary questions being considered in
this study.

Secondly, the data were arranged into sixteen cate-
gories, taking into consideration the academic speech train-
ing of the raters and their sex in relationship to the sex
of the speaker. These categories consisted of two major
divisions, male speakers and female Speakers, each broken
down into eight sub-groups:

(1) Scores given by male faculty evaluators serving
with male student evaluators.

(2) Scores given by the male student evaluators serv-
ing with the above male faculty evaluators.

(3) Scores given by female faculty evaluators serving
with female student evaluators.

(4) Scores given by the female student evaluators
serving with the above female faculty evaluators.

(5) Scores given by male faculty evaluators serving
with female student evaluators.

(6) Scores given by the female student evaluators

serving with the above faculty evaluators.

 

48These tables, consisting of forty-two pages, have
not been included in the appendix of this thesis because of
their bulkiness. The writer has a duplicate copy available
for anyone's use.

44

(7) Scores given by female faculty evaluators serving
with male student evaluators.

(8) Scores given by the male student evaluators serv-
ing with the above female faculty evaluators.

Comparisons of these scores and their statistical //
significance are presented in the following chapter.

Thirdly, in order to determine the influence which
experience with the rating scale had upon the ratings, the
data were arranged so that the scores given by pairs of
evaluators during the first intergroup speech project could
be compared with the scores these pairs of evaluators gave
the same speakers during the second intergroup speech pro-
ject. These were then treated statistically as will be
explained later.

Lastly, in order to consider whether evaluators who
were acquainted with the speakers whom they rated tended to
give scores significantly higher or lower than the evalua-
tors who were not acquainted with these same Speakers, the
Evaluator's Acquaintanceship Check Sheet was used.49 Scores
given by evaluators who indicated that they were unacquainted
with the speakers were compared with scores given by evalua-

tors who were acquainted with these same Speakers.

Statistical procedure. The available literature de-

scribing the principles and methods of population parameters

 

49See Appendix J.

45
and sample statistics is far too extensive to summarize in
this study. However, certain citations are included in an
attempt to provide examples of typical authoritative support
which is available concerning the mathematical methods used
in this study.

An example of calculation of the standard deviation
from original scores is given by Garrett.5O He also demon-
strates how to find the limits in any normal distribution
which will include a given percentage of cases.51 This was
especially useful in allocating grades according to the
normal probability curve.

In order to determine the significance of the dif-
ference in the means of scores given by evaluators influ—
enced by one factor compared with judges influenced by
another factor, the "Student‘s t" test was used.52

Coefficients of correlation, symbolized by "r," were

computed by the product—moment method.53 This is described

 

50Henry E. Garrett, Statistics ip Psychology apd
Education. New York: Longmans, Green and Company, 1947.

p- 3-
511bid., pp. 197-208.

52A full account of this test and the table for its
use will be found in Ronald A. Fisher's Statistical Methods
for Research Workers, London: Oliver and Boyd, Ltd., 1941,
pp. 116—17. Its originator published anonymously under the
pseudonym "Student."

 

53The coefficient of correlation, "r," is often
called the "Pearson r" after Professor Karl Pearson who
developed the product—moment method.

46
in detail by Snedecor.54 Its importance in the determination
of the reliability of an evaluation instrument (such as a
rating scale) was pointed out by Good and others:55

Correlation has an extensive use in connection with

the critical study of tests and other instruments. The
correlation of two series of measure that are supposed
to represent the same thing (such as two applications of
a standard test, or of comparable forms of it), is known
as the coefficient of reliability.

The writer was fortunate to have at his disposal
electric calculators.56 These were most useful when comput—
ing correlation coefficients.

Further references to statistical methods are made in
the chapter presenting the analysis of the data obtained dur-

ing the course of the investigation.

D. Summary

After a year of planning and experimentation the
writer devised an instrument to measure speech proficiency,
the Evaluator's Rating Scale. Then during the academic
year, 1952-53, with the c00peration of Central Michigan

College of Education's juniors and seniors who were speech

54George W. Snedecor, Statistical Methods. Ames,
Iowa: The Iowa State College Press, 1946, pp. 123-41.

55Good, Barr, and Scates, pp. pit., p. 607.

56rechniques for the efficient operation of these
machines are given in Katharine Pease's Machine Computa—
tions 91 Elementary Statistics. New York: Chartwell
House, Incorporated, 1949, 203 pp.

47
majors or minors, the speech faculty, and the freshmen in
Fundamentals 9: Speech classes, the experiment was conducted.
Five hundred and forty-nine freshmen prepared two speeches and
gave each speech twice. Each speech was approximately three
minutes long and its general purpose was expository.

Two evaluators, one an upperclass student and the
other a faculty member, were in each audience and rated each
speech performance. Thus 4392 ratings were collected.
Furthermore, a check was made of rater—speaker acquaintance-
ship.

Although some variability was unavoidable, every pos—
sible effort was made to have sufficient controls operating
regarding speaker, speech, audience, and occasion so that
the ratings would be comparable.

The data were transferred from the rating scales onto
punch cards and IBM methods for sorting and tabulating were
employed. Then with the use of electric calculators the
data were dealt with statistically. The formulas used and

the organized presentation of the findings will appear in

the next chapter.

CHAPTER IV
ANALYSIS OF THE DATA

The analysis of the data gathered in this experiment
considers (l) the rating scale itself, particularly its va—
lidity and reliability, (2) the distribution of scores and
their practical application to a marking system, and (3) the
statistical relationships between each of the four factors

and the ratings given by the evaluators.
A. The Rating Scale

Two important considerations of any measuring instru-
ment are its validity and reliability. Validity means the
degree to which any device or technique measures that which
it is designed to measure. As stated by Cook:1

A test is said to have high validity when it measures

effectively the property it purports to measure. A meas-
ure of validity of a test is secured by computing a co-
efficient of correlation between scores on the test and
an outside criterion.

Reliability refers to the consistency with which an
instrument measures whatever it does measure. It is defined

by Thorndike:2

 

1Walter W. Cook in the Encyclopedia 92 Educational

Research (Walter S. Monroe, ed. , New York: The Macmillan
Company, 1950, p. 1473.

2Robert L. Thorndike in the Enc clo edia p; Educa-
tional Research (Walter S. Monroe, ed.§, New York: The

IMacmillan Company, 1950, p. 1016.

49

The reliability of measurement has to do with the pre-
cision of a measurement procedure. Measurement in educa-
tion is a process of estimating the amount of some quality
or attribute possessed by individual objects or specimens.
These estimates are usually expressed in numbers (scores)
which correspond more or less accurately to the amount of
the quality or trait in question.

Validity. In this research validity is held to be the
degree to which the rating scale actually measures speaking
skill. However, it is difficult to determine the validity of
a speech rating scale because this requires some acceptable
measure of the trait being rated as a basis for comparison.
One of the commonly accepted measures of speaking skill is
the critical response of the listener.

In discussing speech rating methods Monroe and others3

point out:

The problem of validity may be viewed first of all
qualitatively. On logical grounds the audience response
constitutes the ultimate practical criterion of the effec-
tiveness of any speech. This granted, it follows that to
the extent to which the judgments recorded by means of a
rating scale are reliable, they are also valid.

Remmers,4 who made a study of students' ratings of

their teachers, states:

 

3Allan Monroe, Hermann H. Remmers, and Elizabeth
Venemann-Lyle, "Measuring the Effectiveness of Public Speech
in a Beginning Course," Studies in Higher Education, XXIX,

Bulletin pf Purdue University, 37:14, September, 1936.

4Hermann H. Remmers, "Reliability and Halo Effect of
High School and College Students' Judgments of Their Teach-
ers," Journal pf Applied Ps cholo , 18:621, October, 1934.

50

The problem of validity of judgments is hardly perti-
nent. While reliability may be defined as the accuracy
with which a measuring instrument measures whatever it
does measure, validity is defined as the extent to which
the instrument measures what it purports to measure.
Since it is student judgments that constitute the cri-
terion, reliability and validity are in this case synony-
mous.

While the writer believes that this use of the word
"synonymous" is inaccurate, he does agree with Carps who, in
discussing the validity of a speech rating form, points out:

Agreement by experts is an accepted technique in es-

tablishing validity and it is therefore plausible to
maintain that validity and reliability may be derived
from the same index of agreement among judges.

Kelley6 has treated validity of rating scales simi-
larly saying:

If competent judges appraise Individual A as being as

much better than Individual B as Individual B is better

than Individual C, then it is so, and there is no higher
authority to appeal to.

Symonds7 in discussing the validity of ratings says,
"In a certain sense there is nothing more valid than a judg-
ment." He goes on to point out that all our knowledge has

its origin in observation and in interpretations made of

observations.

 

5Bernard Carp, A Study 9; the Influence pf Certain
Personal Factors pp p Speech Judgment. New Rochelle, New
York: The Little Print, 1945, p. 113.

6Truman Lee Kelley, The Influence p: Nurtppe Upon
Individu 1 Differences. New York: The Macmillan Company,
1923, p- 9

7Symonds, pp. ppp., p. 108.

51

In the present study, since the student and faculty
evaluators had discussed and agreed upon the speaking skills
being evaluated, their ratings of the speakers were used as
the criterion. Hence the validity of these ratings is deter-
mined by inference from the reliability of the ratings.

However, a second method, that of comparison with some
other measure of speaking Skill, was possible. All of the
speakers were members of Fundamentalp pf Speech classes at
the time that they participated in the intergroup Speech pro-
jects. At the end of the course each student was given a
mark (A, B, C, D, or E) by his speech instructor. This mark
was to be regarded as indicative of the student's speech
effectiveness in general. Each student also was given a let-
ter mark derived from the total score received by adding the
ratings given by the four evaluators. By using the method of
random sampling,8 a comparison was made of the marks that the
students received in the intergroup speech projects with the
marks they received for general effectiveness of speech.

As indicated in Table I, eighty-four per cent of the
students received the same mark from their speech teacher as
from the evaluator. Thus they disagreed on the marks of only

sixteen per cent of the students. Of this sixteen per cent

 

8The method of random sampling used was that described
by Everett F. Lindquist in "The Technique of Random Selec-

tion," Statistical Analysis pp Educational Research, Boston:
lHoughton Mifflin Company, 1940, pp. 24-29.

52
TABLE I
A COMPARISON OF MARKS GIVEN BY EVALUATORS
WITH MARKS GIVEN THE SAME
STUDENTS BY TEACHERS

 

 

 

Mark given Mark given Percentage
by Evaluator by Teacher of Students
of Speech of Speech with These
Project Class Marks
Students
with A A 2
Identical B B 24
Marks from C C 38
both Eval- D D 19
uator and E E _1_
Teacher
Total 84
Students
who were A B 0
Marked B C 2
Lower by C D l
the Teacher D E _;_
than by
Evaluator Total 4
Students
who were
Marked B A 1
Lower by C B 3
the Eval- D C 7
uator than E D _;_
by the
Teacher Total 12

 

NCDte: In no case did the mark given by the evaluators of
1319 intergroup speech project differ two degrees (i.e., A
tC> C or C to A, etc.) from the mark given by the teacher.

53
the speech teacher gave four per cent of the students lower
marks and twelve per cent of the students higher marks than

did the evaluators of the intergroup speech projects.

Reliability. Reliability may be expressed by the ex-
tent to which two independent measurements will yield the
same quantitative score. In the present study it was assumed
that if the Speech rating scale is a trustworthy device it
should give approximately the same results when employed by
evaluators having a certain minimum background of speech
courses. Hence a calculation of the coefficient of correla-
tion of the rating by pairs of judges Should furnish an index
of reliability of the scale.

In order to determine this coefficient of reliability
the writer computed the correlation of the scores given by
each pair of judges who rated a group of speakers. This cor-
relation was extended by the use of the Spearman-Brown
formula to include all judges.9

Since no machine exists for measuring speaking skills,

any evaluative system involves some sort of human fallibility.

This is substantiated by Shen:10

9Guilrord, pp. pip., p. 421. Also: E. L. Clark,
"Spearman-Brown Formula Applied to Ratings of Personality

frraits," Journal pf Educationpl Ps holo , October, 1935.
pp. 552-55°

10Eugene Shen, "The Reliability Coefficient of Per-
S<1na1 Ratings," Journal p; Educational Psychology, 163232,
April, 1925.

54

The reliability of mental tests is usually measured by
correlation between results of two comparable tests. By
analogy, the reliability of personal ratings may be evaluated
by a correlation between ratings by two comparable judges.
By pairing elements of two tests such that they are similar
in difficulty and type, an author can to a certain extent
insure the comparability between his tests. But the compar-
ability of judges is much more precarious; it is entirely
beyond the control of the investigator except by a meager
selective function that he may fallibly exercise. On
account of this uncontrollable variability of judges, a
correlation between two judges is a very crude approxima-
tion of the reliability of either. The reliability of a
judge thus crudely evaluated often varies considerably
according to the judge with whom he happens to be correla-

ted.

In this study the coefficient of reliability, when
correlating ratings given by student evaluators with those
given by faculty evaluators, was .61 for the first semester
and .62 for the second semester. However, when additional
evaluators participated, the coefficient of reliability was
.68 for the student evaluators and .72 for the faculty eval-
uators. This, according to Slawson,ll indicates very high

reliability for the use of a scale rating personal traits.

B. The Distribution pf Scores

*‘

A study of the distribution of scores is not only

essential in order to measure the speech proficiency and

llJohn Slawson, "The Reliability of Judgments of

.Personal Traits," Journal 9; Applied PS cholo , 6:161-71,
April, 1922. Also Symonds, pp. cit., p.95

55

degree of improvement made, but also worthwhile to provide
background material for understanding the factors affecting
these scores.

The scores of the speakers in each of the four inter-
group speech projects were treated separately. Since the rat-
ing scale was constructed with a hundred points maximum, the
highest possible total score that could be given to a student
by adding the four evaluations given in any intergroup speech
project would be four hundred. Actually during the 1952-53
academic year no one received a score over 371 or under 115.
The range, mean, and standard deviation of the total scores
in each of the intergroup speech projects may be seen in
Tables II and III.

Since Fundamentals pf Speech is a required course for
all freshmen on the campus of Central Michigan College of

Education, academic grades for each intergroup speech project

12

were computed according to institutional policy. This was

done by plotting a curve and assigning marks as follows:
(1) "C's" were given to all of the scores within the
range of a point one-half standard deviation below the mean

and a point one-half standard deviation above the mean.

 

12Central Michigan College of Education, Faculty
Handbook; A Summary pf the More Important Policies, Regula—
tions, and Procedures. Mt. Pleasant, Michigan, 1953, p. 58.

TABLE II

56

RANGE, MEAN, AND STANDARD DEVIATION OF TOTAL SCORES

RECEIVED BY STUDENTS PARTICIPATING IN THE

INTERGROUP SPEECH PROJECTS

GROUP 1*
1952-53

 

 

First Inter-
group Speech

Project
Range 166-348
Mean 259
Standard
Deviation 36

Second Inter-
group Speech
Project

181-371

287

35

 

 

*Group I is comprised of the 305 students who partic-

ipated in the intergroup speech projects first semes-

ter.

57

TABLE III

RANGE, MEAN, AND STANDARD DEVIATION OF TOTAL SCORES
RECEIVED BY STUDENTS PARTICIPATING IN THE
INTERGROUP SPEECH PROJECTS

 

 

GROUP 11*
1952-53

First Inter- Second Inter-

group Speech group Speech

Project Project
Range 115-334 200-355
Mean 257 279
Standard
Deviation 28.5 27.8

 

 

*Group II is comprised of the 244 students who par-
ticipated in the intergroup speech projects second

semester.

58

(2) "B's" were given to all of the scores between one
half and one and a half plus standard deviations from the
mean.

(3) "D's" were given to all of the scores between one
half and one and a half minus standard deviations from the
mean.

(4) "A's" were given to all of the scores on the plus
end of the curve beyond one and a half standard deviations
from the mean.

(5) "E's" were given to all of the scores on the minus
end of the curve beyond one and a half standard deviations
from the mean.

In a normal bell shaped curve this method would mean
38.30 per cent "C's," 24.17 per cent each for the "B's" and
"D's," and 6.68 per cent each for the "A's" and the "E's."13

Qpppp I. (First Semester Students). In order to
avoid confusion of terms "first semester" and "second semes-
ter" with "first intergroup speech project" and "second in-
tergroup speech project," first semester students will be
referred to as Group I and second semester students as Group
II. As indicated earlier in this study, Group I as well as

Group II had two intergroup speech projects. There are three

 

13Harry‘W. Sundwall, "Normal Curve Score Probabili-
ties." East Lansing: Michigan State College, 1950.
(Mumeographed).

59

significant facts to notice regarding the distribution of
scores of this Group I.

(1) From Table II it may be noticed that the range of
scores in the second intergroup speech project, 181-371, was
greater on both the lower and upper level than the range of
scores in the first intergroup Speech project. The mean rose
from 259 in the first intergroup speech project to 287 in the
second intergroup speech project with a mean improvement of
28 points per student. By applying the "t test" this was
found to be statistically significant at the .01 level of
confidence.14 0n the basis of this calculation the differ-
ence of 28 points between these mean gains could happen by
chance less than once in a hundred times.

(2) Inspection of Table IV shows that 305 students in

Fundamentals g§_Speech class participated in all of the per-

 

formances in the experiment during the first semester. As
indicated in this table, 80.7 per cent of these received a
higher score while 19.3 per cent either received the same or
a lower score in the second intergroup speech project. The
greatest gain made by a student was 115 points while the
greatest loss was 54 points.

(3) Table V demonstrates that it was necessary for a

student to have a higher score in the second intergroup

 

l4Garrett, pp. cit., pp. 189—93.

 

TABLE IV

DIFFERENCES BETWEEN FIRST AND SECOND INTERGROUP

SPEECH PROJECT SCORES

 

 

GROUP I
Point Differences Number of Per Cent
between Scores in Students of the
First and Second Making This Total
Speech Project Gain (or Loss) Group

 

111 through 120

101

H
H
I!
II
N
N
1|
N
n
tl
"

110
100

go
0
7O
6O
50
4O
3O

2O
10

Total with higher scores
in second speech project 246

- 9
-29
-39
-49
-59

through
I!

H
N
N
H

O
-10
-2O
-30
-4O
-50

28
13
9
6

O
__3

Total with lower scores
in second speech project 59

O O O O

HNUINOWUJOOl—‘O
O O O
H-F-FF-‘(DNOQCﬁVIWOUU

HHHHH

00
O
V

l—‘OHme
OO\O\OLAJO\

 

l-‘
\O
O

U)

 

 

60

61

TABLE V

TOTAL SCORES AND ACADEMIC RATINGS RECEIVED BY STUDENTS
PARTICIPATING IN THE INTERGROUP
SPEECH PROJECTS

 

 

 

GROUP I
Academic First Inter- Second Inter—
Rating group Speech group Speech
Given Project Project
A 331 or above 345 or above
B 278 - 330 304 - 353
C 241 - 277 270 - 303
D 187 - 240 220 - 269

E 186 or below 219 or below

 

 

62
speech project in order to receive the Same mark as was given
in the first intergroup speech project. This follows because
grades were determined on the basis of the normal curve and
the overall group improved. Thus:

(a) Scores between 331 and 344, which were equal
to "A's" in the first project, were valued as "B's" in the
second.

(b) Scores of 278-303, which were "B's" in the
first project, were "C's" in the second.

(c) Scores in the "C" range in the first project,
241-269, were given "D" grades in the second project.

(d) Scores of 187 to 219 that had been "D" scores
became "E's."

This was true because, as indicated earlier in this chapter,
the grades were allotted according to standard deviations
from the mean; and the mean of the second intergroup speech

project was 28 points higher than the first.

0

Qpppp II. (Second Semester Students). The findings
that were evidenced by the data on Group I were substantially
the same for Group II. They are as follows:

(1) There was a statistically significant improvement
shown in the total scores received by Group II speakers in
the second intergroup speech project when compared with the

scores received in the first project. By comparing data in

Table II with that in Table III one can see that the mean

63
score of the first intergroup speech project of Group II was
257 compared with 259, the mean score of Group I. The mean
score of the second intergroup Speech project of Group II was
279 compared to Group 1'5 287. Thus the mean improvement in
points was 28 for Group II and 22 for Group I. Furthermore,
it may also be noted from Table III that the range of scores
was 115-334 in the first project and 200-355 in the second
project. The same phenomena occurred with both groups,
namely, the scores in the second intergroup speech projects
were consistently higher than those in the first intergroup
speech projects.

(2) As presented in Table VI, 195 (79 per cent) of the
244 Group II speakers who participated in the experiment had
a higher score in the second project than they had in the
first. This 79 per cent is comparable to the 80.7 per cent
of Group I students who also made higher scores in the second
projects.

(3) The results in Group II (presented in Table VII)
were similar to those found in Group I, i.e., the over-all
improvement gains necessitated higher scores by the speaker
in the second intergroup speech project in order to maintain
the same letter mark he earned in the first intergroup speech

project.
C. Findingp Regarding the Four Factors

The academic speech training pf the evaluator. There

was a statistically Significant difference between the rating

TABLE VI

DIFFERENCES BETWEEN FIRST AND SECOND INTERGROUP
SPEECH PROJECT SCORES

 

 

 

GROUP II
Point Differences Number of Per Cent
between Scores in Students of the
First and Second Making This Total
Speech Project Gain (or Loss) Group
1 through 100 2 0.8
l " go 1 0.4
71 " 0 5 1.6
61 " 70 5 1.6
51 " 60 15 6.1
41 " 5O 25 10.3
31 " 4O 31 13.1
21 " 30 33 1§.5
11 " 2O 44 1 .0
l " 10 .3& 1312
Total with higher scores
in second speech project 195 79.0
— 9 " 0 21 8.6
'19 II “‘10 15 6 01
—29 " -20 7 2.9
-39 " -30 4 1.6
-49 " -4O __2 0.4

Total with lower scores
in second speech project 49 21.0

 

 

65

TABLE VII

TOTAL SCORES AND ACADEMIC RATINGS RECEIVED BY STUDENTS
PARTICIPATING IN THE INTERGROUP
SPEECH PROJECTS

 

 

 

GROUP II
1952-53
Academic First Inter- Second Inter-
Rating group Speech group Speech
Given Project Project
A 315 or above 355 or above
B 270 - 314 294 - 334
C 243 - 269 265 - 293
D 200 - 242 225 - 264

E 199 or below 224 or below

 

 

66
given by students, who were in either their third or fourth
year of college, and the ratings given by faculty members
with advanced degrees. This is indicated in Table VIII.

In rating 305 speakers (called "Group I" in this
study) during the first semester of the 1952-53 academic
year, the faculty evaluators gave a mean score of 62.08.

The student evaluators, hearing the same speeches in the same
room at the same time as the faculty evaluators, gave a mean
score of 68.94. Thus the students' ratings averaged 11.1

per cent higher than the instructors' ratings. This is sta-
tistically significant at the five per cent level of con-
fidence. In other words, this phenomenon of student evalua-
tors rating 11.1 per cent higher than the faculty evaluators
could happen by chance only five in a hundred times.15

In rating the 244 speakers (called "Group II" in this
study) during the second semester of the 1952-53 academic
year, the faculty evaluators gave a mean score of 58.89 while
the student evaluators gave a mean score of 69.95. Thus,
the students' ratings averaged 19.9 per cent higher than the
instructors' ratings. This is statistically Significant at

the one per cent level of confidence.

The acquaintanceship pf the evalgator with the speakep.

Each of the twenty-one pairs of evaluators participating in

 

15Everett Franklin Lindquist, Statistical Analysis pp
Educational Research. Boston: Houghton Mifflin Company,
1940. p. 72.

TABLE VIII

A COMPARISON OF THE MEAN SCORES GIVEN
BY FACULTY AND STUDENT EVALUATORS
RATING THE SAME SPEAKERS

 

 

Group I*

Mean Score Given by Student Evaluators

Mean Score Given by Faculty Evaluators

Difference between Student and Faculty Rating

Group I;**

Mean Score Given by Student Evaluators

Mean Score Given by Faculty Evaluators

Difference between Student and Faculty Rating

 

69.95

58.89
11.06

 

 

*N

**N

305
244

67

68
the fourth intergroup speech project, May 11-15, filled out
an Evaluator's Acquaintanceship Check Sheet.16 However, only
eight cases occurred where one speaker was well known by a
faculty evaluator and not known by a student evaluator while
another Speaker in the same group was well known by this stu-
dent evaluator and not known by the faculty evaluator. These
eight cases are presented in Table IX. It can be seen here
that the students who knew the speakers gave a mean score two
points higher than did the faculty members who did not know
the speakers. However, when these same faculty members knew
a speaker, they gave him a mean score seven points higher
than did the student evaluators who were unacquainted with
these speakers. Thus in both cases evaluators who knew the
speakers rated them higher than did evaluators who did not
know the Speakers. This was especially true of faculty mem-
bers.

FUrthermore, as indicated in Table I speech teachers
gave the same mark to eighty-five per cent of their students
as was given by the evaluators. However, when these marks
did differ, the teacher gave a higher rating than did the
evaluators in three of every four cases. Since the teacher
was acquainted with all of his students and the evaluators
were acquainted with only about ten per cent of these speak-

ers, one may assume that acquaintanceship was a positive

 

16See Appendix J.

69

.No. we maoxmmmm on» Socx pom ow on: enema can mummmmmm one Roux
one mnouwsaw>m may moospon “av downwaonaoo Ho psoHOHhmmoo anew

 

 

 

 

 

m m
NI: mm mm mu: mm mm «Heuaao
m Hm mm mu mm mm mmanmmo
m we om m mm Hm Hemnmeo
NH mm mm Ha- am on mmeuaeo
ma mm mm mu me me mmcuauo
ma- em we m mm mm moaimeo
can mm mm a Hm mm oeeummo
m we 45 e no He mmmnaeo
umpcﬁwswow mnmxmomm mnoxmomm wopuﬁmsuow mnmxmoqm whoxmmgm mnopmsam>m
lab madam 302M poz Scam nab mscaﬁ Roam poz Roam Ho mama
condemnaow om mummuzpm haddowm nopqﬁm5do< on spaswmﬂllmmmmmmwm you ocoo
mononommﬁa "Coma Go>ﬁw monoom oodonwmmam ”awn; mmeﬁo monoom

 

 

mmmwmmm mma 39H; QMHZH¢DGU«ZD mH moadqu>m Bzmmbam
mma Q24 amaszboow mH mOH<DA¢>m qubodm was 2mm; zm>Ho
mmmoom Mme mHHS ammdmsoo mmm<mmm Mme mHHB QWHZHdboo<ZD mH mOH¢DA¢>m
Magbogm Ema Gad QmBZH<DGo< wH moa<DA<>m Hzmmbam mma zmm; zm>Ho mmmoom

NH mqmwa

7O

factor in securing a higher rating.

Tpp experience pf the evaluator with the rating scale.
In order to determine whether experience with the rating
scale improved the reliability of the ratings, coefficients
of correlation were computed on (1) the scores given by the
raters when they first used the rating scale in October, and
(2) the scores given by the same raters when, after some ex-
perience, they used the rating scale again in January. These
coefficients of correlation of ratings given by pairs of
judges in October were compared with those of ratings given
by the same pairs of evaluators rating the same speakers
three months later. As indicated in Table X, there was no
evidence of a Significant difference between the experienced
and inexperienced evaluators. Ten pairs of evaluators showed
a mean increase in correlation of fourteen points. However,
nine pairs of evaluators showed a mean decrease in correla-
tion of fifteen points. The correlations of the ratings
given by two pairs of evaluators remained substantially the

same 0

Tpp pp; pf tpp evaluato; 2p relation tp thg pp; pf thp
speaker. A comparison was made of the mean scores given
female speakers and male speakers by four combinations of
judges, i.e., (l) a male faculty member judging with a male
student evaluator, (2) a female faculty member judging with a

female student evaluator, (3) a male faculty member judging

71

.QOprHmupoo no pcmﬁoammooo esp opmcmﬂmmc on com: ma en: Honamm chew

.mmwsmwh
ma .mocooomm ucoaowmﬂo mmw>ﬁm .mpGoUSPm mo agonw mean on» copay
menu was nonopoo ca mnoxmoam mo macaw a couch one «panama anamomm

mmo use pmovdpm oao .mnopmSHw>o no pawn a now mcdwpm Amppoa nommx

 

 

ov.

me.

 

mo.

em.

 

no.

ww.

 

Ho.

no.

 

HO.

 

OH.

ma.

He.

Ho.

mm.

Hm.

 

 

HH.

 

aw.

mm.

 

ON.

mm.

5H.

ow.

mm.

Nw.

*n ma
omwmmoom

*h QH
mmwonocH

mmnﬂpwm
hawSGmh

9H0 *.H
mmdapwm
monopoo
.HO *H

Noooo

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bm¢m< mmazoﬁ mmmme mazmmbam mo abomo m2¢m mma GZHUQDW mmOB¢DA¢>m

Neabodm 02d HZMQDHm Mm zm>HU mozHB<m mo ZOHH¢qmmmOU mme mo zomHm<m§oo 4

N mqm4a

 

72
with a female student evaluator, and (4) a female faculty
member judging with a male student evaluator. This compari-
son was made for both first semester (Group I) and second
semester (Group 11) speakers. The data are outlined in
Tables XI, XII, XIII, and XIV.

The greatest difference between ratings given by pairs
of evaluators occurred when a female faculty evaluator and a
female student evaluator rated together. This was true in
rating both male and female speakers and occurred in Group II
as well as in Group I. Rating female speakers, the female
student evaluators gave an average of 9.66 points higher the
first semester (Table XI) and 16.06 points higher the second
semester (Table XIII) than did the female faculty evaluators.
Rating male Speakers, the female student evaluators gave an
average of 10.61 points higher the first semester (Table XII)
and 14.22 points higher the second semester (Table XIV) than
did the female faculty evaluators.

The least difference between ratings given by pairs of
evaluators occurred when a female faculty evaluator and a
male student evaluator rated together. In rating first se-
mester students the female faculty and male student evalua—
tors differed 3.15 points in rating female speakers and only

.21 in rating male speakers.

73

TABLE XI

A COMPARISON OF THE MEAN SCORES GIVEN
BY FACULTY AND STUDENT EVALUATORS
RATING FEMALE SPEAKERS

 

 

 

GROUP I

Sex of Sex of Mean Score Mean Score Difference
Faculty Student Given by Given by of the
Evaluator Evaluator Faculty Students Mean Scores
Male Male 62.40 67.85 5.45
Female Female 63.66 73.22 9.66
Male Female 63.00 69.95 6.95
Female Male 65.57 68.72 3.15

All Four Combinations
of Judges Listed Above 63.66 69.94 6.28

 

 

74

TABLE XII

A COMPARISON OF THE MEAN SCORES GIVEN
BY FACULTY AND STUDENT EVALUATORS
RATING THE MALE SPEAKERS

 

 

 

GROUP I

Sex of Sex of Mean Score Mean Score Difference
Faculty Student Given by Given by of the
Evaluator Evaluator Faculty Students Mean Scores
Male Male 57.20 62.50 5.30
Female Female 63.06 73.67 10.61
Male Female 58.06 67.89 9.83
Female Male 64.68 64.89 .21

All Four Combinations
of Judges Listed Above 60.75 67.24 6.49

 

 

75

TABLE XIII

A COMPARISON OF THE MEAN SCORES GIVEN
BY FACULTY AND STUDENT EVALUATORS
RATING FEMALE SPEAKERS
GROUP II

 

 

 

Sex of Sex of Mean Score Mean Score Difference
Faculty Student Given by Given by of the
Evaluator Evaluator Faculty Students Mean Scores
Male Male 64.39 70.43 6.04
Female Female 56.98 73.04 16.06
Male Female 57.14 68.84 11.70
Female Male 59.47 71.46 11.99
All Four Combinations
of Judges Listed Above 59.50 70.94 11.44

 

76

TABLE XIV

A COMPARISON OF THE MEAN SCORES GIVEN
BY FACULTY AND STUDENT EVALUATORS
RATING MALE SPEAKERS

 

 

 

GROUP II

Sex of Sex of Mean Score Mean Score Difference
Faculty Student Given by Given by of the
Evaluator Evaluator Faculty Students Mean Scores
Male Male 62.26 66.39 4.13
Female Female 55.09 69.31 14.22
Male Female 57.14 68.84 11.70
Female Male 58.81 67.26 8.45

All Four Combinations -
of Judges Listed Above 58.33 67.95 9.62

 

 

77
D. Summary

An analysis of the data indicates that:

(l) The coefficient of reliability of the rating scale
ranged from .61, when used by pairs of evaluators comprised
of one student and one faculty member, to .72, when used by
pairs of faculty members exclusively.

(2) A degree of validity was evidenced in that eighty-
four per cent of the students received the same marks from
their speech teacher as from the evaluators.

(3) Although the distribution of scores showed a range
of 115 to 371 points, a consistent pattern of higher scores
during the second intergroup speech projects was evidenced
both semesters.

(4) The students' ratings averaged 11.1 per cent higher
than the instructors' ratings the first semester and 19.9 per
cent higher than the instructors' ratings the second semester.

(5) In general, when faculty members knew a Speaker,
they gave him a mean score seven points higher than did the
student evaluators who were unacquainted with the speaker.

(6) When the marks that the speech teachers gave their
students differed from the marks given these students by eval-
uators, the speech teacher who knew the speakers involved
gave them higher ratings seventy-five per cent of the time.

(7) Ten pairs of evaluators showed a mean increase in

correlation of fourteen points between October and January

78
ratings. However, nine pairs of evaluators Showed a mean
decrease in correlation of fifteen points during the same
period.

(8) The greatest difference between ratings given by
pairs of evaluators occurred when a female faculty evaluator
and a female student evaluator rated together.

(9) The least difference between ratings given by
pairs of evaluators occurred when a female faculty evaluator

'and a male student evaluator rated together.

CHAPTER V
CONCLUSIONS AND RECOMMENDATIONS

In this concluding chapter the principal findings of
the research will be summarized and the implications for edu-
cation noted. Although these will be pertinent primarily to
the evaluation of students in Fundamental; pf Speech at
Central Michigan College of Education, they may also have
significance in comparable appraisal situations elsewhere.

In the light of this investigation additional problems which

merit study have been disclosed.
A. Principal Findings

In Chapter IV of this thesis some of the interesting
facts regarding the rating scale and the distribution of
scores have been presented as incidental to the main objec-
tives of the investigation.

The principal findings relate to the questions raised
in Chapter I regarding the role of the four selected factors
in the evaluation of speech performance: the academic speech
training of the evaluator, the acquaintanceship of the evalu-
ator with the rating scale, the experience of the evaluator
with the rating scale, and the sex of the evaluator in rela—

tion to the sex of the Speaker. Taking these up one by one

the results are as follows:

80

Tpp academic ppeech training 9; the evaluator. Rat-
ings given by college juniors and seniors who were speech
majors or minors differed significantly from those given by
members of the faculty who had advanced degrees in speech.
Pairs of student evaluators as well as pairs of faculty eval-
uators had a higher correlation of scores than did pairs of
evaluators comprised of one student and one faculty member.
Furthermore, the fact that students gave a mean score 15.5
per cent higher than did the faculty evaluators cannot be
easily attributed to chance when one considers: (1) that a
total of 4392 comparable ratings were studied, (2) that in
all cases student and faculty evaluators heard the speeches
at the same time, and (3) that there was a rather consistent
pattern of student evaluators giving higher ratings than
faculty evaluators, ranging from a mean difference of 11.1
per cent the first semester to 19.9 per cent the second se-

mester.

Tpp acquaintanceship pf the evaluator pith the ppeaker.
Ratings given by the evaluators acquainted with the speaker
differed somewhat, but not Significantly, from those given by
the evaluators unacquainted with the speaker. Although the
results are not statistically Significant, they are in agree-

ment with the results of published research by Knight1 and

 

lKnight, pp. cit., p. 141.

 

81

Henrickson,2 i.e., that evaluators who are acquainted with
the speakers give them higher ratings than do evaluators who
are unacquainted with these speakers:

Thus, although in general student evaluators gave
higher ratings than did faculty evaluators, this relation
was reversed when the faculty evaluators were acquainted with
the Speakers. Furthermore, when the mark given by the teacher
of the Fundamentals p: Speech class differed from the mark
computed from the four evaluations, the teacher's mark was
higher than the evaluators' mark seventy-five per cent of the
time. Since the teacher was better acquainted with most of
his Speech students than were the evaluators, one may assume

that the factor of acquaintanceship influenced these ratings.

The experience p: the evaluator with the rating scale.
Ratings given by the evaluators before they had experience
with the rating scale did not differ significantly from those
given by the evaluators after they had experience with the
rating scale. A possible explanation of this may be:

(1) All of the evaluators had some speech training
before using the Evaluator's Rating Scale for the first time.
This scale evaluated on the customary criteria so there was
little new or different even when using the rating scale for

the first time. Furthermore, the interpretations of the

 

2Henrickson, pp. cit., p. 124.

82

criteria were agreed upon at two discussion meetings previous
to the first intergroup speech project.

(2) Student and faculty evaluators did not compare
ratings at any time. Thus an evaluator did not know whether
he was giving higher or lower scores than his colleague and
could not, of course, profit by experience and adjust to his

fellow evaluator's ratings.

Tpp pp; pf the evaluator 1p relation tp tpp pp; p: tpp
speaker. Ratings of Speakers of each sex differed signifi-
cantly according to the sex of the evaluator. Specifically
these differences are:

(1) Female student evaluators gave higher ratings to
both male and female Speakers than did male student evalua-
tors.

(2) Female student evaluators gave relatively higher
ratings to male speakers than they gave to female speakers.

(3) Male student evaluators gave relatively higher
ratings to female speakers than they gave to male Speakers.

(4) Male and female faculty evaluators gave ratings
substantially the same. However, female faculty evaluators
slightly favored male speakers, and male faculty evaluators
slightly favored female speakers.

Furthermore, when two female evaluators rated together,
they differed from each other more than did any other com-

bination of evaluators. This was true probably because, in

83
general, the most lenient ratings were given by female stu-
dent evaluators and the most severe ratings by the female

faculty evaluators.
B. Educational Implications

Significance for Central Michigan Collegp pf Education.
This study has important significance for Central Michigan
College of Education. From the analysis of the data it is
evident that, if one wishes to have comparable scores when
rating speech performances, evaluators ought not be chosen at
random. This is true even when the raters have had training
in college speech courses.

Two rules may be formulated for the selection of eval-
uators for future intergroup Speech projects.

(1) Since ratings given by upperclassmen, who are
Speech majors and minors, tend to differ Significantly from
the scores given by members of the speech faculty, speakers
who are to be compared ought to be rated exclusively by stu-
dents, exclusively by faculty members, or by a like number of
each.

(2) Since the student evaluators of both sexes tend to
evaluate male speakers differently than they evaluate female
speakers, this factor must be taken into consideration when
assigning evaluators if the scores are to be comparable. In
order to get more meaningful scores, an equal number of male

and female student evaluators Should evaluate each speaker.

85

Significance for education ip general. To make broad
general statements from a Specific study often results in in-
accuracies. However, one available criterion for the accept-
ance of a conclusion is the consistency of results from
different investigations. A survey of the experimental lit-
erature on differences of evaluators according to the factor
of sex reveal certain major findings. These findings are so
frequently repeated by different investigations as to suggest
a valid basis for a generalization. The conclusion that
Carter3 arrived at in studying marks assigned by teachers of
beginning algebra, the findings that Douglas and Newman4
announced after studying the marks of students in four Min-
nesota high schools, the results of the study by Webb5 on
rating intelligence, and the conclusions of the study made
by the writer6 regarding the attitudes of intercollegiate
debaters towards debate judges, all substantiate the find-
ings of this study, i.e., that female evaluators rate differ-
ently than do male evaluators. Thus it may be concluded that
the sex of the evaluator is a factor to be taken into consid—
eration when attempting to make objective and valid evalua-
tions not only in Speech but in education in general.

3Carter, pp. plt., p. 93.

4Douglas and Newman, pp. ptt., pp. 86-87.

5Webb, pp. _ip., pp. 252-53.

6Prister, pp. pip., pp. 102-03.

86

Furthermore, the fact that juniors and seniors who
were speech majors or minors gave significantly higher rat-
ings than did faculty evaluators may quite logically have
implications for other areas of education. However, these

cannot be stated conclusively until further research is done.
C. Suggestions for Further Study

The possible means of analyzing the data collected in
this study have not been exhausted. Since the pertinent
items have been recorded on IBM punch cards, they could
easily be sorted, arranged, and tabulated in various ways to
furnish evidence in answer to other questions.

One of these questions might be: Is speech skill
evaluated by techniques describing specific factors more re-
liable than a Gestalt "general effectiveness" rating? This
could be partially answered by further statistical study of
the 4392 ratings secured in this experiment. The scores
given on the four traits (thought, language, voice, and
action) could be compared to the scores given the same speak-
ers on "general impression and communicativeness."

Another question might be: What role, if any, does
the "halo effect," described by Thorndike,7 have on these

ratings of speech performances? This could be answered by a

 

7Edward L. Thorndike, "Constant Error in Psychological
Rating," Journal pf Applied Psychology, 4:25-29, March, 1920.

87
study of the sub-traits to determine the extent of this con-
stant error.

Although this study found a significant difference be-
tween ratings given by student and faculty evaluators, it did
not attempt to make a study of the complexity of factors in-
herent in the labels "student" and "faculty." In an insti-
tution such as Central Michigan College of Education some
"students," especially juniors and seniors who are speech
majors or minors, do supervised teaching in the secondary
schools while some "faculty" are often students in graduate
courses in speech and education leading to a higher degree.
Although the students and faculty in this study differ in
their amount of academic speech training, a further study
might examine whether this is the most distinguishable char—
acteristic.

Furthermore, in a few years an interesting follow-up
study on the evaluators might be made. Since most of the
evaluators either are or will be teachers, the factor of age
and teaching experience could be considered and perhaps find
answers to such questions as: (1) Do teachers who rate
severely become more lenient as they get older and more ex-
perienced? and (2) Do teachers who give lenient ratings be-
come more severe raters as time goes on?

More data regarding the student evaluators are avail-
able in the files of the Student Personnel Division at
Central Michigan College of Education. This could be used

88
with the data already tabulated on the IBM cards to determine
whether there is any substantiation for the hypothesis that
students who make better semester grades in speech courses
are more reliable raters than their fellow students.

If it is at all possible, a study making use of some
other experimental design should investigate specifically the
role that the factor of acquaintanceship has upon the ratings
of speech performances. There is, of course, a great deal of
difficulty involved in conducting accurate research where
emotional reactions are concerned. Nevertheless, if the re-
action (favorable or unfavorable) of the acquaintanceship

could be considered, results may be more meaningful.
D. Summapy

Although there was much concomitant learning incident
to this study, the research was primarily concerned with
determining the influence of four selected factors on the
ratings of speech performances.

The findings of this research led to the conclusion
that the academic speech training of the evaluator influences
the ratings that he gives. Undergraduate evaluators with
majors or minors in speech gave significantly higher ratings
than did evaluators with advanced degrees in speech. Further-
more, scores given by pairs of undergraduate evaluators had a

.higher correlation than did scores given by student-faculty

lDairs of evaluators. Pairs of evaluators with advanced

89
degrees in speech had the highest correlation.

The investigation of the influence that the factor of
acquaintanceship has in rating Speech performances was incon-
clusive. However, the results of this study tend to sub-
stantiate the findings of previous research, i.e., that eval-
uators who are acquainted with the speakers give them higher
ratings than do evaluators who are unacquainted with these
speakers.

This study found that the experience of the evaluator
with the particular rating scale did not significantly in-
fluence the ratings. However, all the evaluators had a cer-
tain minimum of speech training and had rated speeches
previously.

The literature and data of this study support the con-
tention that male and female evaluators rate male and female
speakers differently. This is evidence that the factor of
sex does influence the ratings of speech performances.

Probably the most important conclusion that the writer
has arrived at is a philosophic one. Through the experiences
gained from planning, executing, and evaluating this study,
he realizes that by attempting to secure answers to specific
questions one raises many more questions that need to be

answered. Thus the research started in this thesis may con-

tinue indefinitely.

BIBLIOGRAPHY

BOOKS

PERIODICALS

BULLETINS AND MONOGRAPHS
UNPUBLISHED MATERIALS

BIBLIOGRAPHY
A. BOOKS

Anastasi, Ann. Differential Psychology; Individual and Group
Diffprences in Behavior. New York: The Macmillan Com-

pany, 1937. ”515 pp.

Baird, A. Craig, and Franklin H. Knower. General Speech: Ap
introduction. New York: McGraw-Hill Book Company, 1949.

500 pp.

Barnes, Harry G. Speech Handbook. Iowa City: Privately
printed, 1936. 138 pp.

Beach, Ann F., and others. Bibliography pp the Use p; IBM
Machines Ln Science, Statistics, and Education. New York:
International Business Machines Corporation, January,

1954. 60 pp.

Brownell, William A., and others. The Forty-FAfth Yearbook
of the National Society for the Study of Educatipp:

Part I, The Measurement of Understanding. Chicago:
University of Chicago Press, 194 . 33 pp.

 

 

Campbell, William Giles. Form and Style Ln Thesis Writing.
New York: Houghton Mifflin Company, 1954.114 pp.

Carp, Bernard. A Study pf the influence p: Certain Personal
Factors pp A Speech Judgment. New Rochelle, New York:
The Little Print, 1945. 122 pp.

Craig, Alice Evelyn. 12$ Speech Artg. New York: The Mac-
millan Company, 1947. 499 pp.

Fisher, Ronald Aylmer. The Design of Experiments. New York:
Hafner Publishing Company, 1951.244 pp.

Freeman, Graydon LaVern. The Energetics p; Human Behavior.
Ithaca, New York: Cornell University Press, #1948. 344 pp.

Friederich, Willard J. and Ruth Wilcox. Teaching Speech Ln
Hi h Schools. New York: The Macmillan Company, 1953.

4 7 PP-

Garrett, Henry E. StatApticp Ap Psychology and Eppca ion.
New York: Longmans, Green and Company, 1947. 4 7 pp.

92

Good, Carter V., A. S. Barr, and Douglas E. Scates. The

Methodology p: Educational Research. New York: D.
Appleton-Century Crofts, 1941. 890 pp.

Guilford, J. P. Psychometric Methods. New York: McGraw-
Hill Book Company, 1936. 556 pp.

Hedde, Wilhelmina G., and William Norwood Brigance. American
Speech. New York: J. B. Lippincott Company, 1947.

59 pp-

Hollingworth, Harry L. The Psychology p; the Audience. New
York: American Book Company, 1935. 232 pp.

Jordan, Arthur Melville. Measurement pp.Education. New
York: McGraw-Hill Book Company, 1953. 533 pp.

Kelley, Truman Lee. Scientific Method: Its Function pp
Research and 1p Education. New York: The Macmillan Com-

pany, 1932- 233 pp-

. The Inflpence p; Nurture ppon Individual Differ-
ences. New York: The Macmillan Company, 1926: 49 pp.

Knower, Franklin H. Table pi Contents p; thp Quazteply
Journal pg Speech {1915—1952}; Speech Monogzaphs $1934-
193253 and the ppeech Teacher (19522 with 5 Revised Lpdpx
Compiled Through 195 . Columbia, Missouri: Speech
Association of America, 1953. 61 pp.

Laird, Donald A. The Psychology p; Selecting Men. New York:
McGraw-Hill Book Company, 1927. 2 9 pp.

Lindquist, Everett Franklin. Statistical Analysis ip Educa-
tiona; Research. Boston: Houghton Mifflin Company,

1940. 256'pp.

Monroe, Walter 8. (ed.). Encyclopedia p; Educational Ep-
search. Revised edition. New York: The Macmillan Com-

pany, 1950- 1520 pp-

Murray, Elwood. The Speech Personalit . New York: J. B.
Lippincott Company, 1944. 5 5 pp.

 

Ogg, Helen L. and Ray K. Immel. Speech Improvement. New
York: F. S. Crofts and Company, 1936. 190 pp.

Pease, Katharine. Machine Computations p; Elementary Statis-
tics. New York: Chartwell House, Inc., 1949. 20 pp.

93

Reid, Loren D. Lgaching Speech 1p High School. Columbia,
Missouri: Artcraft Press, 1952. 301 pp.

Robinson, Karl F. Teaching Speech ip Secondagy School. New
York: Longmans, Green and Company, 1951. 438 pp.

Rugg, Harold 0. Statistical Methods Applied pp Education.
New York: Houghton Mifflin Company, 1917. 410 pp.

Seward, Georgene H. Sex and the Social Order. New York:
McGraw-Hill Book Company, 1946. 301 pp.

Snedecor, George W. Statistical Methods. Ames: The Iowa
State College Press, 1946. 485 pp.

Symonds, Percival M. Diagnosing Personalipy and Conduct.
New York: D. Appleton-Century Company, 1931. 602 pp.

Terman, Lewis M., and Catharine Cox Niles. Sex and Fe son-
alit . New York: McGraw-Hill Book Company, 1935.
500 pp.

Thonssen, Lester W., and Elizabeth Fatherson. Bibliography
of Speech Education. New York: H. W. Wilson and Com—

Tanv. Ivy. 800 pp.

Thonssen, Lester W., Mary Margaret Robb, and Dorthea Thonssen.

Bibliography 9; Speech Education - Supplement 1932- 8.
New York: H. W. Wilson and Company, 1950. 393 pp.

 

Weaver, Andrew T., Gladys Borchers, and Donald K. Smith. The

Teaching of Speech. New York: Prentice-Hall, Incorpor—

B. PERIODICALS

Anderson, Gordon V., and Royal B. Embree. "Appraisal of the
Individual " Review 9; Educapional Reseagch 18:157—74,
April, 194 .

Bartlett, Neil R. "A Punched-Card Technique for Computing
Means, Standard Deviations, and the Product-Moment
Correlation Coefficient and for Listing Scattergrams,"
Science, 104:374—75, October 18, 1946.

Bingham, Walter V. "Halo, Invalid and Valid," Journal 9;
Applied Psychology, 23:221—28, April, 1939.

94

Brownell, William A. "Some Neglected Safeguards in Control-
Grou Experimentation," Journal 9; Educational Research,
27:9 —lO7, October, 1933.

Bryan, Alice J., and Walter H. Wilke. "A Scale for Measur-
ing Speaking Abilities," Psychological Bulletin, 33:605-
06, October, 1936.

______. "Audience Tendencies in Rating Public Speeches,"
iguana; 2; Applied Psychology, 26:371-81, June, 1942.

______ "A Technique for Rating Speeches," Journal p; Con—
gulting Psychology, 5:80-90, March-April, 1941.

Cable, Arthur W. "A Criticism Card for Class Use," Quarteply
Journal pi Speech Education, 12:186-88, April, 192 .

Carroll, Robert P. "Practice in Rating," Journal 93 Experi-
mental Psychology, l4:299-302, June, 1931.

Carter, Robert Scriven. "Non-intellectual Variables Involved
in Teachers' Marks," Journal 9; Educational Research,
47:81095, October, 1953.

Cason, Hulsey. "An Annoyance Test and Some Research Prob-
lems," Journal 9; Abnormal and Social Ps cholo , 25:
224—36, July—September, 1930.

Castore, George F., and William S. Dye. "A Simplified Punch
Card Method of Determining Sums of Products,".£syghgme-
trika, 14:243-50, September, 1949.

Champney, Horace, and Helen Marshall. "Optimal Refinement
of the Rating Scale," Journal 9; Applied Psychology,
233323-31, 1939-

Clark, Edward L. "Spearman—Brown Formula Applied to Ratings
of Personality Traits," Journal 9; Educational Psychology,

OCtOPBP: 1935: PP. 552-55-

DiVesta, Francis J., and James H. C. Roach. "An Analysis of
a Procedure Used in the Teaching and Rating of Oral Ex-
pression," Journal 93 Educatipnal Psychology, 42:468-78,

December, 1951.

Dorcus, Roy M. "Some Factors Involved in Judging Personal
Characteristics," Journal 9; Applied Ps cholo , 10:502—
18, December, 1926.

95

Douglas, Earl R., and Olson E. Newman. "The Relation of
High School Marks to Sex in Four Minnesota Senior High
Schools," School Review, 45:481—88, April, 1937.

Drushal, J. Garber. "An Objective Analysis of Two Techniques
of Teaching Delivery in Public Speaking," Quarterly
Journal pf S eech, 25:561-69, December, 1939.

Dudycha, George J. "A Note on the 'Halo Effect' in Ratings,"
Journal pf Social Psychology, 15:331-33, May, 1942.

Englehart, Max D. "Suggestions with Respect to Experimenta—

tion under School Conditions," Journ l 9; Experimental
Education, 14:225-44, March, 1946.

Freyd, Max. "The Graphic Rating Scale," Journal 9_ Educa—
tional Psychology, 14:83—102, February, 1923.

 

Furfey, Paul H. "An Improved Rating Scale Technique,"
Journal 9: Educational Psychology, 17:45-48, January,
1926.

Gauger, Paul W. "Comparison of the Abilities of High School
Speech Students and Speech Experts in Rating a Speech
Performance," Journal 9; Educational Research, 42:209-17,
November, 1948.

Gilkinson, Howard, and Franklin H. Knower. "A Study of
Standardized Personality Tests and Skill in Speech,"
Journal pg Educationgl Psychology, 32:161-75, 1941.

Gray, J. Stanley. "Objective Keasurements for Public Speak-
ing," Journal 9; Expression, 2:20-26, March, 1928.

Hart, Hornell N., and Elmer Olander. "Sex Differences in
Character as Indicated by Teachers' Ratings," School and
Societ , 20:381—82, September 6, 1924.

Hawthorne, Joseph W. "An Attempt to Measure Certain Phases
of Speech," Journal 9; General Psychology, 10:399-414,
April, 1934.

Hayworth Donald. "Tests and Measurements in Public Speak-
ing,“ Quarterly Journal 9; s eech, 21:272-75, November,

1935-

Holcomb, Martin J. "The Critic-Judge System," Quarterly
Journal 9; Speech, 19:28-38, February, 1933.

Hollister, Richard. "Faculty Judging," Quarterly Journal
9; Public S eakin , 3:235-41, April, 1917.

96

Holm, James N. "A Progressive Form of the Speech Criticism
Blank," Speaker 9; Tau Kappa Alpha, 22:12—13, March, 1938.

Hudgins, Clarence V. "Concerning the Validity of Speech
Tests," Volta Review, 45:271-72, May, 1943.

Jurgensen, C. E. "Intercorrelations in Merit Rating Traits,"
Journal 9: Applied Psychology, 34:240—43, August, 1950.

Kincheloe, Isabel. "On Refining the Speech Scales," English
Journal, 34:204-07, April, 1945.

Klein, Ruth, and William McLamers. "Standards in Public

Speaking " ngzteply Jougnal 9; Speech 22:439-42
October,’193 . ’ ’

Knight, Frederic B. "The Effect of Acquaintance Factor upon

Personal Judgments," Joppnal 9; Educational Psychology,
14:129-42, March, 1923.

Knower, Franklin H. "A Suggestive Study of Public Speaking
Rating-Scale Values," Quarterly Journal 93 Speech, 15:30-
41, February, 1929.

_______ "Experimental Study of Changes in Attitudes,"
lgugpal pg Social Ps cholo , 6:315—44, August, 1935.

. "Psychological Tests in Public Speaking," Quarterly
Journal p; Speech, 15:216-22, April, 1929.

. "What Is a Speech Test?" Quarterly Joppnal 91
S eech, 30:485-93, December, 1944.

Kornhauser, Arthur W. "A Comparison of Raters," Journal 9;
Personnel Research, 6:338-44, January, 1927.

_______ "A Comparison of Ratings on Different Traits,"
iggrnal 9; Personnel Research, 6:440-46, March, 1927.

_______ "Reliability of Average Ratings," Journal pt Per-
sonnel Research, 5:309-17, December, 1926.

Lehman, Harvey C., and Paul A. Witty. "Sex Differences:
Some Sources of Confusion and Error," Amepngn Joppnpl
9: Ps cholo , 42:140-47, January, 1930.

Lewis, Helen Bruck. "Studies in the Principles of Judgments
and Attitudes," Journal pt Social Ps cholo , 11:121-46,

February, 1940; and 14:229-55, August, 1941.

97

Lorge, Irving. "Tabulating and Test-Scoring Machines:
Applications of International Business Machines to Edu-
cational Research," Review 9; Educational Research, 12:
550-57, December, 1942.

Mahler, W. R. "Some Common Errors in Employee Rating Prac-
tices," Personngl Joupnal, 26:68-74, June, 1947.

Marsh, S. E., and F. A. C. Perrin. "An Experimental Study
of the Rating Scale Technique," Journal pg Abnormal and
Socigl Ps cholo , 19:383-99, January—March, 1925.

Miner, James Burt. "The Evaluation of a Method for Finely
Graduated Estimates of Ability," Journal 9; Applied
Ps cholo , 1:123-33, June, 1917.

Moore, Wilbur E. "Factors Related to Achievement and Im—
provement in Public Speaking," Quarterly Journal 9;
Speech, 29:213-17, April, 1943.

Newcomb, Theodore. "An Experiment Designed to Test the Va-
lidity of a Rating Technique," Journal p; Educational
Psychology, 22:27—89, April, 1931.

N011, Victor H. "Measuring the Scientific Attitude," Jour-
nal pt Abnormal and Social Ps cholo , 30:145-54, July-
September, 1935.

Pelsma, J. R. "Standardization of Grades in Public Speaking,"
Quarterly Joupnal 9; Public Speaking, 1:266-71, October,
1915.

Pemberton, H. Earl. "A Technique for Determining the Optimum
Rating Scale for Opinion Measures," Sociology and Socigl

Researph, 17:470-72, May-June, 1933.

Pfister, Emil R. "A Survey of Attitudes Toward Debate
Judges," Forensic p: Pi Kappa Delta, 39:102-03, May,
1954.

. "Ratings Rankings or Both?" Speech Activities,
8:25, Spring, I952. ’

Remmers, Hermann H. "Reliability and Halo Effect of High
School and College Students' Judgments of Their Teachers,"

Journal 9; Applied Psychology, 18:619-30, October, 1934.

Richards, T. W., and Willis Ellington. "Objectivity in the
Evaluation of Personality," Journal 9; Experimental
Education, 10:228-37, June, 1942.

98

Rose, Forest H. "Training in Speech and Changes in Person-
ality: A Study of the Effects of Beginning Speech
Courses upon Personality Traits," Quarterly Journal 9;
Speech, 26:193-96, April, 1940.

Rugg, Harold 0. "Is the Rating of Human Character Practi—
cable?" Journal of Educational Psychology, 12:425-38,
November, 1g21; 485-501, December, 1921; 13:30—42, Janu-
ary, 1922; 1-93, February, 1922.

Sapir, Edward. "Speech as a Personality Trait," American
Journal pf SociologY, 32:892-905, May, 1927.

Schneider, Dorothy E., and A. F. Bayroff. "Relationship be-
tween Rater Characteristics and Validity of Ratings,"
Journal 9: Applied Psychology, 37:278-80, August, 1953.

Seedorf, Evelyn H. "An Experimental Study in the Amount of
Agreement Among Judges in Evaluating Oral Interpreta-
tion," Journal 9; Educational Research, 43:10-21,
September, 1949.

Shen, Eugene. "The Influence of Friendship upon Personal
Ratings," Joppnal 9; Applied Psychology, 9:66—68, March,

1925.

_______. "The Reliability Coefficient of Personal Ratings,"
Journal 9; Educational Psychology, 16:232-37, April,

1925.

Slawson, John. "The Reliability of Judgments of Personal
Traits," Journal pi Applied Psychology, 6:161-71, April,

1922.

Smith, Marpheus. "Group Judgments in the Field of Person-

ality Traits," Journal 9; Experimental Psychology, 14:
562—65, October, 1931.

Stagner, Ross. "Judgments of Voice and Personality," Journal
2i Educational Psychology, 27:272-77, April, 1936.

 

Stevens, Wilmer E. "A Rating Scale for Public Speakers,"
ngpterly Joupnal 9; Speech, 14:223-32, April, 1928.

Stinchfield, Sarah M. '"The Standardization of Speech Testing
Material," Quarterly Journal 9: Speech Education, 7:360-
69, November, 1921.

Strang, Ruth. "Seven Ways to Improve the Rating Process,"
Occupations, 29:107-10, November, 1950.

99

Strong, Edward K., Jr. "Weighted vs. Unit Scales," Journal
9: Educational Psychplogy, 36:193-216, April, 194 .

Symonds, Percival M. "Changes in Sex Differences in Problems
and Interests of Adolescents with Increasing Age," Jour—
nal 9; Genetic Psychology, 50:83-89, March, 1937.

. "Notes on Rating," Journal 9: Applied Psychology,
9:188-95, June, 1925.

. "On the Loss of Reliability Due to Coarseness of
the Scale," Journal 9; Eyperimental Psychology, 7:456-60,
December, 1924.

Taylor, Harold C. "Social Agreement on Personality Traits as
Judged from Speech," Journal 9; Social Psychology, 5:244-
48, May, 1934.

Thompson, Wayne N. "Is There a Yardstick for Measuring
Speaking Skill?" Quarterly Journal 9; S eech, 29:87-91,
February, 1943.

Thorndike, Edward L. "A Constant Error in Psychological Rat—
ing," Journal 2; Applied Psychology, 4:25-29, March, 1920.

Tiffin, Joseph, and Wayne Musser. "Weighting Merit Rating
Items," Journal pf Applied Ps holo , 2 :575-83, October,
1942.

Webb, Louis W. "The Ability of Men and Women to Judge Intel-
ligence," School and Society, 20:251-54, August 23, 1924.

Weber, Pearl L. "Judgment Today," Psychological Review, 44:
264-66, May, 1937.

West, Robert, and Helen Larsen. "Some Statistical Investi-
gations in the Field of Speech," Quarterly Journal pi
Speech Education, 7:375-82, November, 1921.

Wilke, Walter H. "The Reliability of Summaries of Rating
Scale Evaluation of Student Personality Traits," Journal

9; Genetic Psychology, 53:313-20, February, 1938.

_______ "A Subjective Measurement in Speech: A Note on
Method," Quarterly Journal of Speech, 21:53—59, February,

1935-

 

100
C. BULLETINS AND MONOGRAPHS

Auer, J. Jeffery. "Doctoral Dissertations in Speech: Work
in Progress, 1954," Speech Monographs, 21:136-41, 1954.

Carter, Gerald G. "Student Personalities as Instructors See
Them," Bulletin 9; Purdue University: Studies tp Higher
Education. Lafayette: Purdue University, 1945. 4 pp.

Central Michigan College of Education. Bulletin, 195g353
Spssions. Mount Pleasant, Michigan: Central Michigan
College of Education, 1952. 256 pp.

Henrickson, Ernest H. "The Relation Among Knowing a Person,
Liking a Person, and Judging Him as a Speaker," Speech
Monographs, 14:22—25, 1940.

Knower, Franklin H. "Graduate Theses-~An Index of Graduate
Work in Speech," Speech Monographs, 21:108-35, 1954.

McNemar, Quinn, and Lewis M. Terman. "Sex Differences in
Variational Tendency," Genetic Psychology Mono ra hs,
18:1-65, February, 1936.

Monroe, Allan H. "Evaluation in Speech Education," Bulletin
9; the National Association 9: Secondary School Princi-
pals, 29:156-64, November, 1945.

"The Measurement and Analysis of Audience Reaction
to Student Speakers-~Studies in Attitude Changes,"
Bulletin pt Purdue University: Studies pp Hi her Educa-
tion. Lafayette: Purdue University, 1937. 0 pp.

Monroe, Allan H., Hermann H. Remmers, and Elizabeth Venemann-
Lyle. "Measuring the Effectiveness of Public Speech in a
Beginning Course," Bulletin 9; Purdue Univepsity:
Studies in Higher Education. Lafayette: Purdue Univer-

sity. 19§6. 29 pp.

Norvelle, Lee. "Development and Application of a Method for
Measuring the Effectiveness of Instruction in a Basic
Speech Course," Speech Monographs, 1:41-65, 1934.

Rosenberg, Ralph P. "Bibliograghies of Theses in America,"
Bulletin p; Bibliography, 1 :181-82, September-December,

94' 0

Thompson, Wayne N. "A Study of the Characteristics of Stu-
dent Raters of Public Speaking Performances," Speech
Monographg, 13:45-53, 1946.

101
D. UNPUBLISHED MATERIALS

Anderson, Mary Margaret. "An Analysis of Some of the Sources
of Variation Involved in Rating Speeches." Unpublished
Master's thesis, Michigan State College, East Lansing,

1945. 19 pp.

Andregg, Neal Berry. "A Critical Study of Graphic Rating
Scales." Unpublished Doctor's dissertation, Michigan
State College, East Lansing, 1951. 138 pp.

Brandenburg, Earnest. "Evaluation in Speech Training Pro-
grams of the Armed Forces." Seattle: Washington Univer-

sity, 1952. 7 pp. (Mimeographed.)

Case, Keith. "An Investigation into the Backgrounds for the
Study and Measurement of Personality in Speech Communi-
cation." Unpublished Doctor's dissertation, Denver Uni-
versity, Denver, 1948. 361 pp.

Central Michigan College of Education. "Faculty Handbook;
A Summary of the More Important Policies, Regulations,
and Procedures." Mount Pleasant, Michigan, 1953. 98 pp.
(Mimeographed.)

Gibbs, David Elmore. "A Study of Reliability and Variation
of Critical Rating of Speech by Trained and Untrained
Observers." Unpublished Master's thesis, University of
Washington, Seattle, 1948. 87 pp.

Graunke, Dean F. "The Use of Student Opinion in the Improve-
ment of Instruction in Speech Fundamentals." Unpublished
Master's thesis, University of Nebraska, Lincoln, 1951.

187 pp-

Jones, Horace Redman. "The Development and Present Status
of Beginning Speech Courses in the Colleges and Univer-
sities in the United States." Unpublished Doctor's
dissertation, Northwestern University, Evanston, 1952.

216 pp.

Martin, W. L. "An Experimental Study of Seventeen Terms Used
in Speech Evaluation." Spokane: Whitworth College,

1953. 16 pp. (Mimeographed.)

Moses, E. R., Jr. "A Survey of Speech Tests in Thirty Ameri-
can Universities and Colleges." Columbus: Ohio State
University, 1942. 15 pp. (Mimeographed.)

Mueller, Henry L. "A Study of Speech Examiners' Ratings."
Unpublished Doctor's dissertation, Teachers' College,
Columbia University, New York City, 1952. 114 pp.

102

Penland, Virgil Darrell. "An Experimental Study to Measure
Effectiveness in Oral Reading by Means of a Rating Scale
Technique." Unpublished Doctor's dissertation, Univer-
sity of Southern California, Los Angeles, 1948. 177 pp.

Sundwall, Harry W. "Normal Curve Derived Score Probabili—
ties." East Lansing: Michigan State College, 1950.
(Mimeographed.)

Thompson, Wayne N. "An Experimental Study of the Accuracy of
Typical Speech Rating Techniques." Unpublished Doctor's
dissertation, Northwestern University, Evanston, 1943.
204 pp.

APPENDIX

EVALUATOR'S RATING SCALE

NAMES AND CODE NUMBERS OF EVALUATORS
LETTER TO SPEECH FACULTY

INVITATION TO STUDENT EVALUATORS
STUDENT EVALUATOR'S PREFERENCE REPORT
LETTER ASSIGNING STUDENT EVALUATORS
SPEAKER'S PREFERENCE SHEET
INTERPRETATION OF RATING SCALE CRITERIA
INSTRUCTIONS TO EVALUATORS

EVALUATOR'S ACQUAINTANCESHIP CHECK SHEET
SPEAKER'S ACQUAINTANCESHIP CHECK SHEET
PUNCH CARD USED FOR TABULATING THE DATA
CODE USED PREPARATORY TO IBM TABULATION

,DATA SHEET USED TO COMPUTE CORRELATIONS

APPENDIX A 104

EVALUATOR'S RATING SCALE

 

 

Speaker
Regular class section ,

day hour room
This Speech: Date Time Room

 

 

 

Directions: Circle the number in the appropriate column
and also write that number in the right hand
margin. Then add the column. Please double
check for accuracy.

 

SCORE
Excel-
Poor--Fair--Average——Good-- lent
I. THOUGHT
Content 1 2 3 4 5 6 7 8 9 10

Organization 1 2 3 4 5 6 7 8 9 10

II. LANGUAGE
Vocabulary 1 2 3 4 5 6 7 8 9 10

Sentence
structure 1 2 3 4 5 6 7 8 9 10

III. VOICE
Enunciation 1 2 3 4 5 6 7 8 9 10
Adequacy 1 2 3 4 5 6 7 8 9 10
IV. ACTION
Posture l 2 3 4 5 6 7 8 9 10
Gesture 1 2 3 4 5 6 7 8 9 10
V. GENERAL

Impression 1 2 3 4 5 6 7 8 9 IO

Communica-
tiveness 1 2 3 4 5 6 7 8 9 10

 

TOTAL

 

(Evaluator's SignatureI

01(1)
02(1)
03(2)
04(2)
95(2)
06(2)
05(1)
0 (2)
09(2)
10(2)
11(1)
12(1)
13(1)
14(2)
15(2)
10(1)
1g(2)
1 (2)
19(2)
20(2)
21(1)
22(2)
23(2)
24(1)
25(2)
26(1)
2 (1)
2 (2)

APPENDIX B

NAMES AND CODE NUMBERS OF

Twyla Jo Newhouse
Jean Mayhew

Fred Bush

Emil Pfister
Elbert Bower
Herbert Curry
Ruth Fox

Don Kilbourn
Wilbur Moore

Neil Soumela
Shirley Clark
Doris Whitcomb
Phyllis Eichhorn
J. D. Shuttleworth
James Jaska
Vivienne Jack
Don Kemp

Raymond Page
Richard Balwinski
James McLennan
Sheila Maule
David West

James Prough
Barbara Moore
Clyde Hatter
Josephine Nickora
Alma Nevins
Arthur Rice

First two numbers:
01 - 09 Faculty evaluators
10 - 55 Student evaluators

Last

number:

Key

NHO\O

hbebwwwwwwwwwwm
AAAAAAAAAAAAAAA
NNNHHNNNHHHNNHN

iwmwomooqownew

####
AAA/\AAAAAAAA

bwwwwwwwnwpm
VVVVVVVVVVVVVVVVVVVVVVVVVVV

\nxmnxnmm -¥>
\J'l #LJU N H CO (D'Q O\\J\

105

EVALUATORS

Richard Torongo
Petrine Churchill
Robert Beckley
John Kirn

Alice Wagner
Jacqueline Robinson
Marian Sanborn
Herbert Sanford
Kenneth Downing
L. D. Foster
Anita Hoag

Betty Borman
Robert Gravelle
Virgil Scott
Royal Riggs

Jack Clary
Bernard Randolph
Jean Conklin
Jack White

Jean Caldwell
Norma Levi
Martha Fuce
Patricia Thwaits
Jean Detzur
Carol Clark
Donna Clapp
Keith Birdsall

(1) Females who judged both semesters

(2) Males who judged both semesters

(3) Females who judged second semester only
(4) Males who judged second semester only

LETTER TO SPEECH FACULTY

Dear Colleague:

APPENDIX C

106

Below is a list of the juniors and seniors who are
Speech majors and minors and who are not on academic pro-

bation.

Intergroup Speech Projects.

They are being considered as evaluators for the
Will you (1) please put our

initials after each student you have had in class; (2 add
a question mark after anyone who you feel may be an incom-

petent judge.

MAJORS

Richard Balwinski
Betty Borman

Jean Caldwell
Shirley Clark
Jack Clary

Jean Conklin
Jean Detzur
Kenneth Downing
Phyllis Eichhorn
L. D. Foster
Phyllis Gordon
Lonna Rae Hall
James Jaska

Don Kemp

John Kirn

Norma Levi

Joyce A. McNamara
Josephine Nickora
Richard Powell
Jacqueline Robinson
Marian M. Sanborn
Carla Snow

Neil Suomela
Loyal Thornton
John Trask

Alice Wagner

Jack White

Emil R. Pfister

MINORS

Brian Beckley
John Bilsky
Keith Birdsall
Dale Brown
Petrine Churchill
Carol Clark
Donna Clapp
Dale Edgerle
Martha Fuce
Robert Gravelle
Clyde W. Hatter
Anita Hoag
Vivienne Jack
Jean Klozik
Betty LaLone
James McLennan
Lorna Lesnick
Robert Lucas
Sheila Maule
Barbara Moore
John Murchie
Alma Nevins
Raymond Page

MINORS

Roger Parrish
James Prough
Bernard Randolph
Arthur Rice
Royal Riggs
Herbert Sanford
Virgil Scott

J.D. Shuttleworth
Thomas Simpson
Art Stinchcomb
Patricia Thwaites
Richard Torongo
Paul Totzke
Everett Vincent
Russell Ward
Mary Weber

Jack Weir

Joyce Wells

David West

Doris Whitcomb

107
APPENDIX D

INVITATION TO STUDENT EVALUATORS

Central Michigan College of Education
Mount Pleasant, Michigan
October 13, 1952

Dear ,

You have been recommended by members of the Department of
Speech and Drama to serve as one of the Evaluators in the
Intergroup Speech Projects this year. In these projects
Freshmen in Speech 101 give three minute speeches for
speech majors and minors to evaluate on a rating blank.

We should appreciate it very much if you could serve at
two or three of the possible thirteen times.

If you are willing to cooperate in conducting this speech
project, please answer on the enclosed preference blank.
Please return this to the Speech Office (W261) no later
than 5:00 p.m. this Friday, October 17.

The speech majors and minors will hold two meetings next
week, Thursday, October 23, in Keeler Dining Room C to
discuss the rating blank to be used. One will be from
4:00 - 5:00 p.m.; the other from 6:30 — 7:30 p.m. Since
the worth of the project depends so much upon the coop-
eration and mutual understanding among the evaluators,
you are requested to attend both of these meetings.

Cordially yours,

Wilbur E. Moore, Head
Department of Speech and Drama

108

APPENDIX E

STUDENT EVALUATOR'S PREFERENCE REPORT

I am willing to serve as an evaluator at two of
the Speech 101 Intergroup Projects. I will be avail-
able at the following times:

(Put "P" in two or three of the blanks below to

indicate preference; "X" for those times that are im-

possible.)

12:05-1:55 1:05-1:55 4:05-4:55

 

Eonday, Oct . 27

 

Tuesday, Oct. 28

 

Wednesday, Oct. 29

 

 

 

Thursday, Oct. 30

 

 

 

 

Friday,Oct. 31

 

 

(signed)

Plegse geturn tp Speech Offtpp pp p; befoge Frid ,
October 12.

APPENDIX F 109

LETTER ASSIGNING STUDENT EVALUATORS

Department of Speech and Drama
Central Michigan College of Education
Mount Pleasant, Michigan

October 20, 1952

Dear :

Thank you for filling out and returning the Stu-
dent Evaluator's Preference Report. Please report to
the Speech Office, W261, at the following times to be

available as an evaluator of Intergroup Speech Pro-

 

 

 

 

jects:

(1) Time: Date:
(2) Time: Date:
(3) Time: Date:

 

 

You will probably be asked to evaluate only twice
and be an alternate the third time.
Thank you for the cooperation you have given us.

Sincerely,

Emil R. Pfister, Director
Intergroup Speech Projects

110

APPENDIX G

SPEAKER'S PREFERENCE SHEET
FOR
INTERGROUP SPEECH PROJECT

 

 

Name Instructor

 

Class time and days

Please mark with "P" the two dates you prefer
to give your speech for the Intergroup Speech Project.
Mark with an "X" the times that you are unable to

participate.

 

12:05-l:55 1:05-1:55 4:05-4:55
Monday, Oct. 27 A

 

Tuesday, Oct. 28

 

Wednesday, Oct. 29

 

 

 

 

 

 

Thursday, Oct. 30

 

111
APPENDIX H

INTERPRETATION OF RATING SCALE CRITERIA

Below is a list of the specific questions agreed upon by the

evaluators. These may make for a better interpretation of

the criteria used in the "Evaluator's Rating Scale for the

Intergroup Speech Project."

I.

II.

THOUGHT
(A) Content
1. Is there enough material to cover the subject
well?
2. Is subject and the material interesting enough
to hold the attention of the listener?
3. Is the information accurate?
(B) Organization
1. Is there an adequate introduction?
2. Is the body of the speech well planned?
3. Is there continuity?
4. Is there an adequate conclusion?
LANGUAGE

(A) Vocabulary

(B)

1. Is pronunciation correct?

2. Is there an adequate variety?

3. Are words used correctly?

4. Is there an overuse of slang?

5. Is vocabulary suitable to audience?

Sentence Structure

1. Are sentences grammatically correct?

2. Is there an adequate variety of sentences?
3. Is there an overuse of the word "and"?

III.

IV.

112

APPENDIX H
(CONCLUDED)

VOICE
(A) Enunciation
1. Can you understand the speaker?
2. Does the speaker slur his words?
3. Is the speaker over-articulate?
(B) Adequacy
. Does the speaker speak too loudly or too
softly?
2. Is the voice pleasant?
3. Is there enough variety?
4. Is there good timing and use of pauses?
ACTION
(A) Posture
1. Does he lean against anything?
2. Are his hands and feet in a comfortable,
natural position?
3. Does he hold his head properly?
(B) Gesture
1. Are the speaker's actions distracting?
2. Are the Speaker's actions suitable and
meaningful?
3. Is there either insufficient or too much

action?

I10.

113

APPENDIX I

INSTRUCTIONS TO EVALUATORS
OF THE
INTERGROUP SPEECH PROJECT

Do all judging independently. Select seats near the
center of the room.

Try to get the meeting started as soon as possible.
There will be a few participants who are on leave from
other classes so please permit them to give their
speeches at the beginning of the session.

Please furnish all information called for at the top
of the rating scale.

All speeches are to be limited to three minutes. Do
not permit any contestant to exceed this time limit.

If you have failed to complete the rating on a Speaker
by the time he has concluded, complete the scale before
calling another speaker.

Be impersonal in your judgment. Do not let personalities
temper your evaluation.

Rate the speakers on all ten items on the rating scale,
and total the ratings.

You can aid the speakers by being a considerate judge.
Try to refrain from showing disagreement, disgust, or
disinterest in the speeches. Be an attentive listener.

Above all, remember that these meetings have been
arranged to furnish additional speaking experience for
the Speech Fundamentals students. For that reason, it
is imperative that you do the best job of evaluating
that you are capable of rendering. Do your judging
fairly and conscientiously.

At the end of the session return the rating scales to
the Speech Office (W261).

114

APPENDIX J

EVALUATOR'S ACQUAINTANCESHIP CHECK SHEET

This speech: Date Time Room

Name of Speaker

 

Please circle the number below which is the most

nearly correct answer:

I. To what extent do you know the speaker?
1 2 3 4 5
Not at all Casually Moderately Very well Intimately

II. I converse with this person:

1 2 3 4 5
Never About once Once a Once a Almost
a semester month week daily

 

Evaluator's Signature

115

APPENDIX K

SPEAKER'S ACQUAINTANCESHIP CHECK SHEET

This speech: Date Time Room

Name of Evaluator

 

Please circle the number below which pp the most

nearly correct answer:

I. To what extent do you know the evaluator?
1 2 3 4 5
Not at all Casually Moderately Very well Intimately

II. I converse with this person:

1 2 3 4 5
Never About once Once a Once a Almost
a semester month week daily

 

Speaker's Signature

116

«:2 0::

8 2 a... 2 2 a : 2 2 £2 2 3 m E 3 3 31.3%

ammwmwmmwmammwmmmmwm

Bogzhnﬁmhzahzlgsyeﬂieinsteaala

_
823232.122 :52235333335

NNNNNNNNNANNNNNNNNNN

Bahzghznhznﬁa.gemosawsnoznaulw

___—_—___—____—_____

9222223222288383383;

 

aaaaaaaaaaaaaaaaaaaa

 

732229. 2: 2252333333333.333533333.233333933333.33Baﬂﬁnnnn—ndanﬂouﬁunﬂvnnaa

ommwmmwmmmommuaomomm.mmmmmmmmommmmmmammmam immmmmmmmmmmmmmmmm

oommmmsnwmonvnnnmjp360932333 v9.13mmxmﬁmmnnvnnmmr Eommwmmﬁgﬂaannﬁ

$9323,“ 3m...

oommmnnm3mn3nn~n$

NNNNNNNNNN

3333333333

ﬁ______—__

3333333333

cceoocceca

 

rah-pnnﬁmbznsﬁ52833833333

Easssmxaa;

_

Savawnv'nvvvng;

mmmnmnnnmn

893339333;

NNNNNNNNNN
—_____.___

333339393398838333ﬁ.n

szagnaaxzai

9333.381. mug;

mmmmmmnnmn

9%33393333

vvqcvvvvvv

Snnzxnxawa

mannnmnnnm

Son-nunonnnxnnnun;

NN---~N
______——._

 

 

3933'9399;

39333939:SuvmngsnﬂﬂxuﬂﬁS

aoccceocococceoacoca

lemmaseesawwaawmammamawmmaaaaemaeaaaaeaaa

i3 8 3 : S 9 3 9 2 :

----~i-NN--~N~snshm--:RN--------ms--ﬁss~h

Balmstwnmhizﬁ 1:3 monoamoanwwwaorm 53mm3$m 31.3% 3.?93399399 3.92.33»meva 333333333232

mmmmmmom¢m:owm mmmmmccmmmmomowmmmommmomoammomammmm

322333: 2N2 ZimuS “33333—333 333333?3333339373:

mammmmmnmm:mmmmmmmmmmmmmmmmmm Emnmmmmmmm

3m.2:o~mn:22 Fobcsgﬁcmouwwonoww £39333?!an 33933393991

quva¢v<v_evvevvevee;vvvvvvquvvvvvvvevv

Sasonﬁnmnmn :2;Lowmwmmkwwwnaimwmo53333333335

annmnawmmgmmmmmannnmznnnmnnnmm

Pﬁmaﬁﬁﬁzgﬁz

momcwmmmmw

3a~n~n~w~n~v~n-~ﬁ

mmmmmmmmmm

ﬁﬂzsmﬁﬂanﬁ—n

vvovvvcvcv

aazzznzaaa

mnnnnnnnnn

Sagzﬁﬂzgﬁg

----~N

RRRRﬂﬂERR:

_—___—____

ﬁﬂﬂﬁﬁﬂaﬂﬁﬁ

 

aaocoooaooaecoaeaooo

$358339”SEnouRﬁﬁﬂvnQﬁ-«B!2222223.—

 

82 2 :29! 2 u— 3.2 a a n

mmmmmamwdemmm

Rm. 3 2 o. 9. Z n. n— 2

can. 2 2 o. 222 ~— 2

----h~

82 2 2 a. n. 3 2 .2 2

macaommmmm

89— o. 2 a. £2 2 2 Z

mmnmmmmmmm

can. 2 2! 9293 :

vvvvvevvcc

can. 2 C 2 9: a” S

nnmmnnnnn"

on! 2 2 o. 2: 22 2

N---N-

2o. 2 2 a. 22 n. n. .—

____—___—_

a. 2 2 3 n. 2 2 3 Z

1.23:
.a.n~_
maamam

2....n.n~_

mmcammawmm
s.._.o.n~_
hﬁhﬁshhssh
2..~...n~.
moomummumo
2..~...n~_
mmmmmnmnmm
2....a.n~.
vvvecve<¢v
2....n..~.
nannnnmnnn
2......n~_
Nauuwmuuuu
2....n.n~.
—_————————
s..~.a.a~.
ooooaccoeo
2....»..~.

 

 

HzmzmHDom EmH mo mz<ma Mm

dada Ema uZHB¢ADm«B moh Qmmb Qm¢o mOZDm

Q XHszmmd

 

 

117

 

 

 

 

.hoxwmgm map an co>Hm nommmm ccoomm can :o mpr*

M Una h moxwocoagd mom mIH *nopoww aHnmoocmchwsuo< mm: m

Hmnom mQEsHoo mo oawm no: son; ¢mmIHoo *AmQSSG mooo m.om©:h mmtmm
mmuwm «menace now; mananwasoo mmuoa *onoom proa wmumm
uNIwH mGSSHoo sza mHnwnmoaoo ONIQH *mwnmpHno m>Hw mzp do monoom Nmnmo
uHIw massHoo :pHa oHnwnwmaoo OHIHo *mpwwnp sop mop do monoom qumm
m xﬁccmag« mmm ¢mmuaoo ampssq muoo m.mmusa mm-om
so>Hm mpma m.OOH oz mmIOH whoom pros mNumN

< xHocmamd mom ONIOH mempHno o>Hw mnu no monoom uNlmH

¢ NHonao< mom OHIHo muwmnp cop map so monoom mHI
mGHoHHsn mama onp :H HH< mIH am>Hm mm; nommqm muons aoom m

.opo .oouH now N “Goo: NH now H mIH Go>Hw mas noomam pomp oSHa o
.opm ..mosa now N m.:oz now H mIH Go>Hw was Summon pan» wan m
uncomm now No mpmnww now Ho ONIHo noopo mqwxwmam vim
on5 now N mmHmamw pow H NIH nmxmem wo xmm N
ocoomm now N mpmnww pow H NIH am>wm mm: nooomm nopmmamm H

mcoo Amnssz ammo so ammo :o ooonooom ammo

on» wo omzoqsm meom wcwpwm smH no

GOpranmxm muonssz mop Eopm wpwm :EsHoo

 

 

ZOHB¢HDm<B mszo<ﬁ EmH mom

mmq¢om ozHH<m mma osz<mmmm 2H QMmD mooo

E NHszam¢

Formula use

APPENDIX N

DATA SHEET USED TO COMPUTE CORRELATION

d:*

r:

NZXY - (ix x (X)

 

 

W)? - (102] [N Y2 - ( m2]

 

 

 

 

118

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N x {XY or x =
(X x (Y or x =
Numerator =
N x (X2 or x :
(2X)2 or x : ___
Ans. I
N x 2Y2 or x =
(£Y)2 01‘ x = __
Ans. II
Ans. I x Ans. II or x =
AV/*_ Demoninator
Numerator
__. or = r
Denominator

*Henry E. Garrett, Statistggg in Psychology gag

Education.
p. 292.

New York:

Longmans, Green, and Company, 1947,

 

«w
1;. " ..
'CAL‘. .

0' U 9
~ ”(,1 L, I

_1Ut“msce\

d

 

L‘M

h—r

'~. - 47—378