'C \ I ,
1"," l h .’. ~ I
' 2N. ‘ ”\ul . | ‘.|‘E\°éiéj" -‘K:A:A 0

.'.HI... ‘P‘HII ~-
.‘ _

1:.I be

' ' ' ‘ .7! .'. . I"
15' -~“ .. " W‘*"~\"
CK. -- ‘ lg"; A”. CV‘ ‘
\ VII - " ‘ ' h '4.
' 93W . -

":cﬂ

'UI'I ﬁI'

It”!

Fr?

3% ﬁe n’}:“ u“
{Ln ‘ ‘53:: uygmﬁ‘f .a ‘
' ..I a.
Ema -
5% ' ‘n

53‘:
15
”3
r '
«c
\ -

“A..- )
._ .
. :~
3:?

:1-
M-
5‘ A

1E1 ,

,ﬁx

..‘ ”Iv“
a???

. .. .: . "0. Auw’axu'm 7%:
‘1. 15“

'I
‘I'!'-
I v,
.
1h;

3:.jimr' ”Edit-1: ,
I . 1%“_." :7“ _ I ‘.
‘I‘ L 9H3“ “ PIVE‘.” 3 v‘ﬁ 33:: {{SS";
w. . . .3; ~13? 311.1%
*2de '

_ .. -. ~ ‘ ‘-.L' .
:L‘v-v 1-1.. .L' _ ‘ ‘ ', ‘ 1. ’ \ ‘n .'.I‘. ..~ w
'I‘ ' .- 'v - '

10‘
I .1:

air-.’.
-’ .j.'.
.

o o’

A

IE

I". . -
I‘llf‘k”
9‘ .‘..‘g:

. ‘I
‘ I

.'. .'.. “my." : . 1‘ 1",»; - I
'l . nu

‘_ .<;-;I.‘

1.10:1”; I C. I‘d}?
'I'

r-.-A

I ~ I q‘ ‘
I I‘m 9"1 83:3 '.‘N.
';

._~9~;‘.¢‘

1
S :III
99"

I ". H I on“ .'. . j . I M I I I . I ._ 'I'
'3 ' ‘ ' I - a - . I .. -~ «MIMI;-

-....1-\.v“ .
. I, w I II 'I - t." ‘:_“- .A" 4&3“-

l I \-

1“ .-.I
. . L .981

."A‘u'r 4'?) ‘14".
III)!"- “knit “Lima. Ir?
" V" ‘fh'ﬁJ w" I

{abi-

I .. .;

I45
,I,

 

TH :81.

  
     

  

LIBRARY

Michigan State
University

will; lllll LII! ll Ill] ll "@11th ll

This is to certify that the

thesis entitled

A COMPARISON-OF TWO STUDENT
INSTRUCTIONAL RATING FORMS

presented by

PAMELA WEAVER WILSON

has been accepted towards fulﬁllment
of the requirements for

SERVICES AND EDUCATIONAL
PSYCHOLOGY

ﬂag/4f. EM

Major professor

Datej/W z) /?78

0-7 639

 

 

 

 

 

 

OVERDUE FINES ARE 25¢ PER DAY _
PER ITEM

   
   

Return to book drop to remove
this checkout from your record.

 

 

© 1978

PAMELA ANN WEAVER WILSON

ALL RI GHTS RESERVED

A COMPARISON OF TWO STUDENT

INSTRUCTIONAL RATING FORMS

by

Pamela Weaver Wilson

A DISSERTATION

Submitted to
Rﬂchigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling,
Personnel Services,
and Educational Psychology

1978

ABSTRACT

A COMPARISON OF TWO STUDENT
INSTRUCTIONAL RATING FORMS

By

Pamela Weaver Wilson

With the advent of teacher accountability, student ratings have become
a greater concern in recent years. It has become necessary for administra-
tors to have normative data for making unbiased decisions regarding teach-
ing staff. However, the question of whether these normative items give
enough information for evaluation purposes still remains.

The purpose of this study was to build an instructional rating scale
that would contain items not only general in nature, but items specific to
a class of interest. These items would be useful for instructor evaluation
as well as for instructor self diagnosis and self improvement. This class
specific instrument was compared to a standard general instrument in use
at the university.

To conduct the study, five undergraduate classes were chosen from a
specific department at Michigan State University. The classes were chosen
because of their diverse nature. This diversity was necessary in develop-
ing specific class items. Data was collected on the class specific and
general instrument during the last day of classes, Spring term 1978.

The major hypothesis of the study concerned item variability. It was
expected that the five original instruments would have less variability

on a particular item within a class and have a larger between class

Pamela Weaver Wilson

variability on a particular item than the general instrument. It was also
hypothesized that an index of rater reliability would be larger for the
class specific form than the rater reliabilities of the general instrument.

In order to test the hypotheses of item variabilities, the MannéWhitney
U Statistic was calculated. The hypotheses concerning the rater reliabili-
ties was tested by use of an F statistic.

Although many of the tests were not statistically conclusive, the
results indicated that the class specific instrument was a viable alterna-
tive for use in student rating forms. In four out of five of the classes,
the average item.variance of the class specific form was equal to or less
than the average item variance of the general instrument. The average
between class variability for Specific items on the class specific instru-
ment was larger than the average between class variability for the items
on the general instrument. These results were in the anticipated direction.
On the whole, there did not appear to be any difference between the rater
reliability on the class specific instrument compared to those on the
general instrument.

In conclusion, it is imperative to mention that the class specific
instrument was very exploratory in nature, while the comparison instrument
was in a highly developed state. This lends much credibility to my point
of view, the results of this thesis favorably support the use of student

rating forms containing both class specific and general items.

DEDICATION

In memory of my grandmother,
Elsie Putnam Warr

who taught me the value of an education

11

ACKNOWLEDGMENTS

There are many people who have contributed to this thesis. First, I
want to extend my deepest appreciation to my family. My husband, Terry
has given me both emotional support and technical advice. Without his
many editorial comments, the final thesis would have lost mmch of its
readability. My son, B. J. has been very patient and understanding
about a project he.bare1y comprehends.

I would also like to especially thank my dissertation committee. Dr.
Robert Ebel, the Chairman of my Guidance Committee provided kindness,
emotional support, technical advice, and personal presence whenever
needed. Dr. LeRoy Olson contributed greatly to the design of my instru-
ment with his ideas and personal experiences. Dr. Kenneth Arnold contri-
buted his statistical expertise which helped with the data analyses. Dr.
Edward Smith provided ideas about the data analyses that opened new
avenues of exploration. I would also like to thank Dr. Dennis Gilliland
for his thorough review of all mathematical calculations and statistical
expertise.

It is also necessary to extend thanks to the Marketing and Transpor-
tation Department for both secretarial and faculty support. Special thanks
must go to Dr. Leo Erickson, Dr. Frank Mbssman, Mr. John.Henke, Mr. Mark
Bennion, and Mr. Robert Krapfel for letting me use their classroom for
data collection purposes. Lastly, it is necessary to mention.Mrs. Diane
Scribner who contributed her technical expertise in the typing of both

the preliminary and final draft of this thesis.

iii

Chapter

I.

II.

III.

IV.

V.

THE PROBLEM O O O O O O O I I 0
Introduction . . . . . . . .
Considerations in Instrument

TABLE OF CONTENTS

Current Practices in Student Rating of

at Ten Universities .
Impetus for the Study

Experiences
Purpose . .

Summary

REVIEW OF THE LITERATURE
Introduction . . . . .

Reliability

Experience Outcomes

Hypotheses . . . . . .

Validity . . . .

Comparative Data . . . .
Differences in Item Types
Format . . . . . . . . . .

Summary

PROCEDURES AND DESIGN . . . .

Introduction . .

smple O O O O O O O I
Instrument Development

Design . . .
Hypotheses .
Analysis . .
Summary . .

RESULTS . . . .
Introduction

Overview . . . . . . .

Results Concerning
Results Concerning
Results Concerning
Results Concerning
Other Interesting Results . . . . . .
Summary of Results of Study . . . . .

SUMMARY AND CONCLUSIONS

Summary
Conclusions

Further Research

BIBLIOGRAPHY . . .

Discussion . . .

Like Items on Differing

Item variability .
Student Satisfaction
Rater Reliabilities

Construction

Instruction
Instruments

105

116
119

121
121
124
126
132

134

Table
1.1

1.2

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

3.10

3.11

3.12

3.13

3.14

3.15

4.1

4.2

LIST OF TABLES

Uses Made of Students' Evaluations of Instruction

Frequency of Responses to the Student Rating
Opinion Questionnaire . . . . . . . . . . .

smple Item I I I I I I I I I I I I I I I I I

Final Pretest Items for the General Instrument

Final Class Specific Pretest Items, MTA 311 Section

Final Class Specific Pretest Items, MTA 311 Section

Final Class Specific Pretest Items, MTA 313 . .
Final Class Specific Pretest Items, MTA 317 . .
Final Class Specific Pretest Items, MTA 341 . .
General Instrument, Pretest Statistics . . . .

General Instrument, Frequency Distributions for
question 1 I I I I I I I I I I I I I I I I I

Class Specific Instrument, MTA 311, Section 1,
Frequency Distribution for Question 1 . . .

Class Specific Instrument, MTA 311, Section 2,
Frequency Distribution for Question 1 . . .

Class Specific Instrument, MTA 313, Frequency
Distribution for Question 1 . . . . . . . .

Class Specific Instrument, MTA 317, Section 3,
Frequency Distribution for Question 1 . . .

Class Specific Instrument, MTA 317, Section 5,
Frequency Distribution for Question 1 . . .

Class Specific Instrument, MTA 341, Frequency
Distribution for Question 1 . . . . . . . .

Chi Square Tabled values . . . . . . . . . . .

Chi Square Calculated Values . . . . . . . . .

12
49
51
53
54
55
56
57
6O

61

62

63

64

65

66

67
98
99

Table

4.3

4.4

4.5
4.6

4.7

4.8

4.9
4.10

4.11

4.12

4.13
4.14

Comparison of Variance Distributions, Specific
vs. General Instrument, Mann-Whitney U . . . . . .

Average variances for Class Specific and
General Instruments . . . . . . . . . . . . . . .

General SIRS Form, Index of Between Class Variability

Class Specific Instrument, Index of Between
C1888 variability I I I I I I I I I I I I I I I I

Frequency of Responses to the Satisfaction Item
in the Class Specific Instrument . . . . . . . . .

Statistical Decision Concerning the Ho: In
the Satisfaction Question . . . . . . . . . . . .

Analysis of variance - Complete Sets . . . . . . . .

Intraclass Reliability Coefficients for
Average Ratings . . . . . . . . . . . . . . . . .

Intraclass Reliability Coefficient for An

Indiﬂdual kter I I I I I I I I I I I I I I I I I_

Confidence Intervals Around Reliability
Estimates of an Individual Rater . . . . . . . . .

F Test I I I II I I I I I I I I I I I I I I I I I I I

Table of Grand Haans . . . . . . . . . . . . . . . .

vi

Page

101

102
104

106

107

109

111

112

113

115
117

118

LIST OF FIGURES

Figure Page
1.1 Instructor Opinion Items . . . . . . . . . . . . . . . . . 11
3.1 Final Student Instructional Rating Instrument,

m 311 section 1 I I I I I I I I I I I I I I I I I I I 71

3.2 Final Student Instructional Rating Instrument,
MTA 311 sec tion 2 I I I I I I I I I I I I I I I I I I I 74

3.3 Final Student Instructional Rating Instrument,
m 313 I I I I I I I I I I I I I I I I I I I I I I I I 77

3.4 Final Student Instructional Rating Instrument,
m 317 I I I I I I I I I I I I I I I I I I I I I I I I 80

3.5 Final Student Instructional Rating Instrument,
m 341 I I I I I I I I I I I I I I I I I I I I I I I I 83

3.6 Student Instructional Rating System Form, Form B . . . . . 85
3.7 Design for MannéWhitney U Test, MTA 313 . . . . . . . . . 90

4.1 Chi Square Matrix for MTA 311 Section 1, Specific
Instrument - Item 1, SIRS - Item 2 . . . . . . . . . . 96

vii

CHAPTER I

THE PROBLEM

INTRODUCTION

 

With greater emphasis being placed on teacher accountability, student
ratings of professors have become a greater concern in recent years. It
has become necessary for administrators to have normative data to aid them
in making unbiased decisions regarding the teaching staff. However, student
evaluation instruments are often developed and piloted on a very specific
population (e.g., education, business, etc.) as a sample of convenience.
Once the instrument has been refined and administered, the administrator
may follow one of several actions along a continuum in regard to the results.
Because of the need for some type of normative data, at one end of the
continuum the administrator would perceive the results as absolute truth.
Or his suspicions about the data may lead him to reject the results because
he feels they are invalid. The question becomes one of which decision is
correct. Are there inherent biases in the development sample? Is one
instrument valid across colleges within the university? Or at an even
more basic level, is one instrument valid across classes within a department?

The primary purpose of the present study is to gain insight into the latter

question posed.

CONSIDERATIONS IN INSTRUMENT CONSTRUCTION

 

It has been noted (Baril & Skaggs, 1976) that two major issues should
be addressed prior to data collection for a teacher evaluation instrument.
The first issue that arises addresses the question of whether different
items or forms are necessary for different instructors, courses, departments
or colleges within a university. This question is prompted by the underlying
differences between courses in content, instructor emphasis, student char-
acteristics and general academic discipline.

The second issue which has been addressed on many occasions concerns
the different uses of the data from evaluation instruments. Gillmore (1972)

lists three uses under the descriptive titles of normative, diagnostic, and

 

informative. Others (Baril & Skaggs, 1976; Wotruba & Wright, 1975) separate
this trichotomy into user oriented terms; administrator information, instruc-

tor information and student information. The normative purpose is made use

 

 

of by the administrators who are responsible for counseling faculty members
and for evaluating them with respect to retention, tenure and promotion.
Normative refers to evaluations being used in a comparative mode. The re-
sults for different instructors are compared so that administrators can
place instructors in a hierarchy with respect to classroom performance.
Teacher accountability as well as merit raises and tenure decisions have
made it necessary to institute some type of reliable and valid process for

administrative decisions. It is this usage of student evaluations that

has created the most negative feelings about student evaluations.

The diagnostic purpose is for instructors to gain feedback on their
own teaching abilities thus facilitating self improvement. Diagnostics
has generally been thought of as the primary use and is potentially the
lowest risk situation, especially if administered on an optional basis.

The informative purpose is served when the students, for whatever
reasons, seek information which will help them select instructors and
courses. Dissemination is sometimes accomplished by a group of students
via campus publication of evaluation results. The student government may
take part in such a procedure.

With reference to the above issues, authors have suggested that differ-
ent evaluative items may be appropriate for different purposes. However,
it has been noted by these same authors that few, if any, published reports

have taken these differences into account (Baril & Skaggs, 1975).

CURRENT PRACTICES IN STUDENT RATING OF
INSTRUCTION AT TEN UNIVERSITIES

In an attempt to ascertain the mood of universities in relation to the
above issues a review of student instructional rating systems currently in
use at ten universities were reviewed. The universities included.were the
University of Illinois at Urbana-Champaign, the University of Iowa, Purdue
University, university of Michigan, Michigan State University, University
of Minnesota, Northwestern University, Ohio State University, University
of Indiana, and University of Wisconsin. To obtain the necessary updated

information, personal telephone interviews were made with those universities

involved.1 The points of interest were whether different items or forms
were used across instructors, departments and universities; and whether
different forms were instituted for different users, i.e., administrator,
instructor and student. Also noted was whether student evaluation instru-

ments were optional or required.
Purpose of the Instructional RatinggForm

A perusal of Table 1.1 shows the marked differences in uses of instruc-
tional rating forms. All ten of the universities selected made use of
ratings on a diagnostic basis to improve instruction. In all cases except
Michigan State and Ohio State University, the diagnostic use of student
rating forms was made on an optional basis.

Four (Michigan State University, University of Minnesota, Ohio State
University, University of Indiana) of the ten universities had a university
wide policy in effect that required the use of Student Rating Forms in both
tenure and promotion decisions. The university of Wisconsin has a state
wide system policy that requires the twenty-six state supported educational
institutions (14 four year and 12 two year) to use student evaluations in
tenure and promotion considerations.

Although the other five universities had no university or system policy
requiring student evaluations in the normative mode, several departments
within the universities had their own requirements. For example, even

though there is an absence of any top policy at Purdue University many

 

1Special thanks needs to go to Latricia Turner, University of'Michigan;
Peter Fry, Nbrthwestern University; Kenneth Doyle, University of Minnesota;
James Deerie, Purdue University; Dale Bradenburg, University of Illinois;
Mary Rouse, University of Wisconsin; Ms. Johnson, University of Indiana;
LeRoy Olson, Michigan State University; Larry Jones, Ohio State University;
and Rena weets, University of Iowa.

USES MADE OF STUDENTS' EVALUATIONS OF INSTRUCTION

 

 

 

 

 

 

 

 

 

 

 

Normative Diagnostic Informative

University of Illinois 0 O O
Urbana-Champaign

University of Iowa 0 O 0
Purdue University 0 O NU
University of Michigan 0 O 0
Michigan State University R R 0
university of Minnesota R O 0
Northwestern University 0 O 0
Ohio State University R R NU
University of Indiana R O 0
University of Wisconsin R 0 NU

O . Optional

R = Required
NU = Not in use at this time

departments have created their own policies. The Psychology Department
has made the use of student rating forms mandatory, while the Pharmacy De-
partment has mandated their use by every instructor at least once a year.
Other departments, such as the Management Department strongly encourages
their faculty to make use of student rating forms.

The University of Illinois has a system labeled by many as voluntary
coersion. Although there is no university requirement concerning student
evaluations of teaching, there is a university policy requiring some evi-
dence of teaching performance. This evidence is to be contained in an
instructor's personal file for tenure and promotion decisions.

The universities differ widely with regard to the informative mode.

At one time or another all the universities have had some action by differing
student bodies to obtain information at this level. Michigan State Univer-
sity appears to be the only educational institution in this sample that has
declared that the optional use of this type of instrument is a function that
may be taken on by the Student Government. However, an optional form de-
vised by the Student Government Association in 1976 proved to be so time
consuming for the student participants that no further attempt at such a
large scale procedure has been administered again. Northwestern University
and the Universities of Iowa, Michigan and Illinois have special options on
universitydwide evaluation forms that allows the instructor to release
information for student dissemination. If allowed, partial information

is then released to the student body. While the university of Indiana and
Minnesota have no continuing release of information, student groups have,
from time to time, collected and distributed student rating information to

the student body. Again, this has been on a voluntary basis for the

instructors. During the period 1969-72, Purdue University made an attempt
at student rating release by student organizations but found it to be too
much work and has since been dropped. Ohio State University has taken the
stand that student ratings are personal documents and no release of the
information has been made to the student body.

With reference again to Table 1.1, there are a total of 30 cells. It
is interesting to note that in only seven instances (23%) are ratings re-
quired, in 20 instances (67%) are ratings optional, and in only three

instances (102) ratings do not appear to be in use at this time.

Number of Instructional Forms in Use

 

Michigan State University was the only university to have three separ-
ate forms for the three different purposes (normative, diagnostic, informa-
tive). The other nine universities used one form to accumulate data for
all three purposes.

At Michigan State university an individual form developed by the Office
of Evaluation Services is administered to about 15,000 of the 44,000 students
each term. The rest of the faculty use a departmental or self made instru-
ment. At Northwestern University, two general forms exist; one for lecture
classes and one for small class situations. The University of Wisconsin
uses a myriad of student rating forms that have been developed for depart-
mental use or for individual classes. The other seven universities surveyed
use some type of cafeteria system. This type of system involves the use of
an evaluation item.bank. This item bank allows the instructor to choose
items that are tailor made for the course he is teaching. The form also

consists of a few common items that would appear on all student rating

forms. However, it should be noted that the above descriptions embody the
general mode of student ratings of instruction at the sampled colleges.
Within each college there are various forms developed and used by individual
departments.

Again, it appears that the universities reviewed are not in agreement
on the use of instructional rating forms. There is no agreement on the
question of whether one general instrument may be used across colleges
within a university. Seven of the ten universities are gravitating toward
a cafeteria style rating form. This lends evidence to support the opinion
that a given set of items are incapable of being used to compare all in?
structors across the university.

Philosophically, the question of whether one form should be used for
all three situations, i.e., diagnostic, normative, informative, becomes a
question of whether rating forms are justified in being used for these
three purposes at all. Everyone involved appears to feel very comfortable
with the diagnostic purpose, is learning to live with the normative purpose
but are undecided about the informative purpose. To release to students
any information contained on the student rating forms, it is necessary to
receive permission from the instructor. This attitude prevails over all
ten of the universities under discussion. This leads one to believe that
the position of the universities is based on the premise that the informa-
tion is personal information belonging to the instructor. None of the
universities have yet taken the stand that the information belongs to the
student. As long as this is true, the limited results of the informative
purpose can only give a biased view of faculty performance to interested

students.

IMPETUS FOR THE STUDY

 

The interest in pursuing the present study developed from previous
personal research and experience. A description of these experiences are
presented below. Following the description of these experiences is a dis-
cussion of how these experiences molded the author's thinking in terms of

this dissertation topic.

EXPERIENCES

 

Personal Research

 

1. In a paper presented at the American Educational Research Associa-
tion (Wilson & Wilson, 1978) an attempt is made to compare the factor
stability across colleges of the Student Instructional Rating Forms
(SIRS) at Michigan State University. Three separate samples were
selected from the Education, Business and Engineering colleges. The
study consisted of performing both orthogonal and oblique factor
analyses on the three separate colleges and the combined sample of
colleges. Three separate units of analyses were considered; the
individual student response, the class mean for each item, and the
individual score minus the class mean of each item. No matter what
type of analysis or unit of analysis was attempted, one aspect that
remained consistent was the factor loadings. Factors remained stable
across colleges but the percent of variability accounted for by the

factors was altered.

10

2. An attempt was made to ascertain the mood of the Marketing and
Transportation Department at Michigan State University concerning
student ratings of instruction. In order to meet this objective, a
questionnaire was distributed to all professors and teaching assis-
tants in the department. A list of the items presented in this
questionnaire is located in Figure 1.1. Respondents were directed
to check as many options as applicable. The response rate was
approximately 80%. Eleven faculty members and twelve teaching
assistants responded to the questionnaire. There was no difference
in the way the faculty and teaching assistants responded to the
items. Table 1.2 displays the frequency of response to each item

in the questionnaire.

Personal Experience

 

1. Personal experience has also played a motivating role in the
desire to research the area of using more department, college, or
possibly even class oriented evaluation instruments. The admini-
stration of some type of rating form is mandatory at Michigan State
University. The use or misuse of this requirement becomes evident
in an example of how it is fulfilled.2 Each faculty member is re-
quired to evaluate each class every term in the School of Business.
Although a specific form is not stipulated, most faculty members

give the Level Two, Student Instructional Rating Forms (SIRS)

 

2The information contained in this paragraph was obtained from a
personal interview with Associate Dean, Gardner Jones, School of Business,
Michigan State University.

11

Your present position is
a. Administrator
b. Faculty

c. Graduate Assistant

My response to student evaluations of instruction is to
a. toss them out

b. read the comments

c. leaf through the items

d. have the items computer analyzed

I find the student rating forms

a. useful for instructional improvement

b. useful for self evaluation

c. useful for personnel and tenure decisions

d. a.waste of time

An ideal student rating form should have
a. items that instructors could be compared on

b. items selected for specific classes

Do you feel that all of the items on the Student Instructional Rating
Form now in use by the Marketing and Transportation Department are
appropriate for your classes?

a. most are appropriate

b. some are appropriate

c. none are appropriate

Any comments specific to your type of class that may be incorporated in

a student rating form would be appreciated.
INSTRUCTOR OPINION ITEMS

FIGURE 1.1

12

TABLE 1.2

FREQUENCY OF RESPONSES TO THE STUDENT RATING
OPINION QUESTIONNAIRE

 

Frequencies
Item 1: Present Position
Options: a. Administrator 1a. 0
b. Faculty lb. 11
c. Graduate Assistant 1c. 12
Item 2: Response to Student Evaluation
Options: a. toss them out 2a. 1
b. read the comments 2b. 22
c. leaf through the items 2c. 6
d. have the items computer analyzed 2d. 10
Item 3: Student Ratings Forms are
Options: a. useful for instructional improvement 3a. 17
b. useful for self evaluation 3b. 21
c. useful for personnel and tenure
decisions 3c.
d. a waste of time 3d.
Item 4: An Ideal Student Rating Form
Options: a. items that instructors could be
compared on 4a. 10
b. items selected for specific classes 4b. 15
Item 5: Appropriateness of Items
Options: a. most are appropriate 5a. 7
b. some are appropriate 5b. 15

c. none are appropriate 5c. 1

13

developed by the Office of Evaluation Services. The forms are collected
by a student and sent to the Dean's Office. The Associate Dean reads
all comments made by the students in an area provided on back of the
form.

If the evaluations are of an average nature, a one line summary
comment is made by the Associate Dean concerning the instructor for
that particular class. This comment is put in the instructor's per-
sonal folder, the comment and forms are sent to the department chair-
man. At this point, the department chairman reads the one line summary
and student comments. The forms are then returned to the instructor.
The instructor decides whether to make any further use of the infor-
mation. Often, no analysis is made of the instrument itself.

Further actions are taken in cases where student comments lend
themselves to being either extremely positive or negative. If the
comments are extremely positive, the Associate Dean composes a letter
of recommendation to the instructor. If the comments are extremely
negative, the rating forms are forwarded to the Dean, who may call in
the Instructor or the Department Chairman, or both, for consultation.
2. The second point from personal experience is based on the factor
composition of the SIRS. Previous factor analysis (Office of Evalu-
ation Services, 1971) has indicated the SIRS has five predominant
factors: 1) student interest, 2) course demands, 3) student-instructor
interaction, 4) course organization, and 5) instructor involvement.
Since the SIRS is a general form, it is used for several different
classroom types, for example, large lecture, small discussion,

quantitative, non quantitative.

14

EXPERIENCE OUTCOMES

Because of both personal research and experience, the author believes
there are several implications for using different rating forms for either

personnel evaluation, or teacher self evaluation.
Personal Research

1. In reference to the study by Wilson and Wilson (1978), the unstable
percent of variability accounted for by the different factors across
colleges lends credence to the idea that students place more importance
on some factors than others. It may therefore, be unwise to compare
teachers' evaluations without some sort of differential.weighting
scheme. Each teacher would be well advised to compare his own per-
ception of factor importance with that of the students.

2. With reference to faculty opinion, a glimpse of the results of

the questionnaire distributed to the Marketing and Transportation
Department may be reviewed in Table 1.2. The results of item.two

show that a large portion of the instructors read the comments on

the back, less than half of the sample leaf through the items or

have the items computer analyzed. Responses to item four show that

652 of the instructors feel that student rating forms should include
items specific to the particular class, 43% feel items should be
selected on which instructors could be compared. Responses to item
five show that 64% of the instructors feel that some of the items on
the Level II, Student Instructional Rating Form are appropriate for

their classes.

15

It is the opinion of the author that the results of this
faculty questionnaire give support to the need for more specific
rating forms. Possibly instructors are not, on the whole, leafing
through the items or having them computer analyzed because they
are too busy with other tasks. Or possibly the instructors do not
feel the information obtained from this process is worth the minimal
amount of time invested in the project. The fact that a large per-
centage of instructors feel that an evaluation instrument needs
items specific to individual classes gives evidence to support the
opinion that more class specific instruments are necessary to evalu-

ate individual classes.

Personal Experience

 

l. The process used by the School of Business to evaluate teacher
performance further supports the need for more specific student
rating forms. Since the administrator only reads the comments, it
is the contention of the author that the administrator is dealing
with incomplete information. How the instructor approaches his
class in terms of written comments could have an effect on how the
student responds. If the instructor asks the student specific
questions, then it is likely that the student will only find time
to address himself to these specific questions. If the instructor
asks for possible course improvements he may set the stage for many
student gripes. If the instructor tells the students he wants to
make sure he repeats good points he may open the door for reinforcing

commen t8 0

16

It is proposed that the process of reading comments only has been
devised because the faculty and administration do not feel comfortable
with any further analysis.

2. The last point concerns the factor composition of the Level II,

Student Instructional Rating Forms. From a basic sense of fair play

it does not appear appropriate to compare instructors on all five

factors. For example, it seems that major injustice would be done

if the instructor of a large lecture course had his evaluations of

factor 3, student-instructor interaction compared to those of an

instructor of a small class.

In summary, ample evidence is available to justify further research in
the area of student ratings of instruction. Evidence also supports the need
for rating instruments that are of a tailor made nature for individual

classes.

PURPOSE

The purpose of this study is to build an instructional rating scale
that would discriminate between good and poor instruction, and have unam-
biguous questions on which raters could be in agreement for each instructor.
In terms of item variability, the better of two evaluation instruments
would have less variability on a particular item in a given class. It
would also be expected that variability exists on a particular item between
classes. This latter variability could only be computed for items that

appear on more than one specific evaluation instrument.

17

An index of rater reliability could also be computed for a specific
evaluation instrument in each class. If a class specific instrument and
SIRS form was administered to five classes, there would be a total of ten
rater reliability indexes. This indeX‘would be concerned with inconsis-
tencies, to what extent do the students give the same information about the
instructor. It would be expected that the class specific formwwould have
higher rater reliabilities than the SIRS.

Such a scale would have to be developed from.items selected for both
appropriate content for classes of interest and their psychometric charac-
teristics. It would also be necessary to obtain some type of satisfaction
index on the new instrument in order to assess the students feelings con-
cerning the new instrument.

A comparison will be made between a generally accepted instrument that
has been used by several departments in a large university and an instrument
created for specific classes within a department. This newly created in-
strument will have a set of five to ten core items used for all classes.
These core items will be drawn from the Level II, Student Instructional
Rating Forms (SIRS) now in use at Michigan State University. These items
are of a high inference, general nature. The generality of these core
items will allow them to be appropriate for any classroom situation. The
new instrument will also consist of ten to twenty items specifically de-
signed to meet the needs of the individual class. These class specific
items may be common to some of the specific class evaluation instruments.
The last item on the specific class instrument will be a general satis-

faction item.

II.

III.

IV.

18

HYPOTHESES

For each class, the responses to the core items in the class
specific instrument come from the same distribution as the
responses to corresponding items in the SIRS.

For each class, the responses to the core items in the class
specific instrument do not come from the same distribution

as the responses to corresponding items in the SIRS.

For each class, the item variance of the tailored items in the
class specific instrument is the same as the item variance of
the items in the SIRS.

For each class, the item variance of the tailored items in the
class specific instrument is less than the item variance of the

items in the SIRS.

The between class variability of tailored items shared by two
or more of the class specific instruments is the same as the

between class variability of items on the SIRS.

The between class variability of tailored items shared by two
or more of the class specific instruments is greater than the

between class variability of items on the SIRS.

The proportion of the students that are satisfied with the class
specific instrument is equal to .50.
The proportion of students that are satisfied with the class

specific instrument is greater than .50.

19

V. H : There are no differences in rater reliabilities between the class
specific instrument and SIRS.
H1: The rater reliabilities of the class specific instrument are not

the same as those obtained from the SIRS.
SUMMARY

In summary, it appears that no major consensus has been made among uni-
versities on the exact use of student rating forms. Most universities, if
not explicitly, are at least implicitly using rating forms for tenure and
promotional decisions. The major problem lies in the lack of control exr
hibited so far in how the rating forms are to be used in these decisions.

The questionnaires distributed to the Marketing and Transportation
Department at Michigan State University seem to indicate that instructors
would prefer evaluation instruments tailor made for their classes. Possibly
it is a distrust of a general instrument that leads to relaxed guidelines
involving tenure and promotion decisions.

Whatever the underlying reason for the lack of a standard university
policy, or a standard type of evaluation instrument used across universities,
one point stands alone. This point is that the issue of student ratings of

instruction needs more consistency.
OVERVIEW

In Chapter II, the literature on the validity and reliability of student

evaluations of instruction will be reviewed. Also the literature available

20

on differences in student evaluations across colleges, departments and
classes within a university will be reviewed. The procedures and design
will be discussed in Chapter III. The main thrust of Chapter III will
revolve around the steps taken in building an evaluation form that con-
tains both general high inference items as well as items tailored to
individual classroom situations. The results from the computer analysis
will be presented and analyzed in Chapter IV. Conclusions, a discussion
of the results and any considerations for further research will be pre-

sented in Chapter v.

CHAPTER II

REVIEW OF THE LITERATURE

INTRODUCTION

 

Chapter I stated the problem proposed in this study. This problem was
operationalized in terms of a set of hypotheses. The essence of these
hypotheses is that two different item types are valuable in a student
rating form. One item type is of a general nature that would be utilized
in evaluating any classroom situation. The second more specific item type
would be designed for particular classroom settings, for example, large
lecture versus discussion situations. Part of this chapter will review
the literature associated with varying item types.

Secondly, the item format is a crucial part of any student rating form.
Because the comparison instrument will be the SIRS, Level II, the format of
the proposed instrument will be comparable to that of the SIRS. It will
therefore be necessary to review the literature associated with the format
of the present SIRS.

Thirdly, in reviewing the literature, it becomes obvious that two
prevalent concerns involving student rating forms are their reliability
and validity. It would be unwise to construct an instrument without

reviewing the available literature in these areas.

21

22

One final area, that has to do with both reliability and validity of
evaluation forms that researchers have not addressed to a great degree, is
the differences between departments, colleges and classes within a univer-
sity. In order to support the need for class specific evaluation items, it
is necessary to take a look at this literature.

For completeness, this chapter will therefore review the literature
in the following areas: 1) the reliability of student ratings, 2) the
validity of student ratings, 3) comparative data, 4) differences in item

types, 5) the present format of the SIRS, Level II.

RELIABILITY

 

With relation to the test and measurement theory, Ebel (1972) defines
the term "reliability" to be the consistency with which a set of test scores
measure whatever they do measure; which is the extent to which an instrument
consistently measures a construct. The reliability of a student rating in-
strument would then refer to the ability of students to make unbiased judg-
ments of a teacher's performance. A reliability coefficient for a set of
ratings from a particular group of students is the correlation coefficient
between that set of ratings and another set of ratings, on an equivalent
rating form obtained independently from.the same group of students. At
least four methods have been used in testing theory for obtaining reliability
estimates, namely; test-retest equivalent forms, split halves, Kuder-

Richardson, and rater reliability techniques.

23

Stability Over Time

Research in the student rating field has been done using all of the
above specified methods of obtaining reliability estimates. It is commonly
accepted that student evaluations achieve reliable results (Costin, Green-
ough 8 Manges, 1971). As early as 1954, Guthrie (1954) found correlations
of .87 and .89 between students' ranking of the quality of their teachers
from.one year to the next. Lovell and Bauer (1955) found the correlations
between ratings made two weeks apart to be .89, while later Costin (1968)
found correlations ranging from .70 to .87 between midesemester and end-of-
semester ratings. Recently, in a study using medical students, no differences
in student ratings were found when students filled out evaluation forms both

before and after final examinations (Canaday, Mendelson & Hardin, 1978).

Internal Consistency

 

There have been many research articles expounding the need for internal
consistency of an instrument as well as the consistency or stability across
time. The early 1950's reported correlations ranging from .77 to .94 when
the ratings of students in a given class were randomly paired (Guthrie, 1954;
Maslow & Zimmerman, 1956). Lovell and Haner (1955) found that the mean odd-
item ratings on a forced-choice instrument correlated .79 with the mean even-
item ratings. Internal consistency correlations for the Illinois Course
Evaluation Questionnaire (Spencer, 1968; Spencer & Aleamoni, 1970) averaged
.93 for 16 different courses.

Some forms include an item asking the student to give a global rating

of the course. Correlations have ranged from .69 to .93 when this global

24

item.bas been correlated with the remaining specific items (Harvey 8
Barker, 1970).

However, the concept of halo effect becomes intertwined with that of
internal consistency when considering studies involving the use of rating
scales. Halo effect is defined as the tendency to be influenced in making
a specific judgment by a general impression of the individual being judged.
Unfortunately, while the existence of this tendency is generally recognized,
its measurement is extremely complex and involves the correlation, for each
instructor, of each trait with each of the other traits. Stalnaker and
Remmers (1928) recognized this problem as early as 1928 and calculated such
a set of intercorrelations. The average intercorrelation was .45, which
indicates no large presence of halo effect.

Obviously, the problem with internal consistency and halo effect re-
volves around a causality factor. Could a halo effect be causing internal
consistency? Is internal consistency a desirable characteristic in student
rating forms? The answer to the first question could often times be yes.
The latter question is more complex. If an instrument is unitrait in nature,
then internal consistency would be very desirable. However, if the instru-
ment is of a multitrait nature, then internal consistency, as has tradition-

ally been defined in testing theory, is not a desirable characteristic.
Analysis of variance Techniques

Recently, work has been done using an analysis of variance technique
when estimating reliability coefficients (Kane, Gillmore 8 Crooks, 1976;

Gillmore, Kane 8 Naccarato, 1978). The basis for this technique was

25

explicated by Ebel (1951) in "Estimation of the Reliability of Ratings."
The mean square estimates are used from the analysis of variance (ANOVA)
table in a formula to obtain the reliability of ratings. There are
several advantages to this technique:

1) It permits the investigator to decide if he wants to include
the between rater variance as part of the error variance;

2) It is possible to use incomplete ratings;

3) It is possible to use unequal sets of ratings.

Kane, Gillmore and Crooks (1976) use the ideas of ANOVA and generali-
zability theory to present what they consider a more comprehensive view
of the dependability of student evaluations of instruction (Cronbach,
Gleser, Nanda 8 Rajaratnam, 1972). The use of generalizability theory
leads to three different estimates that partition different error variances
and.have three express purposes: generalizing over both students and items;
generalizing over students only; generalizing over items only. Each of
these three coefficients is a legitimate estimate of the dependability of
student course evaluations. The coefficient which should be used depends
upon the purpose of the study and the desired universe of generalization.
Generally, when evaluating teaching effectiveness, the most appropriate
coefficient to use in analyzing course evaluation questionnaire data is
the error variance associated with students and items, namely, generalizing
over students and items.

However, in a 1978 article, Gillmore, Kane and Naccarato (1978) found
that generalizing over students and items but not over courses or teachers
yields highly dependable results. Generalizing over courses as well as

items and students, with the teacher as the unit of analysis, yields

26

moderately dependable results. Generalizing over teachers, items, and
students with the course as the unit of analysis yields very low dependa-
bility. There are many implications here for tenure and curricular

decision makers.
VALIDITY

The broad issue of validity includes all factors which may contribute
to or detract from the usefulness of student opinions about instructional
effectiveness obtained through ratings. The issue of the validity of
student rating forms is much less precise than the reliability issue.
Generally defined, validity is the extent to which an instrument measures
the construct it purports to measure. An instrument must be reliable if it
is to be valid. However, reliability is not a sufficient condition for
validity. Obtaining evidence to support the validity of student rating
forms is difficult and the results are usually tenuous at best.

Validity can be broken down into many facets of interest (Ebel, 1972).
Mehrens and Lehmann (1973) delimit only three kinds of validity: 1) content
validity-related to how adequately the content of the instrument samples the
domain about which inferences are to be made, 2) criterion-related validity -
pertains to the empirical technique of studying the relationship between the
responses made to the instrument and some independent external criteria, 3)
construct validity - the degree to which the instrument responses can be
accounted for by certain explanatory constructs in psychological theory.

The above definitions are very encompassing, but actual measurement of any
type is difficult. There is more and more demand for quantitative measure-

ment of validity by students, faculty and administration.

27

Correlational Studies

Several studies attempting to quantify the validity of measurements

utilize correlations between the qualities an instrument purports to

measure, and certain explanatory constructs that at least partially

explain performance on an instrument.

Correlations between many sets of variables have been found useful.

Following are six areas that correlational results have fallen into:

I.

II.

supervisors' or Colleagues' Ratings and Student Ratingg

 

The results have not been consistent over studies with regard to

this area. Costin (1966) found a significant correlation of .49
between students' and supervisors' ratings. Both Guthrie (1949, 1954)
and Maslow and Zimmerman (1956) found correlations of +.30 to +.63
between students' ratings and colleagues' evaluations of the same
teachers. Breed (1927) at the University of Chicago reported a high
correlation between fifty-six faculty members and a hundred students
on the qualities of good teaching. Tolor (1973) found student ratings
correlate moderately well with ratings by colleagues. Tang and Feld-
husen (1974) and Starrack (1934) found student ratings correlate
moderately well with expert evaluators. On the other hand, Webb and
Nolan (1955) found no significant correlation between student ratings
and supervisors' judgments of teaching performance in a military

school.

Student Ratings and TeachingLExperience
Again, several studies have shown results varying from significant

negative correlations to significant positive correlations with regard

III.

28

to student ratings and teaching experience. Rayder (1968) found
that younger faculty, with less rank and fewer years teaching
experience received higher student ratings than older faculty, with
more rank and more years teaching experience. Both Heilman, Armen-
trout (1936) and Guthrie (1954) failed to find any difference due
to the experience of the instructor. In contrast, Clark and Keller
(1954), Guthrie (1949, 1954) and Walker (1969) found a significant

positive relationship.

Student Ratings and Student Achievement

 

Several studies have attempted to ascertain the presence or absence
of a correlation between student achievement and student ratings.
Rayder (1968) and Blum (1936) found no correlation between grade

point average and student ratings.

However, in a review article (Costin, Greenough 8 Manges; 1971), a
substantial number of investigations found significant positive
relationships, although typically weak, between students' grades
and their ratings of instructors or courses (Cohen 8 Burger, 1970;
Lathrop 8 Richmond, 1967; Lathrop, 1968). Riley (1950) found that
students of low academic standing rate their instructors more
rigorously than those of a relatively high academic average.
Aleamoni (1972) in a review of studies on the Illinois Course
Evaluation Questionnaire found a positive correlation between
grades and course evaluations. More recent research has found
results similar to the previous research in this area. It was

found that three factors, student accomplishment, presentation

IV.

29

clarity and organization in planning, correlated highly with the
final exam score (Fry, Leonard, Beatty, 1975). Marsh (1975) found
a positive correlation between the average evaluation and grade
point average in a large multi-section course. This course had
several sections using different instructors, but all sections

were tested with a common examination. Centra (1977) found a
significant positive correlation between examination performance
and many variables on a student instructional rating form. Another
direct significant relationship was found by Canaday, Mendelson and
Hardin (1978) between the course ratings of medical students and

their achievement in that course.

Student Ratings and Instructor Self—Ratings
Webb and Nolan (1955) found a significant positive correlation between

student ratings and the instructors' self-ratings.

Student Ratings and Gains in Knowledge

 

This area probably has intrinsic appeal to those who use mastery
learning techniques and test by use of objectives. However, because
of the difficulty in carrying out such a project, little research has
been done. One of the few studies found that gains in information
and practical "job sample" performance were positively and signifi-
cantly correlated with their overall ratings of the course (Morsh,

Burgess 8 Smith, 1956).

Student Needs and Teacher Orientation
Researdh in this area addressed the issue of viewing the act of

student rating as an instance of person perception in which the

30

needs of students were held to affect their perception of teachers
(Tatenbaum, 1975). It was found that specified student needs were
significantly related to ratings of specified teacher orientations

congruent with those needs.

VII. Student Ratingsﬁand Academic Dsgree of the Instructor

Studies dealing with the academic degree of an instructor are some-
what harder to find. Riley (1950) reported that those instructors
who possessed the doctorate were rated higher than those instructors
who did not possess the doctorate. A study by Downie (1952) also
gave higher ratings to those instructors who held the doctorate

degree.

Extrinsic Variables

Many colleges routinely obtain information on variables extrinsic to the
instructional process, such as class size, level of course, student's major,
year in college, sex of student, and so forth. Again, the results have not
been consistent with respect to any one of these variables. The review
article by Costin, Greenough, and Manges (1971) gives many references for

the inconsistent results:

I. Sex of Student
A number of studies found no significant differences in the overall
ratings of teachers made by male and female students, or in the
ratings received by male and female teachers (Downie, 1952; Heilman
8 Armentrout, 1936; Rayder, 1968). On the other hand, there are

studies that report a slight tendency for female students to be more

II.

III.

31

critical of their male instructors than male students. Also, it
was noted that female students rated their female instructors
significantly higher than their male instructors (Walker, 1969;

McKeachie, Lin, 8 Mann, 1971).

Required Courses

Riley (1950) found differences between departments with regard to
whether a course was required or an elective. He found that in
the arts, whether a course is taken as required or as an elective
makes little difference in student evaluation. However, in the
sciences there is a tendency for students taking required courses
to rate their instructor higher than in the case of those taking
elective courses. In contrast, in the social sciences, higher
ratings are more often given by those taking the course as an

elective.

Cohen and Humphreys (1960) found that students required to take
psychology courses tended to rate them lower than students who
elected to take them. Gage (1961) found teachers of required
courses received significantly lower student ratings than did
teachers of elective courses. In contrast, Heilman and Armen—
trout (1936) reported no differences between the ratings of

students in required courses and those in elective courses.

College Year
Heilman and Armentrout (1936) reported no significant relation-
ship between college year and ratings assigned teachers. In

contrast, Villano (1978) and Downie (1952) found advanced level

32

courses tended to be rated higher than low and mid level courses.

IV. Class Size

 

It has often been suggested that instructors of large classes
receive lower evaluations than teachers of small classes. This
opinion is supported by Villano (1977) and Downie (1952).
Downie (1952) found that instructional procedures were rated
more harshly in large classes. Villano (1977) found small
classes (thirty students or less) fared significantly better
than medium and large sized classes with respect to student
ratings. In contrast, Soloman (1966) and Heilman and Armen—
trout (1936) did not find class size to have any effect on

student ratings.

In the past few years, several of these extrinsic variables have been
analyzed using multivariate designs. Studies using multiple regression
have included from three to six extrinsic variables in the list of depen-
dent variables (Danielsen 8 White, 1976; Rose, 1975; Wood 8 DeLorme, 1976).
Rose (1975) found that class level did contribute significantly to the
multiple R,in stepwise multiple regression. While Danielsen and White
(1976) felt the extrinsic variables make little systematic contribution to
the overall rating of the instructor. In any event, the results lend
support to the premise that it may be deceiving to postulate a simple
relationship between teaching effectiveness and extrinsic or intrinsic
variables.

Pohlman (1975) used canonical correlation in a manner similar to the

use of multiple regression in the above three studies. A relationship

33

'was found to exist between the number of outside-of—class study hours
required of students and the student ratings. The percentage of students
taking the course as an elective was also found to be positively related

to student ratings.
Student-Teacher Pairings

To confuse the validity issue even further, it has been suggested
that an aptitude treatment interaction may exist. Cunningham (1975) found
that certain types of teachers interact better with certain types of

students.
Entertainment

A frequent argument against the validity of student ratings is that
students may judge an instruction on the basis of how entertaining the
instructor was. Here again, research has come up with differing conclu-
sions with respect to this issue (Costin, Greenough 8 Manges, 1971).
Recent research by Williams 8 Ware (1977) has used an actor to deliver to
equivalent groups of students two lectures. The lectures vary in sub-
stantive teaching points covered and the expressiveness of the presenta-
tion. There were three levels of substantive teaching points; high,
medium, and law. There were two levels of expressiveness of presentation,
high and low. The results of this study suggest that student ratings of
highly expressive instructors are always higher than those of the low
expressive instructor, regardless of content coverage. Meier and Feld-
husen (1978) replicated part of the above Williams and Ware research using

two levels of expressiveness of presentation (high and low), and two levels

34

of substantive teaching points (high and medium). It was found that in-
structor expressiveness had a major impact on student ratings of global
satisfaction. Student ratings on the global satisfaction scale were
much higher for the high expressive instructor than the low expressive
instructor. The manipulation of lecture content (high vs. medium) did
not significantly influence most of the items on the student rating
scales.
Classroom Seating Patterns

0

Classroom seating patterns are a relatively underexplored topic of
student behavior. Gur, Gur and Marshalek (1975) found seating preference
of college students apparently related to cerebral dominance (left or
right hemisphere) and to handedness. They also noted students would change
from one side of the room to the other depending on the difficulty of the
subject. Beyond this type of study, there has been little but conjecture
as to how seating patterns relate to student ratings.

However, Owen (1978) recently has investigated seating patterns with
regard to student evaluations. He found that seating patterns had a slight
influence on student ratings of instructor effectiveness. It should be
mentioned that Owen's feared the results were idiosyncratic to the compo-

sition of the particular class of students under study.
Use of Discriminant Analysis

In an attempt to consider as many aspects of validity as possible, it
is fitting to include a study utilizing discriminant analysis. Marsh (1977)

has used data from graduating seniors who nominated instructors as "most

35

outstanding" or "least outstanding" in conjunction with student evaluations
from the following year. The validity of student evaluations of instruction
was upheld in this study by statistical significance between "most out-

standing" and "least outstanding" being reflected in both data sources.

COMPARATIVE DATA

 

One area, that has to do with both reliability and validity of evalur
ation forms, that researchers have not addressed to a great degree is the
differences between departments, colleges and classes within a university.
More than likely, the lack of published research on the topic of differences
between colleges is somewhat politically motivated. After all, who is going
to state that the professors from one college are inferior to those of

another college in terms of teaching skills?
Between Colleges Within A University

Centra and Linn (1976) in an attempt to investigate student points of
view in their ratings of specific courses and instructors utilized a sample
consisting of a natural science, a social science and a humanities class.

A discriminant analysis was run on the three classes separately. Each item
on the student evaluation instrument was correlated with the four discrimi-
nant functions for the three separate classes. The correlations varied
across the three classes. These varying correlations give evidence to the
premise of students' perceptions varying across these three classes which

were from different colleges.

36

An alternative technique of comparing colleges would be to run indivi-
dual factor analysis on colleges within a university and compare what items
comprised predominant factors and the percent of variance accounted for by
different factors. A.study using the student as the unit of analysis, com-
pared the factors of three colleges; business, engineering, and education
(Wilson 8 Wilson, 1977). The results obtained showed a consistency between
colleges in the factors. However, the percent of variance accounted for
by each of these factors altered considerably between colleges.

In the past, the individual student response to each item.has been
used as the unit of analysis in the factor analysis of student ratings of
instruction. However, recent research has questioned this as the correct
unit of analysis (Doyle 8 Whitely, 1974; Linn, Centra 8 Tucker, 1975;
Whitely 8 Doyle, 1976). It was noted that there are three possibilities
contending for the unit of analysis; individual student response for each
item, the classroom mean on each item, and the within classroom deviations
from the classroom mean on each item.

It was because of the question of the appropriate unit of analysis
that further research by Wilson and Wilson (1978) was attempted to run
separate factor analyses on student rating forms from the colleges of
business, engineering, and education. Each factor analysis was run
separately using the differing units of analysis: 1) the individual
student response for each item, 2) the classroom mean for each item, 3)
the within classroom deviation from.the classroom mean on each item. The
results were similar to those found earlier by Wilson and Wilson (1977).

The factors, no matter what unit of analyses used, were consistent across

37
colleges but the percent of variance attributed to each factor varied.
Between Classes Within A University

This issue has not been attacked directly as yet. There are many
validity studies that were listed under validity in this chapter that
have looked at extrinsic variables and their effect on student evaluations
of instruction (Costin, Greenough 8 Manges, 1971; Danielsen 8 White, 1976;
Pohlmann, 1975; Rose, 1975; Wood 8 DeLorme, 1976). However, these are an
attempt to look at individual differences rather than class differences.

The author has been unable to locate any studies that compare instrur
ments designed specifically for particular types of classes (i.e., large

classes vs. small classes) with generally available instruments.
DIFFERENCES IN ITEM TYPES

The items appearing on rating forms were once characterized as broad,
requiring much inference on the part of the observer and reader. Ratings
on teacher warmth, overall effectiveness, clarity, or enthusiasm require
high inference. However, with the advent of item banking systems in
student ratings, items are no longer only high inference in nature. Items

" "teacher

like "the teacher uses students' ideas," "teacher criticizes,
listens carefully to student" do not require a large inference on the part

of the observer or rater.
Categgrizingsltems for Student Rating Forms

There have been two attempts to categorize different item types that

would be useful in evaluating instruction. Below is a description of these

38

differing systems:
1. As early as 1970, Rosenshine (1970, 1973) discussed high—
inference and low-inference measures as a system that could be
useful in evaluating instruction. He broke item types into
these two categories following a convention previously used by
Gage (1969). The convention uses "inference" to refer to the
process intervening between the objective data seen or heard
and the judgment concerning a higher order construct of cogni-
tive or social interaction. High inference measures are those
which require considerable inferring from.what is seen or heard
in the classroom to the labelling of the behavior, such as ratings
of the teacher on such scales as "partial-fair," and "dull-
stimulating." Low-inference measures are those which require
the observer to classify teaching behaviors according to rela-
tively objective categories. Examples of these behaviors are very
quantifiable, such as words per minute, movements per minute.
Items somewhere between high-inference and low-inference would
be labelled moderate-inference and refer to such items as

' and "teacher criticizes."

"teacher listens carefully,'
Ratings on high-inference variables generally correlate
highly with global items on student rating forms probably be~
cause such measures allow a rater to consider more evidence
before making a decision (Rosenshine, 1973). The results of
low-inference measures would be easier to use in teacher

training programs because variables can be translated into

specific behaviors.

39

2. Smock and Crooks (1973) utilized a system similar to the one
described above. However, Smock and Crooks (1973) felt that the
evaluative data being collected can and should vary according to
the intended function of the evaluation and the people doing the
evaluating. They identified three major types, or levels, of
evaluation. The first (Level I) is general, summative evaluation,
which will be concise and allow broad, general comparisons to be
made across departments, but will give little or no specific
information to guide instructional improvement. The second

(Level II) is evaluation aimed at identifying success or failure in
general areas or attributes of instruction. The third (Level III)
is detailed course specific evaluation aimed at providing diagnostic
information about instructional problems.

Smocks and Crook (1973) specify that it should not be expected
that all evaluative information would be available to all audiences.
For example, some types of data would be available only to the
faculty members involved, and information sent to administrators

may be in summary form.

validation of Varyingiltem Types

In the past few years, several attempts have been made to identify the

components of high-inference ratings of instructors by correlating a high-

inference rating of teacher effectiveness with ratings obtained on items

reflecting more specific instructor attributes.

Pohlmann (1975) attempted to correlate twenty low-inference items with

a gldbal item of teacher effectiveness. Cushman and Frederick (1976) in a

40

similar fashion found positive correlations existing when they identified
twenty-eight specific teaching behaviors and correlated them with seven
general evaluation items. Olson (1978) used another similar approach to
correlate specific behavior type items with general high-inference items.
All of the above studies found many existing high correlations. It was,
therefore, felt that it is possible to develop sets of behavior-specific
items for student instructional rating forms that are more useful in a
diagnostic sense. It should be noted that the behavior specific items
used in the above studies represent Rosenshine's moderate-inference items.
Recent research at the University of Illinois at Urbana-Champaign has
attempted to validate an item classification system.(Bradenburg, Derry,
Hengstler; 1978). A prOposed classification scheme was developed for the
Instructor and Course Evaluation System (ICES), a cafeteria-like system
of student ratings. This system involves classifying items by content
and specificity. Therefore, items vary from very general in nature to
very specific in nature. Both general and specific items are available
under each content area. The results of the above study confirmed the
conclusions made by the previous studies. Correlations do exist between

general and specific items on student rating forms.
FORMAT

Many formats have been attempted in developing student rating forms.
A previous dissertation in this area has an extensive review of the litera-
ture (Showers, 1973). Five conclusions obtained from the Showers' litera-

ture review follow:

41

l. The optimal number of options for each question is five to seven.

2. The presence of a neutral point increases the ambiguity of the
scale.

3. Reduction in leniency bias due to reversing the direction of
the scale within a questionnaire may increase the errors in
rating.

4. Numeric, sentence, or paragraph cue lengths may reduce leniency
bias, if the cues are not too long, but cue length has no
apparent effect on the rater reliability of untrained groups
of raters.

5. Leniency bias may be reduced by the presence of more favorable
than unfavorable response options.

Her dissertation compared three different response cue formats in an
attempt to discover which one was least susceptible to response bias. The
three response cue formats considered were the Likert, evaluative and
descriptive. The results showed that the evaluative format items in
instructional rating scales were less prone to leniency bias and had
rater reliabilities comparable to Likert and descriptive formats, making

them the best choice of the three formats on an existing rating scale.

SUMMARY

The issues of reliability and validity of student rating forms have
been examined from many angles. The reliability of such instruments has
been shown to be both stable over time and internally consistent. However,
the validity issue is a hodgepodge of conflicting information. The corre-
lational studies are an excellent example of this phenomenon. In an attempt
to tease out the consistent results from the inconsistent results, three

separate lists follow.

42

Inconsistent Results

 

1. Even though many studies found a positive correlation between
supervisors' or colleagues' ratings and student ratings, there is
still some research to suggest no relationship exists.

2. A.weak positive correlation between student ratings and student
achievement was found in some studies, while in other studies no
correlation existed.

3. With reference to whether the course was required or not brought
inconsistent results. At times, no relationship exists between this
variable and student ratings. In some studies, courses are rated
higher if they are electives, in other studies this relationship is
reversed.

4. Some research leads one to believe advanced classes get higher
ratings, while other research studies do not find this relationship
to exist.

5. Class size has sometimes been found to effect student ratings
and other times not.

6. The research on the sex of the student and student ratings has

been highly inconsistent in nature.

Consistent Results

 

1. Research appears to uphold the point of view that instructors
that hold the doctorate degree fare better on student ratings than

those instructors who have not earned the doctorate.

43

Inconclusive Results

The following results, although not inconsistent, had not been researched
enough to be considered consistent in nature.

1. The positive relationship between student ratings and instructor

self ratings.

2. The relationship between student ratings and gains in student

knowledge.

3. The relationship between student needs and teacher orientation.

4. The entertainment issue.

5. The relationship between seating patterns and student evaluations

of instruction.

Again, little research has been done with regard to the differences
between departments, colleges and classes within a university. The avail—
able literature supported the premise that students' perceptions do vary
across colleges. However, no literature has been located comparing different
classes within one department.

Research in the area of varying item types strongly suggests that
specific items (moderate-inference or Level II) correlate highly with
general items found on the traditional high-inference type rating form.
Because much inference is necessary on the part of the observer and the
reader for high inference items, items of moderate inference could be
more beneficial to the instructor in the diagnostic sense.

The results and literature review of a previous dissertation (Showers,
1973) delineated what response cue formats are appropriate to minimize

leniency bias.

CHAPTER III

PROCEDURES AND DESIGN

INTRODUCTION

 

The purpose of the present study is to compare two types of student
evaluation of instruction instruments. One of the forms is the standard
Level II, Form B student rating form given to approximately 15,000 students
per quarter at Michigan State University. The Level II is a general form
developed on a general population. The comparison instrument will be one
developed for a specific department. The comparison instrument will con-
tain five to ten core items that will be used on every instructor's student
evaluation form, and 10-20 items specific to individual class situations.
For example, questions directed at large classroom instruction, demographics,
and instrument satisfaction. The purpose of the instrument satisfaction
question is to obtain an index comparing students' perceptions of the two
instruments.

In order to make comparisons between the instruments, it is necessary
to compute: (1) the average item variance for each instrument within a
particular class, and (2) the average item variance between classes. The

reason for these computations is based on the hypotheses that a good rating

44

45

instrument will have a small variability within one class on a particular

item, e.g., students perceive the instructor's performance on a particular
item similarly. There should be variability between classes on a particu-
lar item, e.g., students can differentiate among instructors' performance

on a particular item.

The better evaluation instrument would have a smaller average item
variance within classes. The better instrument would have larger item
variability between classes.

It will also be desirable to make direct comparisons between the core
and class specific items in the proposed instrument. To make this compari-
son, more variance calculations are needed. First, the average item
variance of both the core items and class specific items for each indivi-
dual class, and second, the average item variance of both the core items
and the class specific items among classes.

It is expected that the average item variances will differ with re-
spect to item type on the proposed instrument. The class specific items
will have a smaller average item variability than the core items within a
specific class. However, the class specific items will have a larger

average item variability than the core items between classes.
SAMPLE

The sample is taken from the Marketing and Transportation Department
at Michigan State University. Generally, the School of Business appears
to exhibit some skepticism toward student evaluation forms. This skepti-

cism is operationalized by their haphazard use of student rating forms to

46

meet minimal university requirements. However, the Business School is also
aware of the need for feedback on course improvement as well as information
for tenure and promotion decisions. Therefore, it is a prime opportunity
to approach them for some individual attention with regard to evaluating
instruction.

Four separate undergraduate classes from the Marketing and Transporta-
tion Department are included in the study. A description of the four
classes are:

(1) Personal Selling (MTA 311): Two sections of this course were

 

included in the study. Each section was taught by a different
instructor utilizing varying techniques of instruction. Section
1 consisted of 96 Juniors and Seniors. The teaching format was
mainly lecture with four days at the end of the term set aside
for group project presentations. Section 2 consisted of 80
Juniors and Seniors. The course format included approximately
60% lecture and 40% conference leadership technique. The con-
ference leadership technique involved the use of student groups
having total reaponsibility for certain class presentations.

(2) Sales Management (MTA 313): One section of this course was

 

included in the study. This course is a case oriented class with
only 39 Juniors and Seniors. The format of the sales management
class consisted of both lecture and discussion. The main focus of
this type of course is concerned with applying particular case
solutions to more general real world situations.

(3) anntitative Business Research Methods (MTA 317): This is a

 

large quantitative lecture class consisting of 405 Juniors and

47

Seniors. This class meets twice a week in main lecture and twice

a week in smaller recitation sections. These recitation sections

spend time reviewing homework and clarifying points from the main

lecture.

(4) Transportation Planning and Policies (MTA 341): Two sections

of this class are included in the present study. Both classes

consist of Juniors and Seniors and utilize the same basic format.

The format is basically one of lecture with some discussion.

Section 1 contains 57 students while section 2 contains 28 stu-

dents.

Even though it was necessary for the instructors to volunteer their
services, it should be noted that the selection was made so that the classes
were diverse. A perusal of the course names and format show that the courses
vary on several dimensions. These dimensions include class size, lecture
versus discussion, qualitative versus quantitative, and inclusion or exclu-
sion of recitation sections. This diversity was necessary in developing
specific class items. .It should also be noted that the varying teaching
techniques used in MTA 311 will necessitate separate forms for each section.

Therefore, there will be five specific class instruments.

INSTRUMENT DEVELOPMENT

 

In order to develop five Specific instruments containing both common
core and class specific items, it is necessary to develop a strategy for
selecting items. The present literature deals with the development of

general evaluation instruments. Therefore, the available developmental

48

strategies had to be re—evaluated and modified for selection of class
specific items. A technique explained by Wotruba and Wright (1975) cone
sists of four steps and is indicative of the literature. These steps are

as follows: (1) Development of an item pool, (2) Screening of the item pool,
(3) Surveying the item's importance for inclusion, (4) Choosing the items
for the final instrument. These four basic steps will be used in the pre-

sent study.

STEP I - DEVELOPMENT OF AN ITEM POOL

 

Initial Interview

 

An initial interview was devised to ascertain the types of items that
the instructor and students of a particular class felt should be included
in an instrument specifically designed for that particular class. Each in-
structor and six to ten student volunteers were interviewed from each class.
The respondents were given copies of the items in use on the present Level
II form. These items are reproduced in Table 3.1. The items were clustered
by previously defined factors (Office of Evaluation Services, 1973). The
interviewer was directed to respond to each general factor. The respondent
was asked two questions in reference to each factor:

Question 1: "Is this factor relevant in evaluating instruction
in this specific class?"

Question 2: "If an answer of yes is given to question 1, is this
factor relevant in evaluating the instruction of all
courses taken at Michigan State University?"

After responding to both questions for each factor, the student and in?

structor volunteers were asked to list any other items that would be helpful

in evaluating this class in particular.

II.

III.

IV.

49

TABLE 3.1

SAMPLE ITEMS

 

Instructor Involvement

(l) The instructor was enthusiastic when presenting course material.

(2) The instructor seemed to be interested in teaching.

(3) The instructor's use of examples or personal experiences helped
to get points across in class.

(4) The instructor seemed to be concerned with whether the students
learned the material.

Student Interest

(1) You were interested in learning the course material.

(2) You were generally attentive in class.

(3) You felt that this course challenged you intellectually.

(4) You have become more competent in this area due to this course.

Student Instructor Interaction

(l) The instructor encouraged students to express opinions.

(2) The instructor appeared receptive to new ideas and others'
viewpoints.

(3) The student had an opportunity to ask questions. ~

(4) The instructor generally stimulated class discussion.

Course Demands

(1) The instructor attempted to cover too much material. “

(2) The instructor generally presented the material too rapidly. x

(3) The homework assignments were too time consuming relative to ‘
their contribution to your understanding of the course material.

(4) You generally found the coverage of topics in the assigned
readings too difficult.

A

Course Organization

(1) The instructor appeared to relate the course concepts in a
systematic manner.

(2) The course was well organized.

(3) The instructor's class presentations made for easy note taking.

(4) The direction of the course was adequately outlined. >

X!

50

STEP II - SCREENING THE ITEM POOL

 

From the initial interview, two outcomes unfolded. First, in terms of
a general instrument it was found that factor III, Student Instructor Inter-
action, was felt to be irrelevant by more than 25% of the sample. While
the other four factors were considered appropriate for all classes by at
least 90% of the sample. Secondly, from the class specific items the
students had been asked to list, the makings of a class specific item.pool

started to materialize.

PRETEST - ITEM POOL SELECTION

 

The final pretest instrument was administered to all the students pre-
sent in MTA 313, MTA 341 and Section 2 of MTA 311. On the day of admini-
stration only two recitation sections of MTA 317 were included and part of
Section 2, MTA 311. The instrument was divided into two parts. The first
part was a general part that was administered to every class (refer to
Table 3.2). The general item list included all the items from Factor 1, 2,
4 and 5. This list was considered to be conclusive due to the previous re-
search work in developing the Level II instrument (Office of Evaluation
Services). Two questions were posed for each item:

Question 1. If you were to construct a general student course
appraisal sheet, would you include this item?

(1) Definitely

(2) Probably

(3) Uncertain

(4) Probably not
(5) Definitely not

51

TABLE 3.2

FINAL PRETEST ITEMS FOR
THE GENERAL INSTRUMENT

(l) The instructor was enthusiastic when presenting material.
(2) The instructor seemed to be interested in teaching.
(3) The instructor's use of examples or personal experiences helped
to get points across in class.
(4) The instructor seemed to be concerned with whether the students
learned the material.
(5) You were interested in learning the course material.
(6) You were generally attentive in class.
(7) You felt that this course challenged you intellectually.
(8) You have become more competent in this area due to this course.
(9) The instructor attempted to cover too much material.
(10) The instructor generally presented the material too rapidly.
(11) The homework assignments were too time consuming relative to
their contribution to your understanding of the course material.
(12) You generally found the coverage of topics in the assigned
reading too difficult.
(13) The instructor appeared to relate the course concepts in a
systematic manner.
(14) The course was well organized.
(15) The instructor's class presentation made for easy note taking.
(16) The direction of the course was adequately outlined.

52

Question 2. Evaluate this course using this item:

(1) Superior: exceptionally good instructor or course.
(2) Above Aversge: better than the typical instructor
or course.

(3) Average: typical instructor or course.

(4) Below Aversge: ‘worse than the typical instructor
or course.

(5) Inferior: exceptionally poor instructor or course.

Question 1 was used to estimate how important the respondents felt the
itemuwas, while Question 2 was developed to ascertain the within class
variability in rating each item.

The second part was a class specific part consisting of 25 to 30 items
designed for individual classes. There were separate class specific parts
for MTA 317, MTA 313, MTA.341 and each section of MTA 311. The items
included for use on these class specific forms were chosen from a pool of
items. This pool contained six separate categories: Grading and Exams,
Instructional Assignments and Material, Student Outcomes, Recitation Sections,
Instructional Environment, Instructor Characteristics and Style. These items
were selected from several sources in conjunction with the initial interviews.
These sources included: "SIRS Technical Bulletin" (Office of Evaluation
Services, 1969), "Behavior Specific Items for Student Evaluation of Instruc-
tion" (Olson, 1978) and "ICES Item Catalog" (university of Illinois, 1977).
The class specific items for each class are presented in Table 3.3, 3.4,

3.5, 3.6 and 3.7.

The students were asked to respond to each of the following four ques-
tions for each class specific item. The questions follow:

Question 1: If you were to construct a student course appraisal

sheet for this course, would you include this item?
(1) Definitely

(2) Probably

(3) Uncertain

(4) Probably not
(5) Definitely not

53

TABLE 3.3

FINAL CLASS SPECIFIC PRETEST ITEMS
MTA 311
SECTION 1

THE INSTRUCTOR'S:

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)

Use of humor

Use of overhead

Use of handouts

General appearance

Willingness to spend extra time with you
Availability during office hours

Relationship of course material to everyday life
Maintenance of an informal classroom
Integration of reading material

THE COURSE'S CONTRIBUTION TO YOUR:

(10)
(11)
(12)
(13)
(14)

Understanding of the day to day workings of a field representative
Obtainment of a general knowledge in the field

Understanding of concepts and principles in the field

Ability to communicate clearly on the subject

Ability to solve real problems in the field

FOR THIS COURSE:

(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)

Classroom atmosphere

Interaction of your project group

Appropriateness of text

Appropriateness of instructional materials

Use of time for assignment completion

Appropriateness of emphasis placed on group project
Beneficialness of homework assignments

Appropriateness of exam format

Appropriateness of case study

Appropriateness of amount of time given to group projects
Group projects were applicable to real life situations
Organization of lecture material

The group members shared the work equally

54

TABLE 3.4

FINAL CLASS SPECIFIC PRETEST ITEMS
MTA 311
SECTION 2

THE INSTRUCTOR'S:

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)

Encouragement to students to express opinions
Receptiveness to new ideas and others' viewpoints
Stimulation of class discussion

Use of humor

Use of overhead or chalkboard

Use of handouts

General appearance

Willingness to spend extra time with you
Relationship of course material to everyday life
Maintenance of an informal classroom
Integration of reading material

THE COURSE'S CONTRIBUTION TO YOUR:

(12)
(13)
(14)
(15)
(16)

Understanding of the day to day workings of a field representative
Obtainment of general knowledge in the field

Understanding of concepts and principles in the field

Ability to communicate clearly on the subject

Ability to solve real problems in the field

FOR THIS COURSE:

(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)

(28)
(29)

Appropriateness of the text

Classroom atmosphere

Appropriateness of instructional materials

Beneficialness of homework assignments

Appropriateness of exam format

Appropriateness of case study

Organization of lecture material

ApprOpriateness of conference leadership technique

Interaction among group members in conference leadership group
Beneficialness of group discussion

Appropriateness of amount of time spent on conference leadership
technique

Appropriateness of proportion of final grade accounted for by
conference leadership technique

Appropriateness of tests

55

TABLE 3.5

FINAL CLASS SPECIFIC PRETEST ITEMS
MTA 313

THE INSTRUCTOR'S:

(1)
(2)
(3)
(4)
(5)

(6)
(7)
(8)
(9)
(10)

Encouragement of students to express opinions
Receptiveness to new ideas and others' viewpoints
General stimulation of class discussion

Use of humor

Clarification of the relationship between course material and
everyday life

Maintenance of an informal classroom

Help in improving my problem solving ability
Effectiveness in preparing students for exams
Grading procedure

Use of visual aids

FOR THIS COURSE:

(11)
(12)
(13)
(14)
(15)

(16)

The case text

The inter-relationship of the two texts

Appropriateness of cases

Readings on reserve

The appropriateness of group discussion for further understanding
of course concepts

No particular group of students monopolized discussion

THE COURSE'S CONTRIBUTION TO YOUR:

(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)

General knowledge in the field

Understanding of concepts and principles in the field
Ability to apply principles to new situations

Ability to communicate clearly on this subject

Ability to solve real problems in the field

Ability to communicate in class

Ability to organize ideas

Ability to apply particular case ideas to general situations

56

TABLE 3.6

FINAL CLASS SPECIFIC PRETEST ITEMS
MTA 317

THE INSTRUCTOR'S:

(1)
(2)
(3)
(4)
(5)

(6)
(7)

Use of humor

Use of overhead

Willingness to spend extra time with you

Availability during office hours

Clarification of the relationship between the course material
and the real world

Maintenance of an informal classroom

'Maintenance of a formal classroom

THIS COURSE CONTRIBUTED TO:

(8)

(9)
(10)
(11)
(12)
(13)
(14)

Improving my problem solving abilities

An understanding of concepts and principles in the field

My ability to communicate clearly on the subject

My ability to solve real problems in the field

Increasing my interest in the subject matter

Preparing me for the material covered on the tests

Developing a more favorable attitude toward the subject matter

FOR.THIS COURSE:

(15)
(16)
(17)
(18)
(19)
(20)
(21)

FOR THE
(22)
(23)
(24)
(25)
(26)
(27)

The atmosphere was conducive to learning

The required text

Readings on reserve

Beneficialness of written homework assignments

Beneficialness of supplementary texts

Appropriateness of testing format

Beneficialness of homework answers and calculations on reserve

RECITATION SECTION:

Clarification of course material

Appropriateness of 202 of grade allotted to recitation
Usefulness of quizzes for exam preparation

Ability of recitation instructor to answer questions
Maintenance of an informal classroom

Adequacy in covering written homework assignments

57

TABLE 3.7

FINAL CLASS SPECIFIC PRETEST ITEMS
MTA 341

THE INSTRUCTOR'S:

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(ll)
(12)
(13)

Use of humor

Use of blackboard

Use of overhead

Use of handouts

Relation of course material to everyday experiences
Maintenance of an informal classroom
Encouragement of students to express opinions
Receptiveness of new ideas and others' viewpoints
General stimulation of class discussion

Use of real life examples

Emphasis of important points

Willingness to spend extra time with you
Availability during office hours

THE COURSE'S CONTRIBUTION TO YOUR:

(14)
(15)
(16)
(17)

Obtainment of general knowledge in the field
Understanding of concepts and principles in the field
Ability to communicate clearly on the subject

Future goals

FOR THIS COURSE:

(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)

Appropriateness of two texts

Appropriateness of material on reserve
Integration of audiovisual and course material
Classroom atmosphere

Appropriateness of exam format

Appropriateness of material covered on the exam
Organization of lecture material

Beneficialness of homework assignments

58

Question 2: Would you want to qualify your response to this item?
(1) Definitely
(2) Probably
(3) Uncertain
(4) Probably not
(5) Definitely not

Question 3 Do you believe you have enough information to evaluate

those aspects of the course referred to by this item?

(1) Definitely

(2) Probably

(3) Uncertain

(4) Probably not

(5) Definitely not

Question 4 Evaluate this course using this item.

(1) Superior: exceptionally good instructor or course.

(2) Above Average: better than the typical instructor
or course.

(3) Average: typical instructor or course.

(4) Below Average: worse than the typical instructor
or course.

(5) Inferior: exceptionally poor instructor or course.

 

 

Question 1 was asked to obtain information on the perceived importance
of each item. Questions 2 and 3 were used to determine if it was necessary
to reword the item before it could be used on a final evaluation instrument.
Question 4 was used to ascertain the within class variability of each item.
This variability would be used in item selection only if question 1 retained

too many items for the final instrument.

STEP III - SURVEY OF ITEMS' IMPORTANCE

 

To survey each item’s importance, the data was coded and transferred
onto IBM cards for further computer analysis. Analyses were made on all of
the data combined for the general items. The sample was then decomposed
into five separate groups for analysis of the class specific items. A

general frequencies program was run utilizing SPSS (Nie, Hull, Jenkins,

59

Steinbrenner, Bent; 1975). Frequencies, means and standard deviations were

calculated for each item.

General Instrument

 

Means and standard deviations were calculated for both questions on
item inclusion and course evaluation for each of the sixteen items in the
general form. Refer to Table 3.8 for this information. In order to get
a better picture of the pattern of responses, frequencies were also con-
sidered for question 1, item inclusion (refer to Table 3.9). When refer-
ring to the means for question 1 (whether the item.should be included), a
low value represents a high desirability while a high value represents a
low desirability. It was decided to include four to eight of these items

in the final instrument.

Class Specific Instrument

 

A frequency distribution was tabulated for the responses to question
1 (item inclusion) for each class specific form. These distributions are
presented in Table 3.10, 3.11, 3.12, 3.13, 3.14 and 3.15. The responses
to question 2, 3 and 4 would only be used in an auxillary manner to help

improve item wording and make finer discriminations, if necessary.

STEP IV - CHOOSING THE ITEMS FOR THE FINAL INSTRUMENT

 

General Instrument

 

It can be seen by referring to Table 3.1 that items 1 through 4, 5
through 8, 9 through 12 and 13 through 16 on the final pretest represent

four separate factors. A11 sixteen of these general items appear desirable,

60

TABLE 3.8

GENERAL INSTRUMENT
Pre-test Statistics
n-230

Question 2
Course Evaluation

Question 1
Inclusion of Item

Standard Standard

Items Mean Deviation Mean Deviation
1 1.626 .723 1.987 .733
2 1.483 .698 1.996 .723
3 1.613 .843 2.026 1.013
4 1.454 .716 2.071 .836
5 1.913 .998 2.415 1.042
6 2.113 1.009 2.351 .980
7 1.991 1.000 2.553 1.015
8 1.657 .920 2.371 1.038
9 1.774 1.041 2.991 .776
10 1.792 1.035 2.950 .858
11 1.939 1.188 2.911 .842
12 2.178 1.207 2.937 .908
13 1.557 .744 2.247 .946
14 1.407 .648 2.259 .909
15 1.750 1.008 2.344 1.063
16 1.763 .913 2.213 .891

Item

\OQNO‘UIJ-‘wNH

61

TABLE 3.9

GENERAL INSTRUMENT

Frequency Distributions for Question 1
Item Inclusion

112
140
129
149

95

67

84
131
125
118
117

83
132
149
122
107

n-230

Response Option

 

99
75
74
61
84
104
90
65
59
64
54
76
73
66
66
86

3

12

9
15
15
33
29
32
17
23
21
23
32
20

8
19
20

Ln

|'-"
WkHo#mk-§bea‘mmoo

Item

\DQNQUI-FwNH

62

TABLE 3.1

0

CLASS SPECIFIC INSTRUMENT

MTA 311
SECTION 1

Frequency Distribution for Question 1
Item Inclusion

l-l
\OHU'i-l-‘U'IwJ-‘O‘UI

n-24

N

1...:
woouuoooc‘oooooo

11

Response Option

3

NNU‘IMUINN$MbNNbNMMHUOONwaNGH

y...

NHNLANI—‘l—‘OWOONNUONHHHO‘NCOOMbO #

UI

COOOI—‘OMOI—‘OOOOOOOOOOOOOONNNN

Item

\OGNO‘UI#UJNH

63

TABLE 3.11

CLASS SPECIFIC INSTRUMENT

MTA 311
SECTION 2

Frequency Distribution for Question 1

34
34
25
23
23
17
14
27
27
17
25
32
31
29
34
27
37
18
24
27
25
26
3O
3O
29
28
28
32
27

Item Inclusi
n-54

on

Response Option

 

10
14
18
15
13
15

14
15
16
17
12
16
19

14
11
13
13
16
16
11
16
10
12
15
15
14
13

F‘k‘
u:

H
mbbuumamwmooHNooowwmmHuuwomuo~wu

H

H

k

NHNOI—‘bOU‘NNI—‘Ml—‘OHHNNCO‘Q’NC-FO‘Ln-L‘Nw

CDCDPIF‘FINDP‘CDP'P‘C>8~C>C>C>C>CDF‘C>h>h3h3\la‘h3h’c>c’c> U'

64

TABLE 3.1

2

CLASS SPECIFIC INSTRUMENT

MTA 313

Frequency Distribution for Question 1

17
19
16

14

15
16
17

16
12
17

18

20
17
17
14
17
13
16
17

Item Inclusi
n929

N

F‘h‘

H
\l‘OUINmml-‘O\UIO\U)\I\DOO\®UI\OO\\OU‘O\IQ

on

Response Option

3

H

H
MWHk@wHw@bwawbwa©NbNNP-'

#

NHOHHI—‘OOOHNHMONONNL‘NU‘HHH

Ul

OOOOOOOOUOHOOOOOOOHNHOOO

Item

h‘h‘
P|C>¢>a>\IO\UI&-u:hah‘

|'-'
N

h‘h‘h‘h'
a~UIa~ua

NNNNNI—‘l—‘H
#UNHOOQN

NNN
NOV!

65

TABLE 3.1

3

CLASS SPECIFIC INSTRUMENT

MTA 317
SECTION 3

Frequency Distribution for Question 1

13
26
28
24

16
18
19
15

26
10
21
19
12
20
15
24
27
28
23
25
31
17
19

Item Inclusi
n=42

on

Response Option

 

13
16
11

11
16
12
20
20
15
19
20

15
11
15
15
16
16
13

10
12
11

13
12

3

1

F‘F‘

l-‘
O‘C‘HNU)NMNm-ﬁNOGbOQGmwhHNbbk-L‘H

b

wwi-‘UUHNUkHOwaHbHOHNwa-bl-‘O‘m

Ul

ONOOCOCONHNOOOOHOOOO#Nooow-§

Item

 

Ddbd
Fic>¢>a>\IO\UI&-h:h>hi

F'h‘hl
£~u>ha

P‘F‘P‘
\IO\U!

NNNNNNNHH
O‘U‘#WNI—‘O\Om

N
\l

66

TABLE 3.14

CLASS SPECIFIC INSTRUMENT

MTA 317
SECTION 5

Frequency Distribution for Question 1

4
13

17
13

11
17

15

11
11

12
17
17
17
17
22

13

Item Inclusi
n-33

on

Response Option

 

9
10
17
14
14
11

14
11
15

12
14

18
19
15
16
14
13
10
13

11
10
13

3

F‘h‘

H
NVONUIO-‘NI—‘MNNNwamNmmmP-‘HUNCACDU

b

H

H

UlWOnﬁbonNO‘WbH-l—‘mHNHl-‘NNNCDHOOGM

UI

OHCHOOHOOOHOOOOOOOCOOOOOOON

Item

F‘F‘
F‘C>¢>a>\lO\UI&~h)hDP‘

Flhih‘
a-oan:

sap:
o~u:

F‘P‘P‘
xooou

NNNNNN
U'IkWNI-‘O

67

TABLE 3.15

CLASS SPECIFIC INSTRUMENT

MTA 341

Frequency Distribution for Question 1
ItemAInclusion

10

10
15
23

24
22
19
26
28
21
25
19
19
22
11
20
21
15
11
20
23
22
15

n-44

14
17
20
16
18
15
16
18
14
10
19
17
18
20
15
12
11
11
15
14
16
16
19
19

Response Option

 

3

1

#wwbmmﬂml‘thN-§O\NwLﬂ-I>O‘HwkﬂO\I-'

b

h‘h‘
(».>

bOl—‘NmJ-‘kNmUIF-‘NOl-‘I-‘wa-‘NHJ-‘UQ

UI

l-‘OI-I'NWHI—‘HNOOOOOOONHONHN-ﬂ-‘UH

68

with the mean desirability coefficient ranging from 1.407 to 2.178. The
mean desirability coefficient can simply be defined as the average response
to Question 1 for each individual item. Because of this high rate of
desirability, it was decided to include two items from each factor. It
was relatively easy to choose items 2, 4, 8, 9, 10, 13 and 14 because of
the low means of these items in comparison to the other items in their
factor. However, it was difficult to choose the second item from factor
II. The means and standard deviations were very close for items 5, 6 and
7 in this particular factor.

The distribution of responses was considered at this point. Item 5
was chosen because of the higher percentage of respondents choosing option

1 on question 1 (refer to Table 3.9).
Class Specific Instrument

It was decided to use a general consensus approach in choosing
specific class items. Items were included in this section of the instrue
ment if 752 of the sample chose option 1 or option 2 in response to
Question 1. This can be interpreted as 752 of the sample would definitely
or probably include this item on a student evaluation instrument for that

particular class.
Class Specific Instrument for MTA 311, Section 1

The percentages were collapsed for options 1 and 2 for each item from
Table 3.10. Below is a listing of each item.nunber with the percent asso-

ciated with the collapse of option 1 and 2:

69

Item Percent
1 46
2 50
3 61
4 38
5 92
6 87
7 84
8 33
9 71

10 75
11 92
12 79
13 79
14 80
15 54
16 62
17 92
18 83
19 61
20 61
21 74
22 87
23 66
24 75
25 75
26 88
27 83

Items numbered 5-7, 10-14, 17, 18, 21, 22, 24-27 were included in the final
class specific instrument. Even though item 21 did not meet the percentage
requirement, it was felt it should be included. No other items had a per-
centage high enough to be worthy of inclusion. The results of Question 2
did not give any reason to reword the items chosen from the pre-test. The
items were therefore worded in the final evaluation instrument basically
the same as the pre-test instrument. The final class specific instrument

for MTA 311, Section 1 can be viewed in Figure 3.1.
Class Specific Instrument for MTA 311, Section 2

Again, the percentages were collapsed for options 1 and 2 for each

item.from.Tab1e 3.11. Below is a listing:

70

FIGURE 3.1

FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT
MTA 311
SECTION 1

71

STUDENT INSTRUCTIONAL RATING FORM
MTA 311

SBC’I‘Iw l

1. Superior: exceptionally good instructor or c0urse.

2. Above Average: better than the typical instructor
or course.

For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course.

4. Below Average: worse than the typical instructor
or course.

lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course.

to the closest description of your instructor or your course. If you

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

items. KEY
l. The instructor seemed to be interested in teaching. 1. 1 2 3 4 5
2. The instructor seemed to be concerned with whether the students learned the material. ----------- 2. 1 2 3 4 5
3. Your interest in learning the course material. 3. l 2 3 4 5
4. Your competence in this area due to this course. 4. l 2 3 4 5
5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5
6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5
7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5
8. The course organization. 8. 1 2 3 4 5
9. The instructor's willingness to spend extra time with you. 9. l 2 3 4 5
10. The instructor's availability during office hours. 10. l 2 3 4 5
11. The instructor's relationship of caurse material to everyday life. 11. l 2 3 4 5
THE COURSE'S C(NTRIBUTIM TO YOUR:
12. Understanding of the day to day workings of a field representative. 12. l 2 3 4 5
l3. Obtainment of a general knowledge in the field. 13. l 2 3 4 5
14. Understanding of concepts and principles in the field. 14. l 2 3 4 5
15. Ability to communicate clearly on the subject. 15. l 2 3 4 5
l6. Ability to solve real problems in the field. 16. l 2 3 4 5
FOR THIS COURSE:
l7. Appropriateness of the text. 17. l 2 3 4 S
18. Appropriateness of instructional materials. 18. 1 2 3 4 5
l9. Beneficialness of homework assignments. 19. 1 2 3 4 5
20. Appropriateness of the exam format. 20. 1 2 3 A 5
21. Appropriateness of the amount of time given to group projects. 21. 1 2 3 4 5
22. Organization of lecture material. 22. 1 2 3 A 5
23. Appropriateness of material covered on the exams. 23. l 2 3 4 5
24. Applicability of group projects to real life situations. 24. l 2 3 4 5
25. Sharing of group work by group members. 25. 1 2 3 4 5
STUDENT BACKGROUND: Select the most appropriate alternative.
26. Has this course required in your degree program? 26. Yes No
27. What is your sex? 27. M F
28. What is your overall GPA? (1) 1.9 or less, (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3, (5) 3.4-4.0—--- 28. l 2 3 4 5
29. what is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior 29. l 2 3 4 5
30. In comparing this rating form with other rating forms you have responded to at MSU, do you find it:
(1) Superior. (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 30. l 2 3 4 5

 

 

72

 

Item Percentage
1 82
2 90
3 81
4 71
5 69
6 62
7 42
8 79
9 81

10 64
11 82
12 85
13 91
14 93
15 83
16 81
17 95
18 61
19 82
20 84
21 80
22 73
23 91
24 79
25 82
26 84
27 96
28 91
29 - 95

The 75 percent rule was strictly adhered to and items numbered 1-3,
8, 9, 11-17, 19-21, 23-29 were included in the final evaluation instru-
ment. The final evaluation instrument for MTA 311, Section 2 may be

referred to in Figure 3.2.

Class Specific Instrument for MTA 313

The percentages from Table 3.12, collapsed for options 1 and 2, for

each item follow:

73

FIGURE 3.2

FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT
MTA 311
SECTION 2

74

STUDENT INSTRUCTIONAL RATING FORM
MTA 311

SECTION 2 1. Superior: exceptionally good instructor or course.

2. Above Average: better than the typical instructor
or course.

For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course.

4. Below Average: worse than the typical instructor
or course.

lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course.

to the closest description of your instructor or your course. If you

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

items. KEY

1. The instructor seemed to be interested in teaching. I. 1 2 3 4 5

2. The instructor seemed to be concerned with whether the students learned the material. ---------- 2. 1 2 3 4 5

3. Your interest in learning the course material. 3. 1 2 3 4 5

4. YOur competence in this area due to this course. 4. l 2 3 4 5

5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5

6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5

7. The manner used by the instructor in relating course concepts. 7. 1 2 3 4 5

8. The course organization. 8. l 2 3 4 5

THE INSTRUCTOR'S:

9. Encouragement to students to express opinions. 9. l 2 3 4 5
10. Receptiveness to new ideas and other's viewpoints. 10. 1 2 3 4 5
ll. Stimulation of class discussion. 11. 1 2 3 4 5
12. Willingness to spend extra time with you. 12. l 2 3 4 5
13. Relationship of course material to everyday life. 13. 1 2 3 4 5
14. Integration of reading material. 14. 1 2 3 4 5
THE COURSE'S CNTRIBUTION TO YOUR:

15. Understanding of the day to day workings of a field representative. 15. l 2 3 4 5
l6. Obtainment of general knowledge in the field. 16. l 2 3 4 5
17. Understanding of concepts and principles in the field. 17. l 2 3 4 5
18. Ability to communicate clearly on the subject. 18. 1 2 3 4 5
l9. Ability to solve real problems in the field. 19. l 2 3 4 5
FOR THIS COURSE:
20. Appropriateness of the text. 20. l 2 3 4 5
21. Appropriateness of instructional materials. 21. l 2 3 4 5
22. Beneficialness of homework assignments. 22. 1 2 3 b 5
23. Appropriateness of exam format. 23. l 2 3 4 5
24. Organization of lecture material. 24. l 2 3 4 5
25. Appropriateness of conference leadership technique. 25. l 2 3 4 5
26. Interaction among group members in conference leadership group. 26. l 2 3 4 5
27. Beneficialness of group discussion. 27. l 2 3 4 5
28. Appropriateness of amount of time spent on conference leadership technique. 28. 1 2 3 4 S
29. Appropriateness of proportion of final grade accounted for by conference leadership technique.----- 29. l 2 3 4 S
30. Appropriateness of material covered on exams. 30. 1 2 3 4 5
STUDENT BACKGROUND: Select the most appropriate alternative.
31. Has this course required in your degree program? 31. Yes No
32. What is your sex? 32. M
33. What is your overall GPA? (l) 1.9 or less. (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3. (5) 3.4-4.0--- 33. l 2 3 4
34. Hhat is your class level? (1) Freshman. (2) Sophomore, (3) Junior, (4) Senior* 34. l 2
35. In comparing this rating form with other rating forms you have responded to at MSU, do you find it:

(1) Superior, (2) Above Average, (3) Average. (4) Below Average, (5) Inferior 35. l 2 3 4 5

 

75

 

Item. Percentage (n=44)
1 93
2 90
3 90
4 66
5 79
6 52
7 83
8 72
9 87

10 45
11 9O
12 72
13 83
14 43
15 83
16 34
17 90
18 97
19 87
20 76
21 83
22 62
23 86
24 83

Again, the 75 percent rule was adhered to and items numbered 1-3,
5, 7, 9, 11, 13, 15, 17-21, 23, 24 were included in the final evaluation

instrument for MTA 313 (refer to Figure 3.3).
Class Specific Instrument for MTA 317

The MTA 317 pre-test was given to two of the nine recitation sections.
The responses to the pre-test can be referred to in Tables 3.13 and 3.14.
The percentages from these two sections were collapsed separately for

options 1 and 2. The results follow:

76

FIGURE 3.3

FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT
MTA 313

77

STUDENT INSTRUCTIONAL RATING FORM
MTA 313

. Superior: exceptionally good instructor or course.

. Above Average: better than the typical instructor
or caurse.

For each item, respond by circling the number in the key that corresponds 3. Average: typical instructor or course.

. Below Average: worse than the typical instructor
or course.

lack the necessary information to respond to any items. please omit the 5. Inferior: exceptionally poor instructor or course.

NH

&~

to the closest description of yOur instructor or your course. If you

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

items. ‘EY
1. The instructor seemed to be interested in teaching. 1. l 2 3 4 5
2. The instructor seemed to be concerned with whether the students learned the material.-------------- 2. l 2 3 4 5
3. Your interest in learning the course material. 3. l 2 3 4 5
4. Your competence in this area due to this course. 4. l 2 3 4 5
5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5
6. The speed used by the instructor in presenting course material. 6. l 2 3 4 S
7. The manner used by the instructor in relating course concepts. 7. 1 2 3 4 5
8. The course organisation. 8. 1 2 3 4 5
9. The instructor's encouragement of students to express opinions. 9. I 2 3 4 5
10. The instructor's receptiveness to new ideas and others' viewpoints. 10. 1 2 3 4 5
11. The instructor's general stimulation of class discussion. 11. 1 2 3 4 5
12. The instructor's clarification of the relationship between course material and everyday life.---- 12. l 2 3 4 5
13. The instructor's help in improving my problem solving ability. 13. 1 2 3 4 5
14. The instructor's grading procedure. 14. 1 2 3 4 5
15. The appropriateness of the case text. 15. l 2 3 4 5
16. The appropriateness of cases. 16. l 2 3 4 5
17. The appropriateness of group discussion for further understanding of caurse concepts.-----—-—-- 17. 1 2 3 4 5

111E COURSE'S CWTRIBUI'ICN TO YOUR:

 

 

 

 

 

 

 

 

 

 

18. General knowledge in the field. 18. 1 2 3 4 5
19. Understanding of concepts and principles in the field. 19. 1 2 3 4 5
20. Ability to apply principles to new situations. 20. l 2 3 4 5
21. Ability to communicate clearly on this subject. 21. l 2 3 4 5
22. Ability to solve real problems in the field. 22. 1 2 3 4 5
23. Ability to organise ideas. 23. 1 2 3 4 5
24. Ability to apply particular case ideas to general situations. 24. l 2 3 4 5
STUDENT BACKGROUND: Select the most appropriate alternative.

25. Has this course required in your degree prugx-u? 25. Yes No
26. What is your sex? ' 26. n r

27. What is your overall GPA? (1) 1.9 or less, (2) 2.0—2.2, (3) 2.3—2.7. (4) 2.8-3.3. (5) 3.4-4.0---- 27. l 2 3 4 5
28. what is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior— 28. 1 2 3 4 5

29. In comparing this rating form with other rating forms you have responded to at HSU, do you find it:
(1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 29. l 2 3 4 5

 

 

78

 

 

Percentage (n-42) Percentage (n-33)
Item. Section 3 Section 5
1 44 39
2 69 51
3 88 91
4 88 94
5 83 84
6 57 42
7 39 30
8 86 69
9 91 66
10 81 73
11 83 76
12 67 54
13 83 88
14 60 39
15 76 76
16 81 9O
17 63 64
18 86 84
19 54 60
20 88 91
21 83 82
22 92 91
23 85 79
24 88 82
25 96 100
26 74 40
27 78 78

Items were initially included on the final evaluation instrument if
the 75 percent rule was met for both sections. This allowed the inclusion
of items numbered 3-5, 11, 13, 15, 16, 18, 20-25, 27. However, a further
perusal of the data indicated a high percentage of students in section 3
opted for inclusion of items 8-10. The percentage for each of these items
was averaged with the percentages from section 5. The average percent met
the criterion necessary for inclusion in the final instrument. Therefore,
all three items were also included in the final evaluation instrument for
MTA 317. Items 15 and 16 from the pretest instrument (Table 3.6) were re-
worded to correct for ambiguity. The final evaluation instrument for MTA

317 is presented in Figure 3.4.

79

FIGURE 3.4

FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT
MTA 317

80

STUDENT INSTRUCTIONAL RATING FORM

MTA 317

. Superior: exceptionally good instructor or course.

. Above Average: better than the typical instructor
or course.

For each item, respond by circling the number in the key that corresponds 3. Average: typical instructor or cOurse.

4. Below Average: worse than the typical instructor

or course.
lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course.

NH

to the closest description of your instructor or your course. If you

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

items. KEY
l. The instructor seemed to be interested in teaching. 1. l 2 3 4 5
2. The instructor seemed to be concerned with whether the students learned the material. ---------- 2. 1 2 3 4 5
3. Your interest in learning the course material. 3. 1 2 3 4 5
4. Your competence in this area due to this course. 4. 1 2 3 4 5
5. The instructor's attempted coverage of the course material. 5. 1 2 3 4 5
6. The speed used by the instructor in presenting course material. 6. 1 2 3 4 5
7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5
8. The course organization. 8. l 2 3 4 5
9. The instructor's willingness to spend extra time with you. 9. 1 2 3 4 5
10. The instructor's availability during office hours. 10. l 2 3 4 5
11. The instructor's clarification of the relationship between the course material and the real world.- 11. 1 2 3 4 5
TRIS COURSE CONTRIBUTED T0:
12. Improving my problem solving abilities. 12. 1 2 3 4 5
13. An understanding of concepts and principles in the field. 13. l 2 3 4 5
14. My ability to communicate clearly on the subject. 14. 1 2 3 4 5
15. My ability to solve real problems in the field. 15. l 2 3 4 5
l6. Preparing me for the material covered on the tests. 16. l 2 3 4 5
FOR TUIS COURSE:
l7. Conduciveness of classroom atmosphere to learning. 17. 1 2 3 4 5
18. Appropriateness of the required text. 18. l 2 3 4 5
l9. Beneficialness of written homework assignments. 19. l 2 3 4 5
20. Beneficialness of supplementary texts. 20. 1 2 3 4 5
21. Appropriateness of testing format. 21. l 2 3 4 5
22. Beneficialness of homework answers and calculations on reserve. 22. l 2 3 4 5
FOR THE RECITATION SECTION:
23. Clarification of course material. 23. l 2 3 4 5
24. Appropriateness of per cent of grade allotted to recitation. 24. 1 2 3 4 5
25. Usefulness of quizzes for exam preparation. 25. l 2 3 4 5
26. Ability of recitation instructor to answer questions. 26. l 2 3 4 5
27. Adequacy in covering written homework assignments. 27. l 2 3 4 5
28. Please circle the number corresponding to your recitation section. 28. l 62 7 3 84 95
STUDENT BACKGROUND: Select the most appropriate alternative.
29. Has this course required in your degree program? 29. Yes No
30. what is your sex? 30. M F
31. what is your overall GPA? (l) 1.9 or less, (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3, (5) 3.4-4.0--- 31. l 2 3 4 5
32. "hot is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior* 32. l 2 3 4 5
33. In comparing this rating form with other rating forms you have responded to at MSU, do you find it:

 

(1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferiorr 33. 1 2 3 4 5

 

81

Class §pecific Instrument for MTA 341

The percentages from.Table 3.15 collapsed for options 1 and 2 for each

item follow:

Item Percentage
1 41
2 47
3 51
4 77
5 87
6 58
7 86
8 85
9 82

10 89
11 84
12 89
13 96
14 88
15 89
16 84
17 52
18 74
19 73
20 68
21 57
22 82
23 88
24 93
25 79

Using the 75 percent criterion, items numbered 4, 5, 7-16, 22-25,
were included in the final evaluation instrument. Item 18 was also in-
cluded because the percentage associated with it was so close to the cut
off point. The final evaluation instrument for MIA 341 is presented in

Figure 3.5.

Comparison Instrument

The comparison instrument used in this study is the SIRS, Level II.

A photographic reproduction is presented in Figure 3.6.

82

FIGURE 3.5

FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT
MTA 341

83

STUDENT INSTRUCTICHAL RATING FORM
MTA 341

. Superior: exceptionally good instructor or course.
. Above Average: better than the typical instructor
or course.
For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course.
4. Below Average: worse than the typical instructor
or course.
lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course.

NH

to the closest description of your instructor or your course. If yOu

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

items. [BY

1. The instructor seemed to be interested in teaching. 1. l 2 3 4 5

2. The instructor seemed to be concerned with whether the students learned the material.--—---—-—----- 2. l 2 3 4 5

3. Your interest in learning the course material. 3. 1 2 3 4 5

4. Your competence in this area due to this course. 4. 1 2 3 4 5

5. The instructor's attempted coverage of the course material. 5. 1 2 3 4 5

6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5

7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5

8. The caurse organization. 8. 1 2 3 4 5

THE INSTRUCTOR'S:

9. Use of handouts. 9. l 2 3 4 5
l0. Relation of course material to everyday experiences. 10. 1 2 3 4 5
ll. Encouragement of students to express opinions. ll. 1 2 3 4 5
12. Receptiveness to new ideas and other's viewpoints. 12. 1 2 3 4 5
13. General stimulation of class discussion. 13. 1 2 3 4 5
14. Use of real life examples. l4. 1 2 3 4 5
15. Emphasis of important points. 15. 1 2 3 4 5
l6. Willingness to spend extra time with you. 16. l 2 3 4 5
17. Availability during office hours. 17. l 2 3 4 5
THE COURSE'S C(NTRIBUTICN TO YOUR:

18. Obtainment of general knowledge in the field. 18. l 2 3 4 5
19. Understanding of concepts and principles in the field. 19. l 2 3 4 5
20. Ability to communicate clearly on the subject. 20. l 2 3 4 5
FOR THIS COURSE:
21. Appropriateness of texts.. 21. l 2 3 4 5
22. Appropriateness of exam format. 22. 1 2 3 4 5
23. Appropriateness of material covered on the exam. 23. 1 2 3 4 5
24. Organization of lecture material. 24. 1 2 3 4 5
25. Beneficialness of homework assignments. 25. l 2 3 4 5
STUDENT BAGCROUND: Select the most appropriate alternative.
26. Was this course required in your degree program? 26. Yes No
27. What is your sex? 27. H P
28. What is your overall GPA? (l) 1.9 or less, (2) 2.0-2.2, (3) 2.3—2.7, (4) 2.8-3.3, (5) 3.4-4.0---- 28. l 2 3
29. What is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Seniorr 29. 1 2 3 4 S
30. In comparing this rating form with other rating forms you have responded to at NSF, do you find it:

(1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 30. l 2 3 4 5

 

84

FIGURE 3.6

STUDENT INSTRUCTIONAL RATING SYSTEM FORM

FORM B

85

MICHIGAN STATE UNIVERSITY
STUDENT INSTRUCTIONAL RATING SYSTEM FORM
Form B

One way in which a teacher can improve his or her class is through thoughﬂul student reactions
.- - .2-..“ . .. - . a .. . ‘.

w v" yw- raw-room Iur ans

 

 

 

I - SUPERIOR: "untimely good oouras or
' s uctor

M - ABOVE AVERAGE: hsnsr than th- W
course ' ructor

AV - AVERAGE: tvpicsl 0' courses or Instructors
IA ~ BELOW AVERAGE: not as good as tho m
course or intruder

l- INFERIOR: exceptionally poor course or
Instructor

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f" , of ‘ ' , to the course that you are rating. For example, if
you‘ ‘- ' ‘, ' L’ mitllesve blank! those items pertaining to
homework.“"“ ,. ‘ , " ' ' L KEY. KEV L i M AV BA I
I The Instructor's enthusiasm when presenting course I'm-rial l A} A2, BA I:
2. The instructor‘s Interest in teaching 2. AA AV ad I
3. T“ 'a uwufu....v'mu., , ‘ L ', ' 3. [A AV M E
4, 'L '. . -.....;.r..:...:... ‘ ' 4. A A)! SA |
5. Your interest in learning the course material 5. AV 3: l‘
6. Your general attentlvsnsss in rlnn 6. A? Ail B} I“
7. The course as an intellectual challenge 7. AA AV BA I"
a" ' ' .- 8. [A (V BA I‘
9. The instructor’s encouragement to students to express minim“ 9. AA A‘v sh f‘
IO. Theinstructor’s racepuvensss to new ideasand others' viewpoints 10. A3 AV 311 l‘
11. The student's opportunrty to ask questions 11. 11 AV EA I!
12. The instructor's stimulation of class discussion 12. A Mr BA I4
13. 't ' ‘ L ‘ ' ' , 13. § 11 AV 51 I‘
I4. 7“ I ‘ ‘ , L4 L L A uuwmpwutu umr |hl1lllIlIIiCl 14~ g A AK; 31 F
15. The contribution at homework assignments to your understanding of the course material relative to the———-. 15. S M AV 31 I
amount ol~ time requIrad
16. u ' ‘ L , ‘ _ 16. 5 AA av an I'
17. 'L ‘ '..b;:.........:... L 17. s (A av 31 I“
18. The course orgamlatron 18. S AA AV 674 I
19. The can of i a 19. g (A AV BA i
20. “ ‘ -. , u? " " ‘ ' “MM-"n 2" 5 AA iv HA I.
21. Your general engovment of the course 7‘ S KA AHV a; l‘
STUDENT BAQKGROUIQ Select the most approprlate alternatIvr. L L I.
22. Was thus course requIred In your degree program? _____________________ 22. via an
23. What Is your sex1 ________________________________ 23. H E
24. What Is your overall GPA’ (a) I9orless (b) 20-22 (C) 2.3-27 (d) 283.3 (9) 3.44.0 ___________ 24. a ‘ b‘ a E i: 3
25. What IS your class level" (a) Freshman(b)Sophomore“)Juntor(d)$enror(s)Graduale or other _______ 25. I. I; E n 5
Optionslltsms:"‘... ...._, zsracor nAsch
23 A s c o E 295 Ii 8 t} E
30 'A a c o E 31 E E g . g
32 A a c o E 33 I i E B E
, . . - I. u .. u a
34 'A Ii c b 's as 5 § § . g
as a 's c o E 37 5 E E ‘ E
as A a c o a as R i E B E
.. .1 u c u u
40 3 II E b E 41 ' B E B E
u . .. s u u u u
4.2 i i C l‘) E 43 R a E D E
u n u .. u u u u u ..
“5..CI'9'.. “55$?“
1‘ I r ' H n
«HHS "IEEEE
rI H n
‘85.”? “55?“
"I I" H Fl 7‘ n n n H I'
5068895 5168885

O 9640

 

86

@2921

Five instructional rating forms differing primarily with regard to
specific items useful for evaluating different instructional techniques
were developed. Each instrument was given to one-half of the particular
class that it was developed for. The other one-half of the class received
the usual SIRS Level II form. The forms were administered to randomly
equivalent halves in each of the classes.

Each instructor was given a packet containing the two forms arranged
alternately (assuming a random start) so that each form would automatically
be distributed to random halves of the class. Each student, therefore,
received one form. Directions were given to the instructors to administer
the forms just as they have administered the instructional rating form in
the past. The instructors were told to ask a student to return the forms
to the Marketing and Transportation Department. The answer sheets were
collected, coded, and the data punched onto cards. At this point, the
rating forms were turned over to the Dean's Office of the College of

Business.

Generalizability of Results

Since the instructors were volunteers, the generalizability of this
study to other instructors is limited. However, the main concern of this
study is not the generalizability to other instructors but the comparability
of two different types of student rating forms and the usefulness of the

procedure described in this study to develop class-specific forms. The

87

nonrandom selection of instructors does not affect the comparison of the two

types of rating forms because the different forms were administered to ran-

domly equivalent groups.

II.

III.

HYPOTHESES

 

For each class, the responses to the core items in the class
specific instrument come from.the same distribution as the
responses to corresponding items in the SIRS.

For each class, the responses to the core items in the class
specific instrument do not come from the same distribution

as the responses to corresponding items in the SIRS.

For each class, the item variance of the tailored items in the
class specific instrument is the same as the item variance of
the items in the SIRS.

For each class, the item.variance of the tailored items in the
class specific instrument is less than the item variance of the

items in the SIRS.

The between class variability of tailored items shared by two
or more of the class specific instruments is the same as the

between class variability of items on the SIRS.

The between class variability of tailored items shared by two
or more of the class specific instruments is greater than the

between class variability of items on the SIRS.

88

IV. R : The proportion of the students that are satisfied with the class
specific instrument is equal to .50.
H : The proportion of students that are satisfied with the class

specific instrument is greater than .50.

V. Ho: There are no differences in rater reliabilities between the class
specific instrument and SIRS.
H1: The rater reliabilities of the class specific instrument are not

the same as those obtained from the SIRS.
ANALYSIS

The hypotheses of no difference in responses to the core items in the
class specific instrument and the corresponding items in the SIRS can be
calculated by the Chi Square Statistic. A.number of Chi Square two sample
Tests of Independence can be performed to determine if the distribution of
responses to a core item on a class specific instrument is from the same
population distribution as the responses to the corresponding item.on the
SIRS. Since there are five specific class instruments with eight core
items, a total of forty Chi Square Statistics will be calculated. If the
hypothesis of no difference is accepted, it will be assumed that students
are reacting the same to both evaluation instruments. Because the first
hypothesis specifies the class as the unit of interest, the alpha level
‘will only be inflated at most by a multiple of eight. This multiple of
eight comes from.the fact that eight general questions exist on each class
specific form. Therefore, if a tabled value for alpha is .01, then the

upper limit on alpha would be .08.

89

The second hypotheses of no difference between the item variance of
the tailored items in the class specific instrument and the item variance
of the items in the SIRS will be tested by the Mhnn4Whitney Statistic.

This is a nonparametric test that may be used to test whether two indepen-
dent groups have been drawn from the same pOpulation (Siegel, 1957). A
separate HannéWhitney test statistic will be calculated for each class
comparing the item variability of the class specific items to the item
variability of the SIRS. For an example of the design format, refer to
Figure 3.7. If the Hypothesis of no difference is rejected, a perusal

of the two distributions would give information concerning where the
differences actually occurred.

To test the third hypothesis of no differences in between class vari-
ability it will be necessary to calculate a between class variability for
each of the twenty items on the SIRS, and a between class variability for
each class specific item common to more than one class specific instrument.
The Mann-Whitney U statistic will then be used to compare the two sets of
between class variances.

The fourth hypothesis concerning the satisfaction item will be examined
by a hypothesis test of one proportion. The normal approximation to the
binomial is the correct statistical test to use in this instance. The pro-
portion of students who viewed the class specific instrument as satisfactory
will be comprised of those students who responded to options 1, 2, and 3 on
the satisfaction item. The proportion of students who responded to options
4 and 5 on the satisfaction item will be considered dissatisfied with the

class specific instrument.

*The subscripts designate the

item number.

90

DESIGN FOR.MANN¥WHITNEY U TEST

 

 

FIGURE 3.7

MTA 313
Class Specific SIRS
Instrument Level II
n=16 n-20
c,2 02
9 1
o2 02
10 2
02 02
ll 3
02 02
12 u
o2 02
13 5
o2 02
1“ 6
02 02
15 7
02 02
16 8
02 02
17 9
02 02
18 10
o2 02
' 19 11
o2 02
20 12
02 02
21 13
02 02
22 I“
02 02
23 15
02 02
2“ 15
02
17
02
18
02
19
02
20

91

The last hypothesis of no difference in rater reliabilities will be
tested by use of the F statistic. Rater reliabilities will be computed
for each instrument within a class. In all of the instances, both an
estimate of the reliability of ratings of an individual rating and the
reliability of the average rating will be calculated. These computations
‘will use an analysis of variance technique (Ebel, 1937), to arrive at the

components necessary for calculation.

SUMMARY

Five instructional rating forms differing primarily with regard to
class specific items were developed and administered to random halves of
five classes in the marketing and Transportation Department. The other
random half of each class was administered the general SIRS, Level II.

The study was designed to test the effect of class specific items
and general items on item.variability. The items that have a small within
class variability and a larger between class variability will be considered
the better items.

The hypothesis of no difference concerning the responses to the SIRS
items and the corresponding class specific items will be tested by the
Chi Square Statistic.

The two hypotheses concerning item variabilities will be tested by
the MannéWhitney U Statistic.

The hypothesis concerning the student satisfaction item on the class
specific instrument will be tested by the use of the normal approximation

to the binomial.

92

The hypothesis of no difference in rater reliabilities will be tested

by the use of an F Statistic.

CHAPTER IV

RESULTS

INTRODUCTION

 

The present research was designed to build an instructional rating
scale composed of general items useful for evaluating all classes, and
specific items tailored for individual classes. The class specific
instrument was compared to a generally accepted rating scale. Because
the generally accepted instrument was well established in its creation,
items were drawn from it to be used in the general (core) part of the
class specific instrument. The first hypothesis of this study compares
the distribution of responses made by students to like items on each
instrument by class. It is hypothesized that students respond in the same
manner to an item regardless of what instructional rating form the items
appear on.

It was also hypothesized that the class specific instrument would
have less item variability on a particular item in a given class than
the general instrument. It would also be expected that items used on
more than one specific instrument would have a larger between class

variability than items on the general instrument.

93

94

Another hypothesis of the study was to compare indexes of rater
reliabilities for each instrument administered to each class. It would
be expected that the class specific form would have higher rater relia-
bilities than the general instrument.

The last hypothesis concerned a satisfaction item.incorporated into
the class specific instrument. An index of satisfaction is computed for
this satisfaction item.for each class. A.hypothesis test of one propor-
tion is calculated to determine if at least fifty percent of the students
are satisfied with the class specific instrument.

To conduct the study, five instructional rating forms differing pri-
marily with regard to class specific items were developed and administered
to random halves of five classes in the marketing and Transportation
Department at Michigan State University. The remaining half of each class
was administered the general SIRS, Level II.

The hypothesis of no difference in the pattern of responses to the
core items in the class specific instrument and the corresponding items on
the SIRS can be tested with the Chi Square Statistic.

The second hypothesis of no difference between the item.variance of
the tailored items on the class specific instrument and the item variance
of the items on the SIRS will be tested by the Mananhitney Statistic.

The third hypothesis concerning the between class variability of the
SIRS and class specific instrument will also be tested by the‘MannéWhitney
statistic.

The student satisfaction item on the class specific instrument will

be tested by use of a one proportion hypothesis test.

95

The hypothesis of no difference in rater reliabilities will be tested

by an F test.
RESULTS CONCERNING LIKE ITEMS ON DIFFERING INSTRUMENTS

The test of Hypothesis I was carried out by the calculation of forty
Chi Square Tests.1 Hypothesis 1 was stated,
Ho: For each class, the responses to the core items in the class
specific instrument come from the same distribution as the
responses to corresponding items in the SIRS.
H : For each class, the responses to the core items in the class
specific instrument do not come from.the same distribution
as the responses to corresponding items in the SIRS.
Chi Square is the appropriate statistic for determining whether two
(or more) distributions are essentially identical. For each class, a
comparison is made between the responses for each core item on the class
specific instrument with the corresponding item on the SIRS. Therefore,
a total of forty comparisons are made. Because the class is the focus of
the hypothesis, the a level will be inflated at most by a multiple of
eight. Therefore, if a tabled value of .01 is used, the most the actual
a can possibly be is .08.

Each of the forty Chi Square tables form a five by two matrix. A
sample matrix for MTA 311, Section 1 is presented in Figure 4.1. However,
due to the fact that a basic assumption for the Chi Square Statistic is

that 802 of the cells require an expected frequency of at least five

 

1It should be noted at this time that MTA 341, Section 2 was omitted
from the analysis due to data collection difficulties.

96

CHI SQUARE MATRIX FOR MTA 311 SECTION 1
SPECIFIC INSTRUMENT - ITEM 1

SIRS - ITEM 2

Response Option

 

 

1 2 3

Specific Instrument 8 18 13
Item 1
SIRS

Item 2 2 24 17

 

 

 

 

 

 

 

FIGURE 4.1

97

many matrixes had to be collapsed. When collapsing was necessary, the
apprOpriate contiguous cells were collapsed. Because the tabled Chi
Square Statistic is related to the number of cells, varying Chi Square
tabled values arose. At the a - .01 (.08 upper limit) level, the Chi
Square tabled values are listed in Table 4.1.

To reject the null hypothesis, a Chi Square calculated value must be
greater than the Chi Square tabled value. The calculated Chi Square values
are presented in.Tab1e 4.2. A perusal of Tables 4.1 and 4.2 shows that
only one of the forty calculated Chi Square values produced significance

at the a - .01 (.08 upper limit) level.

RESULTS CONCERNING ITEM VARIABILITY

The test of hypothesis II was carried out by the calculation of five
Mananhitney U Test Statistics. Hypothesis II was stated,
Ho: For each class, the item variance of the tailored items
in the class specific instrument is the same as the item
variance of the items in the SIRS.
1: For each class, the item variance of the tailored items
in the class Specific instrument is less than the item
variance of the items in the SIRS.
It was necessary to make 5 comparisons in this situation. A comparison
between the variances of the tailored items and the twenty general items on
the SIRS was made for each class. The Mann4Whitney U Test is the most

appropriate test in this instance because it is capable of testing whether

two independent samples were drawn from the same population. The test will

98

TABLE 4.1

CHI SQUARE TABLED VALUES

a-.01
(upper limit .08)

 

Itgg

Specific 311 311

Instrument SIRS Sec. 1 Sec. 313 317 341
l 2 6.63 13.28 6.63 11.34 9.21
2 4 6.63 13.28 6.63 13.28 11.34
3 5 11.34 13.28 6.63 13.28 13.28
4 8 11.34 13.28 6.63 13.28 6.63
S 13 11.34 13.28 6.63 11.34 9.21
6 14 13.28 13.28 6.63 11.34 6.63
7 17 13.28 13.28 6.63 13.28 6.63
8 18 11.34 13.28 6.63 13.28 6.63

99

TABLE 4.2

CHI SQUARE CALCULATED VALUES

 

jgggg

Specific 311 311

Instrument SIRS Sec. Sec. 313 317 341
1 2 .286 5.537 1.529 2.132 .600
2 4 .579 3.125 .176 11.036 7.132
3 5 1.639 5.770 .004 1.647 1.094
4 8 .469 1.713 2.184 7.747 11.302
5 13 10.572 .696 2.742 7.670 .792
6 14 2.055 6.217 2.509 .730 .110
7 17 9.298 3.492 2.341 3.369 .019
8 18 3.536 4.372 .494 3.980 .379

100

give information about whether the distribution of variances for the two
evaluation instruments are the same. Because there were twenty variances
in the SIRS form and a range of fourteen (MTA 317) to twenty-two (MTA 311,
Section 2) variances in the specific form the normal approximation to the
MannéWhitney U was utilized. Table 4.3 lists the decisions made at an

a - .05 level of significance for each of the five classes.

The results were not of a conclusive nature. In three of the classes
(MTA 311, Section 2, MTA 313, MTA 317) the null hypothesis was accepted.
This acceptance decision can be interpreted as the distribution of vari-
ances of the class specific and SIRS instruments being the same. Two
instances occurred in.which the null hypothesis was rejected; in.MTA 311,
Section 1 it could be inferred that the class specific distribution of
variances was larger than the SIRS distribution of variances. The second
rejection decision involved MTA 341, the inferences are reversed with the
class specific distribution of variances being smaller than the SIRS dis-
tribution of variances.

Because of the inconsistent results, it was interesting to delve
further into the data and calculate the average variance for each instru-
ment in each class. The average variances were computed for the first
twenty items on the SIRS and the tailored items on the class specific
instruments. These variances are listed in Table 4.4. It is interesting
to note that in only one case was the class specific average variance
greater than the SIRS average variance. In the four other classes, the
average variance of the class specific form was equal to or smaller than

the average variance of the general instrument.

101

TABLE 4.3

COMPARISON OF VARIANCE DISTRIBUTIONS
SPECIFIC VS GENERAL INSTRUMENT

MANN‘WHITNEY U

 

a - .05

DIRECTION OF

DIFFERENCE IF
CLASS DECISION DECISION - REJECT
MIA 311, Section 1 Reject Ho: Specific > General
MIA 311, Section 2 Accept HO:
MTA 313 Accept Ho:
MTA 317 Accept Ho:

MTA 341 Reject HO: General > Specific

102

TABLE 4.4

AVERAGE VARIANCES FOR

CLASS SPECIFIC AND GENERAL INSTRUMENTS

 

 

 

AVERAGE VARIANCE NUMBER.OF ITEMS
CLASS Class Specific General Class Specific General
MTA 311, Sec. 1 .78 .49 17 20
MIA 311, Sec. 2 .67 .67 22 20
MTA 313 .49 .55 16 20
MTA 317 .71 .73 14 20
MTA 341 .64 .69 17 20

 

 

103

The test of hypothesis III was carried out by the calculation of a
Mananhitney U Test Statistic. Hypothesis III was stated,
no: The between class variability of tailored items shared by
two or more of the class specific instruments is the same

as the between class variability of items on the SIRS.

The between class variability of tailored items shared by

H
.0

two or more of the class specific instruments is greater
than the between class variability of items on the SIRS.
In order to make the necessary calculations, an index for between
class variability was obtained from the following formula:
To calculate an index for between class variability, the

following formula was utilized:

where: x.1 - mean on item 1 for a particular class
i1 - mean of all the means on item 1

n - f of classes using item 1

This index was easy to tabulate for the general SIRS instrument. The
SIRS instrument remains the same for all classes. Therefore, a between
class variability index can be computed for all twenty general items in
the SIRS. These indexes can be viewed in Table 4.5.

To tabulate a between class variability index for the class specific
instruments, it was necessary to isolate specific items used on more than
one form. Two items are common to three class specific forms, three items

are common to six class specific forms, four items are common to four class

104

TABLE 4.5

GENERAL SIRS FORM

INDEX.OF BETWEEN CLASS VARIABILITY

 

1532! Between Class variability
l .101
2 .031
3 .297
4 .014
5 .079
6 .122
7 .243
8 .061
9 .096

10 .116
11 .096
12 .207
13 .042
14 .047
15 .082
16 .024
' 17 .118
18 .087
19 .107
20 .028

 

Average Variance - .099

105

specific forms, and five items are common to four class specific forms.
Twenty-two items are unique to only one class specific form. Table 4.6
lists a short description of the item and gives the associated index of
between class variability. The average of the variances for the specific
instrument is .110, slightly larger than the general SIRS' average vari-
ance of .099.

The'Mann-Whitney U statistic was then calculated to test for any dif-
ferences between the two distributions of variances displayed in Tables
4.5 and 4.6. The results were to accept the null hypothesis at the

a - .05 level of significance.
RESULTS CONCERNING STUDENT SATISFACTION

The test of Hypothesis IV was carried out by the calculation of five
separate normal approximations to the binomial. Hypothesis IV was stated,
Ho: The proportion of students that are satisfied with the
class specific instrument is equal to .50.
H1: The proportion of students that are satisfied with the
class specific instrument is greater than .50.
The distribution of responses to the satisfaction.item are presented in

Table 4.7. It is possible to use the normal approximation to the binomial

in all classes because both,

n V 3.5
n (1-1) 3.5

where: n 8 # of students in each class responding
to the particular instrument

1 - proportion specified in the hypothesis
(1 - .5)

Item Description

106

TABLE 4.6

CLASS SPECIFIC INSTRUMENT

INDEX.OF BETWEEN CLASS VARIABILITY

# of instruments
item included on

THE INSTRUCTOR:

0‘ U|&U)NI-‘
O O

Spends extra time with you

Is available during office hours

Relates course material to everyday life
Encourages students to express opinions
Is receptive to new ideas and other's
viewpoints

Generally stimulates class discussion

THE COURSE'S CONTRIBUTION TO YOUR:

7.

8.
9.

10.
11.
12.

Understanding of day to day working of a
field representative

Obtainment of general knowledge in the field
Understanding of concepts and principles in
the field

Improving problem solving abilities

Ability to communicate clearly on the subject
Ability to solve real problems in the field

FOR.THIS COURSE:

l3.
14.
15.
16.
17.
18.

Appropriateness of text

Appropriateness of instructional material
Appropriateness of homework
Appropriateness of exam format
Organization of lecture material
Appropriateness of exam content

WU UU‘UL‘

#N

bU‘NUI

uwbewm

Between Class
variability_

.148
.145
.158
.117

.196
.226

.186
.047

.057
.090
.237
.068

.072
.008
.065
.071
.059
.022

 

Average variance - .110

107

TABLE 4.7

FREQUENCY OF RESPONSES TO THE SATISFACTION ITEM
IN THE CLASS SPECIFIC INSTRUMENT

Item: In comparing this rating form with other rating
forms you have responded to at MSU, do you find
it:

(1) Superior
(2) Above Average
(3) Average
(4) Below Average
(5) Inferior

Response Option

 

Class (1) (2) (3) (4) (5)
MM 311, Sec. 1 2 15 19 1 1
MTA 311, Sec. 2 2 14 4 o o
m 313 o 5 7 o 0
MIA 317 . 23 46 16 2 1
MIA 341 1 6 11 1 o

 

 

108

The tests were calculated at the a - .05 level of significance. The tests
were first computed on each class collapsing Option one and two. In this
test, the question becomes one of whether at least fifty percent of the
students responded to the class specific instrument as being superior or
above average. Table 4.8 shows that only two out of the three classes re-
jected the null hypothesis under these strict conditions. However, when
options one, two and three were collapsed the null hypothesis was rejected
in all five cases (Table 4.8). The last decision can be interpreted as at
least fifty percent of the students in each class felt the class specific

instrument was at least average.
RESULTS CONCERNING RATER RELIABILITIES

The test of hypothesis V was carried out by the use of an F test.
Hypothesis V was stated,

Ho: There are no differences in rater reliabilities between

the class specific instrument and SIRS.

H1: The rater reliabilities of the class specific instrument

are not the same as those obtained by the SIRS.

In order to calculate the F statistic, it was first necessary to get
estimates of rater reliabilities. The coefficient used to calculate the
rater reliabilities was the intraclass rater reliability coefficient.
written in analysis of variance terms, it was possible to use an SPSS
(Nie, et. al., 1975) routine to find the necessary components to generate

the reliability estimates by hand. The necessary mean squares were ob-

tained from the SPSS routine, "Reliability". A one way analysis of

109

TABLE 4 .8

STATISTICAL DECISION CONCERNING THE E :
IN THE SATISFACTION QUESTION

Responses to
Option 1 + Option 2

Responses to Option 1 4-
Option 2 + Option 3

1 TAILED TEST

 

 

 

a - .05
Ho: ‘I - .50
H1: 1 > .50
CLASS
MTA 311 MTA 311
Sec. 1 Sec. 2 MIA 313 MTA 317‘ MTA 341
Accept Reject Accept Reject Accept
Reject Reject Reject Reject Rej ect

 

 

 

 

 

 

110

variance table (Table 4.9) was tabulated for each instrument for each
class using only complete sets of ratings.

The reliability of average ratings was calculated by hand,

_ MBitems - Mserror
nn MSitems
where: MS - mean square
These reliabilities are presented in Table 4.10.
An estimate of the precision of these reliability estimates according
to a method suggested by Jackson and Ferguson (1941) was also used to

build confidence intervals around the reliability of an individual rating.

The formula for an individual rating follows:

items Mserror

items - (k-1)MS

us
1'11 ' as
error

where: MS - mean square
' k - # of students
The reliability of an individual for each situation is presented in Table
4.11. A confidence interval is built around the reliability of one average

rater by use of the following formula:

 

 

 

 

(stFe)-1 (F8~F)-1
. r
(F; 7 Fe) 1 +-k 11 (F; F) - 1 +-k
MS

where: F8 - £3
error

d.f. for items

F - tabled F - d.f. for error

d.f. for error

Fe - tabled F - d.f. for items

k - # of students

111

TABLE 4.9

ANALYSIS OF RATINGS - COMPLETE SETS

STUDENTS
l 2 3 . . . . n (f in a
1 particular
class)
2
3
TTEMS**
*m

Mean Square: For items
For students
For error
For total

*m - 20 for the SIRS varies for class specific instrument

** - only included tailored items on class specific instrument

112

TABLE 4.10

INTRACLASS RELIABILITY COEFFICIENT

AVERAGE RATINGS

 

 

INSTRUMENT
CLASS Class Specific General SIRS
MTA 311, Section 1 .308 .889
MTA 311, Section 2 .669 .749
MTA 313 .684 .847
MTA 317 .912 .900
MIA 341 .750 .751

 

 

113

TABLE 4.11

INTRACLASS RELIABILITY COEFFICIENT

INDIVIDUAL RATER

 

 

INSTRUMENT
CLASS Class Specific General SIRS
m 311, Section 1 ‘ I .013 .182
MTA 311, Section 2 .106 .17
MIA 313 .153 .284
MTA 317 .111 .095
MTA 341 .150 .159

 

 

114

The corresponding confidence intervals calculated at the 952 level are pre-
sented in Table 4.12.

One well accepted method of testing the hypothesis concerning rater
reliabilities is by a comparison of the confidence intervals around the
estimates presented in Table 4.12. In the instances where the confidence
intervals overlap, the null hypothesis is accepted and there are no dif-
ferences in the reliability estimates. However, it was brought to the
attention of the author that a more powerful technique was available due
to the fact that an equal number of students in each class filled out the
SIRS and class specific form.1 The more powerful technique makes use of
the F Statistic. A calculated F value is compared to tabled F values, if
the calculated value falls between the two tabled F values the null hypo-

thesis is accepted. Since the F statistic is a ratio, 2 values are

necessary. Each value comes from F8 calculated on page 110. Therefore,

F8 (for the SIRS)
1

F _____
9319ulat°d F3 (for the class specific instrument)

2
The degrees of freedom for the tabled F values are nl-l and nZ-l where 111
and 112 are the number of items in the SIRS and class specific instrument
respectively. The upper tabled F value is read directly from the table
with nl-l and nz-l degrees of freedom. The lower tabled F value is the

reciprocal of the tabled value with nz-l and nl-l degrees of freedom.

 

1Thanks must go to Dr. Dennis Gilliland of the Probability and

Statistics Department at Michigan State University for the time given me
with regards to this technique.

115

TABLE 4.12

CONFIDENCE INTERVALS AROUND
RELIABILITY ESTIMATES OF

INDIVIDUAL RATER
(95% Confidence)

 

 

INSTRUMENT
CLASS Class SpeCific General SIRS
MTA 311, Section 1 .04 to 0.0 .28 to .09
MTA 311, Section 2 .18 to .04 .05 to .22
MTA 313 .26 to .06 .41 to .14
MIA 317 .17 to .06 .16 to .05
MIA 341 .24 to .07 .27 to .06

 

 

116

Table 4.13 presents the calculated and upper and lower tabled F values.
Referring to Table 4.13, hypothesis V is accepted in four instances
and rejected in one instance. Acceptance of the null hypothesis refers
to there being no difference in the consistency that students respond to
the items in MTA 311 Section 2, MTA 313, MTA 317, MTA '341. In the case
of MTA 311 Section 1, the calculated P value is larger than the upper
tabled F value, informing the reader that the SIRS had a larger rater

reliability than the class specific instrument for MTA 311 Section 1.

OTHER INTERESTING RESULTS

In a perusal of the class means for each set of items another point
of interest surfaced. The grand mean of all the item means for each class
is presented in Table 4.14, using a letter representation for each class.
EaCh grand mean for the SIRS instrument consisted of twenty items. How-
ever, the number of item.means used in the class specific instrument
varied, and only the tailored items were used to calculate the grand mean.
If the instructors were to be rank ordered according to the grand mean of
each evaluation instrument, the rank orders would not remain constant.

The rank orders of the grand mean for the SIRS instrument are:

Rank Class

u:a~u:nah-
>~c1catnlw

The rank orders for the class specific form are:

117

 

TABLE 4.13
F TEST
0 - .05
2 Tailed
CLASS Fcalculated d'f° Ftabled (lower) Ftabled (upper)
MTA 311, Sec. 1 6.215 19, 24 .408 2.33*
MTA 311, Sec. 2 1.319 19, 29 .418 2.21
MA 313 2.065 19, 23 .408 2.39
MTA 317 0.883 19, 21 .398 2.42
MTA 341 1.004 19, 24 .408 2.33

*Significant at o - .05

118

TABLE 4.14

TABLE OF GRAND MEANS

 

 

 

CLASS
A B C D E
General
SIRS 2.79 2.43 2.57 2.76 2.45
Class
Specific 2-71 2~43 2.34 2.60 2.59

 

 

 

 

 

 

 

119

Rank Class
1 C
2 B
3 E
4 D
5 A

SUMMARY OF RESULTS OF STUDY

 

The hypothesis of no difference between the core items in the class
specific instrument and the corresponding items in the SIRS was accepted
as expected. This supported the proposal that students respond in the
same manner to items regardless of the instrument the items are embedded
in.

The hypothesis of no difference in the item variance of tailored items
as compared to general items was neither totally accepted or rejected.
There were three cases of acceptance and two cases of rejection. The two
cases of rejection gave conflicting results. However, a perusal of a
table of average variances (Table 4.4) for the differing instruments did
support the research hypothesis to a certain extent. Four of the five
average variances for the class specific instrument were the same or
smaller than the average variance of the SIRS. It had been predicted
that the better instrument would have a smaller item variance for a
particular class.

The hypothesis of no difference concerning the between class varia-
bility of tailored items versus general items was tested by a.ManneWhitney
U Statistic. The average between class variability of the common tailored

items - .110 and the average between class variability of the general SIRS

120

items - .099. These results were in agreement with the contention that a
more discriminating evaluation instrument would have a larger between
class variability.

The hypothesis concerning the proportion of students that were satis-
fied was rejected in all classes if the criteria for satisfaction included
the average response option (#3). It appeared that students felt the
class specific instrument was at least as good as any other evaluation
instrument they had responded to at Michigan State University.

The hypothesis of no differences in rater reliabilities between the
SIRS and specific form was only rejected in one out of five instances.

The acceptance of this hypothesis in four out of five instances indicates
that the differences in the reliability coefficients were not large enough
to rule out the possibility of their being due to chance. Although the
class specific reliability coefficients were generally as good as the

SIRS reliability coefficients, no trends were found in this study to
support the hope that class specific instruments yielded larger relia-
bility coefficients than the general SIRS.

In Chapter V, the results of the study are discussed in the light of

possible explanations. Suggestions are made for future research.

CHAPTER V

SUMMARY AND CONCLUSIONS

SUMMARY

With the advent of teacher accountability, student ratings of profes-
sors have become a greater concern in recent years. It has become neces-
sary for administrators to have normative data for making unbiased decisions
regarding the teaching staff. However, student evaluation instruments are
often developed and piloted on a very specific population as a sample of
convenience. The instrument is then often used on a university wide basis.
Because of this, the instrument must remain very general in nature.

The purpose of this study is to build an instructional rating scale
that would contain items not only general in nature, but items specific
to the class of interest. These items would not only be useful in evalu—
ating the instructor, but also much more helpful for self diagnosis and
instructor improvement. The items on this scale would discriminate between
good and poor instruction, and have unambiguous questions on which raters
could be in agreement for each instructor. In terms of item variability,

the better of two evaluation instruments would have less variability on a

121

122

particular item in a given class. It would also be expected that variability
exists on a particular item between classes. This between class variability
could only be computed for items that appear on more than one specific
evaluation instrument. In terms of the above mentioned between class vari-
ability, the better of two evaluation instruments would have a larger
between class variability.

In order to make comparisons, it was necessary to have both a general
instrument and a class specific instrument. One of the forms was the
standard Level II, Form B student rating form given at Michigan State Uni-
versity. The SIRS is a general form developed on a general population.

The comparison instrument was developed for specific classes within a
specific department. The comparison instrument contained eight general
core items used on every instructor's student evaluation form, 10-20 items
specific to the individual class situations, and a satisfaction item. The
eight core items were selected from the SIRS instrument. The purpose of
the instrument satisfaction question was to obtain an index comparing
students' perceptions of the class specific instrument with other rating
scales the students had filled out at Michigan State University.

To conduct the study, five undergraduate classes were chosen from the
Marketing and Transportation Department at Michigan State University.

These classes were chosen because of their diverse nature. The courses
varied on such dimensions as class size, lecture versus discussion,'quali-
tative versus quantitative, and inclusion or exclusion of recitation
sections. This diversity was necessary in developing specific class items.

It was hypothesized that these five original instruments would have

less variability on a particular itemrwithin a class and have a larger

123

between class variability on a particular item than the SIRS.

In order to test the above hypotheses, it was necessary to have equi-
valent student groups responding to both the SIRS and the class specific
instrument in each class. This was accomplished by alternating the SIRS
instrument with the class specific instrument within classes. Thus, each
instrument was given to one-half of the particular class it was developed
for. The other one-half of the class received the usual SIRS form.
Assuming a random start, the forms were administered to randomly equivalent
halves in each of the classes. A.hypothesis was formed to test for equi-
valent groups. The hypothesis stated that the distribution of responses
to the core items in the class specific instrument is the same as the
distribution of responses of corresponding items in the SIRS.

It was also hypothesized that an index of rater reliability would be
larger for the class specific form than the rater reliabilities of the SIRS
instrument. This index is concerned with consistencies, i.e., to what
extent do students give the same information about an instructor. A
separate reliability estimate is obtained for each instructor in each
class. Therefore, consistency in this case refers to how consistently
the students in a particular class evaluate this instructor.

The last hypothesis concerns the satisfaction item incorporated into
the class specific instrument. It was proposed that the proportion of
students satisfied with the class specific instrument would be greater
than .50.

The following statistical techniques were used to test the above

hypotheses:

124

1. The hypothesis of no difference in response to the core items
in the class specific instrument and the corresponding items on
the SIRS was tested with the Chi Square Statistic.

2. The hypothesis of no difference between the item variance of
the tailored items on the class specific instrument and the item
variance of the items on the SIRS was tested by the MannéWhitney
U Statistic.

3. The hypothesis of no difference in between class variability
of the SIRS and class specific instruments was tested by the Mann?
Whitney U Statistic. Tables compare the between class variability
of items on the class specific form‘with items on the SIRS. An
average index was calculated for each form.

4. The hypothesis regarding student satisfaction was tested by
the normal approximation to the binomial.

5. The hypothesis of no difference in rater reliabilities was

tested by use of an F statistic.

CONCLUSIONS

 

1. The distribution of responses to the core items in the class

specific instrument was the same as the distribution of responses of cor-

responding items in the SIRS.

2. There were no concrete statistical conclusions concerning the

item.variance of the tailored items compared to the item variance of the

However, in four out of five classes, the average item variance

125

of the class specific instrument was equal to or less than the average item
variance of the SIRS. The smaller the variance on a particular item, the
larger the amount of agreement among students with regard to a particular
item.

3. The average between class variability for tailored items on.the
class specific instrument was larger than the average between class vari-
ability for the general items on the SIRS. This lends support to the idea
that students can better discriminate between instructors if items are
specific to a class.

4. At least fifty percent of the students felt the class specific
instrument was as good if not better than any other student rating form
they had come in contact with at Michigan State University.

5. On the whole, there did not appear to be any difference between
the rater reliabilities on the specific instrument compared to those on
the SIRS. There was only one class where differences did occur between
the rater reliabilities.

6. A result in addition to the results from.the list of hypotheses con-
cerns the average item mean on each instrument for a particular instructor.
If instructors were to be ranked by their average rating on a student
rating form, it is interesting to note that their ranks would alter with
the instrument being used. .Although the fourth and fifth ranks remained
constant among forms, the ranks of one, two and three were altered con-
siderably. The instructor who ranked first on the class specific instrur
ment ranked only third on the SIRS. The instructor who ranked third on the
class specific instrument ranked second on the SIRS, and the instructor that

ranked second on the class Specific instrument ranked first on the SIRS.

126

DISCUSSION

 

Because of the design of this study, statistical significance was diffi—
cult to determine. Therefore, the author was left with only a few plausible
statistical techniques. The nonparametric techniques are not as powerful as
their parametric counterparts, but the data would not allow any further
analyses. Although there was no startling statistical significance, the
trend of the data supported the major hypotheses of this study.

Firstly, the acceptance of hypothesis I informed the researcher that
any differences occurring in the pattern of responses to an item on the
SIRS and the item's counterpart on the class specific instrument is due to
chance alone. It was possible to proceed with the study with the satis-
faction of knowing that equivalent groups were responding to the SIRS and
the class specific instrument. The acceptance of the hypothesis also
implies that no differences in student response occurs if the general item
is embedded in a general or class specific instrument. The student is
therefore responding to an item independent of the type of rating form.

The above information made it possible to proceed with hypotheses II and
III.

The major purpose of this research dealt with the variability of the
items. It had been assumed in Chapter I that the better of two evaluation
instruments would have less variability on a particular item within a
given class and greater variability on a particular item.between classes.

Focusing on hypothesis two, which deals with the item.variability on

both the class specific instrument and general SIRS, it is interesting to

127

note the average variances. Table 4.4 displayed the average variance for
each form. .A perusal of this table shows that the average variance hardly
varies between the type of instrument in four of the five classes. Howb
ever, in one class, MTA 311 Section 1, a large difference in average
variance surfaces. This is the only class specific instrument that,
relative to the above mentioned assumption, did not appear to be as good
an instrument as the SIRS in terms of average variability. MTA 311 Section
1 will be referred to again at the end of this discussion section.

The third hypothesis concerning between class variability was not
found to be statistically significant, as was predicted. However, the
trend of the data were in the anticipated direction. The average between
class variability of the SIRS was .099, while that of the class specific
form was .110. It had been assumed that a good evaluation instrument
should be able to discriminate between good and poor instruction. This
assumption.msndates a larger between class variability. The larger average
between class variability of the class specific instrument lends evidence
to support the premise that the class specific instrument is a more sensi-
tive measure of instruction. Returning to Table 4.6, it is interesting to
note the items having the best discriminating power. Class specific items
that have the larger between class variances are often prefaced.with the
words "the instructor". It appears that students are better able to differ-
entiate between classes on items pertaining directly to the instructor.
Items prefaced with "The course's contribution to your" and "For this course"
have relatively small between class variability. These results have two

implications. One possible implication is that there is in fact very little

123

difference between classes with regard to teaching tools, for example, exam
format, textbodks, and audio-visual equipment. Or that there is no differences
between classes in the areas of what the student learns. However, another
possibility is that students are unable to judge the teaching tools used by
the professor or the amount of knowledge the student actually obtained.

The rejection of hypothesis IV gave results supporting the student's
satisfaction with the class specific instrument. Unfortunately, there was
no counterpart of the satisfaction item.on the SIRS form. It seems fair to
admit the possibility that the students responding to the SIRS were just as
satisfied as those responding to the class specific instrument. Secondly,
the classes were aware of the fact that research was being conducted with
regard to student rating forms and their classes. This awareness could
have favorably predisposed the students toward the new instrument.

The hypothesis concerning rater reliabilities did not support the pro-
posal that the class specific instrument would have higher rater reliabilities.
In all cases except one, the differences in rater reliabilities could be con?
tributed to chance alone. The only statistically significant difference
occurred in MTA 311 Section 1. For this particular class, the reliability

coefficients follow:

 

SIRS Class Specific
reliability of an individual
rating .18* .01
reliability of an average
rating (coefficient a) .89 .31

*These coefficients are calculated with equal n's

129

Obviously, there is much less consistency in the reliability of an average
rating of .31 as opposed to .89. A coefficient as high as .89 conveys the
information that a large proportion of the class is responding the same to
the items on the class specific and the SIRS forms. A positive 1.0 would
represent the fact that students were totally consistent.

At this time, it is again noted that MTA 311 Section 1 is also the
class in'which the class specific instrument was not as sensitive as the
SIRS with regard to average variability. This is understandable in terms
of rater reliability. A large variance within a class could be equated
with inconsistent answers, which means lower rater reliabilities.

The question still remains as to "why the rater reliabilities were
not higher than those of the SIRS?" In answer to this question, it should
first be kept in mind that in four of the five cases there was no statis-
tical difference, i.e., the class specific reliabilities were as good as
those of the SIRS. It is the opinion of the author that continued work on
the specific class instrument would raise the index of rater reliabilities.
The specific instrument was being compared to an instrument that has under-
gone a large amount of research and is presently in a highly perfected
state. The class specific form, although pretested, is still in a very
early stage of development compared to the SIRS.

The final result that proved of interest was the rankings of instructors
according to the mean item response. Unfortunately, a sample as small as
five cannot give statistically significant results, but a trend did appear.
The rankings do not remain the same for instructors across instruments.

Referring again tofrable 4.13, an instructor's average rating for each

130

instrument can again be observed. Instructor A and B obtained the same
average rating regardless of the instrument utilized. Instructors D and

E give not only different results but the direction changes. That is to
say, instructor D received higher ratings on the SIRS than the class
specific form, instructor E received lower ratings on the SIRS than the
class specific form. The instructor having the largest differences in
average ratings between the two forms was instructor C. Instructor C
ranked third (average rating - 2.57) on the SIRS and first (average rating
- 2.34) on the class specific instrument. In all fairness to the instruc-
tor, it is necessary to maintain anonymity, however, some particulars about
this course may shed some light on the difference. The course taught by
instructor C is what is referred to in the School of Business as a "case"
course. Students are given a description of a problem in the business
world, and they are to read the problem and come up with a solution. The
problem is discussed in class, the students are told how the company solved
the problem, and the probable best solution. The difference between a case
course and most courses in a university, especially at the undergraduate
level, is that the particular case is not what the student is supposed to
learn, but rather the applied concept. The learning that goes on in this
class is to be able to transfer this one problem to new problems. Instead
of knowledge and recall being tested, application and synthesis are what is
important. The only plausible explanation for the difference in rankings
for this course is that the class specific instrument topped the above

areas while the SIRS could not.

131

This inconsistency among rankings should cause much concern to the
administrator who is trying to make promotion and tenure decisions. Is it
possible that an instructor could be tops on the class specific instrument
(rank #1) and mediocre on the SIRS (rank #3)? The difference in being tops
or mediocre may have a marked effect on pay raises. Which piece of infor-
mation should the administrator be using in his decision? .Although the
idea of using the general information is intuitively appealing, the fact
remains that the class specific instrument contains items that students
and instructors feel are important in evaluating their course. It would
seem foolish if the administrator did not put as much emphasis on the
class specific results as those from the general form.

In summary, although the results were not as statistically significant
as desired, the trends were supportive of the major hypotheses. The class
specific form'built for MTA 311 Section I seemed to have more psychometric
problems than the other four class specific forms. In retrospect, the
author can find nothing in the development of this particular form to
account for this problemt The only feasible suggestion to make is develop-
ment of a new class specific form for MTA 311 Section 1, starting with the
early pre-test stages and utilizing the information now available.

Beyond the cut and dry statistical evidence is the obvious fact that
the forms built for this study contain more information than the general
SIRS. Nothing has been lost in these new forms but much new information
has been acquired. Items referring to textbodks, exams, the usefulness
of learned information in the real world, the ability to apply facts to

other problems and many more. The general items on which comparisons can

132

be made between all classes are still available plus items that rate instruc-
tor or course characteristics specific to an individual class. These class
specific items are not only useful for an administrator in evaluating an
instructor for tenure or promotion, but also prove helpful to an instructor
for self improvement. The fact that general items are high inference in

nature makes them of little use in course improvement.
FURTHER RESEARCH

In reviewing the conclusions and the discussion presented in this
chapter, two areas of further research become apparent. The first area
concerns the rankings of instructors according to the mean item response.
A larger sample is needed to find out if the difference in rankings is
anomalous to the present study only or generalizable to larger samples.

If these results were present in future research, it would be necessary to
make some administrative decisions concerning the handling of these dis-
similar ranks.

The other area of research concerns the domain of extrinsic variables.
The results have not been consistent with reference to this subject. Re-
search concerning such variables as sex, college year, class size, and
whether the course is required or elective have come up with inconsistent
results concerning their effect on student ratings. However, it would be
interesting to attempt a study using both a general and class specific
instrument. The first question of this research would be "do extrinsic
variables effect the student ratings of instructors using the general

form?". If this question was answered affirmatively, for any of the

133

extrinsic variables the next question would be "do these extrinsic variables
effect the student ratings of these same instructors when class specific
student rating forms are used?". The author of this research prOposes that
any effects of the extrinsic variables would be dissipated by the specifi-
cities of items of a class specific nature. The fact that the items are of
a low to medium inference - might reduce the possible biases related to

extrinsic variables.

BIBLIOGRAPHY

134

BIBLIOGRAPHY

Baril, G. L.; Skaggs, C. T. "Selecting Items for a College Course Evaluation
Form," College Student Journal, Vol. 10, Summer, 1976, pp. 183-187.

Blum, M. L. "An Investigation of the Relation Existing Between Students'
Grades and their Ratings of the Instructor's Ability to Teach," The
Journal of Educational Psychology, Vol. 27, 1936, pp. 217-221.

Bradenburg, D. 0.; Derry, 8.; Hengstler, D. D. "Validation of an Item
Classification Scheme for a Student Rating Item Catalog," Paper
Presented at NCME, 1978.

Breed, F. S. "Factors Contributing to Success in College Teaching,"
Journal of Educational Research, Vol. 16, pp. 247-253.

Canaday, S. D.; Mendelson, M. A.; Hardin, J. H. "The Effects of Timing on
the validity of Student Ratings," Paper Presented at NCME, 1978.

Centra, J. A. "Student Ratings of Instruction and Their Relationship to
Student Learning," American Educational Research Journal, vol. 14,

Centra, J. A.; Linn, R. L. "Student Points of View in Ratings of College
Instruction," Education and Psychological Measurement, Vol. 36, 1976,

Clark, K. E.; Keller, R. J. "Student Ratings of College Teaching," 1a,;
University Looks at its Program, eds. R. E. Echert and R. J. Keller,
Minneapolis, Minnesota: The university of Minnesota Press, 1954.

Cohen, S. A.; Berger, W. G. "Dimensions of Students' Ratings of College
Instructors Underlying subsequent Achievement on Course Examinations,"
Proceedings of the 178th Annual Convention of the American Psycholo-
gical Association, 1970, Vol. 5, pp. 605-606.

Cohen, J.; Humphreys, L. G. Memorandum to faculty. University of Illinois,
Department of Psychology, 1960. (Mimeographed)

Costin, F. "A Graduate Course in the Teaching of Psychology: Description
and Evaluation," Journal of Teacher Education, 1968, vol. 19, pp.
425-432.

Costin, F. "Intercorrelations Between Students' and Course Chairmen's
Ratings of Instructors," University of Illinois, Division of General
Studies, 1966. (Mimeographed)

135

Costin, F.; Greenough W. T.; Menges, R. J. "Student Ratings of College
Teaching: Reliability, Validity, and Usefulness," Journal of
Educational Research, vol. 41, No. 5, 1971, pp. 511-535.

Cronbach, L. J.; Gleser, G. C.; Nanda, H.: Rajaratnam, N. The Dependa-
bility of Behavioral Measurements: Theory of Generalizability for
Scores and Profiles, New York: John Wiley and Sons, Inc., 1972.

Cunningham, W. J. "The Impact of Student-Teacher Pairings on Teacher
Effectiveness," American Educational Research Journal, vol. 12,
No. 2, Spring 1975, pp. 169-189.

 

Cushman, H. R.; Frederick, K. T. "The Cornell Diagnostic Observation
and Reporting System for Student Description of College Teaching,"
NACTA Journal, March 1976.

Danielsen, A. L.; White, R. A. "Some Evidence on the Variables Associated
with Student Evaluations of Teachers."

Downie, N. M. "Student Evaluation of Faculty," Journal of Higher Education,
vol. 23, 1952, pp. 495-496.

Doyle, K. 0.; Whitely, S. E. "Student Ratings as Criteria for Effective
Teaching," American Educational Research Journal, Vol. 11, No. 3,
Summer 1974, pp. 259-274.

Ebel, R. L. Essentials of Educational Measurement, New Jersey: Prentice-
Hall, Inc., 1972.

Ebel, R. L. "Estimation of the Reliability of Ratings," Psychometrika,

Fry, P. W}; Leonard, D. W.; Beatty, W}‘W. "Student Ratings of Instruction:
validation Research," American Educational Research Journal, Vol. 12,

 

Gage, N. L. "Teaching Methods," in Encyclopedia of Educational Research,
ed. R. L. Ebel, 4th edition, New York: ‘Macmillan Co., 1969.

Gage, N. L. "The Appraisal of College Teaching," Journal of Higher
Education.

Gillmore, G. M. "Three Functions of Student Course Evaluations," Paper
Presented at a Symposium on Course Evaluation, NCME Convention,
1972.

Gillmore, G. M5; Kane, M; T.; Naccarato, R. W. "The Generalizability of
Student Ratings of Instruction: Estimation of the Teacher and Course
Components," Journal of Educational Measurement, vol. 15, Spring 1978,
pp 0 1-13 0

 

136

Our, R. E.; Gur, R. C.; Marshalet, B. "Classroom Seating and Functional
Brain Symmetry," Journal of Educational Psychology, vol. 67, 1975,
pp. 151-153.

Guthrie, E. R. The Evaluation of Teacher Teaching: A Progress Report.
Seattle: University of Washington, 1954.

Guthrie, E. R. "The Evaluation of Teaching," Educational Record, 1949,
V01. 30’ pp. 109-1150

Harvey, J. N.; Baker, D. G. "Student Evaluation of Teaching Effectiveness,"
Ioproviog College and University Teachiog, vol. 18, 1970, pp. 275-278.

Heilman, J. D.; Armentrout, W. D. "The Rating of College Teachers on Ten
Traits by Their Students," Journal of Educational Psychology, 1936,
V01. 27, pp. 197-2160

Jackson, R. W. B.; Ferguson, G. A. Studies on the Reliability of Tests,
Toronto: University of Toronto Press, 1941.

Kane, M. T.; Gillmore, G. M.; Crooks, T. J. "Student Evaluations of Teach-
ing: The Generalizability of Class Means," Journal of Educational
Measurement, Vol. 13, No. 3, Fall 1976.

Lathrop, R. G. "Unit Factorial Ratings by College Students of Courses and
Instructors," Chico State College, California, 1968. (Mimeographed)

Lathrop, R. G.; Richmond, 0. "College Students' Evaluation of Courses and
Instructors," Chico State College, California, 1968. (Mimeographed)

Linn, R. L.; Centra, J. A.; Tucker, L. "Between, Within and Total Group
Factor Analyses of Student Rating of Instruction," Multivariate
Behavioral Research, July, 1975, pp. 277-288.

Lovell, G. D.; Haner, C. F. "Forced - Choice Applied to College Faculty
Rating," Educational and Psychologioal Measurement, vol. 15, 1955,
Pp s 291-304 0

Marsh, H. W.; et. a1. "validity and Usefulness of Student Evaluations of
Instructional Quality," Journal of Educational Psychology, vol. 67,
No. 6, 1975, pp. 833-839.

Marsh, H. W. "The validity of Students' Evaluations: Classroom Evaluations
of Instructors Independently Nominated as Best and Worst Teachers by
Graduating Seniors," Ameriéan Educational Research Journal, Fall 1977,
Vol. 14, No. 4, pp. 441-447.

Maslow, A- H.; Zimmerman, W. "College Teaching Ability, Scholarly Activity and
Personality," Journal of Educational Psychology, Vol. 47, 1956, pp. 185-189.

MeKeachie, W. J.; Lin, Y.; Mann, W. "Student Ratings of Teacher Effectiveness:
validity Studies," American Educational Research Journal, Vol. 8, 1971,
pp. 435-445.

137

Mehrens, W. A.; Lehmann, I. J. Measurement and Evaluation in Education
and Roychology, New York: Holt, Rinehart and Winston, Inc., 1973.

Meier, R. 8.; Feldhusen, J. "Influence of Instructor Expressiveness on
Student Ratings," Paper Presented at NCME, 1978.

Morsh, J. E.; Burgess, G. C.; Smith, P. N. "Student Achievement as a
Measure of Instructor Effectiveness," Journal of Educational
Psychology, 1956, Vol. 47, pp. 79-88.

Nie, N. H.; Hull, C. H.; Jenkins, J. G.; Steinbrenner, K.; Bent, D. H.
Statistical Packoges for Social Sciences, New York: ‘MeGraw-Hill
Book Company, 1975.

Office of Evaluation Services at Michigan State university. Student
Instructional Rating;Systemg Stability of Factor Structure; SIRS
Research Report #2. November 1, 1971.

Office of Evaluation Services, MSU. "Student Instructional Rating
System (SIRS) Technical Bulletin," December 22, 1969.

Office of Instructional Resources, Measurement and Research Division;
University of Illinois at Urbana-Champaign, "Instructor and Course
Evaluation System (ICES)," Newsletter Number 1, 1977.

Olson, L. A. "Behavior-Specific Items for Student Rating of Instruction,"
Paper Presented at AERA,1978.

Owen, S. V. "Classroom Seating Patterns," Paper Presented at NCME, 1978.
Pohlmann, J. T. "A Multivariate Analysis of Selected Class Characteristic
and Student Ratings of Instruction," Multivariate Behavioral Research,

Rayder, N. F. "College Student Ratings of Instructors," Journal of Experi-

Rose, L. A. "Adjustment of Student Ratings of Teachers for Extrinsic In-
fluences," The Journal of Economic Education, Spring 1975, pp. 129-132.

Rosenshine, B. "Enthusiastic Teaching: A Research Review," school Review,
Vol. 78, 1970, pp. 499-514.

 

Rosenshine, B.; Furst, N. "Use of Direct Observation to Study Teaching," in
R. M. Travers (ed.) Second Handbook of Research on Teaching, Chicago:

Riley, J. W.; Ryan, B. F.; Lifochitz, M. The Student Looks at His Teacher,
New Brunswick, New Jersey: Rutgers University Press, 1950.

138

Siegel, S. Nonparametric Statistics, MeGraw-Hill Series in Psychology,
New York: ‘MeGraw-Hill Book Company, 1956.

Showers, B. H. "Alternate Response Definitions in Instructional Rating
Scales," 1973,.A Dissertation at Michigan State University.

Smock, H. R.; Crooks, T. J. "A Plan for the Comprehensive Evaluation of
College Teachers," Journal ofgigher Education, Vol. 44, 1973, pp.
577-586.

Soloman, D. "Teacher Behavior Dimensions, Course Characteristics, and
Student Evaluations of Teachers," American Educational Research
Journal, Vol. 3, 1966, pp. 35-47.

Spencer, R. E. The Illinois Course Evaluation Questionnaire: iManual of
Interpretation: Research Report No. 270. Champaign 111.: Univer-
sity of Illinois, Office of Instructional Resources, Measurement and
Research Division, 1968.

Spencer, R. E.; Aleamoni, L. M. "A Student Course Evaluation Question-
naire," Journal of Educational Measurement, vol. 7, 1970, pp. 209-210.

Stalnaker, J. M.: Remmers, H. H. "Can Students Discriminate Traits
Associated with Success in Teaching," Journal of Applied Psychology,
1928, vol. 12, pp. 602-610.

Starrack, J. A. "Student Rating of Instruction," Journal of Higher Educa-
tion, vol. 5, pp. 288-290.

Tatenbaum, T. J. "The Role of Student Needs and Teacher Orientation in
Student Ratings by Teachers," American Educational Research Journal,
Vol. 12, No. 4, Fall 1975, pp. 417-433.

Tolor, A. "Evaluation of Perceived Teacher Effectiveness," Journal of
Educational Poychology, vol. 64, 1973, pp. 98-104.

 

Toug, M. 8.; Feldhusen, J. F. "validity of Student Ratings of Instructors,"
College Student Journal, Vol. 8, 1974, pp. 2-5.

Villano, M. W. "The Development of an Omnibus Student Rating Form for
Evaluation of Courses and Instructors at a Large University," Paper
Presented at NCME, 1977. -

Walker, B. D. "An Investigation of Selected variables Relative to the
Manner in.Which a Population of Junior College Students Evaluate
Their Teacher," Dissertation Abstracts, vol. 29 (9-B), p. 3474.

Webb, W. B.; Nolan, C. Y. "Student, Supervisor, and Self-Ratings of
Instructional Proficiency," Journal of Educational Psychology,
V61. 46, 1955, pp. 42-46.

139

Whitely, S. E.; Doyle, K. O. "Implicit Theories in Student Ratings,"
American Educational Research Journal, 1976, Vol. 13, No. 4, pp.
241-254.

 

Williams, R. G.; Ware, J. E., Jr. "An Extended Visit With Dr. Fox:
Validity of Student Satisfaction with Instruction Ratings after
Repeated Exposures to a Lecture," American Educational Research
Journal, Fall 1977, Vol. 14, No. 4, pp. 449-457.

 

Wilson, P. A.; Wilson, T. C. "What Factors Contribute to Better Instruc-
tion?" Paper Presented at AERA, 1978.

Wilson, T. C.; Wilson, P. A. "Differences in Student Evaluations from
Business and Other College," Paper Presented at a Symposium on
Student Evaluation at SMA, 1977.

Wood, N.; DeLorme, C. D. "An Investigation of the Relationship Among
Teaching, Evaluation, Research and Ability," Journal of Economic
Education, Vol. 7, Spring 1976, pp. 77-80.

 

Wotruba, T. R.; Wright, P. L. "How to Develop a Teacher-Rating Instru-
ment," Journal of Higher Education, 1975, Vol. 46, No. 6, pp. 653-663.

 

"lllﬁlﬁﬂllljﬂllli