'C \ I , 1"," l h .’. ~ I ' 2N. ‘ ”\ul . | ‘.|‘E\°éiéj" -‘K:A:A 0 .'.HI... ‘P‘HII ~- .‘ _ 1:.I be ' ' ' ‘ .7! .'. . I" 15' -~“ .. " W‘*"~\" CK. -- ‘ lg"; A”. CV‘ ‘ \ VII - " ‘ ' h '4. ' 93W . - ":cfl 'UI'I fiI' It”! Fr? 3% fie n’}:“ u“ {Ln ‘ ‘53:: uygmfi‘f .a ‘ ' ..I a. Ema - 5% ' ‘n 53‘: 15 ”3 r ' «c \ - “A..- ) ._ . . :~ 3:? :1- M- 5‘ A 1E1 , ,fix ..‘ ”Iv“ a??? . .. .: . "0. Auw’axu'm 7%: ‘1. 15“ 'I ‘I'!'- I v, . 1h; 3:.jimr' ”Edit-1: , I . 1%“_." :7“ _ I ‘. ‘I‘ L 9H3“ “ PIVE‘.” 3 v‘fi 33:: {{SS"; w. . . .3; ~13? 311.1% *2de ' _ .. -. ~ ‘ ‘-.L' . :L‘v-v 1-1.. .L' _ ‘ ‘ ', ‘ 1. ’ \ ‘n .'.I‘. ..~ w 'I‘ ' .- 'v - ' 10‘ I .1: air-.’. -’ .j.'. . o o’ A IE I". . - I‘llf‘k” 9‘ .‘..‘g: . ‘I ‘ I .'. .'.. “my." : . 1‘ 1",»; - I 'l . nu ‘_ .<;-;I.‘ 1.10:1”; I C. I‘d}? 'I' r-.-A I ~ I q‘ ‘ I I‘m 9"1 83:3 '.‘N. '; ._~9~;‘.¢‘ 1 S :III 99" I ". H I on“ .'. . j . I M I I I . I ._ 'I' '3 ' ‘ ' I - a - . I .. -~ «MIMI;- -....1-\.v“ . . I, w I II 'I - t." ‘:_“- .A" 4&3“- l I \- 1“ .-.I . . L .981 ."A‘u'r 4'?) ‘14". III)!"- “knit “Lima. Ir? " V" ‘fh'fiJ w" I {abi- I .. .; I45 ,I, TH :81. LIBRARY Michigan State University will; lllll LII! ll Ill] ll "@11th ll This is to certify that the thesis entitled A COMPARISON-OF TWO STUDENT INSTRUCTIONAL RATING FORMS presented by PAMELA WEAVER WILSON has been accepted towards fulfillment of the requirements for SERVICES AND EDUCATIONAL PSYCHOLOGY flag/4f. EM Major professor Datej/W z) /?78 0-7 639 OVERDUE FINES ARE 25¢ PER DAY _ PER ITEM Return to book drop to remove this checkout from your record. © 1978 PAMELA ANN WEAVER WILSON ALL RI GHTS RESERVED A COMPARISON OF TWO STUDENT INSTRUCTIONAL RATING FORMS by Pamela Weaver Wilson A DISSERTATION Submitted to Rflchigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services, and Educational Psychology 1978 ABSTRACT A COMPARISON OF TWO STUDENT INSTRUCTIONAL RATING FORMS By Pamela Weaver Wilson With the advent of teacher accountability, student ratings have become a greater concern in recent years. It has become necessary for administra- tors to have normative data for making unbiased decisions regarding teach- ing staff. However, the question of whether these normative items give enough information for evaluation purposes still remains. The purpose of this study was to build an instructional rating scale that would contain items not only general in nature, but items specific to a class of interest. These items would be useful for instructor evaluation as well as for instructor self diagnosis and self improvement. This class specific instrument was compared to a standard general instrument in use at the university. To conduct the study, five undergraduate classes were chosen from a specific department at Michigan State University. The classes were chosen because of their diverse nature. This diversity was necessary in develop- ing specific class items. Data was collected on the class specific and general instrument during the last day of classes, Spring term 1978. The major hypothesis of the study concerned item variability. It was expected that the five original instruments would have less variability on a particular item within a class and have a larger between class Pamela Weaver Wilson variability on a particular item than the general instrument. It was also hypothesized that an index of rater reliability would be larger for the class specific form than the rater reliabilities of the general instrument. In order to test the hypotheses of item variabilities, the MannéWhitney U Statistic was calculated. The hypotheses concerning the rater reliabili- ties was tested by use of an F statistic. Although many of the tests were not statistically conclusive, the results indicated that the class specific instrument was a viable alterna- tive for use in student rating forms. In four out of five of the classes, the average item.variance of the class specific form was equal to or less than the average item variance of the general instrument. The average between class variability for Specific items on the class specific instru- ment was larger than the average between class variability for the items on the general instrument. These results were in the anticipated direction. On the whole, there did not appear to be any difference between the rater reliability on the class specific instrument compared to those on the general instrument. In conclusion, it is imperative to mention that the class specific instrument was very exploratory in nature, while the comparison instrument was in a highly developed state. This lends much credibility to my point of view, the results of this thesis favorably support the use of student rating forms containing both class specific and general items. DEDICATION In memory of my grandmother, Elsie Putnam Warr who taught me the value of an education 11 ACKNOWLEDGMENTS There are many people who have contributed to this thesis. First, I want to extend my deepest appreciation to my family. My husband, Terry has given me both emotional support and technical advice. Without his many editorial comments, the final thesis would have lost mmch of its readability. My son, B. J. has been very patient and understanding about a project he.bare1y comprehends. I would also like to especially thank my dissertation committee. Dr. Robert Ebel, the Chairman of my Guidance Committee provided kindness, emotional support, technical advice, and personal presence whenever needed. Dr. LeRoy Olson contributed greatly to the design of my instru- ment with his ideas and personal experiences. Dr. Kenneth Arnold contri- buted his statistical expertise which helped with the data analyses. Dr. Edward Smith provided ideas about the data analyses that opened new avenues of exploration. I would also like to thank Dr. Dennis Gilliland for his thorough review of all mathematical calculations and statistical expertise. It is also necessary to extend thanks to the Marketing and Transpor- tation Department for both secretarial and faculty support. Special thanks must go to Dr. Leo Erickson, Dr. Frank Mbssman, Mr. John.Henke, Mr. Mark Bennion, and Mr. Robert Krapfel for letting me use their classroom for data collection purposes. Lastly, it is necessary to mention.Mrs. Diane Scribner who contributed her technical expertise in the typing of both the preliminary and final draft of this thesis. iii Chapter I. II. III. IV. V. THE PROBLEM O O O O O O O I I 0 Introduction . . . . . . . . Considerations in Instrument TABLE OF CONTENTS Current Practices in Student Rating of at Ten Universities . Impetus for the Study Experiences Purpose . . Summary REVIEW OF THE LITERATURE Introduction . . . . . Reliability Experience Outcomes Hypotheses . . . . . . Validity . . . . Comparative Data . . . . Differences in Item Types Format . . . . . . . . . . Summary PROCEDURES AND DESIGN . . . . Introduction . . smple O O O O O O O I Instrument Development Design . . . Hypotheses . Analysis . . Summary . . RESULTS . . . . Introduction Overview . . . . . . . Results Concerning Results Concerning Results Concerning Results Concerning Other Interesting Results . . . . . . Summary of Results of Study . . . . . SUMMARY AND CONCLUSIONS Summary Conclusions Further Research BIBLIOGRAPHY . . . Discussion . . . Like Items on Differing Item variability . Student Satisfaction Rater Reliabilities Construction Instruction Instruments 105 116 119 121 121 124 126 132 134 Table 1.1 1.2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 4.1 4.2 LIST OF TABLES Uses Made of Students' Evaluations of Instruction Frequency of Responses to the Student Rating Opinion Questionnaire . . . . . . . . . . . smple Item I I I I I I I I I I I I I I I I I Final Pretest Items for the General Instrument Final Class Specific Pretest Items, MTA 311 Section Final Class Specific Pretest Items, MTA 311 Section Final Class Specific Pretest Items, MTA 313 . . Final Class Specific Pretest Items, MTA 317 . . Final Class Specific Pretest Items, MTA 341 . . General Instrument, Pretest Statistics . . . . General Instrument, Frequency Distributions for question 1 I I I I I I I I I I I I I I I I I Class Specific Instrument, MTA 311, Section 1, Frequency Distribution for Question 1 . . . Class Specific Instrument, MTA 311, Section 2, Frequency Distribution for Question 1 . . . Class Specific Instrument, MTA 313, Frequency Distribution for Question 1 . . . . . . . . Class Specific Instrument, MTA 317, Section 3, Frequency Distribution for Question 1 . . . Class Specific Instrument, MTA 317, Section 5, Frequency Distribution for Question 1 . . . Class Specific Instrument, MTA 341, Frequency Distribution for Question 1 . . . . . . . . Chi Square Tabled values . . . . . . . . . . . Chi Square Calculated Values . . . . . . . . . 12 49 51 53 54 55 56 57 6O 61 62 63 64 65 66 67 98 99 Table 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 Comparison of Variance Distributions, Specific vs. General Instrument, Mann-Whitney U . . . . . . Average variances for Class Specific and General Instruments . . . . . . . . . . . . . . . General SIRS Form, Index of Between Class Variability Class Specific Instrument, Index of Between C1888 variability I I I I I I I I I I I I I I I I Frequency of Responses to the Satisfaction Item in the Class Specific Instrument . . . . . . . . . Statistical Decision Concerning the Ho: In the Satisfaction Question . . . . . . . . . . . . Analysis of variance - Complete Sets . . . . . . . . Intraclass Reliability Coefficients for Average Ratings . . . . . . . . . . . . . . . . . Intraclass Reliability Coefficient for An Indifldual kter I I I I I I I I I I I I I I I I I_ Confidence Intervals Around Reliability Estimates of an Individual Rater . . . . . . . . . F Test I I I II I I I I I I I I I I I I I I I I I I I Table of Grand Haans . . . . . . . . . . . . . . . . vi Page 101 102 104 106 107 109 111 112 113 115 117 118 LIST OF FIGURES Figure Page 1.1 Instructor Opinion Items . . . . . . . . . . . . . . . . . 11 3.1 Final Student Instructional Rating Instrument, m 311 section 1 I I I I I I I I I I I I I I I I I I I 71 3.2 Final Student Instructional Rating Instrument, MTA 311 sec tion 2 I I I I I I I I I I I I I I I I I I I 74 3.3 Final Student Instructional Rating Instrument, m 313 I I I I I I I I I I I I I I I I I I I I I I I I 77 3.4 Final Student Instructional Rating Instrument, m 317 I I I I I I I I I I I I I I I I I I I I I I I I 80 3.5 Final Student Instructional Rating Instrument, m 341 I I I I I I I I I I I I I I I I I I I I I I I I 83 3.6 Student Instructional Rating System Form, Form B . . . . . 85 3.7 Design for MannéWhitney U Test, MTA 313 . . . . . . . . . 90 4.1 Chi Square Matrix for MTA 311 Section 1, Specific Instrument - Item 1, SIRS - Item 2 . . . . . . . . . . 96 vii CHAPTER I THE PROBLEM INTRODUCTION With greater emphasis being placed on teacher accountability, student ratings of professors have become a greater concern in recent years. It has become necessary for administrators to have normative data to aid them in making unbiased decisions regarding the teaching staff. However, student evaluation instruments are often developed and piloted on a very specific population (e.g., education, business, etc.) as a sample of convenience. Once the instrument has been refined and administered, the administrator may follow one of several actions along a continuum in regard to the results. Because of the need for some type of normative data, at one end of the continuum the administrator would perceive the results as absolute truth. Or his suspicions about the data may lead him to reject the results because he feels they are invalid. The question becomes one of which decision is correct. Are there inherent biases in the development sample? Is one instrument valid across colleges within the university? Or at an even more basic level, is one instrument valid across classes within a department? The primary purpose of the present study is to gain insight into the latter question posed. CONSIDERATIONS IN INSTRUMENT CONSTRUCTION It has been noted (Baril & Skaggs, 1976) that two major issues should be addressed prior to data collection for a teacher evaluation instrument. The first issue that arises addresses the question of whether different items or forms are necessary for different instructors, courses, departments or colleges within a university. This question is prompted by the underlying differences between courses in content, instructor emphasis, student char- acteristics and general academic discipline. The second issue which has been addressed on many occasions concerns the different uses of the data from evaluation instruments. Gillmore (1972) lists three uses under the descriptive titles of normative, diagnostic, and informative. Others (Baril & Skaggs, 1976; Wotruba & Wright, 1975) separate this trichotomy into user oriented terms; administrator information, instruc- tor information and student information. The normative purpose is made use of by the administrators who are responsible for counseling faculty members and for evaluating them with respect to retention, tenure and promotion. Normative refers to evaluations being used in a comparative mode. The re- sults for different instructors are compared so that administrators can place instructors in a hierarchy with respect to classroom performance. Teacher accountability as well as merit raises and tenure decisions have made it necessary to institute some type of reliable and valid process for administrative decisions. It is this usage of student evaluations that has created the most negative feelings about student evaluations. The diagnostic purpose is for instructors to gain feedback on their own teaching abilities thus facilitating self improvement. Diagnostics has generally been thought of as the primary use and is potentially the lowest risk situation, especially if administered on an optional basis. The informative purpose is served when the students, for whatever reasons, seek information which will help them select instructors and courses. Dissemination is sometimes accomplished by a group of students via campus publication of evaluation results. The student government may take part in such a procedure. With reference to the above issues, authors have suggested that differ- ent evaluative items may be appropriate for different purposes. However, it has been noted by these same authors that few, if any, published reports have taken these differences into account (Baril & Skaggs, 1975). CURRENT PRACTICES IN STUDENT RATING OF INSTRUCTION AT TEN UNIVERSITIES In an attempt to ascertain the mood of universities in relation to the above issues a review of student instructional rating systems currently in use at ten universities were reviewed. The universities included.were the University of Illinois at Urbana-Champaign, the University of Iowa, Purdue University, university of Michigan, Michigan State University, University of Minnesota, Northwestern University, Ohio State University, University of Indiana, and University of Wisconsin. To obtain the necessary updated information, personal telephone interviews were made with those universities involved.1 The points of interest were whether different items or forms were used across instructors, departments and universities; and whether different forms were instituted for different users, i.e., administrator, instructor and student. Also noted was whether student evaluation instru- ments were optional or required. Purpose of the Instructional RatinggForm A perusal of Table 1.1 shows the marked differences in uses of instruc- tional rating forms. All ten of the universities selected made use of ratings on a diagnostic basis to improve instruction. In all cases except Michigan State and Ohio State University, the diagnostic use of student rating forms was made on an optional basis. Four (Michigan State University, University of Minnesota, Ohio State University, University of Indiana) of the ten universities had a university wide policy in effect that required the use of Student Rating Forms in both tenure and promotion decisions. The university of Wisconsin has a state wide system policy that requires the twenty-six state supported educational institutions (14 four year and 12 two year) to use student evaluations in tenure and promotion considerations. Although the other five universities had no university or system policy requiring student evaluations in the normative mode, several departments within the universities had their own requirements. For example, even though there is an absence of any top policy at Purdue University many 1Special thanks needs to go to Latricia Turner, University of'Michigan; Peter Fry, Nbrthwestern University; Kenneth Doyle, University of Minnesota; James Deerie, Purdue University; Dale Bradenburg, University of Illinois; Mary Rouse, University of Wisconsin; Ms. Johnson, University of Indiana; LeRoy Olson, Michigan State University; Larry Jones, Ohio State University; and Rena weets, University of Iowa. USES MADE OF STUDENTS' EVALUATIONS OF INSTRUCTION Normative Diagnostic Informative University of Illinois 0 O O Urbana-Champaign University of Iowa 0 O 0 Purdue University 0 O NU University of Michigan 0 O 0 Michigan State University R R 0 university of Minnesota R O 0 Northwestern University 0 O 0 Ohio State University R R NU University of Indiana R O 0 University of Wisconsin R 0 NU O . Optional R = Required NU = Not in use at this time departments have created their own policies. The Psychology Department has made the use of student rating forms mandatory, while the Pharmacy De- partment has mandated their use by every instructor at least once a year. Other departments, such as the Management Department strongly encourages their faculty to make use of student rating forms. The University of Illinois has a system labeled by many as voluntary coersion. Although there is no university requirement concerning student evaluations of teaching, there is a university policy requiring some evi- dence of teaching performance. This evidence is to be contained in an instructor's personal file for tenure and promotion decisions. The universities differ widely with regard to the informative mode. At one time or another all the universities have had some action by differing student bodies to obtain information at this level. Michigan State Univer- sity appears to be the only educational institution in this sample that has declared that the optional use of this type of instrument is a function that may be taken on by the Student Government. However, an optional form de- vised by the Student Government Association in 1976 proved to be so time consuming for the student participants that no further attempt at such a large scale procedure has been administered again. Northwestern University and the Universities of Iowa, Michigan and Illinois have special options on universitydwide evaluation forms that allows the instructor to release information for student dissemination. If allowed, partial information is then released to the student body. While the university of Indiana and Minnesota have no continuing release of information, student groups have, from time to time, collected and distributed student rating information to the student body. Again, this has been on a voluntary basis for the instructors. During the period 1969-72, Purdue University made an attempt at student rating release by student organizations but found it to be too much work and has since been dropped. Ohio State University has taken the stand that student ratings are personal documents and no release of the information has been made to the student body. With reference again to Table 1.1, there are a total of 30 cells. It is interesting to note that in only seven instances (23%) are ratings re- quired, in 20 instances (67%) are ratings optional, and in only three instances (102) ratings do not appear to be in use at this time. Number of Instructional Forms in Use Michigan State University was the only university to have three separ- ate forms for the three different purposes (normative, diagnostic, informa- tive). The other nine universities used one form to accumulate data for all three purposes. At Michigan State university an individual form developed by the Office of Evaluation Services is administered to about 15,000 of the 44,000 students each term. The rest of the faculty use a departmental or self made instru- ment. At Northwestern University, two general forms exist; one for lecture classes and one for small class situations. The University of Wisconsin uses a myriad of student rating forms that have been developed for depart- mental use or for individual classes. The other seven universities surveyed use some type of cafeteria system. This type of system involves the use of an evaluation item.bank. This item bank allows the instructor to choose items that are tailor made for the course he is teaching. The form also consists of a few common items that would appear on all student rating forms. However, it should be noted that the above descriptions embody the general mode of student ratings of instruction at the sampled colleges. Within each college there are various forms developed and used by individual departments. Again, it appears that the universities reviewed are not in agreement on the use of instructional rating forms. There is no agreement on the question of whether one general instrument may be used across colleges within a university. Seven of the ten universities are gravitating toward a cafeteria style rating form. This lends evidence to support the opinion that a given set of items are incapable of being used to compare all in? structors across the university. Philosophically, the question of whether one form should be used for all three situations, i.e., diagnostic, normative, informative, becomes a question of whether rating forms are justified in being used for these three purposes at all. Everyone involved appears to feel very comfortable with the diagnostic purpose, is learning to live with the normative purpose but are undecided about the informative purpose. To release to students any information contained on the student rating forms, it is necessary to receive permission from the instructor. This attitude prevails over all ten of the universities under discussion. This leads one to believe that the position of the universities is based on the premise that the informa- tion is personal information belonging to the instructor. None of the universities have yet taken the stand that the information belongs to the student. As long as this is true, the limited results of the informative purpose can only give a biased view of faculty performance to interested students. IMPETUS FOR THE STUDY The interest in pursuing the present study developed from previous personal research and experience. A description of these experiences are presented below. Following the description of these experiences is a dis- cussion of how these experiences molded the author's thinking in terms of this dissertation topic. EXPERIENCES Personal Research 1. In a paper presented at the American Educational Research Associa- tion (Wilson & Wilson, 1978) an attempt is made to compare the factor stability across colleges of the Student Instructional Rating Forms (SIRS) at Michigan State University. Three separate samples were selected from the Education, Business and Engineering colleges. The study consisted of performing both orthogonal and oblique factor analyses on the three separate colleges and the combined sample of colleges. Three separate units of analyses were considered; the individual student response, the class mean for each item, and the individual score minus the class mean of each item. No matter what type of analysis or unit of analysis was attempted, one aspect that remained consistent was the factor loadings. Factors remained stable across colleges but the percent of variability accounted for by the factors was altered. 10 2. An attempt was made to ascertain the mood of the Marketing and Transportation Department at Michigan State University concerning student ratings of instruction. In order to meet this objective, a questionnaire was distributed to all professors and teaching assis- tants in the department. A list of the items presented in this questionnaire is located in Figure 1.1. Respondents were directed to check as many options as applicable. The response rate was approximately 80%. Eleven faculty members and twelve teaching assistants responded to the questionnaire. There was no difference in the way the faculty and teaching assistants responded to the items. Table 1.2 displays the frequency of response to each item in the questionnaire. Personal Experience 1. Personal experience has also played a motivating role in the desire to research the area of using more department, college, or possibly even class oriented evaluation instruments. The admini- stration of some type of rating form is mandatory at Michigan State University. The use or misuse of this requirement becomes evident in an example of how it is fulfilled.2 Each faculty member is re- quired to evaluate each class every term in the School of Business. Although a specific form is not stipulated, most faculty members give the Level Two, Student Instructional Rating Forms (SIRS) 2The information contained in this paragraph was obtained from a personal interview with Associate Dean, Gardner Jones, School of Business, Michigan State University. 11 Your present position is a. Administrator b. Faculty c. Graduate Assistant My response to student evaluations of instruction is to a. toss them out b. read the comments c. leaf through the items d. have the items computer analyzed I find the student rating forms a. useful for instructional improvement b. useful for self evaluation c. useful for personnel and tenure decisions d. a.waste of time An ideal student rating form should have a. items that instructors could be compared on b. items selected for specific classes Do you feel that all of the items on the Student Instructional Rating Form now in use by the Marketing and Transportation Department are appropriate for your classes? a. most are appropriate b. some are appropriate c. none are appropriate Any comments specific to your type of class that may be incorporated in a student rating form would be appreciated. INSTRUCTOR OPINION ITEMS FIGURE 1.1 12 TABLE 1.2 FREQUENCY OF RESPONSES TO THE STUDENT RATING OPINION QUESTIONNAIRE Frequencies Item 1: Present Position Options: a. Administrator 1a. 0 b. Faculty lb. 11 c. Graduate Assistant 1c. 12 Item 2: Response to Student Evaluation Options: a. toss them out 2a. 1 b. read the comments 2b. 22 c. leaf through the items 2c. 6 d. have the items computer analyzed 2d. 10 Item 3: Student Ratings Forms are Options: a. useful for instructional improvement 3a. 17 b. useful for self evaluation 3b. 21 c. useful for personnel and tenure decisions 3c. d. a waste of time 3d. Item 4: An Ideal Student Rating Form Options: a. items that instructors could be compared on 4a. 10 b. items selected for specific classes 4b. 15 Item 5: Appropriateness of Items Options: a. most are appropriate 5a. 7 b. some are appropriate 5b. 15 c. none are appropriate 5c. 1 13 developed by the Office of Evaluation Services. The forms are collected by a student and sent to the Dean's Office. The Associate Dean reads all comments made by the students in an area provided on back of the form. If the evaluations are of an average nature, a one line summary comment is made by the Associate Dean concerning the instructor for that particular class. This comment is put in the instructor's per- sonal folder, the comment and forms are sent to the department chair- man. At this point, the department chairman reads the one line summary and student comments. The forms are then returned to the instructor. The instructor decides whether to make any further use of the infor- mation. Often, no analysis is made of the instrument itself. Further actions are taken in cases where student comments lend themselves to being either extremely positive or negative. If the comments are extremely positive, the Associate Dean composes a letter of recommendation to the instructor. If the comments are extremely negative, the rating forms are forwarded to the Dean, who may call in the Instructor or the Department Chairman, or both, for consultation. 2. The second point from personal experience is based on the factor composition of the SIRS. Previous factor analysis (Office of Evalu- ation Services, 1971) has indicated the SIRS has five predominant factors: 1) student interest, 2) course demands, 3) student-instructor interaction, 4) course organization, and 5) instructor involvement. Since the SIRS is a general form, it is used for several different classroom types, for example, large lecture, small discussion, quantitative, non quantitative. 14 EXPERIENCE OUTCOMES Because of both personal research and experience, the author believes there are several implications for using different rating forms for either personnel evaluation, or teacher self evaluation. Personal Research 1. In reference to the study by Wilson and Wilson (1978), the unstable percent of variability accounted for by the different factors across colleges lends credence to the idea that students place more importance on some factors than others. It may therefore, be unwise to compare teachers' evaluations without some sort of differential.weighting scheme. Each teacher would be well advised to compare his own per- ception of factor importance with that of the students. 2. With reference to faculty opinion, a glimpse of the results of the questionnaire distributed to the Marketing and Transportation Department may be reviewed in Table 1.2. The results of item.two show that a large portion of the instructors read the comments on the back, less than half of the sample leaf through the items or have the items computer analyzed. Responses to item four show that 652 of the instructors feel that student rating forms should include items specific to the particular class, 43% feel items should be selected on which instructors could be compared. Responses to item five show that 64% of the instructors feel that some of the items on the Level II, Student Instructional Rating Form are appropriate for their classes. 15 It is the opinion of the author that the results of this faculty questionnaire give support to the need for more specific rating forms. Possibly instructors are not, on the whole, leafing through the items or having them computer analyzed because they are too busy with other tasks. Or possibly the instructors do not feel the information obtained from this process is worth the minimal amount of time invested in the project. The fact that a large per- centage of instructors feel that an evaluation instrument needs items specific to individual classes gives evidence to support the opinion that more class specific instruments are necessary to evalu- ate individual classes. Personal Experience l. The process used by the School of Business to evaluate teacher performance further supports the need for more specific student rating forms. Since the administrator only reads the comments, it is the contention of the author that the administrator is dealing with incomplete information. How the instructor approaches his class in terms of written comments could have an effect on how the student responds. If the instructor asks the student specific questions, then it is likely that the student will only find time to address himself to these specific questions. If the instructor asks for possible course improvements he may set the stage for many student gripes. If the instructor tells the students he wants to make sure he repeats good points he may open the door for reinforcing commen t8 0 16 It is proposed that the process of reading comments only has been devised because the faculty and administration do not feel comfortable with any further analysis. 2. The last point concerns the factor composition of the Level II, Student Instructional Rating Forms. From a basic sense of fair play it does not appear appropriate to compare instructors on all five factors. For example, it seems that major injustice would be done if the instructor of a large lecture course had his evaluations of factor 3, student-instructor interaction compared to those of an instructor of a small class. In summary, ample evidence is available to justify further research in the area of student ratings of instruction. Evidence also supports the need for rating instruments that are of a tailor made nature for individual classes. PURPOSE The purpose of this study is to build an instructional rating scale that would discriminate between good and poor instruction, and have unam- biguous questions on which raters could be in agreement for each instructor. In terms of item variability, the better of two evaluation instruments would have less variability on a particular item in a given class. It would also be expected that variability exists on a particular item between classes. This latter variability could only be computed for items that appear on more than one specific evaluation instrument. 17 An index of rater reliability could also be computed for a specific evaluation instrument in each class. If a class specific instrument and SIRS form was administered to five classes, there would be a total of ten rater reliability indexes. This indeX‘would be concerned with inconsis- tencies, to what extent do the students give the same information about the instructor. It would be expected that the class specific formwwould have higher rater reliabilities than the SIRS. Such a scale would have to be developed from.items selected for both appropriate content for classes of interest and their psychometric charac- teristics. It would also be necessary to obtain some type of satisfaction index on the new instrument in order to assess the students feelings con- cerning the new instrument. A comparison will be made between a generally accepted instrument that has been used by several departments in a large university and an instrument created for specific classes within a department. This newly created in- strument will have a set of five to ten core items used for all classes. These core items will be drawn from the Level II, Student Instructional Rating Forms (SIRS) now in use at Michigan State University. These items are of a high inference, general nature. The generality of these core items will allow them to be appropriate for any classroom situation. The new instrument will also consist of ten to twenty items specifically de- signed to meet the needs of the individual class. These class specific items may be common to some of the specific class evaluation instruments. The last item on the specific class instrument will be a general satis- faction item. II. III. IV. 18 HYPOTHESES For each class, the responses to the core items in the class specific instrument come from the same distribution as the responses to corresponding items in the SIRS. For each class, the responses to the core items in the class specific instrument do not come from the same distribution as the responses to corresponding items in the SIRS. For each class, the item variance of the tailored items in the class specific instrument is the same as the item variance of the items in the SIRS. For each class, the item variance of the tailored items in the class specific instrument is less than the item variance of the items in the SIRS. The between class variability of tailored items shared by two or more of the class specific instruments is the same as the between class variability of items on the SIRS. The between class variability of tailored items shared by two or more of the class specific instruments is greater than the between class variability of items on the SIRS. The proportion of the students that are satisfied with the class specific instrument is equal to .50. The proportion of students that are satisfied with the class specific instrument is greater than .50. 19 V. H : There are no differences in rater reliabilities between the class specific instrument and SIRS. H1: The rater reliabilities of the class specific instrument are not the same as those obtained from the SIRS. SUMMARY In summary, it appears that no major consensus has been made among uni- versities on the exact use of student rating forms. Most universities, if not explicitly, are at least implicitly using rating forms for tenure and promotional decisions. The major problem lies in the lack of control exr hibited so far in how the rating forms are to be used in these decisions. The questionnaires distributed to the Marketing and Transportation Department at Michigan State University seem to indicate that instructors would prefer evaluation instruments tailor made for their classes. Possibly it is a distrust of a general instrument that leads to relaxed guidelines involving tenure and promotion decisions. Whatever the underlying reason for the lack of a standard university policy, or a standard type of evaluation instrument used across universities, one point stands alone. This point is that the issue of student ratings of instruction needs more consistency. OVERVIEW In Chapter II, the literature on the validity and reliability of student evaluations of instruction will be reviewed. Also the literature available 20 on differences in student evaluations across colleges, departments and classes within a university will be reviewed. The procedures and design will be discussed in Chapter III. The main thrust of Chapter III will revolve around the steps taken in building an evaluation form that con- tains both general high inference items as well as items tailored to individual classroom situations. The results from the computer analysis will be presented and analyzed in Chapter IV. Conclusions, a discussion of the results and any considerations for further research will be pre- sented in Chapter v. CHAPTER II REVIEW OF THE LITERATURE INTRODUCTION Chapter I stated the problem proposed in this study. This problem was operationalized in terms of a set of hypotheses. The essence of these hypotheses is that two different item types are valuable in a student rating form. One item type is of a general nature that would be utilized in evaluating any classroom situation. The second more specific item type would be designed for particular classroom settings, for example, large lecture versus discussion situations. Part of this chapter will review the literature associated with varying item types. Secondly, the item format is a crucial part of any student rating form. Because the comparison instrument will be the SIRS, Level II, the format of the proposed instrument will be comparable to that of the SIRS. It will therefore be necessary to review the literature associated with the format of the present SIRS. Thirdly, in reviewing the literature, it becomes obvious that two prevalent concerns involving student rating forms are their reliability and validity. It would be unwise to construct an instrument without reviewing the available literature in these areas. 21 22 One final area, that has to do with both reliability and validity of evaluation forms that researchers have not addressed to a great degree, is the differences between departments, colleges and classes within a univer- sity. In order to support the need for class specific evaluation items, it is necessary to take a look at this literature. For completeness, this chapter will therefore review the literature in the following areas: 1) the reliability of student ratings, 2) the validity of student ratings, 3) comparative data, 4) differences in item types, 5) the present format of the SIRS, Level II. RELIABILITY With relation to the test and measurement theory, Ebel (1972) defines the term "reliability" to be the consistency with which a set of test scores measure whatever they do measure; which is the extent to which an instrument consistently measures a construct. The reliability of a student rating in- strument would then refer to the ability of students to make unbiased judg- ments of a teacher's performance. A reliability coefficient for a set of ratings from a particular group of students is the correlation coefficient between that set of ratings and another set of ratings, on an equivalent rating form obtained independently from.the same group of students. At least four methods have been used in testing theory for obtaining reliability estimates, namely; test-retest equivalent forms, split halves, Kuder- Richardson, and rater reliability techniques. 23 Stability Over Time Research in the student rating field has been done using all of the above specified methods of obtaining reliability estimates. It is commonly accepted that student evaluations achieve reliable results (Costin, Green- ough 8 Manges, 1971). As early as 1954, Guthrie (1954) found correlations of .87 and .89 between students' ranking of the quality of their teachers from.one year to the next. Lovell and Bauer (1955) found the correlations between ratings made two weeks apart to be .89, while later Costin (1968) found correlations ranging from .70 to .87 between midesemester and end-of- semester ratings. Recently, in a study using medical students, no differences in student ratings were found when students filled out evaluation forms both before and after final examinations (Canaday, Mendelson & Hardin, 1978). Internal Consistency There have been many research articles expounding the need for internal consistency of an instrument as well as the consistency or stability across time. The early 1950's reported correlations ranging from .77 to .94 when the ratings of students in a given class were randomly paired (Guthrie, 1954; Maslow & Zimmerman, 1956). Lovell and Haner (1955) found that the mean odd- item ratings on a forced-choice instrument correlated .79 with the mean even- item ratings. Internal consistency correlations for the Illinois Course Evaluation Questionnaire (Spencer, 1968; Spencer & Aleamoni, 1970) averaged .93 for 16 different courses. Some forms include an item asking the student to give a global rating of the course. Correlations have ranged from .69 to .93 when this global 24 item.bas been correlated with the remaining specific items (Harvey 8 Barker, 1970). However, the concept of halo effect becomes intertwined with that of internal consistency when considering studies involving the use of rating scales. Halo effect is defined as the tendency to be influenced in making a specific judgment by a general impression of the individual being judged. Unfortunately, while the existence of this tendency is generally recognized, its measurement is extremely complex and involves the correlation, for each instructor, of each trait with each of the other traits. Stalnaker and Remmers (1928) recognized this problem as early as 1928 and calculated such a set of intercorrelations. The average intercorrelation was .45, which indicates no large presence of halo effect. Obviously, the problem with internal consistency and halo effect re- volves around a causality factor. Could a halo effect be causing internal consistency? Is internal consistency a desirable characteristic in student rating forms? The answer to the first question could often times be yes. The latter question is more complex. If an instrument is unitrait in nature, then internal consistency would be very desirable. However, if the instru- ment is of a multitrait nature, then internal consistency, as has tradition- ally been defined in testing theory, is not a desirable characteristic. Analysis of variance Techniques Recently, work has been done using an analysis of variance technique when estimating reliability coefficients (Kane, Gillmore 8 Crooks, 1976; Gillmore, Kane 8 Naccarato, 1978). The basis for this technique was 25 explicated by Ebel (1951) in "Estimation of the Reliability of Ratings." The mean square estimates are used from the analysis of variance (ANOVA) table in a formula to obtain the reliability of ratings. There are several advantages to this technique: 1) It permits the investigator to decide if he wants to include the between rater variance as part of the error variance; 2) It is possible to use incomplete ratings; 3) It is possible to use unequal sets of ratings. Kane, Gillmore and Crooks (1976) use the ideas of ANOVA and generali- zability theory to present what they consider a more comprehensive view of the dependability of student evaluations of instruction (Cronbach, Gleser, Nanda 8 Rajaratnam, 1972). The use of generalizability theory leads to three different estimates that partition different error variances and.have three express purposes: generalizing over both students and items; generalizing over students only; generalizing over items only. Each of these three coefficients is a legitimate estimate of the dependability of student course evaluations. The coefficient which should be used depends upon the purpose of the study and the desired universe of generalization. Generally, when evaluating teaching effectiveness, the most appropriate coefficient to use in analyzing course evaluation questionnaire data is the error variance associated with students and items, namely, generalizing over students and items. However, in a 1978 article, Gillmore, Kane and Naccarato (1978) found that generalizing over students and items but not over courses or teachers yields highly dependable results. Generalizing over courses as well as items and students, with the teacher as the unit of analysis, yields 26 moderately dependable results. Generalizing over teachers, items, and students with the course as the unit of analysis yields very low dependa- bility. There are many implications here for tenure and curricular decision makers. VALIDITY The broad issue of validity includes all factors which may contribute to or detract from the usefulness of student opinions about instructional effectiveness obtained through ratings. The issue of the validity of student rating forms is much less precise than the reliability issue. Generally defined, validity is the extent to which an instrument measures the construct it purports to measure. An instrument must be reliable if it is to be valid. However, reliability is not a sufficient condition for validity. Obtaining evidence to support the validity of student rating forms is difficult and the results are usually tenuous at best. Validity can be broken down into many facets of interest (Ebel, 1972). Mehrens and Lehmann (1973) delimit only three kinds of validity: 1) content validity-related to how adequately the content of the instrument samples the domain about which inferences are to be made, 2) criterion-related validity - pertains to the empirical technique of studying the relationship between the responses made to the instrument and some independent external criteria, 3) construct validity - the degree to which the instrument responses can be accounted for by certain explanatory constructs in psychological theory. The above definitions are very encompassing, but actual measurement of any type is difficult. There is more and more demand for quantitative measure- ment of validity by students, faculty and administration. 27 Correlational Studies Several studies attempting to quantify the validity of measurements utilize correlations between the qualities an instrument purports to measure, and certain explanatory constructs that at least partially explain performance on an instrument. Correlations between many sets of variables have been found useful. Following are six areas that correlational results have fallen into: I. II. supervisors' or Colleagues' Ratings and Student Ratingg The results have not been consistent over studies with regard to this area. Costin (1966) found a significant correlation of .49 between students' and supervisors' ratings. Both Guthrie (1949, 1954) and Maslow and Zimmerman (1956) found correlations of +.30 to +.63 between students' ratings and colleagues' evaluations of the same teachers. Breed (1927) at the University of Chicago reported a high correlation between fifty-six faculty members and a hundred students on the qualities of good teaching. Tolor (1973) found student ratings correlate moderately well with ratings by colleagues. Tang and Feld- husen (1974) and Starrack (1934) found student ratings correlate moderately well with expert evaluators. On the other hand, Webb and Nolan (1955) found no significant correlation between student ratings and supervisors' judgments of teaching performance in a military school. Student Ratings and TeachingLExperience Again, several studies have shown results varying from significant negative correlations to significant positive correlations with regard III. 28 to student ratings and teaching experience. Rayder (1968) found that younger faculty, with less rank and fewer years teaching experience received higher student ratings than older faculty, with more rank and more years teaching experience. Both Heilman, Armen- trout (1936) and Guthrie (1954) failed to find any difference due to the experience of the instructor. In contrast, Clark and Keller (1954), Guthrie (1949, 1954) and Walker (1969) found a significant positive relationship. Student Ratings and Student Achievement Several studies have attempted to ascertain the presence or absence of a correlation between student achievement and student ratings. Rayder (1968) and Blum (1936) found no correlation between grade point average and student ratings. However, in a review article (Costin, Greenough 8 Manges; 1971), a substantial number of investigations found significant positive relationships, although typically weak, between students' grades and their ratings of instructors or courses (Cohen 8 Burger, 1970; Lathrop 8 Richmond, 1967; Lathrop, 1968). Riley (1950) found that students of low academic standing rate their instructors more rigorously than those of a relatively high academic average. Aleamoni (1972) in a review of studies on the Illinois Course Evaluation Questionnaire found a positive correlation between grades and course evaluations. More recent research has found results similar to the previous research in this area. It was found that three factors, student accomplishment, presentation IV. 29 clarity and organization in planning, correlated highly with the final exam score (Fry, Leonard, Beatty, 1975). Marsh (1975) found a positive correlation between the average evaluation and grade point average in a large multi-section course. This course had several sections using different instructors, but all sections were tested with a common examination. Centra (1977) found a significant positive correlation between examination performance and many variables on a student instructional rating form. Another direct significant relationship was found by Canaday, Mendelson and Hardin (1978) between the course ratings of medical students and their achievement in that course. Student Ratings and Instructor Self—Ratings Webb and Nolan (1955) found a significant positive correlation between student ratings and the instructors' self-ratings. Student Ratings and Gains in Knowledge This area probably has intrinsic appeal to those who use mastery learning techniques and test by use of objectives. However, because of the difficulty in carrying out such a project, little research has been done. One of the few studies found that gains in information and practical "job sample" performance were positively and signifi- cantly correlated with their overall ratings of the course (Morsh, Burgess 8 Smith, 1956). Student Needs and Teacher Orientation Researdh in this area addressed the issue of viewing the act of student rating as an instance of person perception in which the 30 needs of students were held to affect their perception of teachers (Tatenbaum, 1975). It was found that specified student needs were significantly related to ratings of specified teacher orientations congruent with those needs. VII. Student Ratingsfiand Academic Dsgree of the Instructor Studies dealing with the academic degree of an instructor are some- what harder to find. Riley (1950) reported that those instructors who possessed the doctorate were rated higher than those instructors who did not possess the doctorate. A study by Downie (1952) also gave higher ratings to those instructors who held the doctorate degree. Extrinsic Variables Many colleges routinely obtain information on variables extrinsic to the instructional process, such as class size, level of course, student's major, year in college, sex of student, and so forth. Again, the results have not been consistent with respect to any one of these variables. The review article by Costin, Greenough, and Manges (1971) gives many references for the inconsistent results: I. Sex of Student A number of studies found no significant differences in the overall ratings of teachers made by male and female students, or in the ratings received by male and female teachers (Downie, 1952; Heilman 8 Armentrout, 1936; Rayder, 1968). On the other hand, there are studies that report a slight tendency for female students to be more II. III. 31 critical of their male instructors than male students. Also, it was noted that female students rated their female instructors significantly higher than their male instructors (Walker, 1969; McKeachie, Lin, 8 Mann, 1971). Required Courses Riley (1950) found differences between departments with regard to whether a course was required or an elective. He found that in the arts, whether a course is taken as required or as an elective makes little difference in student evaluation. However, in the sciences there is a tendency for students taking required courses to rate their instructor higher than in the case of those taking elective courses. In contrast, in the social sciences, higher ratings are more often given by those taking the course as an elective. Cohen and Humphreys (1960) found that students required to take psychology courses tended to rate them lower than students who elected to take them. Gage (1961) found teachers of required courses received significantly lower student ratings than did teachers of elective courses. In contrast, Heilman and Armen— trout (1936) reported no differences between the ratings of students in required courses and those in elective courses. College Year Heilman and Armentrout (1936) reported no significant relation- ship between college year and ratings assigned teachers. In contrast, Villano (1978) and Downie (1952) found advanced level 32 courses tended to be rated higher than low and mid level courses. IV. Class Size It has often been suggested that instructors of large classes receive lower evaluations than teachers of small classes. This opinion is supported by Villano (1977) and Downie (1952). Downie (1952) found that instructional procedures were rated more harshly in large classes. Villano (1977) found small classes (thirty students or less) fared significantly better than medium and large sized classes with respect to student ratings. In contrast, Soloman (1966) and Heilman and Armen— trout (1936) did not find class size to have any effect on student ratings. In the past few years, several of these extrinsic variables have been analyzed using multivariate designs. Studies using multiple regression have included from three to six extrinsic variables in the list of depen- dent variables (Danielsen 8 White, 1976; Rose, 1975; Wood 8 DeLorme, 1976). Rose (1975) found that class level did contribute significantly to the multiple R,in stepwise multiple regression. While Danielsen and White (1976) felt the extrinsic variables make little systematic contribution to the overall rating of the instructor. In any event, the results lend support to the premise that it may be deceiving to postulate a simple relationship between teaching effectiveness and extrinsic or intrinsic variables. Pohlman (1975) used canonical correlation in a manner similar to the use of multiple regression in the above three studies. A relationship 33 'was found to exist between the number of outside-of—class study hours required of students and the student ratings. The percentage of students taking the course as an elective was also found to be positively related to student ratings. Student-Teacher Pairings To confuse the validity issue even further, it has been suggested that an aptitude treatment interaction may exist. Cunningham (1975) found that certain types of teachers interact better with certain types of students. Entertainment A frequent argument against the validity of student ratings is that students may judge an instruction on the basis of how entertaining the instructor was. Here again, research has come up with differing conclu- sions with respect to this issue (Costin, Greenough 8 Manges, 1971). Recent research by Williams 8 Ware (1977) has used an actor to deliver to equivalent groups of students two lectures. The lectures vary in sub- stantive teaching points covered and the expressiveness of the presenta- tion. There were three levels of substantive teaching points; high, medium, and law. There were two levels of expressiveness of presentation, high and low. The results of this study suggest that student ratings of highly expressive instructors are always higher than those of the low expressive instructor, regardless of content coverage. Meier and Feld- husen (1978) replicated part of the above Williams and Ware research using two levels of expressiveness of presentation (high and low), and two levels 34 of substantive teaching points (high and medium). It was found that in- structor expressiveness had a major impact on student ratings of global satisfaction. Student ratings on the global satisfaction scale were much higher for the high expressive instructor than the low expressive instructor. The manipulation of lecture content (high vs. medium) did not significantly influence most of the items on the student rating scales. Classroom Seating Patterns 0 Classroom seating patterns are a relatively underexplored topic of student behavior. Gur, Gur and Marshalek (1975) found seating preference of college students apparently related to cerebral dominance (left or right hemisphere) and to handedness. They also noted students would change from one side of the room to the other depending on the difficulty of the subject. Beyond this type of study, there has been little but conjecture as to how seating patterns relate to student ratings. However, Owen (1978) recently has investigated seating patterns with regard to student evaluations. He found that seating patterns had a slight influence on student ratings of instructor effectiveness. It should be mentioned that Owen's feared the results were idiosyncratic to the compo- sition of the particular class of students under study. Use of Discriminant Analysis In an attempt to consider as many aspects of validity as possible, it is fitting to include a study utilizing discriminant analysis. Marsh (1977) has used data from graduating seniors who nominated instructors as "most 35 outstanding" or "least outstanding" in conjunction with student evaluations from the following year. The validity of student evaluations of instruction was upheld in this study by statistical significance between "most out- standing" and "least outstanding" being reflected in both data sources. COMPARATIVE DATA One area, that has to do with both reliability and validity of evalur ation forms, that researchers have not addressed to a great degree is the differences between departments, colleges and classes within a university. More than likely, the lack of published research on the topic of differences between colleges is somewhat politically motivated. After all, who is going to state that the professors from one college are inferior to those of another college in terms of teaching skills? Between Colleges Within A University Centra and Linn (1976) in an attempt to investigate student points of view in their ratings of specific courses and instructors utilized a sample consisting of a natural science, a social science and a humanities class. A discriminant analysis was run on the three classes separately. Each item on the student evaluation instrument was correlated with the four discrimi- nant functions for the three separate classes. The correlations varied across the three classes. These varying correlations give evidence to the premise of students' perceptions varying across these three classes which were from different colleges. 36 An alternative technique of comparing colleges would be to run indivi- dual factor analysis on colleges within a university and compare what items comprised predominant factors and the percent of variance accounted for by different factors. A.study using the student as the unit of analysis, com- pared the factors of three colleges; business, engineering, and education (Wilson 8 Wilson, 1977). The results obtained showed a consistency between colleges in the factors. However, the percent of variance accounted for by each of these factors altered considerably between colleges. In the past, the individual student response to each item.has been used as the unit of analysis in the factor analysis of student ratings of instruction. However, recent research has questioned this as the correct unit of analysis (Doyle 8 Whitely, 1974; Linn, Centra 8 Tucker, 1975; Whitely 8 Doyle, 1976). It was noted that there are three possibilities contending for the unit of analysis; individual student response for each item, the classroom mean on each item, and the within classroom deviations from the classroom mean on each item. It was because of the question of the appropriate unit of analysis that further research by Wilson and Wilson (1978) was attempted to run separate factor analyses on student rating forms from the colleges of business, engineering, and education. Each factor analysis was run separately using the differing units of analysis: 1) the individual student response for each item, 2) the classroom mean for each item, 3) the within classroom deviation from.the classroom mean on each item. The results were similar to those found earlier by Wilson and Wilson (1977). The factors, no matter what unit of analyses used, were consistent across 37 colleges but the percent of variance attributed to each factor varied. Between Classes Within A University This issue has not been attacked directly as yet. There are many validity studies that were listed under validity in this chapter that have looked at extrinsic variables and their effect on student evaluations of instruction (Costin, Greenough 8 Manges, 1971; Danielsen 8 White, 1976; Pohlmann, 1975; Rose, 1975; Wood 8 DeLorme, 1976). However, these are an attempt to look at individual differences rather than class differences. The author has been unable to locate any studies that compare instrur ments designed specifically for particular types of classes (i.e., large classes vs. small classes) with generally available instruments. DIFFERENCES IN ITEM TYPES The items appearing on rating forms were once characterized as broad, requiring much inference on the part of the observer and reader. Ratings on teacher warmth, overall effectiveness, clarity, or enthusiasm require high inference. However, with the advent of item banking systems in student ratings, items are no longer only high inference in nature. Items " "teacher like "the teacher uses students' ideas," "teacher criticizes, listens carefully to student" do not require a large inference on the part of the observer or rater. Categgrizingsltems for Student Rating Forms There have been two attempts to categorize different item types that would be useful in evaluating instruction. Below is a description of these 38 differing systems: 1. As early as 1970, Rosenshine (1970, 1973) discussed high— inference and low-inference measures as a system that could be useful in evaluating instruction. He broke item types into these two categories following a convention previously used by Gage (1969). The convention uses "inference" to refer to the process intervening between the objective data seen or heard and the judgment concerning a higher order construct of cogni- tive or social interaction. High inference measures are those which require considerable inferring from.what is seen or heard in the classroom to the labelling of the behavior, such as ratings of the teacher on such scales as "partial-fair," and "dull- stimulating." Low-inference measures are those which require the observer to classify teaching behaviors according to rela- tively objective categories. Examples of these behaviors are very quantifiable, such as words per minute, movements per minute. Items somewhere between high-inference and low-inference would be labelled moderate-inference and refer to such items as ' and "teacher criticizes." "teacher listens carefully,' Ratings on high-inference variables generally correlate highly with global items on student rating forms probably be~ cause such measures allow a rater to consider more evidence before making a decision (Rosenshine, 1973). The results of low-inference measures would be easier to use in teacher training programs because variables can be translated into specific behaviors. 39 2. Smock and Crooks (1973) utilized a system similar to the one described above. However, Smock and Crooks (1973) felt that the evaluative data being collected can and should vary according to the intended function of the evaluation and the people doing the evaluating. They identified three major types, or levels, of evaluation. The first (Level I) is general, summative evaluation, which will be concise and allow broad, general comparisons to be made across departments, but will give little or no specific information to guide instructional improvement. The second (Level II) is evaluation aimed at identifying success or failure in general areas or attributes of instruction. The third (Level III) is detailed course specific evaluation aimed at providing diagnostic information about instructional problems. Smocks and Crook (1973) specify that it should not be expected that all evaluative information would be available to all audiences. For example, some types of data would be available only to the faculty members involved, and information sent to administrators may be in summary form. validation of Varyingiltem Types In the past few years, several attempts have been made to identify the components of high-inference ratings of instructors by correlating a high- inference rating of teacher effectiveness with ratings obtained on items reflecting more specific instructor attributes. Pohlmann (1975) attempted to correlate twenty low-inference items with a gldbal item of teacher effectiveness. Cushman and Frederick (1976) in a 40 similar fashion found positive correlations existing when they identified twenty-eight specific teaching behaviors and correlated them with seven general evaluation items. Olson (1978) used another similar approach to correlate specific behavior type items with general high-inference items. All of the above studies found many existing high correlations. It was, therefore, felt that it is possible to develop sets of behavior-specific items for student instructional rating forms that are more useful in a diagnostic sense. It should be noted that the behavior specific items used in the above studies represent Rosenshine's moderate-inference items. Recent research at the University of Illinois at Urbana-Champaign has attempted to validate an item classification system.(Bradenburg, Derry, Hengstler; 1978). A prOposed classification scheme was developed for the Instructor and Course Evaluation System (ICES), a cafeteria-like system of student ratings. This system involves classifying items by content and specificity. Therefore, items vary from very general in nature to very specific in nature. Both general and specific items are available under each content area. The results of the above study confirmed the conclusions made by the previous studies. Correlations do exist between general and specific items on student rating forms. FORMAT Many formats have been attempted in developing student rating forms. A previous dissertation in this area has an extensive review of the litera- ture (Showers, 1973). Five conclusions obtained from the Showers' litera- ture review follow: 41 l. The optimal number of options for each question is five to seven. 2. The presence of a neutral point increases the ambiguity of the scale. 3. Reduction in leniency bias due to reversing the direction of the scale within a questionnaire may increase the errors in rating. 4. Numeric, sentence, or paragraph cue lengths may reduce leniency bias, if the cues are not too long, but cue length has no apparent effect on the rater reliability of untrained groups of raters. 5. Leniency bias may be reduced by the presence of more favorable than unfavorable response options. Her dissertation compared three different response cue formats in an attempt to discover which one was least susceptible to response bias. The three response cue formats considered were the Likert, evaluative and descriptive. The results showed that the evaluative format items in instructional rating scales were less prone to leniency bias and had rater reliabilities comparable to Likert and descriptive formats, making them the best choice of the three formats on an existing rating scale. SUMMARY The issues of reliability and validity of student rating forms have been examined from many angles. The reliability of such instruments has been shown to be both stable over time and internally consistent. However, the validity issue is a hodgepodge of conflicting information. The corre- lational studies are an excellent example of this phenomenon. In an attempt to tease out the consistent results from the inconsistent results, three separate lists follow. 42 Inconsistent Results 1. Even though many studies found a positive correlation between supervisors' or colleagues' ratings and student ratings, there is still some research to suggest no relationship exists. 2. A.weak positive correlation between student ratings and student achievement was found in some studies, while in other studies no correlation existed. 3. With reference to whether the course was required or not brought inconsistent results. At times, no relationship exists between this variable and student ratings. In some studies, courses are rated higher if they are electives, in other studies this relationship is reversed. 4. Some research leads one to believe advanced classes get higher ratings, while other research studies do not find this relationship to exist. 5. Class size has sometimes been found to effect student ratings and other times not. 6. The research on the sex of the student and student ratings has been highly inconsistent in nature. Consistent Results 1. Research appears to uphold the point of view that instructors that hold the doctorate degree fare better on student ratings than those instructors who have not earned the doctorate. 43 Inconclusive Results The following results, although not inconsistent, had not been researched enough to be considered consistent in nature. 1. The positive relationship between student ratings and instructor self ratings. 2. The relationship between student ratings and gains in student knowledge. 3. The relationship between student needs and teacher orientation. 4. The entertainment issue. 5. The relationship between seating patterns and student evaluations of instruction. Again, little research has been done with regard to the differences between departments, colleges and classes within a university. The avail— able literature supported the premise that students' perceptions do vary across colleges. However, no literature has been located comparing different classes within one department. Research in the area of varying item types strongly suggests that specific items (moderate-inference or Level II) correlate highly with general items found on the traditional high-inference type rating form. Because much inference is necessary on the part of the observer and the reader for high inference items, items of moderate inference could be more beneficial to the instructor in the diagnostic sense. The results and literature review of a previous dissertation (Showers, 1973) delineated what response cue formats are appropriate to minimize leniency bias. CHAPTER III PROCEDURES AND DESIGN INTRODUCTION The purpose of the present study is to compare two types of student evaluation of instruction instruments. One of the forms is the standard Level II, Form B student rating form given to approximately 15,000 students per quarter at Michigan State University. The Level II is a general form developed on a general population. The comparison instrument will be one developed for a specific department. The comparison instrument will con- tain five to ten core items that will be used on every instructor's student evaluation form, and 10-20 items specific to individual class situations. For example, questions directed at large classroom instruction, demographics, and instrument satisfaction. The purpose of the instrument satisfaction question is to obtain an index comparing students' perceptions of the two instruments. In order to make comparisons between the instruments, it is necessary to compute: (1) the average item variance for each instrument within a particular class, and (2) the average item variance between classes. The reason for these computations is based on the hypotheses that a good rating 44 45 instrument will have a small variability within one class on a particular item, e.g., students perceive the instructor's performance on a particular item similarly. There should be variability between classes on a particu- lar item, e.g., students can differentiate among instructors' performance on a particular item. The better evaluation instrument would have a smaller average item variance within classes. The better instrument would have larger item variability between classes. It will also be desirable to make direct comparisons between the core and class specific items in the proposed instrument. To make this compari- son, more variance calculations are needed. First, the average item variance of both the core items and class specific items for each indivi- dual class, and second, the average item variance of both the core items and the class specific items among classes. It is expected that the average item variances will differ with re- spect to item type on the proposed instrument. The class specific items will have a smaller average item variability than the core items within a specific class. However, the class specific items will have a larger average item variability than the core items between classes. SAMPLE The sample is taken from the Marketing and Transportation Department at Michigan State University. Generally, the School of Business appears to exhibit some skepticism toward student evaluation forms. This skepti- cism is operationalized by their haphazard use of student rating forms to 46 meet minimal university requirements. However, the Business School is also aware of the need for feedback on course improvement as well as information for tenure and promotion decisions. Therefore, it is a prime opportunity to approach them for some individual attention with regard to evaluating instruction. Four separate undergraduate classes from the Marketing and Transporta- tion Department are included in the study. A description of the four classes are: (1) Personal Selling (MTA 311): Two sections of this course were included in the study. Each section was taught by a different instructor utilizing varying techniques of instruction. Section 1 consisted of 96 Juniors and Seniors. The teaching format was mainly lecture with four days at the end of the term set aside for group project presentations. Section 2 consisted of 80 Juniors and Seniors. The course format included approximately 60% lecture and 40% conference leadership technique. The con- ference leadership technique involved the use of student groups having total reaponsibility for certain class presentations. (2) Sales Management (MTA 313): One section of this course was included in the study. This course is a case oriented class with only 39 Juniors and Seniors. The format of the sales management class consisted of both lecture and discussion. The main focus of this type of course is concerned with applying particular case solutions to more general real world situations. (3) anntitative Business Research Methods (MTA 317): This is a large quantitative lecture class consisting of 405 Juniors and 47 Seniors. This class meets twice a week in main lecture and twice a week in smaller recitation sections. These recitation sections spend time reviewing homework and clarifying points from the main lecture. (4) Transportation Planning and Policies (MTA 341): Two sections of this class are included in the present study. Both classes consist of Juniors and Seniors and utilize the same basic format. The format is basically one of lecture with some discussion. Section 1 contains 57 students while section 2 contains 28 stu- dents. Even though it was necessary for the instructors to volunteer their services, it should be noted that the selection was made so that the classes were diverse. A perusal of the course names and format show that the courses vary on several dimensions. These dimensions include class size, lecture versus discussion, qualitative versus quantitative, and inclusion or exclu- sion of recitation sections. This diversity was necessary in developing specific class items. .It should also be noted that the varying teaching techniques used in MTA 311 will necessitate separate forms for each section. Therefore, there will be five specific class instruments. INSTRUMENT DEVELOPMENT In order to develop five Specific instruments containing both common core and class specific items, it is necessary to develop a strategy for selecting items. The present literature deals with the development of general evaluation instruments. Therefore, the available developmental 48 strategies had to be re—evaluated and modified for selection of class specific items. A technique explained by Wotruba and Wright (1975) cone sists of four steps and is indicative of the literature. These steps are as follows: (1) Development of an item pool, (2) Screening of the item pool, (3) Surveying the item's importance for inclusion, (4) Choosing the items for the final instrument. These four basic steps will be used in the pre- sent study. STEP I - DEVELOPMENT OF AN ITEM POOL Initial Interview An initial interview was devised to ascertain the types of items that the instructor and students of a particular class felt should be included in an instrument specifically designed for that particular class. Each in- structor and six to ten student volunteers were interviewed from each class. The respondents were given copies of the items in use on the present Level II form. These items are reproduced in Table 3.1. The items were clustered by previously defined factors (Office of Evaluation Services, 1973). The interviewer was directed to respond to each general factor. The respondent was asked two questions in reference to each factor: Question 1: "Is this factor relevant in evaluating instruction in this specific class?" Question 2: "If an answer of yes is given to question 1, is this factor relevant in evaluating the instruction of all courses taken at Michigan State University?" After responding to both questions for each factor, the student and in? structor volunteers were asked to list any other items that would be helpful in evaluating this class in particular. II. III. IV. 49 TABLE 3.1 SAMPLE ITEMS Instructor Involvement (l) The instructor was enthusiastic when presenting course material. (2) The instructor seemed to be interested in teaching. (3) The instructor's use of examples or personal experiences helped to get points across in class. (4) The instructor seemed to be concerned with whether the students learned the material. Student Interest (1) You were interested in learning the course material. (2) You were generally attentive in class. (3) You felt that this course challenged you intellectually. (4) You have become more competent in this area due to this course. Student Instructor Interaction (l) The instructor encouraged students to express opinions. (2) The instructor appeared receptive to new ideas and others' viewpoints. (3) The student had an opportunity to ask questions. ~ (4) The instructor generally stimulated class discussion. Course Demands (1) The instructor attempted to cover too much material. “ (2) The instructor generally presented the material too rapidly. x (3) The homework assignments were too time consuming relative to ‘ their contribution to your understanding of the course material. (4) You generally found the coverage of topics in the assigned readings too difficult. A Course Organization (1) The instructor appeared to relate the course concepts in a systematic manner. (2) The course was well organized. (3) The instructor's class presentations made for easy note taking. (4) The direction of the course was adequately outlined. > X! 50 STEP II - SCREENING THE ITEM POOL From the initial interview, two outcomes unfolded. First, in terms of a general instrument it was found that factor III, Student Instructor Inter- action, was felt to be irrelevant by more than 25% of the sample. While the other four factors were considered appropriate for all classes by at least 90% of the sample. Secondly, from the class specific items the students had been asked to list, the makings of a class specific item.pool started to materialize. PRETEST - ITEM POOL SELECTION The final pretest instrument was administered to all the students pre- sent in MTA 313, MTA 341 and Section 2 of MTA 311. On the day of admini- stration only two recitation sections of MTA 317 were included and part of Section 2, MTA 311. The instrument was divided into two parts. The first part was a general part that was administered to every class (refer to Table 3.2). The general item list included all the items from Factor 1, 2, 4 and 5. This list was considered to be conclusive due to the previous re- search work in developing the Level II instrument (Office of Evaluation Services). Two questions were posed for each item: Question 1. If you were to construct a general student course appraisal sheet, would you include this item? (1) Definitely (2) Probably (3) Uncertain (4) Probably not (5) Definitely not 51 TABLE 3.2 FINAL PRETEST ITEMS FOR THE GENERAL INSTRUMENT (l) The instructor was enthusiastic when presenting material. (2) The instructor seemed to be interested in teaching. (3) The instructor's use of examples or personal experiences helped to get points across in class. (4) The instructor seemed to be concerned with whether the students learned the material. (5) You were interested in learning the course material. (6) You were generally attentive in class. (7) You felt that this course challenged you intellectually. (8) You have become more competent in this area due to this course. (9) The instructor attempted to cover too much material. (10) The instructor generally presented the material too rapidly. (11) The homework assignments were too time consuming relative to their contribution to your understanding of the course material. (12) You generally found the coverage of topics in the assigned reading too difficult. (13) The instructor appeared to relate the course concepts in a systematic manner. (14) The course was well organized. (15) The instructor's class presentation made for easy note taking. (16) The direction of the course was adequately outlined. 52 Question 2. Evaluate this course using this item: (1) Superior: exceptionally good instructor or course. (2) Above Aversge: better than the typical instructor or course. (3) Average: typical instructor or course. (4) Below Aversge: ‘worse than the typical instructor or course. (5) Inferior: exceptionally poor instructor or course. Question 1 was used to estimate how important the respondents felt the itemuwas, while Question 2 was developed to ascertain the within class variability in rating each item. The second part was a class specific part consisting of 25 to 30 items designed for individual classes. There were separate class specific parts for MTA 317, MTA 313, MTA.341 and each section of MTA 311. The items included for use on these class specific forms were chosen from a pool of items. This pool contained six separate categories: Grading and Exams, Instructional Assignments and Material, Student Outcomes, Recitation Sections, Instructional Environment, Instructor Characteristics and Style. These items were selected from several sources in conjunction with the initial interviews. These sources included: "SIRS Technical Bulletin" (Office of Evaluation Services, 1969), "Behavior Specific Items for Student Evaluation of Instruc- tion" (Olson, 1978) and "ICES Item Catalog" (university of Illinois, 1977). The class specific items for each class are presented in Table 3.3, 3.4, 3.5, 3.6 and 3.7. The students were asked to respond to each of the following four ques- tions for each class specific item. The questions follow: Question 1: If you were to construct a student course appraisal sheet for this course, would you include this item? (1) Definitely (2) Probably (3) Uncertain (4) Probably not (5) Definitely not 53 TABLE 3.3 FINAL CLASS SPECIFIC PRETEST ITEMS MTA 311 SECTION 1 THE INSTRUCTOR'S: (1) (2) (3) (4) (5) (6) (7) (8) (9) Use of humor Use of overhead Use of handouts General appearance Willingness to spend extra time with you Availability during office hours Relationship of course material to everyday life Maintenance of an informal classroom Integration of reading material THE COURSE'S CONTRIBUTION TO YOUR: (10) (11) (12) (13) (14) Understanding of the day to day workings of a field representative Obtainment of a general knowledge in the field Understanding of concepts and principles in the field Ability to communicate clearly on the subject Ability to solve real problems in the field FOR THIS COURSE: (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) Classroom atmosphere Interaction of your project group Appropriateness of text Appropriateness of instructional materials Use of time for assignment completion Appropriateness of emphasis placed on group project Beneficialness of homework assignments Appropriateness of exam format Appropriateness of case study Appropriateness of amount of time given to group projects Group projects were applicable to real life situations Organization of lecture material The group members shared the work equally 54 TABLE 3.4 FINAL CLASS SPECIFIC PRETEST ITEMS MTA 311 SECTION 2 THE INSTRUCTOR'S: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) Encouragement to students to express opinions Receptiveness to new ideas and others' viewpoints Stimulation of class discussion Use of humor Use of overhead or chalkboard Use of handouts General appearance Willingness to spend extra time with you Relationship of course material to everyday life Maintenance of an informal classroom Integration of reading material THE COURSE'S CONTRIBUTION TO YOUR: (12) (13) (14) (15) (16) Understanding of the day to day workings of a field representative Obtainment of general knowledge in the field Understanding of concepts and principles in the field Ability to communicate clearly on the subject Ability to solve real problems in the field FOR THIS COURSE: (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) Appropriateness of the text Classroom atmosphere Appropriateness of instructional materials Beneficialness of homework assignments Appropriateness of exam format Appropriateness of case study Organization of lecture material ApprOpriateness of conference leadership technique Interaction among group members in conference leadership group Beneficialness of group discussion Appropriateness of amount of time spent on conference leadership technique Appropriateness of proportion of final grade accounted for by conference leadership technique Appropriateness of tests 55 TABLE 3.5 FINAL CLASS SPECIFIC PRETEST ITEMS MTA 313 THE INSTRUCTOR'S: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Encouragement of students to express opinions Receptiveness to new ideas and others' viewpoints General stimulation of class discussion Use of humor Clarification of the relationship between course material and everyday life Maintenance of an informal classroom Help in improving my problem solving ability Effectiveness in preparing students for exams Grading procedure Use of visual aids FOR THIS COURSE: (11) (12) (13) (14) (15) (16) The case text The inter-relationship of the two texts Appropriateness of cases Readings on reserve The appropriateness of group discussion for further understanding of course concepts No particular group of students monopolized discussion THE COURSE'S CONTRIBUTION TO YOUR: (17) (18) (19) (20) (21) (22) (23) (24) General knowledge in the field Understanding of concepts and principles in the field Ability to apply principles to new situations Ability to communicate clearly on this subject Ability to solve real problems in the field Ability to communicate in class Ability to organize ideas Ability to apply particular case ideas to general situations 56 TABLE 3.6 FINAL CLASS SPECIFIC PRETEST ITEMS MTA 317 THE INSTRUCTOR'S: (1) (2) (3) (4) (5) (6) (7) Use of humor Use of overhead Willingness to spend extra time with you Availability during office hours Clarification of the relationship between the course material and the real world Maintenance of an informal classroom 'Maintenance of a formal classroom THIS COURSE CONTRIBUTED TO: (8) (9) (10) (11) (12) (13) (14) Improving my problem solving abilities An understanding of concepts and principles in the field My ability to communicate clearly on the subject My ability to solve real problems in the field Increasing my interest in the subject matter Preparing me for the material covered on the tests Developing a more favorable attitude toward the subject matter FOR.THIS COURSE: (15) (16) (17) (18) (19) (20) (21) FOR THE (22) (23) (24) (25) (26) (27) The atmosphere was conducive to learning The required text Readings on reserve Beneficialness of written homework assignments Beneficialness of supplementary texts Appropriateness of testing format Beneficialness of homework answers and calculations on reserve RECITATION SECTION: Clarification of course material Appropriateness of 202 of grade allotted to recitation Usefulness of quizzes for exam preparation Ability of recitation instructor to answer questions Maintenance of an informal classroom Adequacy in covering written homework assignments 57 TABLE 3.7 FINAL CLASS SPECIFIC PRETEST ITEMS MTA 341 THE INSTRUCTOR'S: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (ll) (12) (13) Use of humor Use of blackboard Use of overhead Use of handouts Relation of course material to everyday experiences Maintenance of an informal classroom Encouragement of students to express opinions Receptiveness of new ideas and others' viewpoints General stimulation of class discussion Use of real life examples Emphasis of important points Willingness to spend extra time with you Availability during office hours THE COURSE'S CONTRIBUTION TO YOUR: (14) (15) (16) (17) Obtainment of general knowledge in the field Understanding of concepts and principles in the field Ability to communicate clearly on the subject Future goals FOR THIS COURSE: (18) (19) (20) (21) (22) (23) (24) (25) Appropriateness of two texts Appropriateness of material on reserve Integration of audiovisual and course material Classroom atmosphere Appropriateness of exam format Appropriateness of material covered on the exam Organization of lecture material Beneficialness of homework assignments 58 Question 2: Would you want to qualify your response to this item? (1) Definitely (2) Probably (3) Uncertain (4) Probably not (5) Definitely not Question 3 Do you believe you have enough information to evaluate those aspects of the course referred to by this item? (1) Definitely (2) Probably (3) Uncertain (4) Probably not (5) Definitely not Question 4 Evaluate this course using this item. (1) Superior: exceptionally good instructor or course. (2) Above Average: better than the typical instructor or course. (3) Average: typical instructor or course. (4) Below Average: worse than the typical instructor or course. (5) Inferior: exceptionally poor instructor or course. Question 1 was asked to obtain information on the perceived importance of each item. Questions 2 and 3 were used to determine if it was necessary to reword the item before it could be used on a final evaluation instrument. Question 4 was used to ascertain the within class variability of each item. This variability would be used in item selection only if question 1 retained too many items for the final instrument. STEP III - SURVEY OF ITEMS' IMPORTANCE To survey each item’s importance, the data was coded and transferred onto IBM cards for further computer analysis. Analyses were made on all of the data combined for the general items. The sample was then decomposed into five separate groups for analysis of the class specific items. A general frequencies program was run utilizing SPSS (Nie, Hull, Jenkins, 59 Steinbrenner, Bent; 1975). Frequencies, means and standard deviations were calculated for each item. General Instrument Means and standard deviations were calculated for both questions on item inclusion and course evaluation for each of the sixteen items in the general form. Refer to Table 3.8 for this information. In order to get a better picture of the pattern of responses, frequencies were also con- sidered for question 1, item inclusion (refer to Table 3.9). When refer- ring to the means for question 1 (whether the item.should be included), a low value represents a high desirability while a high value represents a low desirability. It was decided to include four to eight of these items in the final instrument. Class Specific Instrument A frequency distribution was tabulated for the responses to question 1 (item inclusion) for each class specific form. These distributions are presented in Table 3.10, 3.11, 3.12, 3.13, 3.14 and 3.15. The responses to question 2, 3 and 4 would only be used in an auxillary manner to help improve item wording and make finer discriminations, if necessary. STEP IV - CHOOSING THE ITEMS FOR THE FINAL INSTRUMENT General Instrument It can be seen by referring to Table 3.1 that items 1 through 4, 5 through 8, 9 through 12 and 13 through 16 on the final pretest represent four separate factors. A11 sixteen of these general items appear desirable, 60 TABLE 3.8 GENERAL INSTRUMENT Pre-test Statistics n-230 Question 2 Course Evaluation Question 1 Inclusion of Item Standard Standard Items Mean Deviation Mean Deviation 1 1.626 .723 1.987 .733 2 1.483 .698 1.996 .723 3 1.613 .843 2.026 1.013 4 1.454 .716 2.071 .836 5 1.913 .998 2.415 1.042 6 2.113 1.009 2.351 .980 7 1.991 1.000 2.553 1.015 8 1.657 .920 2.371 1.038 9 1.774 1.041 2.991 .776 10 1.792 1.035 2.950 .858 11 1.939 1.188 2.911 .842 12 2.178 1.207 2.937 .908 13 1.557 .744 2.247 .946 14 1.407 .648 2.259 .909 15 1.750 1.008 2.344 1.063 16 1.763 .913 2.213 .891 Item \OQNO‘UIJ-‘wNH 61 TABLE 3.9 GENERAL INSTRUMENT Frequency Distributions for Question 1 Item Inclusion 112 140 129 149 95 67 84 131 125 118 117 83 132 149 122 107 n-230 Response Option 99 75 74 61 84 104 90 65 59 64 54 76 73 66 66 86 3 12 9 15 15 33 29 32 17 23 21 23 32 20 8 19 20 Ln |'-" WkHo#mk-§bea‘mmoo Item \DQNQUI-FwNH 62 TABLE 3.1 0 CLASS SPECIFIC INSTRUMENT MTA 311 SECTION 1 Frequency Distribution for Question 1 Item Inclusion l-l \OHU'i-l-‘U'IwJ-‘O‘UI n-24 N 1...: woouuoooc‘oooooo 11 Response Option 3 NNU‘IMUINN$MbNNbNMMHUOONwaNGH y... NHNLANI—‘l—‘OWOONNUONHHHO‘NCOOMbO # UI COOOI—‘OMOI—‘OOOOOOOOOOOOOONNNN Item \OGNO‘UI#UJNH 63 TABLE 3.11 CLASS SPECIFIC INSTRUMENT MTA 311 SECTION 2 Frequency Distribution for Question 1 34 34 25 23 23 17 14 27 27 17 25 32 31 29 34 27 37 18 24 27 25 26 3O 3O 29 28 28 32 27 Item Inclusi n-54 on Response Option 10 14 18 15 13 15 14 15 16 17 12 16 19 14 11 13 13 16 16 11 16 10 12 15 15 14 13 F‘k‘ u: H mbbuumamwmooHNooowwmmHuuwomuo~wu H H k NHNOI—‘bOU‘NNI—‘Ml—‘OHHNNCO‘Q’NC-FO‘Ln-L‘Nw CDCDPIF‘FINDP‘CDP'P‘C>8~C>C>C>C>CDF‘C>h>h3h3\la‘h3h’c>c’c> U' 64 TABLE 3.1 2 CLASS SPECIFIC INSTRUMENT MTA 313 Frequency Distribution for Question 1 17 19 16 14 15 16 17 16 12 17 18 20 17 17 14 17 13 16 17 Item Inclusi n929 N F‘h‘ H \l‘OUINmml-‘O\UIO\U)\I\DOO\®UI\OO\\OU‘O\IQ on Response Option 3 H H MWHk@wHw@bwawbwa©NbNNP-' # NHOHHI—‘OOOHNHMONONNL‘NU‘HHH Ul OOOOOOOOUOHOOOOOOOHNHOOO Item h‘h‘ P|C>¢>a>\IO\UI&-u:hah‘ |'-' N h‘h‘h‘h' a~UIa~ua NNNNNI—‘l—‘H #UNHOOQN NNN NOV! 65 TABLE 3.1 3 CLASS SPECIFIC INSTRUMENT MTA 317 SECTION 3 Frequency Distribution for Question 1 13 26 28 24 16 18 19 15 26 10 21 19 12 20 15 24 27 28 23 25 31 17 19 Item Inclusi n=42 on Response Option 13 16 11 11 16 12 20 20 15 19 20 15 11 15 15 16 16 13 10 12 11 13 12 3 1 F‘F‘ l-‘ O‘C‘HNU)NMNm-fiNOGbOQGmwhHNbbk-L‘H b wwi-‘UUHNUkHOwaHbHOHNwa-bl-‘O‘m Ul ONOOCOCONHNOOOOHOOOO#Nooow-§ Item Ddbd Fic>¢>a>\IO\UI&-h:h>hi F'h‘hl £~u>ha P‘F‘P‘ \IO\U! NNNNNNNHH O‘U‘#WNI—‘O\Om N \l 66 TABLE 3.14 CLASS SPECIFIC INSTRUMENT MTA 317 SECTION 5 Frequency Distribution for Question 1 4 13 17 13 11 17 15 11 11 12 17 17 17 17 22 13 Item Inclusi n-33 on Response Option 9 10 17 14 14 11 14 11 15 12 14 18 19 15 16 14 13 10 13 11 10 13 3 F‘h‘ H NVONUIO-‘NI—‘MNNNwamNmmmP-‘HUNCACDU b H H UlWOnfibonNO‘WbH-l—‘mHNHl-‘NNNCDHOOGM UI OHCHOOHOOOHOOOOOOOCOOOOOOON Item F‘F‘ F‘C>¢>a>\lO\UI&~h)hDP‘ Flhih‘ a-oan: sap: o~u: F‘P‘P‘ xooou NNNNNN U'IkWNI-‘O 67 TABLE 3.15 CLASS SPECIFIC INSTRUMENT MTA 341 Frequency Distribution for Question 1 ItemAInclusion 10 10 15 23 24 22 19 26 28 21 25 19 19 22 11 20 21 15 11 20 23 22 15 n-44 14 17 20 16 18 15 16 18 14 10 19 17 18 20 15 12 11 11 15 14 16 16 19 19 Response Option 3 1 #wwbmmflml‘thN-§O\NwLfl-I>O‘HwkflO\I-' b h‘h‘ (».> bOl—‘NmJ-‘kNmUIF-‘NOl-‘I-‘wa-‘NHJ-‘UQ UI l-‘OI-I'NWHI—‘HNOOOOOOONHONHN-fl-‘UH 68 with the mean desirability coefficient ranging from 1.407 to 2.178. The mean desirability coefficient can simply be defined as the average response to Question 1 for each individual item. Because of this high rate of desirability, it was decided to include two items from each factor. It was relatively easy to choose items 2, 4, 8, 9, 10, 13 and 14 because of the low means of these items in comparison to the other items in their factor. However, it was difficult to choose the second item from factor II. The means and standard deviations were very close for items 5, 6 and 7 in this particular factor. The distribution of responses was considered at this point. Item 5 was chosen because of the higher percentage of respondents choosing option 1 on question 1 (refer to Table 3.9). Class Specific Instrument It was decided to use a general consensus approach in choosing specific class items. Items were included in this section of the instrue ment if 752 of the sample chose option 1 or option 2 in response to Question 1. This can be interpreted as 752 of the sample would definitely or probably include this item on a student evaluation instrument for that particular class. Class Specific Instrument for MTA 311, Section 1 The percentages were collapsed for options 1 and 2 for each item from Table 3.10. Below is a listing of each item.nunber with the percent asso- ciated with the collapse of option 1 and 2: 69 Item Percent 1 46 2 50 3 61 4 38 5 92 6 87 7 84 8 33 9 71 10 75 11 92 12 79 13 79 14 80 15 54 16 62 17 92 18 83 19 61 20 61 21 74 22 87 23 66 24 75 25 75 26 88 27 83 Items numbered 5-7, 10-14, 17, 18, 21, 22, 24-27 were included in the final class specific instrument. Even though item 21 did not meet the percentage requirement, it was felt it should be included. No other items had a per- centage high enough to be worthy of inclusion. The results of Question 2 did not give any reason to reword the items chosen from the pre-test. The items were therefore worded in the final evaluation instrument basically the same as the pre-test instrument. The final class specific instrument for MTA 311, Section 1 can be viewed in Figure 3.1. Class Specific Instrument for MTA 311, Section 2 Again, the percentages were collapsed for options 1 and 2 for each item.from.Tab1e 3.11. Below is a listing: 70 FIGURE 3.1 FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT MTA 311 SECTION 1 71 STUDENT INSTRUCTIONAL RATING FORM MTA 311 SBC’I‘Iw l 1. Superior: exceptionally good instructor or c0urse. 2. Above Average: better than the typical instructor or course. For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course. 4. Below Average: worse than the typical instructor or course. lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course. to the closest description of your instructor or your course. If you items. KEY l. The instructor seemed to be interested in teaching. 1. 1 2 3 4 5 2. The instructor seemed to be concerned with whether the students learned the material. ----------- 2. 1 2 3 4 5 3. Your interest in learning the course material. 3. l 2 3 4 5 4. Your competence in this area due to this course. 4. l 2 3 4 5 5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5 6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5 7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5 8. The course organization. 8. 1 2 3 4 5 9. The instructor's willingness to spend extra time with you. 9. l 2 3 4 5 10. The instructor's availability during office hours. 10. l 2 3 4 5 11. The instructor's relationship of caurse material to everyday life. 11. l 2 3 4 5 THE COURSE'S C(NTRIBUTIM TO YOUR: 12. Understanding of the day to day workings of a field representative. 12. l 2 3 4 5 l3. Obtainment of a general knowledge in the field. 13. l 2 3 4 5 14. Understanding of concepts and principles in the field. 14. l 2 3 4 5 15. Ability to communicate clearly on the subject. 15. l 2 3 4 5 l6. Ability to solve real problems in the field. 16. l 2 3 4 5 FOR THIS COURSE: l7. Appropriateness of the text. 17. l 2 3 4 S 18. Appropriateness of instructional materials. 18. 1 2 3 4 5 l9. Beneficialness of homework assignments. 19. 1 2 3 4 5 20. Appropriateness of the exam format. 20. 1 2 3 A 5 21. Appropriateness of the amount of time given to group projects. 21. 1 2 3 4 5 22. Organization of lecture material. 22. 1 2 3 A 5 23. Appropriateness of material covered on the exams. 23. l 2 3 4 5 24. Applicability of group projects to real life situations. 24. l 2 3 4 5 25. Sharing of group work by group members. 25. 1 2 3 4 5 STUDENT BACKGROUND: Select the most appropriate alternative. 26. Has this course required in your degree program? 26. Yes No 27. What is your sex? 27. M F 28. What is your overall GPA? (1) 1.9 or less, (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3, (5) 3.4-4.0—--- 28. l 2 3 4 5 29. what is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior 29. l 2 3 4 5 30. In comparing this rating form with other rating forms you have responded to at MSU, do you find it: (1) Superior. (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 30. l 2 3 4 5 72 Item Percentage 1 82 2 90 3 81 4 71 5 69 6 62 7 42 8 79 9 81 10 64 11 82 12 85 13 91 14 93 15 83 16 81 17 95 18 61 19 82 20 84 21 80 22 73 23 91 24 79 25 82 26 84 27 96 28 91 29 - 95 The 75 percent rule was strictly adhered to and items numbered 1-3, 8, 9, 11-17, 19-21, 23-29 were included in the final evaluation instru- ment. The final evaluation instrument for MTA 311, Section 2 may be referred to in Figure 3.2. Class Specific Instrument for MTA 313 The percentages from Table 3.12, collapsed for options 1 and 2, for each item follow: 73 FIGURE 3.2 FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT MTA 311 SECTION 2 74 STUDENT INSTRUCTIONAL RATING FORM MTA 311 SECTION 2 1. Superior: exceptionally good instructor or course. 2. Above Average: better than the typical instructor or course. For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course. 4. Below Average: worse than the typical instructor or course. lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course. to the closest description of your instructor or your course. If you items. KEY 1. The instructor seemed to be interested in teaching. I. 1 2 3 4 5 2. The instructor seemed to be concerned with whether the students learned the material. ---------- 2. 1 2 3 4 5 3. Your interest in learning the course material. 3. 1 2 3 4 5 4. YOur competence in this area due to this course. 4. l 2 3 4 5 5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5 6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5 7. The manner used by the instructor in relating course concepts. 7. 1 2 3 4 5 8. The course organization. 8. l 2 3 4 5 THE INSTRUCTOR'S: 9. Encouragement to students to express opinions. 9. l 2 3 4 5 10. Receptiveness to new ideas and other's viewpoints. 10. 1 2 3 4 5 ll. Stimulation of class discussion. 11. 1 2 3 4 5 12. Willingness to spend extra time with you. 12. l 2 3 4 5 13. Relationship of course material to everyday life. 13. 1 2 3 4 5 14. Integration of reading material. 14. 1 2 3 4 5 THE COURSE'S CNTRIBUTION TO YOUR: 15. Understanding of the day to day workings of a field representative. 15. l 2 3 4 5 l6. Obtainment of general knowledge in the field. 16. l 2 3 4 5 17. Understanding of concepts and principles in the field. 17. l 2 3 4 5 18. Ability to communicate clearly on the subject. 18. 1 2 3 4 5 l9. Ability to solve real problems in the field. 19. l 2 3 4 5 FOR THIS COURSE: 20. Appropriateness of the text. 20. l 2 3 4 5 21. Appropriateness of instructional materials. 21. l 2 3 4 5 22. Beneficialness of homework assignments. 22. 1 2 3 b 5 23. Appropriateness of exam format. 23. l 2 3 4 5 24. Organization of lecture material. 24. l 2 3 4 5 25. Appropriateness of conference leadership technique. 25. l 2 3 4 5 26. Interaction among group members in conference leadership group. 26. l 2 3 4 5 27. Beneficialness of group discussion. 27. l 2 3 4 5 28. Appropriateness of amount of time spent on conference leadership technique. 28. 1 2 3 4 S 29. Appropriateness of proportion of final grade accounted for by conference leadership technique.----- 29. l 2 3 4 S 30. Appropriateness of material covered on exams. 30. 1 2 3 4 5 STUDENT BACKGROUND: Select the most appropriate alternative. 31. Has this course required in your degree program? 31. Yes No 32. What is your sex? 32. M 33. What is your overall GPA? (l) 1.9 or less. (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3. (5) 3.4-4.0--- 33. l 2 3 4 34. Hhat is your class level? (1) Freshman. (2) Sophomore, (3) Junior, (4) Senior* 34. l 2 35. In comparing this rating form with other rating forms you have responded to at MSU, do you find it: (1) Superior, (2) Above Average, (3) Average. (4) Below Average, (5) Inferior 35. l 2 3 4 5 75 Item. Percentage (n=44) 1 93 2 90 3 90 4 66 5 79 6 52 7 83 8 72 9 87 10 45 11 9O 12 72 13 83 14 43 15 83 16 34 17 90 18 97 19 87 20 76 21 83 22 62 23 86 24 83 Again, the 75 percent rule was adhered to and items numbered 1-3, 5, 7, 9, 11, 13, 15, 17-21, 23, 24 were included in the final evaluation instrument for MTA 313 (refer to Figure 3.3). Class Specific Instrument for MTA 317 The MTA 317 pre-test was given to two of the nine recitation sections. The responses to the pre-test can be referred to in Tables 3.13 and 3.14. The percentages from these two sections were collapsed separately for options 1 and 2. The results follow: 76 FIGURE 3.3 FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT MTA 313 77 STUDENT INSTRUCTIONAL RATING FORM MTA 313 . Superior: exceptionally good instructor or course. . Above Average: better than the typical instructor or caurse. For each item, respond by circling the number in the key that corresponds 3. Average: typical instructor or course. . Below Average: worse than the typical instructor or course. lack the necessary information to respond to any items. please omit the 5. Inferior: exceptionally poor instructor or course. NH &~ to the closest description of yOur instructor or your course. If you items. ‘EY 1. The instructor seemed to be interested in teaching. 1. l 2 3 4 5 2. The instructor seemed to be concerned with whether the students learned the material.-------------- 2. l 2 3 4 5 3. Your interest in learning the course material. 3. l 2 3 4 5 4. Your competence in this area due to this course. 4. l 2 3 4 5 5. The instructor's attempted coverage of the course material. 5. l 2 3 4 5 6. The speed used by the instructor in presenting course material. 6. l 2 3 4 S 7. The manner used by the instructor in relating course concepts. 7. 1 2 3 4 5 8. The course organisation. 8. 1 2 3 4 5 9. The instructor's encouragement of students to express opinions. 9. I 2 3 4 5 10. The instructor's receptiveness to new ideas and others' viewpoints. 10. 1 2 3 4 5 11. The instructor's general stimulation of class discussion. 11. 1 2 3 4 5 12. The instructor's clarification of the relationship between course material and everyday life.---- 12. l 2 3 4 5 13. The instructor's help in improving my problem solving ability. 13. 1 2 3 4 5 14. The instructor's grading procedure. 14. 1 2 3 4 5 15. The appropriateness of the case text. 15. l 2 3 4 5 16. The appropriateness of cases. 16. l 2 3 4 5 17. The appropriateness of group discussion for further understanding of caurse concepts.-----—-—-- 17. 1 2 3 4 5 111E COURSE'S CWTRIBUI'ICN TO YOUR: 18. General knowledge in the field. 18. 1 2 3 4 5 19. Understanding of concepts and principles in the field. 19. 1 2 3 4 5 20. Ability to apply principles to new situations. 20. l 2 3 4 5 21. Ability to communicate clearly on this subject. 21. l 2 3 4 5 22. Ability to solve real problems in the field. 22. 1 2 3 4 5 23. Ability to organise ideas. 23. 1 2 3 4 5 24. Ability to apply particular case ideas to general situations. 24. l 2 3 4 5 STUDENT BACKGROUND: Select the most appropriate alternative. 25. Has this course required in your degree prugx-u? 25. Yes No 26. What is your sex? ' 26. n r 27. What is your overall GPA? (1) 1.9 or less, (2) 2.0—2.2, (3) 2.3—2.7. (4) 2.8-3.3. (5) 3.4-4.0---- 27. l 2 3 4 5 28. what is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior— 28. 1 2 3 4 5 29. In comparing this rating form with other rating forms you have responded to at HSU, do you find it: (1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 29. l 2 3 4 5 78 Percentage (n-42) Percentage (n-33) Item. Section 3 Section 5 1 44 39 2 69 51 3 88 91 4 88 94 5 83 84 6 57 42 7 39 30 8 86 69 9 91 66 10 81 73 11 83 76 12 67 54 13 83 88 14 60 39 15 76 76 16 81 9O 17 63 64 18 86 84 19 54 60 20 88 91 21 83 82 22 92 91 23 85 79 24 88 82 25 96 100 26 74 40 27 78 78 Items were initially included on the final evaluation instrument if the 75 percent rule was met for both sections. This allowed the inclusion of items numbered 3-5, 11, 13, 15, 16, 18, 20-25, 27. However, a further perusal of the data indicated a high percentage of students in section 3 opted for inclusion of items 8-10. The percentage for each of these items was averaged with the percentages from section 5. The average percent met the criterion necessary for inclusion in the final instrument. Therefore, all three items were also included in the final evaluation instrument for MTA 317. Items 15 and 16 from the pretest instrument (Table 3.6) were re- worded to correct for ambiguity. The final evaluation instrument for MTA 317 is presented in Figure 3.4. 79 FIGURE 3.4 FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT MTA 317 80 STUDENT INSTRUCTIONAL RATING FORM MTA 317 . Superior: exceptionally good instructor or course. . Above Average: better than the typical instructor or course. For each item, respond by circling the number in the key that corresponds 3. Average: typical instructor or cOurse. 4. Below Average: worse than the typical instructor or course. lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course. NH to the closest description of your instructor or your course. If you items. KEY l. The instructor seemed to be interested in teaching. 1. l 2 3 4 5 2. The instructor seemed to be concerned with whether the students learned the material. ---------- 2. 1 2 3 4 5 3. Your interest in learning the course material. 3. 1 2 3 4 5 4. Your competence in this area due to this course. 4. 1 2 3 4 5 5. The instructor's attempted coverage of the course material. 5. 1 2 3 4 5 6. The speed used by the instructor in presenting course material. 6. 1 2 3 4 5 7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5 8. The course organization. 8. l 2 3 4 5 9. The instructor's willingness to spend extra time with you. 9. 1 2 3 4 5 10. The instructor's availability during office hours. 10. l 2 3 4 5 11. The instructor's clarification of the relationship between the course material and the real world.- 11. 1 2 3 4 5 TRIS COURSE CONTRIBUTED T0: 12. Improving my problem solving abilities. 12. 1 2 3 4 5 13. An understanding of concepts and principles in the field. 13. l 2 3 4 5 14. My ability to communicate clearly on the subject. 14. 1 2 3 4 5 15. My ability to solve real problems in the field. 15. l 2 3 4 5 l6. Preparing me for the material covered on the tests. 16. l 2 3 4 5 FOR TUIS COURSE: l7. Conduciveness of classroom atmosphere to learning. 17. 1 2 3 4 5 18. Appropriateness of the required text. 18. l 2 3 4 5 l9. Beneficialness of written homework assignments. 19. l 2 3 4 5 20. Beneficialness of supplementary texts. 20. 1 2 3 4 5 21. Appropriateness of testing format. 21. l 2 3 4 5 22. Beneficialness of homework answers and calculations on reserve. 22. l 2 3 4 5 FOR THE RECITATION SECTION: 23. Clarification of course material. 23. l 2 3 4 5 24. Appropriateness of per cent of grade allotted to recitation. 24. 1 2 3 4 5 25. Usefulness of quizzes for exam preparation. 25. l 2 3 4 5 26. Ability of recitation instructor to answer questions. 26. l 2 3 4 5 27. Adequacy in covering written homework assignments. 27. l 2 3 4 5 28. Please circle the number corresponding to your recitation section. 28. l 62 7 3 84 95 STUDENT BACKGROUND: Select the most appropriate alternative. 29. Has this course required in your degree program? 29. Yes No 30. what is your sex? 30. M F 31. what is your overall GPA? (l) 1.9 or less, (2) 2.0—2.2, (3) 2.3-2.7, (4) 2.8-3.3, (5) 3.4-4.0--- 31. l 2 3 4 5 32. "hot is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Senior* 32. l 2 3 4 5 33. In comparing this rating form with other rating forms you have responded to at MSU, do you find it: (1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferiorr 33. 1 2 3 4 5 81 Class §pecific Instrument for MTA 341 The percentages from.Table 3.15 collapsed for options 1 and 2 for each item follow: Item Percentage 1 41 2 47 3 51 4 77 5 87 6 58 7 86 8 85 9 82 10 89 11 84 12 89 13 96 14 88 15 89 16 84 17 52 18 74 19 73 20 68 21 57 22 82 23 88 24 93 25 79 Using the 75 percent criterion, items numbered 4, 5, 7-16, 22-25, were included in the final evaluation instrument. Item 18 was also in- cluded because the percentage associated with it was so close to the cut off point. The final evaluation instrument for MIA 341 is presented in Figure 3.5. Comparison Instrument The comparison instrument used in this study is the SIRS, Level II. A photographic reproduction is presented in Figure 3.6. 82 FIGURE 3.5 FINAL STUDENT INSTRUCTIONAL RATING INSTRUMENT MTA 341 83 STUDENT INSTRUCTICHAL RATING FORM MTA 341 . Superior: exceptionally good instructor or course. . Above Average: better than the typical instructor or course. For each item. respond by circling the number in the key that corresponds 3. Average: typical instructor or course. 4. Below Average: worse than the typical instructor or course. lack the necessary information to respond to any items, please omit the 5. Inferior: exceptionally poor instructor or course. NH to the closest description of your instructor or your course. If yOu items. [BY 1. The instructor seemed to be interested in teaching. 1. l 2 3 4 5 2. The instructor seemed to be concerned with whether the students learned the material.--—---—-—----- 2. l 2 3 4 5 3. Your interest in learning the course material. 3. 1 2 3 4 5 4. Your competence in this area due to this course. 4. 1 2 3 4 5 5. The instructor's attempted coverage of the course material. 5. 1 2 3 4 5 6. The speed used by the instructor in presenting course material. 6. l 2 3 4 5 7. The manner used by the instructor in relating course concepts. 7. l 2 3 4 5 8. The caurse organization. 8. 1 2 3 4 5 THE INSTRUCTOR'S: 9. Use of handouts. 9. l 2 3 4 5 l0. Relation of course material to everyday experiences. 10. 1 2 3 4 5 ll. Encouragement of students to express opinions. ll. 1 2 3 4 5 12. Receptiveness to new ideas and other's viewpoints. 12. 1 2 3 4 5 13. General stimulation of class discussion. 13. 1 2 3 4 5 14. Use of real life examples. l4. 1 2 3 4 5 15. Emphasis of important points. 15. 1 2 3 4 5 l6. Willingness to spend extra time with you. 16. l 2 3 4 5 17. Availability during office hours. 17. l 2 3 4 5 THE COURSE'S C(NTRIBUTICN TO YOUR: 18. Obtainment of general knowledge in the field. 18. l 2 3 4 5 19. Understanding of concepts and principles in the field. 19. l 2 3 4 5 20. Ability to communicate clearly on the subject. 20. l 2 3 4 5 FOR THIS COURSE: 21. Appropriateness of texts.. 21. l 2 3 4 5 22. Appropriateness of exam format. 22. 1 2 3 4 5 23. Appropriateness of material covered on the exam. 23. 1 2 3 4 5 24. Organization of lecture material. 24. 1 2 3 4 5 25. Beneficialness of homework assignments. 25. l 2 3 4 5 STUDENT BAGCROUND: Select the most appropriate alternative. 26. Was this course required in your degree program? 26. Yes No 27. What is your sex? 27. H P 28. What is your overall GPA? (l) 1.9 or less, (2) 2.0-2.2, (3) 2.3—2.7, (4) 2.8-3.3, (5) 3.4-4.0---- 28. l 2 3 29. What is your class level? (1) Freshman, (2) Sophomore, (3) Junior, (4) Seniorr 29. 1 2 3 4 S 30. In comparing this rating form with other rating forms you have responded to at NSF, do you find it: (1) Superior, (2) Above Average, (3) Average, (4) Below Average, (5) Inferior 30. l 2 3 4 5 84 FIGURE 3.6 STUDENT INSTRUCTIONAL RATING SYSTEM FORM FORM B 85 MICHIGAN STATE UNIVERSITY STUDENT INSTRUCTIONAL RATING SYSTEM FORM Form B One way in which a teacher can improve his or her class is through thoughflul student reactions .- - .2-..“ . .. - . a .. . ‘. w v" yw- raw-room Iur ans I - SUPERIOR: "untimely good oouras or ' s uctor M - ABOVE AVERAGE: hsnsr than th- W course ' ructor AV - AVERAGE: tvpicsl 0' courses or Instructors IA ~ BELOW AVERAGE: not as good as tho m course or intruder l- INFERIOR: exceptionally poor course or Instructor f" , of ‘ ' , to the course that you are rating. For example, if you‘ ‘- ' ‘, ' L’ mitllesve blank! those items pertaining to homework.“"“ ,. ‘ , " ' ' L KEY. KEV L i M AV BA I I The Instructor's enthusiasm when presenting course I'm-rial l A} A2, BA I: 2. The instructor‘s Interest in teaching 2. AA AV ad I 3. T“ 'a uwufu....v'mu., , ‘ L ', ' 3. [A AV M E 4, 'L '. . -.....;.r..:...:... ‘ ' 4. A A)! SA | 5. Your interest in learning the course material 5. AV 3: l‘ 6. Your general attentlvsnsss in rlnn 6. A? Ail B} I“ 7. The course as an intellectual challenge 7. AA AV BA I" a" ' ' .- 8. [A (V BA I‘ 9. The instructor’s encouragement to students to express minim“ 9. AA A‘v sh f‘ IO. Theinstructor’s racepuvensss to new ideasand others' viewpoints 10. A3 AV 311 l‘ 11. The student's opportunrty to ask questions 11. 11 AV EA I! 12. The instructor's stimulation of class discussion 12. A Mr BA I4 13. 't ' ‘ L ‘ ' ' , 13. § 11 AV 51 I‘ I4. 7“ I ‘ ‘ , L4 L L A uuwmpwutu umr |hl1lllIlIIiCl 14~ g A AK; 31 F 15. The contribution at homework assignments to your understanding of the course material relative to the———-. 15. S M AV 31 I amount ol~ time requIrad 16. u ' ‘ L , ‘ _ 16. 5 AA av an I' 17. 'L ‘ '..b;:.........:... L 17. s (A av 31 I“ 18. The course orgamlatron 18. S AA AV 674 I 19. The can of i a 19. g (A AV BA i 20. “ ‘ -. , u? " " ‘ ' “MM-"n 2" 5 AA iv HA I. 21. Your general engovment of the course 7‘ S KA AHV a; l‘ STUDENT BAQKGROUIQ Select the most approprlate alternatIvr. L L I. 22. Was thus course requIred In your degree program? _____________________ 22. via an 23. What Is your sex1 ________________________________ 23. H E 24. What Is your overall GPA’ (a) I9orless (b) 20-22 (C) 2.3-27 (d) 283.3 (9) 3.44.0 ___________ 24. a ‘ b‘ a E i: 3 25. What IS your class level" (a) Freshman(b)Sophomore“)Juntor(d)$enror(s)Graduale or other _______ 25. I. I; E n 5 Optionslltsms:"‘... ...._, zsracor nAsch 23 A s c o E 295 Ii 8 t} E 30 'A a c o E 31 E E g . g 32 A a c o E 33 I i E B E , . . - I. u .. u a 34 'A Ii c b 's as 5 § § . g as a 's c o E 37 5 E E ‘ E as A a c o a as R i E B E .. .1 u c u u 40 3 II E b E 41 ' B E B E u . .. s u u u u 4.2 i i C l‘) E 43 R a E D E u n u .. u u u u u .. “5..CI'9'.. “55$?“ 1‘ I r ' H n «HHS "IEEEE rI H n ‘85.”? “55?“ "I I" H Fl 7‘ n n n H I' 5068895 5168885 O 9640 86 @2921 Five instructional rating forms differing primarily with regard to specific items useful for evaluating different instructional techniques were developed. Each instrument was given to one-half of the particular class that it was developed for. The other one-half of the class received the usual SIRS Level II form. The forms were administered to randomly equivalent halves in each of the classes. Each instructor was given a packet containing the two forms arranged alternately (assuming a random start) so that each form would automatically be distributed to random halves of the class. Each student, therefore, received one form. Directions were given to the instructors to administer the forms just as they have administered the instructional rating form in the past. The instructors were told to ask a student to return the forms to the Marketing and Transportation Department. The answer sheets were collected, coded, and the data punched onto cards. At this point, the rating forms were turned over to the Dean's Office of the College of Business. Generalizability of Results Since the instructors were volunteers, the generalizability of this study to other instructors is limited. However, the main concern of this study is not the generalizability to other instructors but the comparability of two different types of student rating forms and the usefulness of the procedure described in this study to develop class-specific forms. The 87 nonrandom selection of instructors does not affect the comparison of the two types of rating forms because the different forms were administered to ran- domly equivalent groups. II. III. HYPOTHESES For each class, the responses to the core items in the class specific instrument come from.the same distribution as the responses to corresponding items in the SIRS. For each class, the responses to the core items in the class specific instrument do not come from the same distribution as the responses to corresponding items in the SIRS. For each class, the item variance of the tailored items in the class specific instrument is the same as the item variance of the items in the SIRS. For each class, the item.variance of the tailored items in the class specific instrument is less than the item variance of the items in the SIRS. The between class variability of tailored items shared by two or more of the class specific instruments is the same as the between class variability of items on the SIRS. The between class variability of tailored items shared by two or more of the class specific instruments is greater than the between class variability of items on the SIRS. 88 IV. R : The proportion of the students that are satisfied with the class specific instrument is equal to .50. H : The proportion of students that are satisfied with the class specific instrument is greater than .50. V. Ho: There are no differences in rater reliabilities between the class specific instrument and SIRS. H1: The rater reliabilities of the class specific instrument are not the same as those obtained from the SIRS. ANALYSIS The hypotheses of no difference in responses to the core items in the class specific instrument and the corresponding items in the SIRS can be calculated by the Chi Square Statistic. A.number of Chi Square two sample Tests of Independence can be performed to determine if the distribution of responses to a core item on a class specific instrument is from the same population distribution as the responses to the corresponding item.on the SIRS. Since there are five specific class instruments with eight core items, a total of forty Chi Square Statistics will be calculated. If the hypothesis of no difference is accepted, it will be assumed that students are reacting the same to both evaluation instruments. Because the first hypothesis specifies the class as the unit of interest, the alpha level ‘will only be inflated at most by a multiple of eight. This multiple of eight comes from.the fact that eight general questions exist on each class specific form. Therefore, if a tabled value for alpha is .01, then the upper limit on alpha would be .08. 89 The second hypotheses of no difference between the item variance of the tailored items in the class specific instrument and the item variance of the items in the SIRS will be tested by the Mhnn4Whitney Statistic. This is a nonparametric test that may be used to test whether two indepen- dent groups have been drawn from the same pOpulation (Siegel, 1957). A separate HannéWhitney test statistic will be calculated for each class comparing the item variability of the class specific items to the item variability of the SIRS. For an example of the design format, refer to Figure 3.7. If the Hypothesis of no difference is rejected, a perusal of the two distributions would give information concerning where the differences actually occurred. To test the third hypothesis of no differences in between class vari- ability it will be necessary to calculate a between class variability for each of the twenty items on the SIRS, and a between class variability for each class specific item common to more than one class specific instrument. The Mann-Whitney U statistic will then be used to compare the two sets of between class variances. The fourth hypothesis concerning the satisfaction item will be examined by a hypothesis test of one proportion. The normal approximation to the binomial is the correct statistical test to use in this instance. The pro- portion of students who viewed the class specific instrument as satisfactory will be comprised of those students who responded to options 1, 2, and 3 on the satisfaction item. The proportion of students who responded to options 4 and 5 on the satisfaction item will be considered dissatisfied with the class specific instrument. *The subscripts designate the item number. 90 DESIGN FOR.MANN¥WHITNEY U TEST FIGURE 3.7 MTA 313 Class Specific SIRS Instrument Level II n=16 n-20 c,2 02 9 1 o2 02 10 2 02 02 ll 3 02 02 12 u o2 02 13 5 o2 02 1“ 6 02 02 15 7 02 02 16 8 02 02 17 9 02 02 18 10 o2 02 ' 19 11 o2 02 20 12 02 02 21 13 02 02 22 I“ 02 02 23 15 02 02 2“ 15 02 17 02 18 02 19 02 20 91 The last hypothesis of no difference in rater reliabilities will be tested by use of the F statistic. Rater reliabilities will be computed for each instrument within a class. In all of the instances, both an estimate of the reliability of ratings of an individual rating and the reliability of the average rating will be calculated. These computations ‘will use an analysis of variance technique (Ebel, 1937), to arrive at the components necessary for calculation. SUMMARY Five instructional rating forms differing primarily with regard to class specific items were developed and administered to random halves of five classes in the marketing and Transportation Department. The other random half of each class was administered the general SIRS, Level II. The study was designed to test the effect of class specific items and general items on item.variability. The items that have a small within class variability and a larger between class variability will be considered the better items. The hypothesis of no difference concerning the responses to the SIRS items and the corresponding class specific items will be tested by the Chi Square Statistic. The two hypotheses concerning item variabilities will be tested by the MannéWhitney U Statistic. The hypothesis concerning the student satisfaction item on the class specific instrument will be tested by the use of the normal approximation to the binomial. 92 The hypothesis of no difference in rater reliabilities will be tested by the use of an F Statistic. CHAPTER IV RESULTS INTRODUCTION The present research was designed to build an instructional rating scale composed of general items useful for evaluating all classes, and specific items tailored for individual classes. The class specific instrument was compared to a generally accepted rating scale. Because the generally accepted instrument was well established in its creation, items were drawn from it to be used in the general (core) part of the class specific instrument. The first hypothesis of this study compares the distribution of responses made by students to like items on each instrument by class. It is hypothesized that students respond in the same manner to an item regardless of what instructional rating form the items appear on. It was also hypothesized that the class specific instrument would have less item variability on a particular item in a given class than the general instrument. It would also be expected that items used on more than one specific instrument would have a larger between class variability than items on the general instrument. 93 94 Another hypothesis of the study was to compare indexes of rater reliabilities for each instrument administered to each class. It would be expected that the class specific form would have higher rater relia- bilities than the general instrument. The last hypothesis concerned a satisfaction item.incorporated into the class specific instrument. An index of satisfaction is computed for this satisfaction item.for each class. A.hypothesis test of one propor- tion is calculated to determine if at least fifty percent of the students are satisfied with the class specific instrument. To conduct the study, five instructional rating forms differing pri- marily with regard to class specific items were developed and administered to random halves of five classes in the marketing and Transportation Department at Michigan State University. The remaining half of each class was administered the general SIRS, Level II. The hypothesis of no difference in the pattern of responses to the core items in the class specific instrument and the corresponding items on the SIRS can be tested with the Chi Square Statistic. The second hypothesis of no difference between the item.variance of the tailored items on the class specific instrument and the item variance of the items on the SIRS will be tested by the Mananhitney Statistic. The third hypothesis concerning the between class variability of the SIRS and class specific instrument will also be tested by the‘MannéWhitney statistic. The student satisfaction item on the class specific instrument will be tested by use of a one proportion hypothesis test. 95 The hypothesis of no difference in rater reliabilities will be tested by an F test. RESULTS CONCERNING LIKE ITEMS ON DIFFERING INSTRUMENTS The test of Hypothesis I was carried out by the calculation of forty Chi Square Tests.1 Hypothesis 1 was stated, Ho: For each class, the responses to the core items in the class specific instrument come from the same distribution as the responses to corresponding items in the SIRS. H : For each class, the responses to the core items in the class specific instrument do not come from.the same distribution as the responses to corresponding items in the SIRS. Chi Square is the appropriate statistic for determining whether two (or more) distributions are essentially identical. For each class, a comparison is made between the responses for each core item on the class specific instrument with the corresponding item on the SIRS. Therefore, a total of forty comparisons are made. Because the class is the focus of the hypothesis, the a level will be inflated at most by a multiple of eight. Therefore, if a tabled value of .01 is used, the most the actual a can possibly be is .08. Each of the forty Chi Square tables form a five by two matrix. A sample matrix for MTA 311, Section 1 is presented in Figure 4.1. However, due to the fact that a basic assumption for the Chi Square Statistic is that 802 of the cells require an expected frequency of at least five 1It should be noted at this time that MTA 341, Section 2 was omitted from the analysis due to data collection difficulties. 96 CHI SQUARE MATRIX FOR MTA 311 SECTION 1 SPECIFIC INSTRUMENT - ITEM 1 SIRS - ITEM 2 Response Option 1 2 3 Specific Instrument 8 18 13 Item 1 SIRS Item 2 2 24 17 FIGURE 4.1 97 many matrixes had to be collapsed. When collapsing was necessary, the apprOpriate contiguous cells were collapsed. Because the tabled Chi Square Statistic is related to the number of cells, varying Chi Square tabled values arose. At the a - .01 (.08 upper limit) level, the Chi Square tabled values are listed in Table 4.1. To reject the null hypothesis, a Chi Square calculated value must be greater than the Chi Square tabled value. The calculated Chi Square values are presented in.Tab1e 4.2. A perusal of Tables 4.1 and 4.2 shows that only one of the forty calculated Chi Square values produced significance at the a - .01 (.08 upper limit) level. RESULTS CONCERNING ITEM VARIABILITY The test of hypothesis II was carried out by the calculation of five Mananhitney U Test Statistics. Hypothesis II was stated, Ho: For each class, the item variance of the tailored items in the class specific instrument is the same as the item variance of the items in the SIRS. 1: For each class, the item variance of the tailored items in the class Specific instrument is less than the item variance of the items in the SIRS. It was necessary to make 5 comparisons in this situation. A comparison between the variances of the tailored items and the twenty general items on the SIRS was made for each class. The Mann4Whitney U Test is the most appropriate test in this instance because it is capable of testing whether two independent samples were drawn from the same population. The test will 98 TABLE 4.1 CHI SQUARE TABLED VALUES a-.01 (upper limit .08) Itgg Specific 311 311 Instrument SIRS Sec. 1 Sec. 313 317 341 l 2 6.63 13.28 6.63 11.34 9.21 2 4 6.63 13.28 6.63 13.28 11.34 3 5 11.34 13.28 6.63 13.28 13.28 4 8 11.34 13.28 6.63 13.28 6.63 S 13 11.34 13.28 6.63 11.34 9.21 6 14 13.28 13.28 6.63 11.34 6.63 7 17 13.28 13.28 6.63 13.28 6.63 8 18 11.34 13.28 6.63 13.28 6.63 99 TABLE 4.2 CHI SQUARE CALCULATED VALUES jgggg Specific 311 311 Instrument SIRS Sec. Sec. 313 317 341 1 2 .286 5.537 1.529 2.132 .600 2 4 .579 3.125 .176 11.036 7.132 3 5 1.639 5.770 .004 1.647 1.094 4 8 .469 1.713 2.184 7.747 11.302 5 13 10.572 .696 2.742 7.670 .792 6 14 2.055 6.217 2.509 .730 .110 7 17 9.298 3.492 2.341 3.369 .019 8 18 3.536 4.372 .494 3.980 .379 100 give information about whether the distribution of variances for the two evaluation instruments are the same. Because there were twenty variances in the SIRS form and a range of fourteen (MTA 317) to twenty-two (MTA 311, Section 2) variances in the specific form the normal approximation to the MannéWhitney U was utilized. Table 4.3 lists the decisions made at an a - .05 level of significance for each of the five classes. The results were not of a conclusive nature. In three of the classes (MTA 311, Section 2, MTA 313, MTA 317) the null hypothesis was accepted. This acceptance decision can be interpreted as the distribution of vari- ances of the class specific and SIRS instruments being the same. Two instances occurred in.which the null hypothesis was rejected; in.MTA 311, Section 1 it could be inferred that the class specific distribution of variances was larger than the SIRS distribution of variances. The second rejection decision involved MTA 341, the inferences are reversed with the class specific distribution of variances being smaller than the SIRS dis- tribution of variances. Because of the inconsistent results, it was interesting to delve further into the data and calculate the average variance for each instru- ment in each class. The average variances were computed for the first twenty items on the SIRS and the tailored items on the class specific instruments. These variances are listed in Table 4.4. It is interesting to note that in only one case was the class specific average variance greater than the SIRS average variance. In the four other classes, the average variance of the class specific form was equal to or smaller than the average variance of the general instrument. 101 TABLE 4.3 COMPARISON OF VARIANCE DISTRIBUTIONS SPECIFIC VS GENERAL INSTRUMENT MANN‘WHITNEY U a - .05 DIRECTION OF DIFFERENCE IF CLASS DECISION DECISION - REJECT MIA 311, Section 1 Reject Ho: Specific > General MIA 311, Section 2 Accept HO: MTA 313 Accept Ho: MTA 317 Accept Ho: MTA 341 Reject HO: General > Specific 102 TABLE 4.4 AVERAGE VARIANCES FOR CLASS SPECIFIC AND GENERAL INSTRUMENTS AVERAGE VARIANCE NUMBER.OF ITEMS CLASS Class Specific General Class Specific General MTA 311, Sec. 1 .78 .49 17 20 MIA 311, Sec. 2 .67 .67 22 20 MTA 313 .49 .55 16 20 MTA 317 .71 .73 14 20 MTA 341 .64 .69 17 20 103 The test of hypothesis III was carried out by the calculation of a Mananhitney U Test Statistic. Hypothesis III was stated, no: The between class variability of tailored items shared by two or more of the class specific instruments is the same as the between class variability of items on the SIRS. The between class variability of tailored items shared by H .0 two or more of the class specific instruments is greater than the between class variability of items on the SIRS. In order to make the necessary calculations, an index for between class variability was obtained from the following formula: To calculate an index for between class variability, the following formula was utilized: where: x.1 - mean on item 1 for a particular class i1 - mean of all the means on item 1 n - f of classes using item 1 This index was easy to tabulate for the general SIRS instrument. The SIRS instrument remains the same for all classes. Therefore, a between class variability index can be computed for all twenty general items in the SIRS. These indexes can be viewed in Table 4.5. To tabulate a between class variability index for the class specific instruments, it was necessary to isolate specific items used on more than one form. Two items are common to three class specific forms, three items are common to six class specific forms, four items are common to four class 104 TABLE 4.5 GENERAL SIRS FORM INDEX.OF BETWEEN CLASS VARIABILITY 1532! Between Class variability l .101 2 .031 3 .297 4 .014 5 .079 6 .122 7 .243 8 .061 9 .096 10 .116 11 .096 12 .207 13 .042 14 .047 15 .082 16 .024 ' 17 .118 18 .087 19 .107 20 .028 Average Variance - .099 105 specific forms, and five items are common to four class specific forms. Twenty-two items are unique to only one class specific form. Table 4.6 lists a short description of the item and gives the associated index of between class variability. The average of the variances for the specific instrument is .110, slightly larger than the general SIRS' average vari- ance of .099. The'Mann-Whitney U statistic was then calculated to test for any dif- ferences between the two distributions of variances displayed in Tables 4.5 and 4.6. The results were to accept the null hypothesis at the a - .05 level of significance. RESULTS CONCERNING STUDENT SATISFACTION The test of Hypothesis IV was carried out by the calculation of five separate normal approximations to the binomial. Hypothesis IV was stated, Ho: The proportion of students that are satisfied with the class specific instrument is equal to .50. H1: The proportion of students that are satisfied with the class specific instrument is greater than .50. The distribution of responses to the satisfaction.item are presented in Table 4.7. It is possible to use the normal approximation to the binomial in all classes because both, n V 3.5 n (1-1) 3.5 where: n 8 # of students in each class responding to the particular instrument 1 - proportion specified in the hypothesis (1 - .5) Item Description 106 TABLE 4.6 CLASS SPECIFIC INSTRUMENT INDEX.OF BETWEEN CLASS VARIABILITY # of instruments item included on THE INSTRUCTOR: 0‘ U|&U)NI-‘ O O Spends extra time with you Is available during office hours Relates course material to everyday life Encourages students to express opinions Is receptive to new ideas and other's viewpoints Generally stimulates class discussion THE COURSE'S CONTRIBUTION TO YOUR: 7. 8. 9. 10. 11. 12. Understanding of day to day working of a field representative Obtainment of general knowledge in the field Understanding of concepts and principles in the field Improving problem solving abilities Ability to communicate clearly on the subject Ability to solve real problems in the field FOR.THIS COURSE: l3. 14. 15. 16. 17. 18. Appropriateness of text Appropriateness of instructional material Appropriateness of homework Appropriateness of exam format Organization of lecture material Appropriateness of exam content WU UU‘UL‘ #N bU‘NUI uwbewm Between Class variability_ .148 .145 .158 .117 .196 .226 .186 .047 .057 .090 .237 .068 .072 .008 .065 .071 .059 .022 Average variance - .110 107 TABLE 4.7 FREQUENCY OF RESPONSES TO THE SATISFACTION ITEM IN THE CLASS SPECIFIC INSTRUMENT Item: In comparing this rating form with other rating forms you have responded to at MSU, do you find it: (1) Superior (2) Above Average (3) Average (4) Below Average (5) Inferior Response Option Class (1) (2) (3) (4) (5) MM 311, Sec. 1 2 15 19 1 1 MTA 311, Sec. 2 2 14 4 o o m 313 o 5 7 o 0 MIA 317 . 23 46 16 2 1 MIA 341 1 6 11 1 o 108 The tests were calculated at the a - .05 level of significance. The tests were first computed on each class collapsing Option one and two. In this test, the question becomes one of whether at least fifty percent of the students responded to the class specific instrument as being superior or above average. Table 4.8 shows that only two out of the three classes re- jected the null hypothesis under these strict conditions. However, when options one, two and three were collapsed the null hypothesis was rejected in all five cases (Table 4.8). The last decision can be interpreted as at least fifty percent of the students in each class felt the class specific instrument was at least average. RESULTS CONCERNING RATER RELIABILITIES The test of hypothesis V was carried out by the use of an F test. Hypothesis V was stated, Ho: There are no differences in rater reliabilities between the class specific instrument and SIRS. H1: The rater reliabilities of the class specific instrument are not the same as those obtained by the SIRS. In order to calculate the F statistic, it was first necessary to get estimates of rater reliabilities. The coefficient used to calculate the rater reliabilities was the intraclass rater reliability coefficient. written in analysis of variance terms, it was possible to use an SPSS (Nie, et. al., 1975) routine to find the necessary components to generate the reliability estimates by hand. The necessary mean squares were ob- tained from the SPSS routine, "Reliability". A one way analysis of 109 TABLE 4 .8 STATISTICAL DECISION CONCERNING THE E : IN THE SATISFACTION QUESTION Responses to Option 1 + Option 2 Responses to Option 1 4- Option 2 + Option 3 1 TAILED TEST a - .05 Ho: ‘I - .50 H1: 1 > .50 CLASS MTA 311 MTA 311 Sec. 1 Sec. 2 MIA 313 MTA 317‘ MTA 341 Accept Reject Accept Reject Accept Reject Reject Reject Reject Rej ect 110 variance table (Table 4.9) was tabulated for each instrument for each class using only complete sets of ratings. The reliability of average ratings was calculated by hand, _ MBitems - Mserror nn MSitems where: MS - mean square These reliabilities are presented in Table 4.10. An estimate of the precision of these reliability estimates according to a method suggested by Jackson and Ferguson (1941) was also used to build confidence intervals around the reliability of an individual rating. The formula for an individual rating follows: items Mserror items - (k-1)MS us 1'11 ' as error where: MS - mean square ' k - # of students The reliability of an individual for each situation is presented in Table 4.11. A confidence interval is built around the reliability of one average rater by use of the following formula: (stFe)-1 (F8~F)-1 . r (F; 7 Fe) 1 +-k 11 (F; F) - 1 +-k MS where: F8 - £3 error d.f. for items F - tabled F - d.f. for error d.f. for error Fe - tabled F - d.f. for items k - # of students 111 TABLE 4.9 ANALYSIS OF RATINGS - COMPLETE SETS STUDENTS l 2 3 . . . . n (f in a 1 particular class) 2 3 TTEMS** *m Mean Square: For items For students For error For total *m - 20 for the SIRS varies for class specific instrument ** - only included tailored items on class specific instrument 112 TABLE 4.10 INTRACLASS RELIABILITY COEFFICIENT AVERAGE RATINGS INSTRUMENT CLASS Class Specific General SIRS MTA 311, Section 1 .308 .889 MTA 311, Section 2 .669 .749 MTA 313 .684 .847 MTA 317 .912 .900 MIA 341 .750 .751 113 TABLE 4.11 INTRACLASS RELIABILITY COEFFICIENT INDIVIDUAL RATER INSTRUMENT CLASS Class Specific General SIRS m 311, Section 1 ‘ I .013 .182 MTA 311, Section 2 .106 .17 MIA 313 .153 .284 MTA 317 .111 .095 MTA 341 .150 .159 114 The corresponding confidence intervals calculated at the 952 level are pre- sented in Table 4.12. One well accepted method of testing the hypothesis concerning rater reliabilities is by a comparison of the confidence intervals around the estimates presented in Table 4.12. In the instances where the confidence intervals overlap, the null hypothesis is accepted and there are no dif- ferences in the reliability estimates. However, it was brought to the attention of the author that a more powerful technique was available due to the fact that an equal number of students in each class filled out the SIRS and class specific form.1 The more powerful technique makes use of the F Statistic. A calculated F value is compared to tabled F values, if the calculated value falls between the two tabled F values the null hypo- thesis is accepted. Since the F statistic is a ratio, 2 values are necessary. Each value comes from F8 calculated on page 110. Therefore, F8 (for the SIRS) 1 F _____ 9319ulat°d F3 (for the class specific instrument) 2 The degrees of freedom for the tabled F values are nl-l and nZ-l where 111 and 112 are the number of items in the SIRS and class specific instrument respectively. The upper tabled F value is read directly from the table with nl-l and nz-l degrees of freedom. The lower tabled F value is the reciprocal of the tabled value with nz-l and nl-l degrees of freedom. 1Thanks must go to Dr. Dennis Gilliland of the Probability and Statistics Department at Michigan State University for the time given me with regards to this technique. 115 TABLE 4.12 CONFIDENCE INTERVALS AROUND RELIABILITY ESTIMATES OF INDIVIDUAL RATER (95% Confidence) INSTRUMENT CLASS Class SpeCific General SIRS MTA 311, Section 1 .04 to 0.0 .28 to .09 MTA 311, Section 2 .18 to .04 .05 to .22 MTA 313 .26 to .06 .41 to .14 MIA 317 .17 to .06 .16 to .05 MIA 341 .24 to .07 .27 to .06 116 Table 4.13 presents the calculated and upper and lower tabled F values. Referring to Table 4.13, hypothesis V is accepted in four instances and rejected in one instance. Acceptance of the null hypothesis refers to there being no difference in the consistency that students respond to the items in MTA 311 Section 2, MTA 313, MTA 317, MTA '341. In the case of MTA 311 Section 1, the calculated P value is larger than the upper tabled F value, informing the reader that the SIRS had a larger rater reliability than the class specific instrument for MTA 311 Section 1. OTHER INTERESTING RESULTS In a perusal of the class means for each set of items another point of interest surfaced. The grand mean of all the item means for each class is presented in Table 4.14, using a letter representation for each class. EaCh grand mean for the SIRS instrument consisted of twenty items. How- ever, the number of item.means used in the class specific instrument varied, and only the tailored items were used to calculate the grand mean. If the instructors were to be rank ordered according to the grand mean of each evaluation instrument, the rank orders would not remain constant. The rank orders of the grand mean for the SIRS instrument are: Rank Class u:a~u:nah- >~c1catnlw The rank orders for the class specific form are: 117 TABLE 4.13 F TEST 0 - .05 2 Tailed CLASS Fcalculated d'f° Ftabled (lower) Ftabled (upper) MTA 311, Sec. 1 6.215 19, 24 .408 2.33* MTA 311, Sec. 2 1.319 19, 29 .418 2.21 MA 313 2.065 19, 23 .408 2.39 MTA 317 0.883 19, 21 .398 2.42 MTA 341 1.004 19, 24 .408 2.33 *Significant at o - .05 118 TABLE 4.14 TABLE OF GRAND MEANS CLASS A B C D E General SIRS 2.79 2.43 2.57 2.76 2.45 Class Specific 2-71 2~43 2.34 2.60 2.59 119 Rank Class 1 C 2 B 3 E 4 D 5 A SUMMARY OF RESULTS OF STUDY The hypothesis of no difference between the core items in the class specific instrument and the corresponding items in the SIRS was accepted as expected. This supported the proposal that students respond in the same manner to items regardless of the instrument the items are embedded in. The hypothesis of no difference in the item variance of tailored items as compared to general items was neither totally accepted or rejected. There were three cases of acceptance and two cases of rejection. The two cases of rejection gave conflicting results. However, a perusal of a table of average variances (Table 4.4) for the differing instruments did support the research hypothesis to a certain extent. Four of the five average variances for the class specific instrument were the same or smaller than the average variance of the SIRS. It had been predicted that the better instrument would have a smaller item variance for a particular class. The hypothesis of no difference concerning the between class varia- bility of tailored items versus general items was tested by a.ManneWhitney U Statistic. The average between class variability of the common tailored items - .110 and the average between class variability of the general SIRS 120 items - .099. These results were in agreement with the contention that a more discriminating evaluation instrument would have a larger between class variability. The hypothesis concerning the proportion of students that were satis- fied was rejected in all classes if the criteria for satisfaction included the average response option (#3). It appeared that students felt the class specific instrument was at least as good as any other evaluation instrument they had responded to at Michigan State University. The hypothesis of no differences in rater reliabilities between the SIRS and specific form was only rejected in one out of five instances. The acceptance of this hypothesis in four out of five instances indicates that the differences in the reliability coefficients were not large enough to rule out the possibility of their being due to chance. Although the class specific reliability coefficients were generally as good as the SIRS reliability coefficients, no trends were found in this study to support the hope that class specific instruments yielded larger relia- bility coefficients than the general SIRS. In Chapter V, the results of the study are discussed in the light of possible explanations. Suggestions are made for future research. CHAPTER V SUMMARY AND CONCLUSIONS SUMMARY With the advent of teacher accountability, student ratings of profes- sors have become a greater concern in recent years. It has become neces- sary for administrators to have normative data for making unbiased decisions regarding the teaching staff. However, student evaluation instruments are often developed and piloted on a very specific population as a sample of convenience. The instrument is then often used on a university wide basis. Because of this, the instrument must remain very general in nature. The purpose of this study is to build an instructional rating scale that would contain items not only general in nature, but items specific to the class of interest. These items would not only be useful in evalu— ating the instructor, but also much more helpful for self diagnosis and instructor improvement. The items on this scale would discriminate between good and poor instruction, and have unambiguous questions on which raters could be in agreement for each instructor. In terms of item variability, the better of two evaluation instruments would have less variability on a 121 122 particular item in a given class. It would also be expected that variability exists on a particular item between classes. This between class variability could only be computed for items that appear on more than one specific evaluation instrument. In terms of the above mentioned between class vari- ability, the better of two evaluation instruments would have a larger between class variability. In order to make comparisons, it was necessary to have both a general instrument and a class specific instrument. One of the forms was the standard Level II, Form B student rating form given at Michigan State Uni- versity. The SIRS is a general form developed on a general population. The comparison instrument was developed for specific classes within a specific department. The comparison instrument contained eight general core items used on every instructor's student evaluation form, 10-20 items specific to the individual class situations, and a satisfaction item. The eight core items were selected from the SIRS instrument. The purpose of the instrument satisfaction question was to obtain an index comparing students' perceptions of the class specific instrument with other rating scales the students had filled out at Michigan State University. To conduct the study, five undergraduate classes were chosen from the Marketing and Transportation Department at Michigan State University. These classes were chosen because of their diverse nature. The courses varied on such dimensions as class size, lecture versus discussion,'quali- tative versus quantitative, and inclusion or exclusion of recitation sections. This diversity was necessary in developing specific class items. It was hypothesized that these five original instruments would have less variability on a particular itemrwithin a class and have a larger 123 between class variability on a particular item than the SIRS. In order to test the above hypotheses, it was necessary to have equi- valent student groups responding to both the SIRS and the class specific instrument in each class. This was accomplished by alternating the SIRS instrument with the class specific instrument within classes. Thus, each instrument was given to one-half of the particular class it was developed for. The other one-half of the class received the usual SIRS form. Assuming a random start, the forms were administered to randomly equivalent halves in each of the classes. A.hypothesis was formed to test for equi- valent groups. The hypothesis stated that the distribution of responses to the core items in the class specific instrument is the same as the distribution of responses of corresponding items in the SIRS. It was also hypothesized that an index of rater reliability would be larger for the class specific form than the rater reliabilities of the SIRS instrument. This index is concerned with consistencies, i.e., to what extent do students give the same information about an instructor. A separate reliability estimate is obtained for each instructor in each class. Therefore, consistency in this case refers to how consistently the students in a particular class evaluate this instructor. The last hypothesis concerns the satisfaction item incorporated into the class specific instrument. It was proposed that the proportion of students satisfied with the class specific instrument would be greater than .50. The following statistical techniques were used to test the above hypotheses: 124 1. The hypothesis of no difference in response to the core items in the class specific instrument and the corresponding items on the SIRS was tested with the Chi Square Statistic. 2. The hypothesis of no difference between the item variance of the tailored items on the class specific instrument and the item variance of the items on the SIRS was tested by the MannéWhitney U Statistic. 3. The hypothesis of no difference in between class variability of the SIRS and class specific instruments was tested by the Mann? Whitney U Statistic. Tables compare the between class variability of items on the class specific form‘with items on the SIRS. An average index was calculated for each form. 4. The hypothesis regarding student satisfaction was tested by the normal approximation to the binomial. 5. The hypothesis of no difference in rater reliabilities was tested by use of an F statistic. CONCLUSIONS 1. The distribution of responses to the core items in the class specific instrument was the same as the distribution of responses of cor- responding items in the SIRS. 2. There were no concrete statistical conclusions concerning the item.variance of the tailored items compared to the item variance of the However, in four out of five classes, the average item variance 125 of the class specific instrument was equal to or less than the average item variance of the SIRS. The smaller the variance on a particular item, the larger the amount of agreement among students with regard to a particular item. 3. The average between class variability for tailored items on.the class specific instrument was larger than the average between class vari- ability for the general items on the SIRS. This lends support to the idea that students can better discriminate between instructors if items are specific to a class. 4. At least fifty percent of the students felt the class specific instrument was as good if not better than any other student rating form they had come in contact with at Michigan State University. 5. On the whole, there did not appear to be any difference between the rater reliabilities on the specific instrument compared to those on the SIRS. There was only one class where differences did occur between the rater reliabilities. 6. A result in addition to the results from.the list of hypotheses con- cerns the average item mean on each instrument for a particular instructor. If instructors were to be ranked by their average rating on a student rating form, it is interesting to note that their ranks would alter with the instrument being used. .Although the fourth and fifth ranks remained constant among forms, the ranks of one, two and three were altered con- siderably. The instructor who ranked first on the class specific instrur ment ranked only third on the SIRS. The instructor who ranked third on the class specific instrument ranked second on the SIRS, and the instructor that ranked second on the class Specific instrument ranked first on the SIRS. 126 DISCUSSION Because of the design of this study, statistical significance was diffi— cult to determine. Therefore, the author was left with only a few plausible statistical techniques. The nonparametric techniques are not as powerful as their parametric counterparts, but the data would not allow any further analyses. Although there was no startling statistical significance, the trend of the data supported the major hypotheses of this study. Firstly, the acceptance of hypothesis I informed the researcher that any differences occurring in the pattern of responses to an item on the SIRS and the item's counterpart on the class specific instrument is due to chance alone. It was possible to proceed with the study with the satis- faction of knowing that equivalent groups were responding to the SIRS and the class specific instrument. The acceptance of the hypothesis also implies that no differences in student response occurs if the general item is embedded in a general or class specific instrument. The student is therefore responding to an item independent of the type of rating form. The above information made it possible to proceed with hypotheses II and III. The major purpose of this research dealt with the variability of the items. It had been assumed in Chapter I that the better of two evaluation instruments would have less variability on a particular item within a given class and greater variability on a particular item.between classes. Focusing on hypothesis two, which deals with the item.variability on both the class specific instrument and general SIRS, it is interesting to 127 note the average variances. Table 4.4 displayed the average variance for each form. .A perusal of this table shows that the average variance hardly varies between the type of instrument in four of the five classes. Howb ever, in one class, MTA 311 Section 1, a large difference in average variance surfaces. This is the only class specific instrument that, relative to the above mentioned assumption, did not appear to be as good an instrument as the SIRS in terms of average variability. MTA 311 Section 1 will be referred to again at the end of this discussion section. The third hypothesis concerning between class variability was not found to be statistically significant, as was predicted. However, the trend of the data were in the anticipated direction. The average between class variability of the SIRS was .099, while that of the class specific form was .110. It had been assumed that a good evaluation instrument should be able to discriminate between good and poor instruction. This assumption.msndates a larger between class variability. The larger average between class variability of the class specific instrument lends evidence to support the premise that the class specific instrument is a more sensi- tive measure of instruction. Returning to Table 4.6, it is interesting to note the items having the best discriminating power. Class specific items that have the larger between class variances are often prefaced.with the words "the instructor". It appears that students are better able to differ- entiate between classes on items pertaining directly to the instructor. Items prefaced with "The course's contribution to your" and "For this course" have relatively small between class variability. These results have two implications. One possible implication is that there is in fact very little 123 difference between classes with regard to teaching tools, for example, exam format, textbodks, and audio-visual equipment. Or that there is no differences between classes in the areas of what the student learns. However, another possibility is that students are unable to judge the teaching tools used by the professor or the amount of knowledge the student actually obtained. The rejection of hypothesis IV gave results supporting the student's satisfaction with the class specific instrument. Unfortunately, there was no counterpart of the satisfaction item.on the SIRS form. It seems fair to admit the possibility that the students responding to the SIRS were just as satisfied as those responding to the class specific instrument. Secondly, the classes were aware of the fact that research was being conducted with regard to student rating forms and their classes. This awareness could have favorably predisposed the students toward the new instrument. The hypothesis concerning rater reliabilities did not support the pro- posal that the class specific instrument would have higher rater reliabilities. In all cases except one, the differences in rater reliabilities could be con? tributed to chance alone. The only statistically significant difference occurred in MTA 311 Section 1. For this particular class, the reliability coefficients follow: SIRS Class Specific reliability of an individual rating .18* .01 reliability of an average rating (coefficient a) .89 .31 *These coefficients are calculated with equal n's 129 Obviously, there is much less consistency in the reliability of an average rating of .31 as opposed to .89. A coefficient as high as .89 conveys the information that a large proportion of the class is responding the same to the items on the class specific and the SIRS forms. A positive 1.0 would represent the fact that students were totally consistent. At this time, it is again noted that MTA 311 Section 1 is also the class in'which the class specific instrument was not as sensitive as the SIRS with regard to average variability. This is understandable in terms of rater reliability. A large variance within a class could be equated with inconsistent answers, which means lower rater reliabilities. The question still remains as to "why the rater reliabilities were not higher than those of the SIRS?" In answer to this question, it should first be kept in mind that in four of the five cases there was no statis- tical difference, i.e., the class specific reliabilities were as good as those of the SIRS. It is the opinion of the author that continued work on the specific class instrument would raise the index of rater reliabilities. The specific instrument was being compared to an instrument that has under- gone a large amount of research and is presently in a highly perfected state. The class specific form, although pretested, is still in a very early stage of development compared to the SIRS. The final result that proved of interest was the rankings of instructors according to the mean item response. Unfortunately, a sample as small as five cannot give statistically significant results, but a trend did appear. The rankings do not remain the same for instructors across instruments. Referring again tofrable 4.13, an instructor's average rating for each 130 instrument can again be observed. Instructor A and B obtained the same average rating regardless of the instrument utilized. Instructors D and E give not only different results but the direction changes. That is to say, instructor D received higher ratings on the SIRS than the class specific form, instructor E received lower ratings on the SIRS than the class specific form. The instructor having the largest differences in average ratings between the two forms was instructor C. Instructor C ranked third (average rating - 2.57) on the SIRS and first (average rating - 2.34) on the class specific instrument. In all fairness to the instruc- tor, it is necessary to maintain anonymity, however, some particulars about this course may shed some light on the difference. The course taught by instructor C is what is referred to in the School of Business as a "case" course. Students are given a description of a problem in the business world, and they are to read the problem and come up with a solution. The problem is discussed in class, the students are told how the company solved the problem, and the probable best solution. The difference between a case course and most courses in a university, especially at the undergraduate level, is that the particular case is not what the student is supposed to learn, but rather the applied concept. The learning that goes on in this class is to be able to transfer this one problem to new problems. Instead of knowledge and recall being tested, application and synthesis are what is important. The only plausible explanation for the difference in rankings for this course is that the class specific instrument topped the above areas while the SIRS could not. 131 This inconsistency among rankings should cause much concern to the administrator who is trying to make promotion and tenure decisions. Is it possible that an instructor could be tops on the class specific instrument (rank #1) and mediocre on the SIRS (rank #3)? The difference in being tops or mediocre may have a marked effect on pay raises. Which piece of infor- mation should the administrator be using in his decision? .Although the idea of using the general information is intuitively appealing, the fact remains that the class specific instrument contains items that students and instructors feel are important in evaluating their course. It would seem foolish if the administrator did not put as much emphasis on the class specific results as those from the general form. In summary, although the results were not as statistically significant as desired, the trends were supportive of the major hypotheses. The class specific form'built for MTA 311 Section I seemed to have more psychometric problems than the other four class specific forms. In retrospect, the author can find nothing in the development of this particular form to account for this problemt The only feasible suggestion to make is develop- ment of a new class specific form for MTA 311 Section 1, starting with the early pre-test stages and utilizing the information now available. Beyond the cut and dry statistical evidence is the obvious fact that the forms built for this study contain more information than the general SIRS. Nothing has been lost in these new forms but much new information has been acquired. Items referring to textbodks, exams, the usefulness of learned information in the real world, the ability to apply facts to other problems and many more. The general items on which comparisons can 132 be made between all classes are still available plus items that rate instruc- tor or course characteristics specific to an individual class. These class specific items are not only useful for an administrator in evaluating an instructor for tenure or promotion, but also prove helpful to an instructor for self improvement. The fact that general items are high inference in nature makes them of little use in course improvement. FURTHER RESEARCH In reviewing the conclusions and the discussion presented in this chapter, two areas of further research become apparent. The first area concerns the rankings of instructors according to the mean item response. A larger sample is needed to find out if the difference in rankings is anomalous to the present study only or generalizable to larger samples. If these results were present in future research, it would be necessary to make some administrative decisions concerning the handling of these dis- similar ranks. The other area of research concerns the domain of extrinsic variables. The results have not been consistent with reference to this subject. Re- search concerning such variables as sex, college year, class size, and whether the course is required or elective have come up with inconsistent results concerning their effect on student ratings. However, it would be interesting to attempt a study using both a general and class specific instrument. The first question of this research would be "do extrinsic variables effect the student ratings of instructors using the general form?". If this question was answered affirmatively, for any of the 133 extrinsic variables the next question would be "do these extrinsic variables effect the student ratings of these same instructors when class specific student rating forms are used?". The author of this research prOposes that any effects of the extrinsic variables would be dissipated by the specifi- cities of items of a class specific nature. The fact that the items are of a low to medium inference - might reduce the possible biases related to extrinsic variables. BIBLIOGRAPHY 134 BIBLIOGRAPHY Baril, G. L.; Skaggs, C. T. "Selecting Items for a College Course Evaluation Form," College Student Journal, Vol. 10, Summer, 1976, pp. 183-187. Blum, M. L. "An Investigation of the Relation Existing Between Students' Grades and their Ratings of the Instructor's Ability to Teach," The Journal of Educational Psychology, Vol. 27, 1936, pp. 217-221. Bradenburg, D. 0.; Derry, 8.; Hengstler, D. D. "Validation of an Item Classification Scheme for a Student Rating Item Catalog," Paper Presented at NCME, 1978. Breed, F. S. "Factors Contributing to Success in College Teaching," Journal of Educational Research, Vol. 16, pp. 247-253. Canaday, S. D.; Mendelson, M. A.; Hardin, J. H. "The Effects of Timing on the validity of Student Ratings," Paper Presented at NCME, 1978. Centra, J. A. "Student Ratings of Instruction and Their Relationship to Student Learning," American Educational Research Journal, vol. 14, Centra, J. A.; Linn, R. L. "Student Points of View in Ratings of College Instruction," Education and Psychological Measurement, Vol. 36, 1976, Clark, K. E.; Keller, R. J. "Student Ratings of College Teaching," 1a,; University Looks at its Program, eds. R. E. Echert and R. J. Keller, Minneapolis, Minnesota: The university of Minnesota Press, 1954. Cohen, S. A.; Berger, W. G. "Dimensions of Students' Ratings of College Instructors Underlying subsequent Achievement on Course Examinations," Proceedings of the 178th Annual Convention of the American Psycholo- gical Association, 1970, Vol. 5, pp. 605-606. Cohen, J.; Humphreys, L. G. Memorandum to faculty. University of Illinois, Department of Psychology, 1960. (Mimeographed) Costin, F. "A Graduate Course in the Teaching of Psychology: Description and Evaluation," Journal of Teacher Education, 1968, vol. 19, pp. 425-432. Costin, F. "Intercorrelations Between Students' and Course Chairmen's Ratings of Instructors," University of Illinois, Division of General Studies, 1966. (Mimeographed) 135 Costin, F.; Greenough W. T.; Menges, R. J. "Student Ratings of College Teaching: Reliability, Validity, and Usefulness," Journal of Educational Research, vol. 41, No. 5, 1971, pp. 511-535. Cronbach, L. J.; Gleser, G. C.; Nanda, H.: Rajaratnam, N. The Dependa- bility of Behavioral Measurements: Theory of Generalizability for Scores and Profiles, New York: John Wiley and Sons, Inc., 1972. Cunningham, W. J. "The Impact of Student-Teacher Pairings on Teacher Effectiveness," American Educational Research Journal, vol. 12, No. 2, Spring 1975, pp. 169-189. Cushman, H. R.; Frederick, K. T. "The Cornell Diagnostic Observation and Reporting System for Student Description of College Teaching," NACTA Journal, March 1976. Danielsen, A. L.; White, R. A. "Some Evidence on the Variables Associated with Student Evaluations of Teachers." Downie, N. M. "Student Evaluation of Faculty," Journal of Higher Education, vol. 23, 1952, pp. 495-496. Doyle, K. 0.; Whitely, S. E. "Student Ratings as Criteria for Effective Teaching," American Educational Research Journal, Vol. 11, No. 3, Summer 1974, pp. 259-274. Ebel, R. L. Essentials of Educational Measurement, New Jersey: Prentice- Hall, Inc., 1972. Ebel, R. L. "Estimation of the Reliability of Ratings," Psychometrika, Fry, P. W}; Leonard, D. W.; Beatty, W}‘W. "Student Ratings of Instruction: validation Research," American Educational Research Journal, Vol. 12, Gage, N. L. "Teaching Methods," in Encyclopedia of Educational Research, ed. R. L. Ebel, 4th edition, New York: ‘Macmillan Co., 1969. Gage, N. L. "The Appraisal of College Teaching," Journal of Higher Education. Gillmore, G. M. "Three Functions of Student Course Evaluations," Paper Presented at a Symposium on Course Evaluation, NCME Convention, 1972. Gillmore, G. M5; Kane, M; T.; Naccarato, R. W. "The Generalizability of Student Ratings of Instruction: Estimation of the Teacher and Course Components," Journal of Educational Measurement, vol. 15, Spring 1978, pp 0 1-13 0 136 Our, R. E.; Gur, R. C.; Marshalet, B. "Classroom Seating and Functional Brain Symmetry," Journal of Educational Psychology, vol. 67, 1975, pp. 151-153. Guthrie, E. R. The Evaluation of Teacher Teaching: A Progress Report. Seattle: University of Washington, 1954. Guthrie, E. R. "The Evaluation of Teaching," Educational Record, 1949, V01. 30’ pp. 109-1150 Harvey, J. N.; Baker, D. G. "Student Evaluation of Teaching Effectiveness," Ioproviog College and University Teachiog, vol. 18, 1970, pp. 275-278. Heilman, J. D.; Armentrout, W. D. "The Rating of College Teachers on Ten Traits by Their Students," Journal of Educational Psychology, 1936, V01. 27, pp. 197-2160 Jackson, R. W. B.; Ferguson, G. A. Studies on the Reliability of Tests, Toronto: University of Toronto Press, 1941. Kane, M. T.; Gillmore, G. M.; Crooks, T. J. "Student Evaluations of Teach- ing: The Generalizability of Class Means," Journal of Educational Measurement, Vol. 13, No. 3, Fall 1976. Lathrop, R. G. "Unit Factorial Ratings by College Students of Courses and Instructors," Chico State College, California, 1968. (Mimeographed) Lathrop, R. G.; Richmond, 0. "College Students' Evaluation of Courses and Instructors," Chico State College, California, 1968. (Mimeographed) Linn, R. L.; Centra, J. A.; Tucker, L. "Between, Within and Total Group Factor Analyses of Student Rating of Instruction," Multivariate Behavioral Research, July, 1975, pp. 277-288. Lovell, G. D.; Haner, C. F. "Forced - Choice Applied to College Faculty Rating," Educational and Psychologioal Measurement, vol. 15, 1955, Pp s 291-304 0 Marsh, H. W.; et. a1. "validity and Usefulness of Student Evaluations of Instructional Quality," Journal of Educational Psychology, vol. 67, No. 6, 1975, pp. 833-839. Marsh, H. W. "The validity of Students' Evaluations: Classroom Evaluations of Instructors Independently Nominated as Best and Worst Teachers by Graduating Seniors," Ameriéan Educational Research Journal, Fall 1977, Vol. 14, No. 4, pp. 441-447. Maslow, A- H.; Zimmerman, W. "College Teaching Ability, Scholarly Activity and Personality," Journal of Educational Psychology, Vol. 47, 1956, pp. 185-189. MeKeachie, W. J.; Lin, Y.; Mann, W. "Student Ratings of Teacher Effectiveness: validity Studies," American Educational Research Journal, Vol. 8, 1971, pp. 435-445. 137 Mehrens, W. A.; Lehmann, I. J. Measurement and Evaluation in Education and Roychology, New York: Holt, Rinehart and Winston, Inc., 1973. Meier, R. 8.; Feldhusen, J. "Influence of Instructor Expressiveness on Student Ratings," Paper Presented at NCME, 1978. Morsh, J. E.; Burgess, G. C.; Smith, P. N. "Student Achievement as a Measure of Instructor Effectiveness," Journal of Educational Psychology, 1956, Vol. 47, pp. 79-88. Nie, N. H.; Hull, C. H.; Jenkins, J. G.; Steinbrenner, K.; Bent, D. H. Statistical Packoges for Social Sciences, New York: ‘MeGraw-Hill Book Company, 1975. Office of Evaluation Services at Michigan State university. Student Instructional Rating;Systemg Stability of Factor Structure; SIRS Research Report #2. November 1, 1971. Office of Evaluation Services, MSU. "Student Instructional Rating System (SIRS) Technical Bulletin," December 22, 1969. Office of Instructional Resources, Measurement and Research Division; University of Illinois at Urbana-Champaign, "Instructor and Course Evaluation System (ICES)," Newsletter Number 1, 1977. Olson, L. A. "Behavior-Specific Items for Student Rating of Instruction," Paper Presented at AERA,1978. Owen, S. V. "Classroom Seating Patterns," Paper Presented at NCME, 1978. Pohlmann, J. T. "A Multivariate Analysis of Selected Class Characteristic and Student Ratings of Instruction," Multivariate Behavioral Research, Rayder, N. F. "College Student Ratings of Instructors," Journal of Experi- Rose, L. A. "Adjustment of Student Ratings of Teachers for Extrinsic In- fluences," The Journal of Economic Education, Spring 1975, pp. 129-132. Rosenshine, B. "Enthusiastic Teaching: A Research Review," school Review, Vol. 78, 1970, pp. 499-514. Rosenshine, B.; Furst, N. "Use of Direct Observation to Study Teaching," in R. M. Travers (ed.) Second Handbook of Research on Teaching, Chicago: Riley, J. W.; Ryan, B. F.; Lifochitz, M. The Student Looks at His Teacher, New Brunswick, New Jersey: Rutgers University Press, 1950. 138 Siegel, S. Nonparametric Statistics, MeGraw-Hill Series in Psychology, New York: ‘MeGraw-Hill Book Company, 1956. Showers, B. H. "Alternate Response Definitions in Instructional Rating Scales," 1973,.A Dissertation at Michigan State University. Smock, H. R.; Crooks, T. J. "A Plan for the Comprehensive Evaluation of College Teachers," Journal ofgigher Education, Vol. 44, 1973, pp. 577-586. Soloman, D. "Teacher Behavior Dimensions, Course Characteristics, and Student Evaluations of Teachers," American Educational Research Journal, Vol. 3, 1966, pp. 35-47. Spencer, R. E. The Illinois Course Evaluation Questionnaire: iManual of Interpretation: Research Report No. 270. Champaign 111.: Univer- sity of Illinois, Office of Instructional Resources, Measurement and Research Division, 1968. Spencer, R. E.; Aleamoni, L. M. "A Student Course Evaluation Question- naire," Journal of Educational Measurement, vol. 7, 1970, pp. 209-210. Stalnaker, J. M.: Remmers, H. H. "Can Students Discriminate Traits Associated with Success in Teaching," Journal of Applied Psychology, 1928, vol. 12, pp. 602-610. Starrack, J. A. "Student Rating of Instruction," Journal of Higher Educa- tion, vol. 5, pp. 288-290. Tatenbaum, T. J. "The Role of Student Needs and Teacher Orientation in Student Ratings by Teachers," American Educational Research Journal, Vol. 12, No. 4, Fall 1975, pp. 417-433. Tolor, A. "Evaluation of Perceived Teacher Effectiveness," Journal of Educational Poychology, vol. 64, 1973, pp. 98-104. Toug, M. 8.; Feldhusen, J. F. "validity of Student Ratings of Instructors," College Student Journal, Vol. 8, 1974, pp. 2-5. Villano, M. W. "The Development of an Omnibus Student Rating Form for Evaluation of Courses and Instructors at a Large University," Paper Presented at NCME, 1977. - Walker, B. D. "An Investigation of Selected variables Relative to the Manner in.Which a Population of Junior College Students Evaluate Their Teacher," Dissertation Abstracts, vol. 29 (9-B), p. 3474. Webb, W. B.; Nolan, C. Y. "Student, Supervisor, and Self-Ratings of Instructional Proficiency," Journal of Educational Psychology, V61. 46, 1955, pp. 42-46. 139 Whitely, S. E.; Doyle, K. O. "Implicit Theories in Student Ratings," American Educational Research Journal, 1976, Vol. 13, No. 4, pp. 241-254. Williams, R. G.; Ware, J. E., Jr. "An Extended Visit With Dr. Fox: Validity of Student Satisfaction with Instruction Ratings after Repeated Exposures to a Lecture," American Educational Research Journal, Fall 1977, Vol. 14, No. 4, pp. 449-457. Wilson, P. A.; Wilson, T. C. "What Factors Contribute to Better Instruc- tion?" Paper Presented at AERA, 1978. Wilson, T. C.; Wilson, P. A. "Differences in Student Evaluations from Business and Other College," Paper Presented at a Symposium on Student Evaluation at SMA, 1977. Wood, N.; DeLorme, C. D. "An Investigation of the Relationship Among Teaching, Evaluation, Research and Ability," Journal of Economic Education, Vol. 7, Spring 1976, pp. 77-80. Wotruba, T. R.; Wright, P. L. "How to Develop a Teacher-Rating Instru- ment," Journal of Higher Education, 1975, Vol. 46, No. 6, pp. 653-663. "lllfilfiflllljflllli