SOME EFF£CTS OF EMPHASHZENG THE LEARNENQ FUNCTEON 0F CLASSROOM ACHiEVEMENT EXAMINATlONS Thesis for the Degree of D. A. G; MICHIGAN STATE UNIVERSITY ABEL EKPO~UFOT 1969 wL= 14L LIBRAR/Y It ‘ Michigan StatO L1 University ‘NE-D‘C ABSTRACT SOME EFFECTS OF EMPHASIZING THE LEARNING FUNCTION OF CLASSROOM ACHIEVEMENT EXAMINATIONS BY Abel Ekpo—Ufot How may achievement examinations be conducted so as to better define and attain the objectives of classroom instruc- tion? That is the problem investigated in this study. It is suggested it may be solved by emphasizing the learning function of examinations. The suggestion rests on the literature evidence that examinations are a learning device. The authors quoted as supporting this view include Jersild, Standlee, Fitch and Page. This study was designed to test methods which might capitalize on such learning function. The methods consisted in requiring one experimental group to take examinations twice-—in class and outside class. A second experimental group both repeated and evaluated their performance before they had feedback. A third group, the control, was permitted to keep the test scripts. The hypothe- ses were that each of the experimental groups would score higher than the control on the final examinations, and the Abel Ekpo-Ufot second experimental would score significantly highest among the three. Furthermore the attitudes of the experimental groups would be more favorable towards examinations and grad— ing than those of the control. About 1500 students formed the population for the study. They were enrolled in two courses in the College of Education, Michigan State University during fall term 1968. In one course there were thirty-four classes, of whom thirty-two were randomly selected to run experiment 1. All the twelve classes in the other course were used in experiment 2. In both cases the classes were randomly allocated to treatment. The study began with the develOpment of a scale for measuring students' attitudes towards examinations and grading. Free response Opinions were obtained from a sample of students by the use of an open-ended questionnaire. Content analysis of the returns produced nuclei statements for the attitude items. Attitude is multidimensional. The key dimensions are "positive" and "negative," but these may not be on the same continuum since they are supported by different attributes of the psychological object. Such is the framework which guided the writing of attitude statements. These were tried out with a sample of 585 students representative of the uni- versity; the responses were factor analyzed and the final items selected on the basis of their loadings. In the main study two examinations were administered within the term and students carried out instructions as Abel Ekpo-Ufot Specified for their treatment conditions. The final examina— tions were the criterion measures of achievement. The Friedman xi test showed an overall treatment effect only in experiment 1, and a t-test revealed that the mean for the second eXperi— mental group was significantly larger than the one for the control. However,the absolute differences were small; but the trend was in the predicted direction. This is explained as due largely to the effect of the second treatment condition: it did stimulate effort and the self—evaluation would aid understanding. The chief weakness of the study was poor control: all groups kept the within-term examination scripts. Also, the period of one term might have been too short for the treatment to work. These might partly account for the haphazard results obtained on the attitude criterion measures. The main conclusion is based on the trend in the pre- dicted direction revealed in eXperiment 1. If students were required to repeat and evaluate their examination performance their achievement of course objectives would tend to rise higher than what it would be without such conditions. It would be worthwhile to investigate the hypothesis that this "self- evaluation" would make students attitudes more "positive" than "negative" towards examinations and grading. Any contri- butions of this study to knowledge are conditional on col— laborating evidence--evidence to support the usefulness of the attitude scale, and the model on which it is based, and Abel Ekpo-Ufot above all evidence to show that emphasis on the learning function of examinations will produce the type of effect weakly indicated in experiment 1 of the present study. SOME EFFECTS OF EMPHASIZING THE LEARNING FUNCTION OF CLASSROOM ACHIEVEMENT EXAMINATIONS BY Abel Ekpo-Ufot A THESIS Submitted to Michigan State University reporting research done as part of the Special requirements for the degree of EDUCATION SPECIALIST College of Education 1969 / 635936] 6/22/(99 COpyright by ABEL EKPO-UFOT 1969 This Thesis is Dedicated to: Ufot-Ekpe—-my late father, Amma—mmi-—my mother, Fred Akpan—-my uncle and stepfather, Udo-Eka-Ekpo, Udo—Aka ("Gabriel"), Ebenge and Iboro--my brothers, Idorienyin--my sister, to: Esit-Ima (Grace)—-my wife, and to: James S. Karslake-_my strategist. TABLE OF CONTENTS CHAPTER I INTRODUCTION . . . . . . . . . . . . . . . . . The Functions of Examinations . . . . . . Dissatisfaction with Testing. . . . . . . The Problem . . . . . . . . . . . . . . . Related Literature. . . . . . . . . . . . The Purposes and Hypotheses . . . . . . . II METHODOLOGY OF THE MAIN STUDY. . . . . . . . . The Treatment . . . . . . . . . . . . . . The Criterion Measures. . . . . . . . . . The POpulation, Sampling, and Allocation to Treatment . . . . . . . . . . . . The Design. . . . . . . . . . . . . . . . Procedure . . . . . . . . . . . . . 4 . . Analysis. . . . . .-. . . . . . . . . . . III RESULTS AND ANALYSIS . . . . . . . . . . . . . EXperiment 1. . . . . . . . .'. . . . . . Experiment 2. . . . . . . . . . . . . . . The Hypotheses on Attitudes . . . . . . . IV DISCUSSION . . . . . . . . . . . . . . . . . . Interpretation of Results . . . . . . . . Implications of the Study . . . . . i . . Suggestions for Further Study . . . . . . V SUMMARY. . . . . . . . . . . . . . . . . . . . Chapter by Chapter Review . . . . . . . .' Contributions to Knowledge. . . . . . . . Summary of Tentative Conclusions. . . . . Summary of Testable Hypotheses. . . . . . Summary of Conditional Contributions. . . Conclusion. . . . . . . . . . . . . . . . LIST OF REFERENCES. . . . . . . . . . . . . . . . . . iii Page 0103me P 16 18 19 20 26 27 27 32 55 4O 4O 47 51 55 55 64 66 67 68 69 7O TABLE OF CONTENTS — Continued Page APPENDIX A—-The Students' Attitudes Towards Exami- nation and Grading Scale Battery (SATEG)° 75 APPENDIX B-—Specific Instructions as Originally Designed for the Treatment Conditions . . 153 APPENDIX C--Mean Percent of Respondents Choosing Option on the Factor Sub-scales . . . . . 160 iv LIST OF TABLES TABLE 1. 2. Mean and Rank Scores on First Examination (Experiment 1). . . . . . . . . . . . . . . . . Mean and Rank Scores on Final Examination (Experiment 1). . . . . . . . . . . . . . . . . Covariance Analysis of the Final Examination Scores-Y (Experiment 1) . . . . . . . . . . . . Mean and Rank Scores on First Examination (EXperiment 2). . . . . . . . . . . . . . . . . Mean and Rank Scores on Final Examination (Experiment 2). . . . . . . . . . . . . . . . . Mean Group Scores on the Attitude Factor Sub- scales. . . . . . . . . . . . . . . . . . . . . Rank-Score Positions at the High End of the Factor SUb_ScaleS o o o o o o o o o o o o o o o Grouped Frequencies, Range and Median of Inter- Item Correlations . . . . . . . . . . . . . . Mean Percent of Respondents Choosing Option on the Factor Sub—scales . . . . . . . . . . . . . Page 28 5O 52 55 54 56 59 112 118 LIST OF FIGURES FIGURE Page 1. The Attitude Model. . . . . . . . . . . . . . . 80 2. Attitude Profile of the Try-Out Sample (N=575). 117 5. The Attitude Model with Specific Reference to (a) Examinations, and (b) Grading . . . . . . . 121 vi PREFACE One conducting this type of study must have a bias; so has the writer. He does not share the View that class— 'room achievement testing should be abolished at a formal institution of learning--be it a school or a university. He does not consider that these twin aspects of the cur— riculum are a necessary evil: it may not be necessary that they should be to the student a "traumatic experience". Rather he shares the view that they are "a natural part of the total learning process." But this bias may have intruded itself unwittingly into the tone of the presentation of this thesis; for this the writer sincerely apologizes to the reader. He really meant to present it as a scientific study uncolored by his biasesmwbut he may not have succeeded. In particular he has offered tentative conclusions based on trends revealed in one of the two experiments conducted. But he has not hidden the fact that the evidence is very weak, not only because the absolute differences in the so-called trend were very small despite the ”significance" of the t—test comparisons, but also because the results in the second experiment were not definitive. The reader should therefore take as vii hypotheses to be investigated all tentative conclusions made in this thesis. Many peOple have contributed to make this study possible. The twenty-two professors who undertook to administer the Attitude Scale deserve first mention; so also the students who served as "guinea pigs". The writer wishes to express his thanks to all these unnamed persons. Thanks are due to Dr. Andrew C. Porter, and his staff of research consultants in the College of Education. Their criticisms and comments on the design of this study were of great value. Two of the writer's teachers deserve special mention: one is Dr. Maryellen McSweeny, of the College of Education, and the other Dr. Charles F. wrigley, of the Psychology Department and Director of the Computer Institute for Social Science Research. As the reader will soon find this study was in a way a "try-out" of some research methods. These two professors gave the writer, among other students, a brilliant introduction to these methods. Besides they have criticized parts of the study that relate to their special- ties, and in some cases have actually helped in the inter- pretation of the data. Another professor who had criticized parts of this study is Dr. Willard G. Warrington, Director of the Evaluation Services. His searching questionings con- tributed much to the development of the Attitude Scale to be reported. viii The four members of the Program Committee occupy a unique position. The Chairman Dr. Robert L. Ebel, has indirectly inspired this study in that his philosophy is behind it. Dr. James S. Karslake, of the Psychology Depart- ment, urged that a research study be included in the writer's program for the Education Specialist degree. The other members are Drs. RObert C. Craig, Chairman of the Department of Educational Psychology and Dr. Paul L. Dressel, Director of Institutional Research. These four have each rendered constructive and valuable criticism on the thesis to be presented. The writer is grateful to them for their services. Without a scholarship grant by the home Government of Nigeria the writer might not have embarked upon graduate education. This Government has therefore contributed in- directly but significantly to this study. Thanks are also due to Drs. W. Sweetland, and D. Freeman, and their staffs of instructors and secretaries. The study might have been sabotaged without the cooperation of these people running the courses in which the experiments were carried out. Apart from Dr. Karslake the other persons to whom this thesis is dedicated are of the household in which the writer is a member. He is deeply grateful to them for the price they are paying--to wait. ix The last offer of thanks is to those who at one time or another have grappled with the problems of education. There is nothing reported here which is not owed to MAN. CHAPTER I INTRODUCTION The focus of this study is on classroom achievement examinations and the twin practice of grading. In this intro- ductory chapter some functions of examinations are stated. The fact that these may not always be realized leads to a statement of the problem to be investigated. This in turn is related to the evidence in the literature, in particular that which SUpports the view that examinations perform some learning functions. The chapter closes with a statement of the purposes and hypotheses of the study. The Function of Examinations An important objective of formal education is the acqui- sition of knowledge. Though, practices differ, classroom achievement examinations are widely used for assessing how far this objective has been attained. Examinations perform other important functions also. They motivate the student to learn. Admittedly, this function is differential; as Tyler (1966) observes they may stir up "feelings of incompetence in some students." However, it is likely that such unmotivated students are in a small minority. Furthermore, examinations provide a learning experience per se: in Stone's (1955) words they "represent practice sessions which aid the fixation of correct responses . . . and the elimination of error." In other words, the taking of examinations in effect promotes and guides learning. Moreover, in a society such as ours, it would appear one cannot escape evaluation. If the school exists to prepare youth to fit a need in society then it cannot altogether ignore some preliminary evaluation of the products it turns out to society. It may even be argued that such evaluation helps to remind the student of his future role,.and that society ex- pects him to be proficient in his fulfilling that role. If this argument be granted then, from the student's point of View, there are at least four functions of classroom examina- tions. The motivation for learning, the promotion, fostering and guiding of learning, the assessment of what ”amount" has been learned and the reminder that learning must be proficient if one is to fulfill his role adequately in society-~all these are of special importance to the student. Dissatisfaction With Testing Teachers tend to overemphasize the assessment function at the expense of other functions of classroom achievement testing. In such a situation, the attainment of the educa- tional objectives may be limited or thwarted. Evidence is not lacking that there is some dissatisfaction with testing in general and achievement testing in particular. Take for an example Banesh Hoffmann's book: The Tyranny of Testing (1962). The author is directing his attack against the "professional testers" and their reliance on multiple-choice tests and item statistics. But his view that "there is no satisfactory method of testing" applies to the classroom situation as well. "If sample questions made by the best test makers can give cause for concern," he asks, ”What of multiple-choice tests made by individual teachers for their own classroom use. . .?” The poor quality of test items, as Hoffmann says is cause for concern. But one is tempted to express the Opinion that, within the classroom, the "tyranny of testing" is most evident in the teacher's toonmuch-emphasis on the assessment function of examinations at the expense of the learning func- tions, and the dissatisfaction among students may partly be explained by this fact. The Problem Granted that examinations serve important functions to the learner, it would appear there is a strong case to retain the system as an aspect of the school curriculum. If one takes this position he is faced with a problem: how may achievement examinations be improved in use so as to better define and attain the objectives of classroom instruction? This appears to be an important practical problem in all education. It may be that the achievement examination is perceived as a necessary evil because of how it is carried out in practice: the questions posed may be unintelligible, or they may be ambiguous, or they may be highly speeded, or the student may be denied the opportunity of knowing what his performances are in the light of expected responses. Moreover, as hinted earlier, it may be the teacher has created an atmos— phere which overemphasizes the assessment function. This may be the case when he deprives the student of the opportunity to have back the examination papers because they must be kept secure for use with other sets of students. This practice added to other undesirable elements bias the student's attitude against examinations. The position taken here is that for those.who think the examination system may be retained, the problem of improvement in use may be partly solved by emphasizing the learning func— tion of classroom testing. This change of emphasis is in accord with the teachers professional role in the learning situation. Besides the new emphasis may hopefully change the student's perceptions of examinations, and the twin practice of grading. The evil aspects of the system will thus be minimized and conditions set for higher attainment of objectives-—higher than the attainment possible under condi— tions where the assessment function is emphasized at the expense of the learning function. If, for example, students take an examination in a class- room situation under the so—called "examination conditions" and in addition repeat the examination "at home," making use of all available resources, excluding the teacher and fellow students, and their performances on both occasions count for their grades, then they may perceive the learning function of examinations. In this case, a student would be "cheating" if he were to solicit help from the teacher or his fellow student, within the “examination period." Other genuine efforts to seek out the correct response would then be encour— aged and rewarded. If, in addition, the students are made to "grade" their own performance to the best of their knowledge, they would be learning still in carrying out such a requirement, and they may grow to perceive and appreciate the meaning of examina- tion grades. Classroom achievement examinations administered in this way may be described as improvement—in-use. From the point of View of both the teacher and the learner the modi- fied practice neither eliminates nor depreciates the assess- ment function; but it pushes to the fore the learning function, and this is likely to be richly rewarding. The specific problem of this study was to investigate these suppositions using students enrolled in two courses in the College of Education, Michigan State University. Further details about the students and the courses are given in Chapter 2. The main purpose at this stage is to introduce the problem both in general, and with specific reference to the particular conditions in which it was investigated. How may achievement examinations be improved in use when applied to the courses selected, so as to better define and attain the objectives of these courses? To solve this prob- lem examinations were administered twice each--in class, and outside class-—to groups of students. Members of one of the grOUps were also required to evaluate their performances. Was such a procedure any improvement-in-use of examinations? The answer to this question will be found in Chapters 5 and 4. Meanwhile it will be necessary to relate such a practice to similar ones in the literature. Related Literature The problem posed and the solution proposed stem not only from a practical situation but also from two types of previous research studies. One type deals directly with the learning function of examinations and the other on the effect of knowledge of results and encouraging comments on student's examination performances. A few of these will be quoted to illustrate the connection. »Examinations as a Learning Device In a study on “Examination as an Aid to Learning" Jersild (1929) sought to answer this question: to what extent does the examination enforce an active participating attitude of mind on the learner,.and does such activity yield higher re- turns in achievement when compared to the attainment resulting from ordinary conditions of study? He used two equivalent groups in each of a set of replicated experiments where the main treatment variable was what the author called "pre- examination." By this he exposed the "experimental" groups to an examination experience before using the same test items or constructing new ones to assess the groups' achievement of course objectives. Thus the eXperimental groups had examination "warm—up" during the pre-testing or "pre-examina- tion" period. The other treatment variable was the examina— tion-type; there were three types: true-false, multiple choice and essay. There were five replications of this study. In the first two the "pre-examination" was made up of true-false items; multiple—choice items were used in the third and fourth experiments and the essay in the last one. Jersild's study is very relevant to the present one; three of the replications will therefore be described in some detail. The first experiment, like the others, was carried out in a psychology class. There were two sections in the class, each made up of 57 students. The course objectives are stated in general terms to include an understanding of "class~ room lectures" and "reading assignments." At the beginning of a semester one of the groups was randomly selected as the experimental group and given a "pre-examination" on materials to be covered in the next six weeks; the other class had not this treatment. At the end of the first six weeks both classes were administered the same true-false examination used in pre-testing the experimental group. The claSses exchanged roles in the second part of the semester such that the one that served as the control, formerly, became the “experimental" group and was pre—tested on materials to be covered in the rest of the semester. In the end both groups were assessed on their achievement by the same true—false test used in pretesting the experimental group. The third experiment also involved two classes, each with 42 students. The tOpic to be learnt was "Reaction Time." The experimental group was selected randomly and counter- balanced as described above. The "pre—examination" in this case was made up of multiple—choice items. But the final achievement was tested by newly constructed true-false and recall items. The procedure in the last experiment (N = 65 in each group) followed the lines already described. But here the "pre-examination" for the experimental groups was of the essay-type, and the subject to be learnt was a biographical selection. A test of immediate recall was administered as a criterion measure. The author summarizes the results of these experiments 100(Me) in the form of ratio scores M c where Me equals the mean score for the experimental group and MC the mean score for the control. With the exception of replications in which true-false items were used in the "pre-examination" the experimental group always scored higher than the control. The study may be criticized on the ground that it did not control for the memory factor. But as most of the results were in the predicted direction one cannot reject completely the author's conclusion that the treatment group excelled the control in subsequent performance, and that the treatment stimulated "the industry of the learner." The present study will use and modify Jersild's method of repeating the same examination with the treatment group; but the repeat will be outside class so that not only will the industry of the learner be challenged but also will he be able to use the examination directly as a study guide. Standlee §£_§;, (1960) investigated "quizzes" and their contribution to learning. The quizzes were made up of twenty true-false items and given at the end of each month of work; thirteen of them were administered during the experiment. There were three eXperimental conditions and a control. In condition 1 the quizzes were administered in the written form, graded by the instructor and the scripts were returned to the subjects. The author explained that the mere giving of quizzes would enforce the students' learning activity as well as provide a structuring of the course for the guidance of the students. The instructor's written grades provided extrinsic motivation; moreover the students had knowledge of their performance item by item as the corrected scripts were returned to them. 10 The second experimental condition received the quizzes in written form too; but the members checked their own work, presumably from key provided by the instructor. This group therefore experienced the same benefits as stated for those in condition 1, but without the extrinsic motivation from teacher—awarded grades. In the third condition the same quizzes were read out orally by the instructor who also pro— vided the correct answers. The only benefit enjoyed by this ggzroup was the enforced activity and course structuring. The control group enjoyed none of these benefits as it had no quizzes . All groups had a preliminary pre—test comprising 100 111111tiple-choice items which had been tried out in the same The scores on this were used course in a previous semester. The criterion as covariates in the analysis of the results. I1“feasures comprised of 100~item multiple-choice examination given at mid—semester, and a 150-item test given at the end. The mid-semester examination included 50 items from the pre- test while the final included the other 50 items which were j‘r1 ‘the pre—test, but not in the midwsemester examination. VV1145313 the mid-semester scores were used as criterion, signifi- cant difference was found at the .06 level as against the .05 h . . . . ybethesnzed. A t-test comparison of the means of condition 1 and the control was also significant at the .05 level. Th Q Cb - 3‘ terion . differences were not significant with the finals as ¥ 11 It appears that the author's criterion measures were not sensitive enough, since they contained from a third to a Furthermore, a "multiple .half of the items on the pre-test. <:omparison" technique like Scheffe's (1959) could have been ilsed. It must be remembered also that the quizzes were Iruade of true-false items, which according to Jersild (229 cit.) Eilfe of "dubious value as a pedagogical instrument." These .lgindtations may have eclipsed the effect of treatment. The present study also uses examination as the main treat~ :nneent variable. But all the defects listed above are avoided. .bdcareover, the idea of the subject grading his own work is «EICiOpted and given much weight and significance in that the .sstllfiect was given the Opportunity to compare his self-evalua— t::i<3ns with the evaluations from the teacher—experimenter. In a similar study Fitch §£__l, (1951) investigated the effect of "frequent testing as a motivating factor in large The authors found that frequent testing 1e Cture classes . " But they remarked that re Sulted in " superior achievement." "instructional function (is) best served when divorced tiflea The from the regular process of achievement evaluation.” I31:.‘5353ent study specifically challenges Fitch's (§£_2;,) remark. The subjects were told at the beginning that all examinations a - . . . cill-"llinistered would count towards their final evaluations. CPea F“‘§§EJEEZher's Comments The studies hitherto mentioned were conducted.in a col- leg ‘533 setting. Page (1958) couched his in a secondary school 12 setting. He had 74 teachers of different subjects from difw ferent schools involved in an eXperiment in which the treat- :ment consisted only of “teacher's comments" on objective examination answer scripts. The subjects for the experiment were drawn from the 7th through the 12th grades and twelve schools were represented. The treatment variable was at three levels--"no comment," "free comment" and "specified comment" and subjects were assigned to treatment at random. The experiment basically involved administration Of the treatment on the answer scripts of a first test and then us» , ing the performance on a second as criterion. Since a factoral design was used, the experimenter was able to investigate interactions between treatment and schools, and Classes and school year. The results were analyzed by the Friedman Rank-Test and the effect of treatment was highly Significant. The author concludes: “When the average Secondary teacher takes the time and trouble to write com— ruerits . . . these apparently have a measurable and potent e:Ezlfect upon student effort . . . or whatever it is which ca-‘Lnses learning to improve.” It would be interesting and valuable to know whether Similar conclusions can be reached if the study were conducted In a college setting. The present study incorporated Q'Cbrnments" in one of the conditions. But perhaps its greatest QQ hhection with Page's work will be in the use of different QQ uxses, and similar test statistics in result analysis. <3:f? 15 The Purposes and Hypotheses The related studies reviewed provide evidence that class_ room achievement examinations perform some learning functions besides their measuring function. The present study was not concerned with establishing additional evidence for this learning function; this was, and is assumed. Rather it was (moncerned with manipulating the examination variable in order tx> realize and increase its value as a learning device. As iJidicated earlier there is some dissatisfaction with examina- thons. Such a state Of affairs would appear to result from tlie way examinations are operated, and not through any in- tirinsic attribute of examinations. One may even suspect that izkuose who speak of the "tyranny of testing“ would not gainsay Jitzs potentiality to stimulate and promote learning in a class- room situation . .P_ urposes It was suggested earlier also that attainment Of course C313§jectives may be increased and that student attitude may 1>€2Oses just stated: 1) The experimental groups which repeated earlier exami- nations would score higher than the control group on subsequent examinations in the same course. 2) The experimental group whose members both repeated and evaluated their performances in earlier examina- tions would score higher than all other groups in subsequent examinations in the same course. 5) The attitudes of the eXperimental groups as measured by a specially developed scale would be more favor- able towards examinations and grading than the atti— tudes of the control group. The first two were tested at the .05 level of signifi— cance, but the results on the last hypothesis were used for rank-ordering the treatment groups on the criterion measure. CHAPTER II METHODOLOGY OF THE MAIN STUDY By now it should have been obvious to the reader that the "treatment variable" in this study was the method Of examination. It was the "variable" not in the quantitative, but in the qualitative sense. But it is necessary to ex- plain how it was supposed to Operate. The Treatment There were three conditions as follows: 1) an examination was taken under normal conditions and members of the group were required to repeat their performance in non-examination conditions; 2) an examination was taken under normal conditions and members of the group were required both to repeat and to evaluate their performances on the two occa— sions; 5) an examination was taken under normal conditions and members of the grOUp were required neither to repeat nor to evaluate their performances; but they were permitted to keep the examination scripts as their prOperty. These three treatments may be referred to as T1, T3 and T3, respectively. The requirements stated above are clear; but the second one for T3 may easily be confused with the so- called "level of aspiration" type of experiment. In the latter, the subject "estimates" his score, for example; by and large 16 17 such estimates are guesses, depending as they do on past experiences of success or failure. It must be emphasized that the self-evaluations envisaged here should not be guessed estimates; if they are guesses depending on what "self-concept" the subject holds, then they are not the treatment implied in this study. If the self-evaluations were not to be guessed esti- mates, what should they be? They were and should be scores and grades which the subject awards himself--solidly based on knowledge—~present knowledge, which he gains by expending effort to use all possible resources, excluding the teacher and fellow students, to search out for the correct responses for the test items. In a University setting the requirement to carry out such a search is not beyond the student. Even in a High School setting with fairly adequate library and other learning facilities the student can cope with this requirement. Is T1 different from T2? Both require effort to search out the correct answers. It is, however, claimed that the additional requirement imposed for members Of the second group to judge their work induces them to pay more attention than do the members of the other group; should this be so they would also learn more. By the same argument members of T3 might not learn as much as those Of the other two groups. It should be added that all the three treatment condij tions emphasize the learning function of examinations. 18 Clearly, the "practice effect" is double for T;, and T3, and all three groups have the Opportunity to use the test items as a study guide. The Criterion Measures The final examinations at the end of term reflect the course objectives; scores on these were used to test the hypotheses on students achievement. The second set of criterion measures were scores on the "Students'Attitudes Towards Examinations and Grading" scale batteryf-a scale which may be called for short "SATEG" scale battery. This instrument was Specially develOped for this study. A brief account of the Operations involved is rela- vant here. Free reSponse statements of Opinions on examinations and grading were first obtained from a small sample of stu- dents through an Open-ended questionnaire. These responses were content analyzed in a search for "significant“ statements which focus on clearly specified attributes of examinations and grading. The selected significant statements formed nuclei for the initial sixty-five attitude statements con— structed. These were rated and Q-sorted by ten judges. The forty and eight statements which survived that exercise were administered to a representative sample of students, and the responses were factor analyzed. Finally thirty-two items were selected largely on the basis of their high loadings on 19 the various "factors" revealed. There was therefore suf— ficient evidence both in the operations outlined and in the reliabilities of the factor sub-scales——enough evidence, that is, to show that the scale is fairly valid for the purpose for which it was designed. Full details of the Operations will be found in Appendix A. The Population, Sampling, and Allocation to Treatment The population used in this study was made up of stu- dents enrolled in two courses during the Fall Term, 1968. The courses are (1) ED 200: Individual and School and (2) ED 450: School and Society. Both are Offered in the College of Education, Michigan State University. In ED 200 there were 54 classes, each made up of at least 50 students. Sixteen instructors were in charge of 52 c1asses-—one each in the morning and one each in the after— noon. Two other instructors were in charge of the other two classes, one for each. In one of these another experiment was in progress, and to control for possible contamination from this source the class was not sampled; the other class was also withdrawn since its instructor had one class and not two as the others. .Thus sixteen instructors class-groups were left for sampling. Fifteen of these were randomly selected, randomly formed into three equal groups and the groups then randomly assigned to the experimental conditions. The selection,'formation of groups and allocation to treatment was done separately, and based on a table of random numbers. 20 In ED 450 there were twelve classes of at least fourteen students each. .These were under seven instructors, five of whom taught two classes each. The other two had one class each. All classes here were involved in the study. These twelve classes were randomly formed into three groups and the latter were then randomly allocated to the three treatment conditions. The pOpulation thus defined is rather limited and conclu— sions will largely be confined to it. But it may be argued that it represents typical education students as these two courses.are required Of all education majors. To the extent that these students are typical of education majors in particu- lar and college students in general, the conclusions may be extended. However, no attempt at such wide generalization will be made from this study—-as yet. The Design The main elements of the design have been described, but it is necessary to add that the study was conducted as two separate experiments. The one in ED 200 will be referred to as experiment 1, and the one in ED 450 as experiment 2. The resulting sub-designs are illustrated in tabular forms below: 21 SUMMARY OF SUB-DESIGN FOR EXPERIMENT 1 T1 T2 T3 C1 C2 C3 C4 C5 C11 C12 C13 C14 C15 C21 C22 C23 C24 C25 C6 C7 C8 C9 C10 C16 C17 C18 C19 C20 C26 C27 C28 C29 C30 SUMMARY OF SUB-DESIGN FOR EXPERIMENT 2 Ti T2 T3 Ci C2 C5 C6 C9 C10 C3 C4 C7 C8 C11 C12 KEY: Treatment Class (nested within Treatment) Procedure This study was a practical classroom eXperiment. It is, therefore apprOpriate to describe first how the courses used are normally organized, and then the execution of the experi- ment and how it was woven into the existing structure. There is always a large enrollment in the two courses. In the period Of study, the totals were 1129 and 185 for ED 200 and ED 450 respectively. The lectures are given by a team of professors including the Course Coordinators who are also responsible for all arrangements relating to the courses. The students are divided into "discussion" groups under the leadership of graduate assistants, as instructors. These 22 groups constituted the "classes" which were the experimental units in the present study. The administrative operations were conducted at three levels--(1) arrangements with the Course Coordinators, (2) contact with the instructors and (5) students' activities. Consultation with the Course Coordinators preceded and continued throughout and beyond the study period. They were informed of the nature and purpose of the study through dis- cussion, and the prOposal was made available to them. They in turn supplied the writer information on the number of dis- cussion groups and their instructors. The latter formed the basis for the definition of classes, formation of groups and allocation to treatment. .All these were done randomly as described earlier. Furthermore, the Coordinators were told in discussion and in writing the type of scores that would be used as criterion measures. Such information would be kept in their records which would be made accessible to the writer when he needed them. Contact with the instructors was to be kept at a minimum. There were reasons for this. First, the writer did not wish to bias any of them for or against the treatment; secondly he would have preferred an atmosphere in which no fuss about the study existed, and in which as far as possible the subjects remained naive; thirdly, it was desired to see how far the procedure for carrying out this study could be understood from written instructions only. More will be said on these points 25 in chapters IV and V . Meanwhile, it will suffice to say that the instructors were expected to follow written instructions but that in actual practice the writer dealt with problems individually as they arose. These were very few in the T; condition. Most of the problems arose in connection with T2 making it necessary to eliminate certain aspects of it. Originally the members of this group were expected to graph their scores and grades. Such a graph was called a "progress chart" and was to be submitted to the instructor for "comments.'i Furthermore the instructor was to allow at least ten percent of his assigned grade to the activities involved in this study. These aspects of the treatment were eliminated because they involved both the student and the instructor in too much work. Appendix.B presents all that was originally designed for both treatment conditions and includes the supplementary in- structions in full. Here it is only apprOpriate to present the instructions as they actually applied. These were given orally by the instructor, and woven into his planned activi— ties for his class. Instructions for T; Condition 1) "You will be expected to repeat each of the two within- term examinations at home. You may take up to four days before submitting this second attempt for scoring.”' 2) "You will be free to make use of all resources, ex- cluding instructors and fellow students. Your aim should be to come out with all answers correct, work— ing independently." 24 Instructions for T2_Condition 1) 2) 5) 4) "You will be eXpected to repeat each of the two within-term examinations at home. You will be free to make use of all resources, excluding instructors and fellow students. Your aim should be to come out with all answers correct, working independently." "You will be eXpected to score and grade your two performances. Score, using your best judgments on what you feel are the correct answers. Evaluate your scores by assigning grades to yourself (0 - 4.5) using some criteria you feel to be objective." "You may take up to four days before submitting your second performance for machine scoring." "Later when you receive the feedback, check your scoring and self-evaluation and discuss the discrep- ancies with your instructor, until you are satisfied.” During the study it became necessary to issue SUpple- mentary instructions for this group. They were likewise addressed to the instructor. Here again the full instructions will be found in Appendix B (c). The relevant portions actually adopted were as follow: 1) 5) 4) "Ask your students to: a) write their names on their test booklets-~to help them recover their copies, b) mark their in-class performance on both the test booklet and the answer sheets provided; the answer sheets will be handed in but they will keep (or pick Up later) their test booklets to score and grade the markings at home-~as described below." "Give to every student a spare answer sheet and a pencil for the repeat performance described below.‘I "Emphasize that every student is to rework the test making use of all possible resources, excluding fellow students and instructors. To prevent any embarrassment over wide discrepancies this exercise must be done first and with care.“ "When and only when the student has established enough confidence in his/her answers on the second perform- ance (without any consideration of the first), then 25 and only then should he/she proceed to score and grade this second performance. Emphasize that guess- ing in any form will result in wide discrepancies.” 5) "With the scoring and grading of his/her repeat per- formance as the "Key" the student then turns over to his marked test booklet to score and grade that performance also. Treatment Condition T3 This was the "control" group; the members were allowed to keep their test booklets, but no other requirements were expressed. The third level of operation may be described under stu~ dents' activities. These consisted of their following instructions as these were communicated to them through their instructors. Members of both groups T1 and Ta repeated their performances in the examinations and re-submitted their work for machine scoring. But in some classes, and particularly in experiment 2 there were misinterpretations of the self- evaluation requirement at the beginning of the experiment. As mentioned earlier it became necessary to issue supplemen-y tary instructions; after that there were no more problems. Members of the control group (T3) were not required to do anything other than take back the examination scripts which they kept as their properties. Finally, it is also relevant to note that all students were given written "keys” to the two within-term examinations. These were however delayed for about four days until members of grOUps T1 and T2 had turned in their second performances and self evaluations. Instructors also discussed the tests in class. 26 Analysis The results of this study were analyzed by nonparametric methods. In particular the Friedman X? was used to test the overall effect of treatment with respect to the hypotheses on achievement of course objectives. This was followed by a t-test comparison of group mean scores. The grOUps were ranked on the attitude criterion according to their mean scores add percent of high scorers on each sub-scale. Full details of this analysis and the outcome are presented in chapter III. CHAPTER III RESULTS AND ANALYSIS The use to which the results of the first mid—term examinations were put is given in this chapter. This is fol- lowed by the outcome of the study with respect to the hypothe- ses investigated. The analysis is made for each of the two experiments separately. Experiment 1 There were ten classes under each treatment condition in experiment.1. Their mean scores on the first mid-term exami- nation are shown in Table 1 on the following page. It will be noticed as one reads down the columns under each treatment condition that the scores are arranged in descending order of magnitude. Thus Class 1 occupies top rank position within the T1 group; class 11 occupies top rank posi- tion within group T2; similarly class 21 is tOp in the T3 group. _As a further illustration class 9 is ninth in T1; class 19 and class 29 are also ninth in the groups T2 and T3 respectively. This arrangement makes it possible to match the classes according to their rank positions in their respec- tive groups. It turned out as the table shows that the 27 28 .a scan map mm: pmmnmnn was * so.mm mm.mm mm.sm mnoum cmmz msonw a mo.sm mm om m am.mm mm om m sm.mm mm oa m Hm.¢m mm mm a mm.am mm ma m ma.sm em a a sm.em mm mm m mm.em mm ma m mm.em mm m m em.¢m an em a mfi.mm me be! m mm.sm an e m ma.mm mm mm a ms.mm mm me n am.sm mm m m ma.mm mm mm a om.mm mm ma m mo.mm mm m m mm.mm mm am a m>.mm em «a m am.mm mm a m >¢.mm mm mm m m~.mm mm we a am.mm mm m m am.mm mm mm m mm.mm pm me a om.mm mm N m em.mm mm Hm m «a.mm em as a mm.mm an a xcmm mnoum z mmmao. xcmm muoom Z mmmau xcmm whoom z mmma *BOM 2mm: *3Om smmz *3om cam: 0H. NH. HR. Ad BZflSHmmmva ZOHBdZHEde BmMHm ZO mMMOUm MZ¢M QZ¢ Zflmz H mflmflfi 29 matched classes would have very nearly identical scores if these were reduced to two significant figures. The columns headed “Row Rank" reflect the absolute difw ferences in the scores of members of the matched triples. Classes 1, 11, and 21, for example, have scores of 26.52, 26.11 and 25.84 respectively; their rank scores within the triple are therefore 1, 2 and 5 respectively. Scores for classes 7, 17 and 27 are 24.56, 25.16 and 24.97, and the corresponding rank scores are 5, 1 and 2 respectively. The Friedman test (Siegel, 1956) was applied to test the signifi- cance of the difference of the sum of ranks shown in the "Row Rank" column. .This was not significant (Xi = 4.2; x:(.05)2 = 5.99). -Evidently the differences among the groups were not statistically significant at the start of the experiment. The Hypotheses on Achievement of Course Objectives The first hypothesis was that the mean score for group T1 would exceed the one for group T3 in the achievement Of course objectives as measured by the course end examinations. The second hypothesis maintained that the mean for T2 would exceed each of the means for T1 and T3. The final examina— tion results presented in Table 2 on the following page were uSed in testing these hypotheses. Table 2 shows that the classes in each matched triple ‘V‘ire ranked on the basis of their mean scores, as illustrated ’ _ ea"airlier. The Friedman test was applied to test the overall 50 mm.mm mm.mm am.mm mnoum Owe; ucmEummHB N mm.Hm on H mo.mm ON m mm.om 0H m HH.Hm mN H Hm.mm mH N Hm.Hm m m HH.Nm mm H mo.Hm NH N Hm.mm m N mN.Hm sN H so.Nm sH m mm.om s m om.Hm mN H mo.Hm mH N HH.mm m H «H.Hm mN m mm.Nm mH N sm.mm m m m©.Hm HN N mm.mm «H H om.mm H m OH.Nm mN H mm.mm mH N >m.¢m m m sm.Hm NN H mN.mm NH N mm.¢m N m mH.Hm HN H NN.©m HH N mm.Hm H Mcmm whoom mmMHU xcmm Onoom mmmHU Mcmm mnoom mmmao 30m cmmz 30m cmmz 30m cmmz me we as N mqmfia AH BZHZHmmmNmV ZOHBdZHZde A¢2Hm ZO mmmovm MZ¢M QZ< 24m: 51 treatment effect. The difference was significant at the .05 level (X? = 8.6; x:(005)2 = 5.99). This means that the risk involved in rejecting a contrary ("null") hypothesis that the treatment produced no effect has a probability of about five percent: in other words the probability is high that the null hypothesis is false. If so the alternative that there was a treatment effect may be accepted. A t—test comparison was then made between the pair of means for T1 and T3. The difference was not significant (t = 0.995; t.05(18)= 1.754). The meaning in this case is that the treatment effects on these two groups, if any, were not significantly different. The second hypothesis was in two parts: part 1 involves comparison of the means for T3 and T1; these as the table shows are almost identical. The other part involves the means for T2 and T3. A t-test showed that the mean for T2 was significantly larger than the one for T3 at the .05 level as hypothesized (t = 2.55; = 1.754). The chances t.05(18) are therefore small--about five percent—~that the null hypothesis of equality of means for the two groups is true. The alternative experimental hypothesis was therefore accepted that the mean for Ta was significantly larger than the one for T3. The nonparametric test revealed there was an overall Significant treatment effect. Would a parametric test lead tC) such conclusion? To provide answer to this question the Crfiterion scores were re-analyzed by the analysis of 52 co-variance method. The means for the first examination were used as co-variates. As mentioned earlier they were not significantly different, but the F—value of 2.08 suggested there might be one or two very large scores, so that it would be advantageous to remove the variance associated with initial test scores. Table 5 summarizes the results of this analysis. It is evident that the gain in the co-variance analysis is only slight. Without it the F-value is 2.45, and significant at 20% (F — 1.71); with it F is 2.76 and significant .20(2,27)‘ at 10% (F.10(2,26)= 2.52). In neither case is the difference significant at the five per cent level, as was hypothesized. TABLE 5 CO-VARIANCE-ANALYSIS OF THE FINAL EXAMINATION SCORES-Y (EXPERIMENT 1) Source SSX SSXY SSY ssy. df MS Between 0.645 1.915 9.595 5.901 2 2.951 2.76* {Within 18.056 21.252 52.860 27.818 26 1.069 Total 18.679 25.167 62.455 55.719 28 *Not significant; F 05(2 26): 5.57. A similar covariance analysis of the means for T2 and T3 also .revealed no "significant" difference; but the F value was (3.02 less than the one required for significance. The 52a tabular illustration below diSplays the relevant data. COVARIANCE ANALYSIS OF THE FINAL EXAMINATION SCORES FOR T2 vs. T3 IN EXPERIMENT 1 Experiment 2 Source SSx SSXY SSY SSY' df MSY' F Between 0.4565 1.8476 9.5775 5.5567 1 5.5567 4.4175 rWithin 6.6420 8.0159 50.9776 21.5085 17 1.2554 Total 7.0985 9.8615 40.5549 26.8452 18 F.05(1,17)= 4°45’ F.10(1,17)= 5'03 Table 4 on the following page shows the mean class scores on the first examination for the classes and grOUps in 55 GmmL sm.mm mm.mm ms.mm msouu a mm.mm .m« we m HA.Sm «a m m A¢.mm me a N mm.mm «a as m es.mm AA 5 H mm.mm «a m m Aa.mm we OH H mm.mm me o A mm.mm SH m m as.sm we a m «A.mm 4H m a Sm.sm 5H H Mcwm mnoum z mmmHU xcmm mnoum z mmmHU Mcmm wuoom. 2 need 30“ CM 0: 30“ CM 0: 30m C“ m: as we as AN BZHSHMMQNMV ZOHBdZHZdNH BMmHm 20 mmMOUm MZ¢M QZ¢.Z¢HZ fl Hflmflfi 54 experiment 2. The ordering of the classes within each condi- tion and their consequent matching and ranking within matched triples across treatment conditions were done exactly as described in experiment 1 earlier. The Friedman test was also applied to test the significance of the sum Of ranks in the "Row Rank" column. The groups were not statistically different (X2 = 4.50; X2 = r r.05(2) As in experiment 1 the hypotheses were that 5.99). (i) the mean for T1 is greater than the mean for T3 (ii) the mean for T2 is greater than the mean for T1 (iii) the mean for T2 is greater than the mean for T3. Table 5 below presents the data for testing these hy- potheses. But it is clearly evident that there is no need to apply any statistical tests: the group means are almost identical, and the figures in the "Row Rank" column show a pattern contrary to the hypotheses. The mean class scores on the final examination are shown in Table 5 below. TABLE 5 MEAN AND RANK SCORES ON FINAL EXAMINATION (EXPERIMENT 2) T1 T2 T3 Mean Row Mean Row Mean Row Class Score Rank Class Score Rank Class Score Rank 1 46.24 2 5 45.79 5 9 46.65 1 2 44.95 5 6 45.50 1 10 44.94 2 5 47.45 1 7 45.71 2 11 44.50 5 4 44.74 2 8 45.77 5 12 45.67 1 Group ean 45.84 45.19 45.44 55 The pattern shown in the above figures is contrary to the hypothesis. The differences between treatment conditions were, however, not significant (x: 05(2)= 5.99). The risk of rejecting the null hypothesis in this case would be as high as 95 per cent (Siegel, Table N). The Hypotheses on Attitudes The third major hypothesis of the study was that the attitudes of the experimental groups would be more favorable towards examinations and grading than the attitudes of the control group. In View of the breakdown of the scales described in the Appendix A, and in view of the position taken there of the nature of attitude this hypothesis will be sub- divided and examined in parts and with reference to the attitude "factors“. These sub-hypotheses are: 1) that each of the groups T1 and T2 would score higher on the "learning function” factors (EP and GP) than group T3. 2) that each of the groups T; and T2 would score hi her on the "motivating function" factors (EP and GP) than group T3. 5) that each of the groups T; and T2 would Score lower on the "Dys function" factors (EN and GN) than group T30 4) that each of the groups T1 and T2 would score lower on the Pressure-Anxiety factors (EN and GN) than group T3. The first two of these sub-hypotheses remecho the parent hypothesis,.and also Specify the crucial attitude "anchors." The other two say the same things indirectly, since "lower” Placement on the "negative" dimension is a ”more favorable“ 56 position, relatively. Table 6 presents the mean scores on the factor scales. The measure is the same in both experi- ments, hence the results are reported under each treatment condition, with the groups in the two experiments combined. TABLE 6 MEAN GROUP SCORES ON THE ATTITUDE FACTOR SUB-SCALES My “ Scale Factor T1 (N=265) T2 (c=219) T3 (N=245) Examination Satisfac- tion 1.62 1.61 1.62 EP Learning Function 2.19 2.55 2.25 Motivating Function 2.11 2.57 2.15 :Examination-type 2.92 2.79 2.97 EN Dysfunction 5.04 2.84 2.96 PressureeAnxiety 5.66 5.55 5.57 Hate 5.06 2.80 5.05 Learning Function 2.27 2.40 2.50 GP Motivating Function 2.69 2.88 2.66 Measuring Function 2.62 2.76 2.57 Dysfunction 5.24 5.51y 5.25 Pressure—Anxiety 5.71 5.74 5.75 GN Hate 2.71 2.59 2.67 Non—learning 5.27 5.00 5.25 Non-measuring 5.56 5.41 5.41 The absolute scores shown on the above table are SO close that they may not be significantly different; but ranking across treatment conditions (1 for the highest) produces the following pattern of rank scores for the crucial factors specified in the sub—hypotheses: 57 T1 T2 T3 EP Learning Function 5 1 2 Motivating Function 5 1 2 GP Learning Function 5 1 2 Motivating Function 2 1 5 EN Dysfunction 1 5 2 Pressure—Anxiety 1 5 2 GN Dysfunction 2 1 5 Pressure-Anxiety 5 2 1 When T1 and T3 are compared the trend Shows T1 scoring lower on the EP and GP factors and higher on the EN factors. This is contrary to eXpectation. On the other hand when T2 and T3 are compared T2 scored higher on the EP and GP factors and lower on the EN factors. This fact tends to support the hypothesis. The pattern for the GM factors is not consistent. The resulusabove consider the means of the groups. .The extreme scores throw further light on the relative positions of these groups on the attitude factors. The percents of the group choosing each point on the Likert Scale are given in Appendix C. An extract from that Table gives the following picture. On the learning function factor the percents of respondents choosing point 4 and 5 were 17, 15 and 14 for T1, T2 and T3, respectively. It would be expected that more students in T1 than in T3 should be "high" on this factor. The trend is in line with this expectation. On the other hand comparison between T2 and T3 shows a contrary trend. 58 The trend is consistently in line with expectation when the groups are compared on the motivating factor. The cor- responding percentages are 18.5, 16.5, and 14.5 for T3, T3 and T3 respectively. The dysfunction factor responses revealed the same pat— tern as the learning function factor. T1 was lower than T3 as would be expected; but T2 was higher than T3—-against expectation. The reSpective figures are 27, 54 and 50. On the PressureeAnxiety factor the trend falls in line with the expectation. The values are 51, 51 and 55 respectively. Table 7 converts the percentages given here into rank scores, and thus makes it easy to comprehend the relative positions at the "high" extreme end of the scale factors. On the Grading Scales the trend was consistently in the Opposite direction as illustrated by the following percentage figures: T1 T3 T3 Learning function 17 18 18 Motivating function 51 50 55 Dysfunction 48 44 51 Pressure-Anxiety 65 67 65 These percentages are also converted to rank scores in Table 7 on the following page. 59 m H N mumecmrwnsmmem m m H coAuocsmmsn uzw MHmmnuOmmn may Ow MHMHHSOU H m N SOHuucsm mcHum>Huoz maamnmcmm tam ucmuMHmsooaH nuOQ mum muouomm 20 can mo m£u How msumuumm was m.H m.H m coHuUSSm mSHSHmmA “me EHmwnuom%£ may nuH3 USTpMHmSOO MH cumuwmm One H m.N m.N. wumeSNrmnSmmmHm mnmuucou MH NB "OSHA EH MH as N H m coHuUCSmme ”2m mHmmnuomms mzu anB uSTLMHmSOU mH cumuumm TAB m N H SOHUOSDN msHum>HuoE humuucoo MH NE MHmmnuomma nuHB TGHH SH EH H8 N m H SOHpocsm mGHSHmmA "mi. mHmwnuOmhm Tau Ou mSHumamm mucwEEOU OB we as mwamomlflsm Houom mMAmm .AUV xHUSTQQMIQDm SH cm>Hm mum musmE Imumum nmnuo map How MOSHM> SMHOOE mnuv usmamumum mnu HO ODHM> may mmEOown MHsau .UmuMHSOHmo no: mm3 Om has usm>mHmHuH mH Hauou mHna 50m A .H .O>HuHmom mH umnu TUDuHuum as men Imwnmxm USN .sOHumcHmem Ou mcHuHmmmH mH ucwfiwumum may menus mmmpsn Ham wmmo MHz» SH .usmfimumum may SH wuHsmHnfim mo mocmmmsm HO mocmmnm mnu muumamms HauOu MHsam Um HH HH HH OH .Hm> .mmz .mom .Hm> .mmz .mom mad lemme Edvn .Hm> 6cm . “HQ .Hno .O—fi EwuH SMHOOZ Hmuoe Odm mm mm pm am PM «m 0m mm Hm ETOOSD mpcmfimumum Mo OSHm> cam SOHuumHHQ wuuwflno mosmnmmmm 91 fifty-five items should be selected from the original sixty— five. To meet this requirement also five items were select- ed, each with a value Of fiye, There was no absolute need to calculate and use indices Of dispersion since the aim was not to produce a scale purely on the Thurstone model. It is necessary to explain the purpose of statement "values" at these early stages in the develOpment of the scale battery. The values were g9£_to be used in the Thurstone style: this must be emphasized. They were to be used as aids to developing a homogeneous scale. Suppose for example, that a statement has been judged to be Of "negative" direction; if it is further assigned the value of Qg§_then it represents a statement that is tending towards a positive direction; if on the other hand it is assigned the value of eleven it can be safely assumed to represent an extremely negative statement. Ideally only items, each with value eleven would be selected since as stated earlier there was reason to prefer high extreme statements. To attain this ideal is not impossible; all it involves is increasing the original pool to at least 500 items, carefully written with the same goal in mind. Actually the median value of the selected items turned out to be seven, and there were two items with value £§2.and another two with value eleven each. .This is admittedly a poor approximation to the ideal, but was accepted as fairly satisfactory in the present circum- stances. 92 .Another comment is in place. Each statement was assigned a relative value within its own group. The direction of the attitude implied in the statement was p2£_taken into account in assigning values. Thus, these values are not to be con- fused with the five-point-Likert—scale used in the final scale battery, as will be shown presently. In fact the exer- cise thus described is an elaborated example of stimuli scal- ing--the attitude statements are scaled; on the other hand the Likert technique scales persons. The two methods were therefore combined in the present study. PRELIMINARY TRY-OUT The preliminary try-out was necessary to check on the suitability of the format in which the battery is to be presented, to check also on the clarity of instructions, and again on the quality of the items. Moreover it would provide an opportunity to test the scoring procedure before a full scale try-out was launched. This last need emerged from .discussions with Warrington (1968). .In view of the purposes just stated the "sample"--if it could be called one at all--was confined to three advanced graduate students1 invited to respond to the items in their role as University students. Later they were expected to pro- vide and did provide written comments as they felt necessary. lThe writer is grateful to the advanced graduate students named below for the role they played in this part of the study: Jack Hruska, W. Russel Harris, Glenn L. Sterner. 95 As mentioned in the last section fifty-five items were selected. Respondents were expected to Show the degree of their agreement/disagreement with the statements by assign— ing values using a five-point Likert scale. The points were defined as follow:1 ,1. Np agreement whatsoever 2.-Disagreement most of the time; agreement at few occasions. 5. Opinion hovers between agreement and disagreement equally. 4. Agreement most of the time; disagreement at few occasions. 5. Complete agreement. There were four groups of items: Examination-Positive, Examination—Negative, Grading-Positive and Grading—Negative. From now on these will be referred to as EP, EN, GP and GN respectively. They are the four scales Which constitute the scale battery. To simplify notations further they will also be referred to as Scales 1, 2, 5, and 4 respectively. No systematic order was employed in arranging the items in each scale; but the scales were chosen alternately, and no more than six items in the same scale were presented successively. .The results from this investigation were as shown in tabular form on the following page. J'These definitions may be cumbrous; but the aim is to avoid the stereotype and thus hopefully minimixe response sets. _ 3'11 94 ATTITUDE SCORES PRELIMINARY TRY-OUT Scales Possible score EP EN GP GN Possible score range 11-55 12-60 14-70 16-80 Cutting score* 55 59 42 48 Respondents S; 17 52 55 57 Se 57 25 46 55 S3 14 46 17 59 *These scores are determined from the Likert point value of 5 as defined above. ReSpondents with scores above the cell entries here can generally be classified as being "high" on the attitude measured by the scale. :The values vary with the number of items in each scale; no final selection of items was made, as yet. RANK-DIFFERENCE CORRELATION COEFFICIENT* EP EN GP GN EP —1.00 +1.00 -1.00 EN -1 . 00 +1 . 00 GP -1.00 . GN *The high values are certainly an artifact of the sample size; does this also apply to the direction? 95 The preceding pattern of scores and the direction of the coefficients would be expected from the theoretical model; the absolute values were of no significance. This part of the exercise was therefore very valuable in that it also led to the improvement in the diction of some of the items and in the format of the instructions--all based on the comments from the respondents and other consultants. Of the fifty—five items used, forty-eight were retained—- twelve each for the four scales EP, EN, GP and GN. THE MAIN TRY-OUT Two considerations determined the characteristics of thegpample drawn for the main try-out phase. The first was the immediate pOpulation for which the Scale is designed. The Scale is directly applicable to a pOpulation of college and university students. It is assumed that the students of Michigan State University form such a typical pOpulation. The sample was drawn in such a way that the main departments of the University are represented. However, it was not random; judgment was exercised to make the selection include “juniors" and graduate students as shown in Sub—appendix (e). The second consideration was the intention to factor analyze the returns--in an effort to test the validity of the theoretical model conceived as the basis for the scale battery. Accordingly, the size of the sample was planned at 600 at least. As Sub-appendix (e) shows, the actual returns 96 were 585 (incidentally twelve data cards were destroyed in process so that the final output involved 575 Observations). .The questionnaire was administered by the instructors1 responsible for the classes selected. Subjects responded to all items on a five-Option IBM answer sheet. About fif- teen minutes were sufficient to respond to all items. The scoring was done by the Office of Evaluation Services. THE FACTOR EXPERIMENT Both the theoretical basis for the battery and Eh; hypothe§§§_that may be deduced from the model may sound a little radical. It is therefore necessary to put them through a somewhat rigorous test as may be provided by factor analysis. In the first place the view is expressed that like and dis- like attitudinal feelings are not necessarily on a linear continuum. Accordingly it was hypothesized that EP and EN scales represent two distinguishable "factors" and not one bipolar factor. Similarly GP and GN scales also represent separate factors. The model also depicts attributes of the psychological Object as the anchors for attitudinal feelings. It would follow therefore that where a number of attitude statements focus on a well defined attribute of the object lSpace forbids the listing of the twenty-and-two profes- sors who were not only willing to permit the use of their classes but also agreed to administer the questionnaire to their students in an effort to help keep "the experimenter out of the scene." The writer is deeply grateful to these professors and their students for their COOperation. 97 factor analysis would bring out a "factor" symbolizing such attribute. In the present battery develOpment it was pos- sible to focus a number of statements on the functions of the objects of interest. The content analysis exercise pro- vided for this catetory. .The second hypothesis was there- fore that a "functional factor" would emerge from the analysis. As mentioned above one of the richest content categories on which the attitude statements were based was the one in which emotion was expressed on diverse aspects of the objects. It was therefore not possible to formulate a well defined hypothesis in this area. At best it was hypothesized that-a general attitudinal factor would also emerge. The six types of factors discussed were clearly antici— pated. But perhaps there might be another factor or factors engulfed in the general factor. ~With such reasoning the raw data was submitted for analysis in the hOpe that there would emerge "at least five factors". The Rotation Technigues The analytic procedures were repeated three times. In the first and second, half the Observations were used-- randomly divided; the third repeat involved all the Observa- tions. The Kiel-Wrigley criterion (MSU CISSR, 1967) was used in the rotation Of factors for the two half samples, but the full sample data was rotated to ten factors. 98 Both the Quartimax and the Varimax methods of rotation were applied. Extracts from the final outputs are given in Sub-appendix (f). Only the loadings with value 0.40 or greater are shown on that table. The lower values may not be significant. The sample was split so that the factor pat- terns may be compared. Such comparison would throw light on the stability of the factors. The full data analysis resulted in six Quartimax factors each of which has loadings on at least three variables. Three of these factors each account for at least five percent of the common variance. The other four factors may not be significant. The corresponding distribution for the Varimax factors is as follows: nine factors--with at least three variables, five factors, each accounting for at least five percent of the common variance and only one factor that may not be significant. Following Wrigley's (1968) suggestion the Varimax factors are adOpted as the more appropriate in the present case. In fact there are also evidences in the literature (e.g., Vernon 1959, Kerlinger and Kaya, 1959) to justify this preference. But it is worth observing that both techniques Of rotation produce more factors than were hypothe— sized. If the traditional model applied in this case there would have been at most three factors. Furthermore, the patterns across the three samples though not in perfect agree- ment are sufficiently similar, and tend to show the factors in the third analysis are stable. .A full comparison of the Vari- max factors across the three samples and the four Scales is 99 The Naming of the Varimax Factors in the Full Data Analysis Factor 1: Var. Quest. NO. NO. 1 EP, 6 EP3 27 EP24 28 EPgs 29 EP33 56 EP33 58 EP35 5 GP 16 GP13 25 GPgO 24 G931 25 Gng 26 GP23 59 GP33 40 GP37 21 EN13 51 GN33 (16.08% of Common Variance) "(General) Learning Function" Attitude Statement Loading Sum of scores on 12 items comprising 0.7262 Exam Positive Scale. Of all teaching devices, examinations 0.4008 provide the most useful feedback. Examinations provide the most satis- 0.5654 factory means for assessing learning. Examinations are an indispensable 0.7194 feature of the University curriculum. Without examinations, academic stand- 0.7911 ards fall. The discipline Of examinations is vital 0.6628 to learning. Abolition of examinations will in the 0.6548 long run lead to chaos in graduate education. Sum of scores on 12 items comprising 0.7071 Grading Positive Scale Grades provide a necessary incentive to 0.5269 hard work. The grading system should be an inte- 0.5212 gral part of the curriculum in higher education. For the student, grades are a desirable 0.5055 aid to self-evaluation. Abolition of grading would jeopardize 0.7572 learning at the University level. Grading is a necessity if standards 0.7726 have to be maintained in University education. I would campaign vigorously against 0.5912 any attempt to abolish grading at the University level. Without grading the motivational func— 0.5559 tion of examinations would be impaired. Examinations should be abolished at the —0.4158 University level. Grading should be abolished at the -0.5152 University level. 100 given in Sub—appendix (g). The conclusion from that table is that the stability of the factors is not in doubt. Seventeen variables have "significant" loadings on this factor; of these there are seven each belonging to the orig- inal EP and GP scales, and one each to the EN and GN scales. 0n the positive side the theme is that both examinations and grading are relevant in the curriculum; the negative side is also clear: these aSpects of the curriculum are not rele- vant and "should be abolished". This factor shows up as bipolar, but very few negative items load on it and these negative loadings may reflect the particular wordings in variables 21 and 51. Perhaps a bipolar attitudinal factor may be an artifact of the language used in the statement. This will therefore be called the General Learning Function Factor. Future revisions will discard variables 21 and 51 and all such types. Factor 2: (5.45% of CommOn Variance) "Examination Type" Var. Quest. Attitude Statement/Description Loading NO. No. . 22 EN Sum of scores on 12 items comprising 0.5425 Examination Negative Scale. 18 EN15 Objective examinations are nothing 0.7024 more than a guessing game. 44 EN41 Examinations are nothing more than 0.6649 trickery. Apart from the EN scale only two other variables load significantly on this factor. One of them suggests this may 101 be an "Examination-Type" Factor. Further studies may in- vestigate whether there is any such factor. It is worth noting that no items on grading load significantly on this factor. It is therefore peculiar to examinations, and pro- vides another evidence that negative attitude may be on a distinct attribute of the psychological object. Factor 5: (7.06% of Common Variance) "Pressure-Anxiety" Var. Quest. Attitude Statement/Description Loading NO. NO. 2 EN Sum of scores on 12 items comprising 0.4576 Examination Negative Scale. 19 ENie Examinations provide the student a 0.5790 frustrating experience. 20 EN17 I resent the pressure which examina- 0.7110 tions bring on me. 45 EN4O Examinations generate too much anxiety 0.7808 50 GN37 Grades induce too much worry. 0.7459 Here again the only items that load significantly on this factor belong to the negative EN and GN scales. All the items provide "pressure" or "worry" or "anxiety" stimuli. This will therefore be called the "pressure-anxiety" factor. Examinations and grading go together, once again suggesting some common frame of mind, or reflecting the fact that the attitude dimensions and the supporting attributes are the same for both objects. 102 Factor 4: (7.69% of Common Variance) "Grade-Measure" Var. Quest. Attitude Statement/Description Loading No. NO. 5 GP Sum of scores on 12 items comprising 0.6159 Grading-Positive Scale 14 GP11 Grades are very effective for indicat- 0.6595 ing students achievements of the course _ objectives. 15 GP;3 Grades are a good estimate of the quality 0.6262 of learning that has taken place. 17 GP14 Given the word "meaningful" as indicat- 0.5815 g ing your Opinion of grading, rate it according to the strength of this Opinion. 25 GPgO The grading system should be an integral 0.4292 part of the curriculum in higher education. 24 6P3; For the student, grades are a desirable 0.4175 aid to self-evaluation. 41 GP37 The finer the grading system, the better 0.5125 it reflects the students' competence level. 42 GP33 Given the word "relevant" as indicating 0.5499 your Opinion of grades, rate it to show the strength of this Opinion. 55 GN3O Grades are no indication of what the 0.4467 student has learned in a course. With the exceptions of variables 25 and 24 (which also load high on factor 1) these items focus on the effectiveness of grading as a measuring instrument. That variable 55 loads with an Opposite sign may be just an artifact of its wording ("no indication") and not necessarily that the factor is bi- polar. 105 This shall be called the Grade—Measure Factor. It is hard to explain why a similar item on examinations does not load high on this factor. Are the perceptions Of these ob— jects as measuring tools on different dimensions? Factor 5: (6.54% of Common Variance) "Hate" Var. Quest. Attitude Statement/Description Loading No. No. 4 GN Sum of scores of 12 items comprising 0.5705 Grading—Negative Scale. 54 GN31 Given the word "evil" as reflecting 0.4555 your Opinion Of grading, rate it to show the strength of this Opinion. 49 GN43 I have nothing for grades but pure 0.6459 hate. 50 GN47 Whoever put more grades into the 0.7578 scale should be hanged. 51 GN43 It is grossly unfair to award a gradu— 0.5969 ate student a "D" or an equivalent grade. 47 EN44 Given the phrase ("a farce" as indi- 0.4594 cating your Opinion of examinations rate it to show the strength of this Opinion. 48 EN43 In my experience as a university 0.4550 student, examinations are the instruc- tors' make—shift without any real value. Here as in Factor 1 the attitudinal disposition is the same for examinations and grading. That this is a distinct factor is further evidence that a negative attitudinal dis— position may exist on a separate dimension. 104 This is named the Hate Factor; it is somewhat general in that the determinants of the "Hate" are not specified. Factor 6:(5.42 of Common Variance) Var. Quest. Attitude Statement Loading No. No. 11 EP3 Examinations make me feel happy and 0.5787 confident. 55 EP33 Examinations Should be given more empha- 0.4249 sis in the University curriculum. 57 EP34 Examinations make study exciting. 0.6515 This may be a general satisfaction factor--in Opposition to the PressureeAnxiety factor. Perhaps if similar items were included on grading they would also load on this factor. Factor 7: (5.95 of Common Variance) Var. Quest. Attitude Statement Loading NO. No. 46 EN34 The examination system is entirely 0.5209 lacking in precision. 47 EN44 Given the phrase "a farce" as indicat- 0.4547 ing your Opinion of examinations, rate it to Show the strength of this Opinion. 51 GN43 It is grossly unfair to award a graduate 0.5551 student a “D" or an equivalent grade. It is difficult to explain why these items should com- prise a separate factor. The last two also load significantly 105 on the "Hate" factor. It may not be a stable factor. Further investigations may reveal the nature of this factor, if at all it exists on a separate dimension. Meanwhile it will be ignored. Factor 8: (4.91% Of Common Variance) "Motivating Function Var. Quest. Attitude Statement Loading NO. NO. 5 EP3 Examinations are the best means for 0.5770 motivating students to learn. 7 EP4 I Examinations enforce my desire to 0.5960 learn. 8 EP5 Given the word "favorable" as refer- 0.5551 ring to your feeling about examinations, rate it to indicate the degree of this feeling. 16 GP13 Grades provide a necessary incentive to 015984 hard work. The central thought in the first three items is that examinations are perceived to motivate learning. The loading of the last item on grading is below the criterion value of .40; however it is so close as to justify its inclusion here. This shall be called the "Motivating-Function" factor. The statements which load on factor 9 (see the follow— ing page) seem to say that the psychological objects are worthless, or that they perform some undesirable function. This will therefore be called the Dysfunction Factor in Opposition to the relevant Function Factors 1 and 4. 106 Factor 9: (4.52 of Common Variance) Var. Quest. Attitude Statement/Description Loading NO. No. 12 EN3 There is very little of instructional 0.6558 value in the content of examinations. 15 EN10 Examinations are redundant in the edu- 0.6806 cational process at the University level. 10 GN7 Grading encourages students to cheat 0.4559 in examinations. It is worth Observing that these variables do not load significantly On the first factor. There their loadings are 0.0806, 0.2254 and -0.0022 respectively. In other words the evidence is not very strong that either Factor 1 or Factor 9 is bipolar. Generally the hypotheses were confirmed. Most of the "positive" statements came out under separate and identifiable factors; and so did the negative statements. Furthermore their identities have references or anchors in the attributes of the attitude Objects. These attributes are reflected in the factor names suggested. However only limited success was achieved in separating the examination from the grading factors. Perhaps there is a natural linkage between them. It may also be that attitude factors are similar and parallel as shown in Figure 5, page 121- 107 SELECTION OF ITEMS AND PRESENTATION OF THE BATTERY The table on the following page, shows the scheme used in making a selection of eight items for each of the four scales. The numbers appended refer to the items with the highest loadings on the respective factors. The table serves to emphasize the aims of the present battery. If attitude statements are anchored on well-defined attributes Of the psychological object separate "factors" will emerge to symbolize these attributes. Furthermore the general nature of the attribute determines the direction of attitude, that is whether it is "positive" or "negative"--for or against. It may be added that this table also provides a scheme for writing new items. Ideally only unidimensional factors would serve in this scheme-~to agree with the theoretical model, but factors 1 and 4 fail to meet this ideal. The battery in the final form is reproduced on pages 109 and 110. Where groups of items belong to one factor they are arranged in descending order of the magnitude of their loadings, which were given earlier. 108 >20 mannedzm AuouommISOHuosnmmmnv m name Nmmuemm Auouomm :oHuossm msHum>Huozv m Anouomm Foam coHuUNNMHummISOHumsHmemv m Hflzwnmfizo oazoikvzw massiv+zm Auouomm mummv m Tumouaamw Anouomm onzw mumwuaamw. SOHuocSmlmcHHSmmmz OONHOV H OHZN rmzw hazmnovzm AHOuomm wumecmrmnsmmmumv m mazm AHOHONm mmhalsoHumsHmemv N TNmW Dmmw mammummmfl mmzw mmmuummmo mummiommm Anouomm SOHpussmumchHmmAV H 20 am an an HOHOMJ id ZOHBUmnflm ZHBH mom MEmmUm 109 .MSOHumSHEsxm mo usmucoo OS» SH mSHm> HSSOHSUSHumSH mo OHuUHH wum> EH THOSE .m .Hm>mH huHmHm>HSD on» us mmwuoum HSSOHuNOSpm mnp SH uSmpSSUmH mum MSOHumSHmem..> SOHuOSSmme uMHSmmme .mnouUSHHMSH may mum mSOHumSHfimxm .uSmosum muHmum>HSD m mm TOSOHHmmxm ME SH .m .SOHSHmo EHSu mo Sumsmnum may 30Sm ou uH mums mSOHuMSHmem mumm mo SOHSHQO H50» mSHumUHUSH mm =monmm m: Tasman may SO>HO .m .mOSmHmexm mSHumHumSHm m uSTUSum TS» m©H>OHm MSOHumSHSmxm .H .mE So mSHHQ mSOHumSHmew SUHS3 OHSmmmHm SSH uSmme H .m huwHXS<|mHSmmmum zm .mumesm SUSS oou mumHOSmm mSoHumSHmem .N .mfimm mSHmmmsm m SSS» SHOE mSHnuOS mum mSoHHMSHmew T>Huomnno .H mmhalfimxm .SHSOH ou muSmUSum mSHum>HuOE How MSmmE ume may mum mSOHumSHEmNm .m SoduoSsmlmSHum>Huoz .SHSOH ou muHmmp he TUMOMST mSoHumSHmem .s . . . .mSHuHUxm mpSum TESS MSOHumSHmem..m SOHuommmemmlfimxm mm .mSHSHmmH mSHmmmmnm How mSmmE NHOHOSHMHumm umOE mnu OUH>OHQ MSOHumSHmem .m .SOHumospm mumnomnm SH momno Ou UmmH SSH mSOH Onu SH HHH3 MSOHumSHmem mo SOHuHHon< .H .mSHSHmmH Op HSHH> MH MSOHHMSHmeO mo OSHHQHOch SSE .m .ESHSUHHHSU wuHmum>HSD may no musummm THAMESTQMHUSH Sm mum mSoHumSHmem .N .HHmm OHSO3 mpnmpcmum OHfimomum .mSOHHMSHfimxw uSOSqu..H SOHHOSSMImSHSwaq EOHH Houomm mHmom 110 .mSoHpmSHmeo SH umwSO Ou muSmpSum mmmmHSOOSO mSHpme .m SOHSUSSHmmn .SOHSHQO MHSu Ho Sumamuum mSu 30Sm 0» SH mums .mSH IUmHm Ho SOHSHmo snow mSHuomHHwH mm =HH>m= UHO3 SSS Sm>HO .s .wpmnm uSmHm>HSUm Sm HO gm: m uSwUSum mumspmum m OHNBM Ou HHSHSS memOHm EH SH .m mumm 20 .mumS Tuna SSS mmpmum HOH mSHSSOS T>MS H .m .pmemS TS OHSOSm mHmom SSH ouSH mmnmum mHoE use HO>TOS3 .H .mmnSOO S SH SoHuOSsm SOSHmmH MSS uSmUSum SSS SSSB mo SOHumOHUSH OS mum mmnmnw .m ImSHHSmmmzlSoz .mHHOB SUSS oou TUSOSH mwpmsw .N mumeSmrmHSmmmHm SOHSOSSm .Hm>mH wuHmum>HSD TS» um OmSmHHOSm OS UHSOSm mSHpme .H mSHSHmmHISOz .Suo3 UHSS Ou T>HuSTOSH hummmmomS m TUH>OHm mwpmsw .m SOHSOSShImSHpm>Huoz .Hm>mH OUSOSTQEOU .muSmtsum TS“ muowHHmn uH HmuuTS mSu .Ewummm mSHanm TS» HOSHH TSB .s .SOHSHmO mHSu Ho Sumcmnpm mSu Ou mSHUnouom SH mums .mSHUmum Ho SOHSHmO “Dom mSHumOHpSH mm =HSHmSHSmmE= UHO3 SSS SO>HO .m .womHm smxmu mmS HMSS mSHSHmmH Ho muHHmSU wSu Ho mumEHumm 000m m mum mmpmno .m SOHSOSSMImSHHSmmmz .mm>HuomnSO mmusoo HO muSoEm>mHSUm . .muSmpsum mSHumUHUSH How O>Hpowmmm msm> mum mmpmuw .H mm .Hm>wH muHmHO>HSD mSu um mSvanm SmHHonm ou umEmuum mSm umSHmmm mHmsouomH> smHmmfimO pHsoz H,.m .Hm>mH wuHmHm>HSD mSu um mSHSHNTH TNHnHmmomn OHSO3 mSHpmum Ho SoHuHHOSS .N SOHSOSzmImSHSHSOH. .SOHHMUDUT SUHSH0>HSD SH UwchuSHmE 09 OH w>MS mUHMUSMUM NH thmmmowG m mH mGHUwHO 111 A few comments are necessary. In administering the bat- tery the items would be thrown into some random order. Future revision will aim at ten items for each scale, at least three and at most four factors under each scale, and two or four items within each factor. The increase in the total number Of items will hOpefully lead to increase in validity, while the use of even number of items under each factor will make it convenient to compute split—half reliability coefficients. TEST STATISTICS In the present case where there were five alternative weighted responses the product moment correlation of item scores with the total scores in their appropriate scales may be used in determining items which belong to the Scale. But such coefficients are inflated since the item scores are also included in the scale scores. Even so these coefficients are diSplayed in Sub-appendix (h) together with the standard deviations for each item—variable, and also the inter—item correlations. The latter may safely be interpreted as indices prbelonging. To facilitate their comprehension Table 8 summarizes the relevant data. It is worth noting that all coefficients are positive. Furthermore GP is the most homo- geneous as its inter-item coefficients are all above .20. By the same standard GN is the poorest scale, and needs much revision. 112 TABLE 8 GROUPED FREQUENCIES, RANGE AND MEDIAN OF INTER-ITEM CORRELATIONS EP EN GP GN Categories (f) (f) (f) (f) .6000-.6999 1 1 1 - .5000-.5999 4 6 7 1 .4000-.4999 16 12 29 12 .5000-.5999 24 22 17 15 .2000-.2999 15 20 12 20 .1000-.1999 6 5 - — 15 Below .1000 - - - 5 Total (f) 66 66 66 66 Range .159-.621 .129—.609 .228-.641 .004-.545 Median .554 .520 .410 .277 Intercorrelation Among the Scales Logically the total scores for the "positive" and "negative" scales should reveal an inverse relationship be- tween them. But this may not be perfect since the"dimensions" are not necessarily on the same linear continuum. In fact the inverse relationship may be conceived to be an intrinsic property of the "negative" and "positive" dimension vectors. The absolute sizes of the coefficients as presented below also Show an interesting pattern: the positive scales (EP-GP) and the negative scales (EN-GN) correlate more highly within their like-pairs than they do within unlike pairs 115 (EP-EN or EP—GN; similarly GP-GN or GP-EN). This may be interpreted as another evidence against bipolarity of the attitude factors. The correlation between "positive" and "negative" scales is negative; if the scales were on the same linear continuum, if they represented Opposite ends of a bipolar factor then the absolute value of the correlation coefficient would be as close to unity as possible. The evi- dence of this study does not seem to support such a position. In the sample the correlations were as follows: EP EN GP GN EP 1.00 EN - .589 1.00 GP .796 - .562 1.00 GN - .550 .800 - .624 1.00 The directions of these coefficients agree with those illustrated on page 94. Reliability of the Scales An estimate of the reliabilities Of the scales was com— puted by the Kuder-Richardson method. In the present case where responses are weighted the apprOpriate formula accord- ing to Magnusson (1966) is 2 _ 2 r = D (St >3 Si) tt n-1 82 t where rtt is the reliability coefficient (K-Rgo) n is the number of observations 114 s2 is the variance of the test 2‘s: is the sum of the item variances The reliabilities shown below were based on this formula. The relevant data for calculations will be found in Sub- appendix (h). EP EN GP GN .798 .791 .812 .746 INTERPRETATION OF THE SCORES Ostensibly four scales make up this battery. However factor analysis has brought out sub-scales which are fairly easy to interpret. From the general instructions to the Questionnaire a value of 5.13 to be assigned to a statement if "Opinion hovers between agreement and disagreement equally." It will therefore follow that a mean score less than 5 or a mean score higher than 5 will be interpreted to indicate that the group or the individual is "low" or "high" on the particu- lar dimension of attitude. The mean total score for a group of items may also be interpreted accordingly. .Thus if there are four items in the sub-scale a mean total score of 12 'would form the dividing line between the "lows" and the "highs" on the dimension reflected by that sub-scale. The scheme for interpretation outlined implies a built—in meaning for the scores, and not a meaning to be determined with reference to any group. It seems logical that the mean- ing of scores should be similar to the Likert values as here 115 defined. The only assumptions are that the subject under- stands the instructions, and that he responds to the items honestly. These may be somewhat limited by "response-set" tendencies. The extent of such tendencies were not deter- mined, but the percentage Of‘reSpondents choosing each Option shown in Table 9 would lead one to say that the effect of such sets may not have been very serious. The choices are fairly spread out except that respondents tend to avoid the high extreme value. The above Observations will now be illustrated for the try-out sample. There are three factors in the EP Scale. In the first-- the learning-function factor--there are five items; the mean total on these for the 575 Observations is 12.8081. This places the group on the "low" end of this sub—scale with respect to their perception of examinations as a learning device. The mean item response on this and the other factors may be set out as follows: Range of Mean Item Inter-item Factor Responsea Correlations Learning Function 2.562 Examination—Satisfaction 1.887 .159-.621 Motivating Function 2.584 aThe means for all items are given in the sub-appendix. bThese may be taken as estimates of the reliabilities of the factor scales. 116 These results also read "Low", or "Very Low", as on the Examination-Satisfaction factor. The break-down of the other scales is as follows: Mean Item Range of Inter—item Scale Factor Response Correlations Examination-Type 2.615 PressureHAnxiety 5.528 _ EN Hate 2.551 .129 .609 Dysfunction 2.705 Learning-function 2.686 GP Measuring-function 2.640 .228—.641 Motivating function 5.077 Pressure-Anxiety 5.415 Hate 2.512 GN Dysfunction 5.059 .004-.545 Non-learning function (bipolar) 2.670 Non-measuring function 5.158 The meaning that may be read into the above results is that the group tends to be "high" on the following factors: PressureeAnxiety, Grade-Motivating function, Grade-Dysfunction and Grade-Non-Measuring function. On the other factors it is "low". The point needs emphasis. The scores for an individu— al (or group) on the Scales in this battery should be broken down into "factor" scores, and then interpreted in terms of “low" or "high" on the respective factors. The aim is to present a profile mapping of the individual in the defined attitude factors. Such a profile is presented in Figure 2 on the following page. The Pressure-Anxiety factors are prominent in both Examination and Grading scales, while the learning function factors are "low". - flu (4 117 5 very High 4 High s: 5 \\ ‘ * $51 \ S RN is. Low i {SLS§[§§DF< §§ :§? 5% 5§ §S_ ’SS * §§\’\‘\‘ § § \?\:§. iE; 2 L \‘ \\ \\ t§é§mw§x WT ‘ m, # 7 Very Low SE §§ § EI S $§§\i§ \Hhxh h \ \\‘\,\§’ §‘\§§’§’ §’§§§§\§ 14 V. ‘§\ \\~‘ §§\\\\ L451. T 3‘ Factors'LF ESMFET PAH DY LF M MFPAH DY NL NM—w 1 4|— ] 1- !JV :_:I Scales EP EN GP GN Key: EP = Exam.-Positive M = Measuring (function) EN = Exam.-Negative ET = Exam. Type GP = Grade-Positive PA = PressureeAnxiety GN = Grade-Negative H = Hate LF = Learning Function DY = Dysfunction ES = Exam. Satisfaction NL = Non—learning Function MF = Motivating Function NM = Non-measuring Function Figure 2. Attitude profile of the try-out sample (N = 575) Students attitudes towards Examination and Grading Scale Battery (SATEG SB). 118 A frequency count was made of respondents choosing each Option, and converted into percentages. Table 9 shows these mean percentages under each factor sub-scale. TABLE 9 MEAN PERCENT OF RESPONDENTS CHOOSING OPTION IN THE FACTOR SUB-SCALES H l Likert-Font Valgg§______ Scale and Factor 1 2 5 4 5 . E. . E Learning Function 19 52 26 18 5 Exam. Satisfaction 45 27 14 7 2 Motivating Function 14 55 26 26 5 EN Exam. Type 14 59 22 19 5 Pressure-Anxiety 6 20 26 29 18 Hate 24 54 21 12 4 Dysfunction 10 56 5O 18 5 GP Learning Function 21 24 25 21 7 Measuring Function 15 52 27 21 5 Motivating Function 10 21 28 54 7 GN Non-Learning Function 20 25 29 17 9 Pressure—Anxiety 5 19 22 54 19 Non-Measuring 7 28 24 25 16 Hate 29 27 22 11 6 Dysfunction 12 25 25 26 14 The picture shown may be easily comprehended if the 0p- tions 4 and 5 are combined and summarily described as "high". (Similarly 1 and 2 may be combined and described as low.) On this basis the following statements may be made of this sample: 119 1) 22 percent are high on the learning function factor in the EP scale 2) 29 percent are high on the motivating function factor In contrast only 9 percent are highly satisfied with exami— nations. The EN scale throws some light on this contrast. Here 47 percent are high on the Pressure-Anxiety factor and 25 percent on the Dysfunction factor. A similar analysis may be made of the Grade Scales. For the GP scale the percentages on the high group are: learning function, 28; measuring function, 24; and motivating function 41. Thus the group perceives grades more as a motivating than as a measuring or learning device. The figures for the pressure—anxiety and dysfunction factors are 55 and 40 respectively. This would mean that more than half the sample perceive grades as generating pressure and anxiety, and about a half also feel grades perform no useful function. IMPLICATIONS AND CONCLUSION The manner of describing attitude on a psychological object as being "high" or "low" along specified attribute "factors" and the dimensions they support has some diagnostic value. At least it is a step beyond a global conception of attitude. Moreover it makes it comparatively easy to "control" attitude. Suppose for example that this scale is valid and that with its aid the attitude of a group in the learning- =‘:..J ; .xn—I‘fi—‘fiimum ‘. g . I . 120 function.aspect of examinations is diagnosed to be "low"; an area is thus clearly specified for "treatment" should one desire to influence attitude on this positive dimension. In other words, control of attitude towards a psychological object becomes feasible if the anchors of the attitude are identified. It is reasonable to think that attitude change may be effected through some manipulation of the attributes of the psychological objects. A cursory look at the pattern of the figures in Table 9 may lead one to suppose an inverse relationship between the learning and motivating function factors on the one hand and pressure-anxiety and dysfunction factors on the other. This suggests that the attitudes may be "changed" to be more "positive" if effort is concentrated on developing the learn- ing and motivating function attributes of both examinations and grading. A general hypothesis may therefore be set out as follows: the more students perceive examinations and grading as promoting learning the less they will feel the pressure and anxiety which these twin aspect of the curricu- lum also generate, and hence the more positive will their attitudes be towards these objects, and consequently the higher the amount of learning that will take place. This general hypothesis may be broken up and tested, among others, in a program of construct validation of this scale battery. The results of this study provide evidence which tend to agree with the theoretical model. Figure 5 reproduces the model with specific reference to the present study. 121 (a) (b) "a; Figure 5. The attitude model with specific reference to Examinations (a) and Grading (b). (The reader is now familiar with the abbreviations used; the words they stand for are displayed in the Key to Figure 2; the general model is presented in Figure 1.) In the figure ABCD still represents the attitude pre- dispositional base, which remains the same for all attitudes of an individual. In fact both parts (a) and (b) would be shown on the same diagram; they are separated here to aid clear presentation. It should be noted that the growth points are now defined with reference to the attitude ob- jects (E: Examinations: G: Grading). Furthermore the posi- tive dimemsions (EP and GP) are parallel; so also the nega- tive dimensions (EN and GN). The reader is reminded of the high and positive correlation between the scales in brackets, and of their loadings on the various factors discussed earlier. The last observation would lead one to suggest that positive attitudes, irreSpective of the attitude objects 122 would correlate highly and positively with one another; similarly negative attitudes would correlate highly and positively. The attribute vectors shown in Figure 5 represent the factors revealed in the factor analysis. The figure shows that the upward growth of attitude along each dimension- positive or negative is supported by the number and strength (reflected by length) of the attributes. The model and the evidence provided by this study would lead one to doubt that attitude is bipolar. A linear continuum model for attitude may not be apprOpriate. An instrument like this can serve two purposes. It may be used for an attitude survey and in studies of relations between attitude and other variables. Furthermore it may be used to plan "treatment" measures to bring about attitude change. The traditional attitude measures do not seem to suggest this diagnostic and treatment use. In the writer's mind if social scientists survey attitude and always report it in the global form they are unwittingly perpetuating the attitude; and this may not always be desirable. If on the other hand their reports make evident the anchoring factors, someone's attention will be easily arrested to examine the basis of the attitude. It must be added however that the model needs further supporting evidence to be worth considering. It is there- fore suggested that the battery be used as a research 125 instrument--to investigate how stable the factors are across different student pOpulations. Other workers may of course wish to test the model and the approach using different attitude objects. SUB-APPENDIX (a) THE OPEN-ENDED QUESTIONNAIRE Course NO. and Title: Course Instructor: Student's Name & NO. (Optional) Date: W STUDENTS' OPINIONS AND ATTITUDES ON THE EXAMINATION-GRADING CONTROVERSY Introduction and Instructions Q' The debate—-"to examine or not to examine, to grade or not to grade"-~is a very crucial one in college and univer- sity education today. TO be democratic and also to help create a healthy_gtm9§phe£§ for carrying out our educational objectives it would be desirable for students to take part not only on this debate but in the formulation of policies on this issue. OA survey is therefore being conducted to tap students' opinions and attitudes. Your response to the following questions will be of great importance in future de- cisions on examination and grading practices in this University. Consider it therefore a grand Opportunity now Offered you to influence policies in these areas. It is up to 193 in particu- lar to utilize such a rare Opportunity to express your views for your good and for theggood of future generations of stu- dents. TO underline the importance of this survey to you in particular, you are to take this questionnaire home; respond to it independently and candidly and then return it to your instructor the following day. Feel free to use the blank pages Of this questionnaire to write as much as you like on any of the questions. Thank you for your COOperation. 124 DO not write on this margin 125 The Questions Do not write 1. on this margin How important are exami- nations in the instruc- tional process? Defend your opinion. How important is "grading" (involving the use of A, B-— or 0,1) in the instructional process? Defend your Opinion. Some say examinations and grading are a necessary evil while others believe they are an important aspect of the instruc- tional process. How do you feel about these aspects Of the cur- riculum? Defend your answer. DO not write on this margin 126 4. What reactions have ygg_ had to the examinations you have taken in your college and university experience? 5. In your college and uni- versity eXperience, what reactions have you had over your grades in parti— cular and over the grading system in general? 6. DO you have suggestions for change that should be made in the examination practice at the college level? Defend your suggestions. Do not write on this margin Do not write on this margin 127 7.-DO you have suggestions DO not write for change that should on be made in the grading practice at the college level? Defend your sug- gestions. 8. Which examination type do you prefer more-—the essay 9£_the Objective? State reasons for your preference. 9. Which of the following item types do you most prefer-- True-False, Multiple Choice or Completion Type? State reasons for your preference. this margin .128 DO not write 10. Which of the following on this margin item types do you least prefer-—True-False, Multiple Choice or Com- pletion Type? State reasons for your prefer- ence. DO not write on this margin 11. Would you favor a more or a less emphasis on exami- nations at the university level? .State your reasons for your answer. .12. Would you favor a more or a less emphasis on grading at the University level? State your reasons for your answer. DO not write on this margin 15. 14. 129 It has been suggested that students should be on involved directly and actively in the decisions determining their grades. Would you support this suggestion? State reasons for your Opinion. If you can, suggest and defend concrete ways in which students might be directly and actively in- volved in the determination of their grades. 15. Would you, or would you not, support a student motion urg- ing the completegabolition of examinations and grading at the college level? State reasons for the position you take. DO not write this margin ‘I_.I (i) (ii) SUB-APPENDIX (b) SCHEME FOR THE CONTENT ANALYSIS Coding and Categorization Coding Description of Item Symbol Positive1 direction of attitude toward examination + ex Negative2 direction Of attitude against examination - ex Positive direction of attitude toward grading + gr Negative direction Of attitude against grading - gr Content Categories: 1. Statement of function (e.g., feedback; stifles learning) A 2. Statement of preferences--either in direct answer to “which . . . prefer?" or implied in statement B 5. Statement expressing or implying emotion (e.g., very important, less emphasis) C 4. Statement Offering suggestions directly (e.g., term paper) D Question number--Use Roman numerals I, II...XV Respondent: assigned Arabic numerals to be written after the course number, and separated by a colon:-- ED200:4 General Procedural Steps in the Analysis 1. Read through the response to each question. 2. Re—read, and underline significant words, etc., which may be put into one of the content categories, and append the appropriate code symbol. 5. Judge direction of attitude as either positive or negative and append the code (+ ex, for example) be- side the content code Of every underlined word, etc. 4. Transfer the coding symbols to the right margin (use the left margin for writing comments, if any). 5. On the outline summary blank provided prepare the "Summary of Analysis" table (as shown below) and trans- fer the results Of the analysis. 6. All work is to be done on pencil. J'Examples of words, etc.: "feedback"; "very important" zExamples: "stifles learning"; "less emphasis" 150 ” g.” .1... 151 (iii) Specific Hints on the Analysis of Each_Question Item No.* Hints 1. Perhaps this Q. will prove the best stimulus eli- citing responses illustrating "statement of function"--e.g., 1. motivates learning 2. assesses performance 5. reveals weaknesses in learning 4. guides learning. 2. Perhaps best stimulus eliciting 1) incentive to study, work hard; 2) reward. Category A or C may abound, but others not excluded. This comment also applies for number 1 and other items. 5. On the surface this Q seems.a repetition of number 1 and 2 but a new stimulus is subtly introduced in "necessary evil." If respondents agree with this stimulus then the direction of their attitude tends to be negative. Look out for attitudinal and emo- tional overtones. 4. Some reactions will reflect positive attitudes, others negative. Rate (judge) each key word, etc., .appropriately. Perhaps "statement on efficiency/ inefficiency" will be elicited--(Category C). 5. Same remarks as in number 4. 6. The attitude object is written examination.at The following therefore reflect negative attitude (-)Ex 1. Oral exams 2. Term papers 5. Reports of projects, etc. On the other hand the following are positive 1. More emphasis on essay exams 2. More emphasis on Objective exams 5. More quizzes, etc. (Open-book, take-home) *Involving a series of test items--Objective or essay, taken in class or at home, closed-look or Open-book. 7. The attitude Object is the grading system involving at least three levels-~whether letters or numerals, and "GPA." Therefore suggestion Of 1. Pass--Fail 2. Pass--No credit, etc. show negative attitude. Positive attitude is reflected by 1. a finer system 2. a broader system 5. a narrower, etc. * See Sub-appendix (a). Item NO. 8. 10. 11. 12. 15. .14. 15. 152 Hints -Main response here is in category 8 expressed "statement of preference"; direction is positive. mamu mnu CH ocsom on HHN3 muouomm ommnu mcHEmc um mumEmuud .OHQmflHOH ocm magnum manflmm mum wmnu OBOnm OOOOOH>O one .mcowuflmom nouw3m mawumfi OHOuowm may no 080m .HmuaucmoH mum mnouomm umuflm one .ucmfimmumm uommnmm cfl uoc nmsozu HmHHEwm mum mOHmEMm mmouom mcumpumm HODOMM one cowmsaocou .m wamfimm CH m HOuomm £ua3 Hmowucmofl mfl N can a mmHmEMm suon cw N Houumm .Am ocm m mHoQEscv muouomm OBu mm mm: a mamfimw umn3 Aw Honescv Houomm 0:0 Ouca masonm m mamfimm usm .mHOuomm “waafiflm wuflsv uno UGHHQ N can H mmamfimm .m mamfimm OH snow Ops“ ooumnmmom ma umnu Houomm was» ma pH .AN OHQEMmV d “Ouomm mm OEmw may manmnonm OH Ad OHQEMOV m Houomm .mnouomm 03p Oucfl cmxonn coon mm: mm mmHmEmm Ham CH .N .OEMm onu mum m mamfimm mo 0 HOpOMm ocm N mamfimm mo m Houomm can a mamemm mo 5 Houomm Hmmmmm oH503 pH .d ucmEEOO 20 m6 2m mm meum .omcwaumocz mum mOHmEmm mounu map mmouum HOuumm m CH COEEOO meQMHHm> an .m xHOOOQQMIQDm ca omcflmmo mm mmcfiom0H uGOUHMHcmHm m>mn omonoowu mmHQmHHm> one AN .m xHOSOQQMInsm CH cm>flm wommuam mm mnmnfisc OHQMHHM> on» mum mmfiuucm HHOU one Ad mmuoz humcflfiflaoum mmfldum Mbom WEB 92¢ mmqm2¢m mmmma mmB mmomvd WMOBUfim N¢2Hmm> HEB m0 ZOmHm¢NSOU d 13 552255 .005000 snow 0:u 0:0 mxmfiu0umm cuHB m0anmflnm> * 147 Nm.5 5m.5 mm.m N5.m 5m.m mm.> 00.5 m5.m mo.m5 m 0HmEmm IIII 5m.m wN.m hm.m mm.m mm.m N>.> m5.mfi N 0HmEmm .Hm> mm.s mm.m sm.m mm.5 mm.m mm.m m.m mo.s5 5 mamsmm .906 am om.m5 05 0:0: Hm 0:0: .5m.*5 mm on 0:0: am m 0amamm 5m.om . .m5.m5 6:66 6:06 05.6 660: mm 5m.mm.*5 on .5m.*5 5m m mamemm 20 m5 .mn.Nm Nm .5m.om Nm 5m.om .m.*5 mm.05 maoc 6:06 660: om .m5.*5 5m 5 mamsmm N5.55 O5.mm .5N.mN.>5 .mN.mN.5N 0:0: 0:0: 0:0: 0:0: 0:0: .md.5d.*m 0:0: 0:0: .MN.wd.*m m 0HmEmm N5 .55.05.mm pa . .mN.mN.5N 0:0: 0:0: 0:0: 0:0: .md.5fi 0:0: 0:0: 0:0: .mN.md.*m N 0amfimm m0 N5 .55.05.mm N5.MN. 55 .mN.mN.5N 0:0: .wd.*m 0:0: 0:0: .md.*m 0:0: 0:0: 0:0p .MN.md.*m a 0HmEmm m5.om 55 m5.m5 0:06 55.55 0606 55.55 660: .m5.*m .m5.*m 5m m mamsmm 55.55.55 .NN.5N.m5 m5 0:0: 0:0: 0:0: 0:0: N5 .ma.Na.*N m5.ON .h5.*N 0:0: N 0HmEmm zm m5.>5.m5 .55.NN.m5 m5.0N 0:0: 0:0: 0:0: .MH.NH.*N 0:0: 0:0: .mfi.*N 0:0: «N a 0HQE0m mm .mm.mN.mN 0:0: m.>.m 0:0: >n.mm.afi 0:0: 0:0: 0:0: 0:0: .nN.w.*H m 0amfimm mm m.> .um.mm.mN 0:0: 0:0: m m.m.*fi 0:0: 0:0: 55 0:0: .mN.>N.*H N 0HmEmm mm mm.mm 55 .mm.mm.mm 0:0: 0:0: mm 0:0: 0:0: .m.>.*d 0:0: 0:0: .hN.w.m.*d a 0amfimm m m w m m d m N H 0HQEmm 0H00m muouomm 148 .:05505>0o oumocmumamv* 000. 555. 500. 550. 005. 005. 050. .500. 050. 500. 000. 555. 000.5 05 000. 000. 000. 000. 005. 000. 500. 000. 050. 050. 505. 500.5 55 500. 050. 500. 000. 050. 000. 550. 005. 005. 050. 555.5 05 500. 005. 055. 500. 055. 505. 500. 500. 000. 000.5 55 000. 500. 000. 005. 500. 500. 005. 000. 055.5 05 050. 050. 000. 500. 000. 050. 000. 005.5 00 055. 050. 500. 005. 050. 055. 500.5 50 505. 500. 000. 005. 500. 000.5 00 000. 000. 005. 550. 050.5 05 000. 050. 000. 055.5 05 500. 000. 000.5 05 005. 550.5 05 05 55 05 55 05 00 50 00 05 05 05 05 20 5 0509552 0590550> 0500m mwm H0QEDZ .550 050 A * m5nm5um> 05600 20 was 050. 005. 050. 000. 005. 005. 000. 500. 500. 000. 550. 000. 505.5 00 000. 050. 500. 500. 000. 000. 000. 000. 000. 000. 500. 500.5 50 050. 550. 000. 005. 505. 000. 000. 550. 005. 005. 005.5 00 000. 000. 050. 005. 505. 000. 050. 000. 005. 050.0 00 000. 005. 005. 000. 000. 000. 505. -055. 005.5 00 500. 000. 000. 005. 055. 005. 005. 005.5 00 005. 550. 000. 055. 005. 000. 550.5 50 500. 050. 000. 055. 505. 550.5 55 005. 000. 055. 000. 000.5 0 000. 005. 500. 000.5 5 055. 000. 555.5 0 000. 500.5 0 00 50 00 00 00 00 50 55 0 5 0 0 mm 5 WHQQESZ 0HQMflHM> QHMUW Amwm HTQESZ HAM HflH * 0HQMHHM> 0H00m mm 0:8 MCOHHMHTHHOU EwuflleHGH USN EGHHITHMOm Amy 2: 05320000150 mZOHB¢AmmmOU SmBHlmmBZH 92¢ SHEHIHHdUm .:05505>0o oumocmumfimv* 149 005. 050. 050. 500. 000. 050. 550. 000. 000. 005. 550. 500. 000.5 00 050. 550. 005. 055. 000. 050. 005. 505. 500. 550. 050. 000.5 50 005. 550. 000. 005. 050. 050. 055. 505. 005. 505. 050.5 00 055. 505. 000. 000. 000. 500. 555. 050. 000. 005.5 05 505. 500. 000. 550. 050. 055. 000. 000. 005.5 05 505. 000. 555. 050. 050. 000. 050. 005.5 50 055. 055. 000. 505. 000. 000. 505.5 00 005. 005. 050. 050. 000. 000.5 00 005. 550. 505. 005. 000.5 50 050. 000. 000. 055.5 00 500. 055. 050.5 05 000. 005.5 0 00 50 00 05 05 50 00 00 50 00 05 0 20 5 5 0900a m.m 509552 A V* 0H905Hm> 05600 20 005 050. 005. 005. 005. 505. 000. 0504, 000. 005. 000. 505. 005. 000.5 05 500. 500. 050. 500. 000. 000. 550. 000. 000. 000. 050. 000.5 55 000. 005. 005. 005. 005. 000. 550. 050. 000. 000. 500.5 05 005. 505. 005. 005. 050. 500. 050. 000. 000. 055.5 00 050. 050. 050. 500. 050. 550. 000. 005. 005.5 00 505. 005. 505. 005. 500. 050. 555. 000.5 00 000. 005. 055. 055. 005. 005. 005.5, 50 505. 005. 550. 555. 505. 555.5 00 505. 005. 505. 050. 000.5 55 500. 500. 000. 055.5 05 555. 000. 500.5 05 000. 000.5 55 05 55 05 05 00 00 50 00 55 05 05 55 00 5 0509852 0590550> m 0Hmom mwm 509552 .550 050 A * m5nm5um> mHMUm m0 058 SUMMARY OF MEAN SCORES AND STANDARD DEVIATIONS SUB-APPENDIX (i) Variable Mean S.D. Variable Mean S.D. 1 29.5141 8.5172 27 2.5864 1.0477 2 55.2496 8.5111 28 2.5812 1.1528 5 55.4295 9.0265 29 2.6500 1.1886 4 54.0995 8.5214 50 5.4154 1.1780 5 2.6405 1.0815 51 2.6702 1.2298 6 2.8098 1.1149 52 5.8554 1.0555 7 2.5525 1.2298 55 5.1579 1.1949 8 2.7469 1.0620 54 2.4295 1.1951 9 2.7245 1.1290 55 1.7260 0.8165 10 5.0595 1.2491 56 2.6667 1.1095 11 2.2129 1.0147 57 1.8866 1.0040 12 2.7016 1.0748 58 2.5458 1.1219 15 2.7086 1.0554 59 2.1798 1.1751 14 2.6405 1.0551 40 5.0855 1.2271 15 2.5665 1.0511 41 2.7855 1.2284 16 5.0768 1.1152 42 2.8604 1.0625 17 2.7784 1.0051 45 5.5794 1.1759 18 2.6126 1.1165 44 2.1850 1.0050 19 5.2216 1.0756 45 2.4154 1.1296 20 5.1815 1.2550 46 2.9564 1.1440 21 2.5759 1.2578 47 2.5462 1.2079 22 2.4904 1.1265 48 2.5166 1.0594 25 2.7958 1.1171 49 2.0858 1.1055 24 5.0175 1.1020 50 2.1518 1.5484 25 2.8569 1.2008 51 2.5812 1.5802 26 5.0227 1.1924 52 5.5864 1.5296 150 151 OOmM.OI 500N.OI mmmo.o ommm.o hwfid.01 wmmo.o mwwa.01 mhbm.OI mmwo.o mmmm.o mm 0050.0- 0000.0- 0005 0 0000.0 5500 0- 0000.0- 5055.0- 5000.0- 0055.0 0050.0 50 0050.0- 5000.0- 0000.0- 0500.0 0500.0- 5555.0- 0555.0- 0055.0- 0500.0 0055.0 00 5050.0- 5000.0 5005.0- 0005.0- 0050.0 0500.0 5550.0 0000.0 5005.0- 5505.0- 00 0000.0- 0000.0 0055.0- 0000.0- 0005.0 5050.0 0000.0 0505.0 0055.0- 5055.0- 00 0505.0- 0505.0 5505.0- 0000.0- 5050.0 0500.0- 5050.0 0500.0 0005.0- 5000.0- 50 0000.0 5050.0 0005.0- 0050.0 0000.0- 5500.0 0005.0 0555.0 0000.0- 0055.0- 00 5505.0 0500.0- 0005.0- 0000.0- 0550.0- 0500.0 0505.0 5000.0 5500.0- 0505.0- 00 0550.0- 5005.0 5055.0- 0050.0- 5005.0 0555.0 0555.0 0005.0 0055.0- 0000.0- 50 5505.0 0005.0 0005.0- 0000.0 5550.0 5005.0 0005.0 5005.0 5050.0- 0500.0- 00 0550.0- 0500.0- 0050.0 0000.0 0005.0- 0050.0- 0505.0- 5000.0- 0000.0 0000.0 00 5000.0- 0050.0- 0000.0 0500.0 5555.0- 0500.0- 0505.0- 5050.0- 0500.0 0055.0 50 0505.0 0555.0- 5000.0 0050.0 0000.0- 5055.0- 0500.0- 0555.0- 0005.0 0055.0 00 0505.0 0000.0- 5000.0 0000.0 0000.0- 5005.0- 5550.0- 0050.0- 5550.0 0500.0 05 0000.0- 5005.0- 0000.0- 5550.0- 5000.0- 5500.0- 0005.0- 0005.0- 5005.0 5055.0 05 5005.0 5005.0 0000.0- 0005.0- 0000.0 0000.0 0500.0 5005.0 0000.0- 0000.0- 55 5555.0 0000.0- 5000.0- 0000.0- 0050.0- 0055.0 0500.0 0000.0- 5550.0- 0000.0- 05 0500.0- 0000.0 0000.0- 0050.0- 0500.0 0000.0 0000.0 0055.0 0505.0- 5000.0- 05 0050.0 0500.0 0055.0- 0005.0- 5005.0 5500.0 0000.0 0055.0 5500.0 0000.0- 55 0000.0- 0000.0- 0050.0 0500.0- 5500.0- 5055.0- 5000.0- 0005.0- 0005.0 5000.0 05 5000.0 0000.0- 0500.0 0000.0 5555.0- 0050.0 5500.0- 0000.0 0550.0 0000.0 05 0000.0- 0005.0- 0000.0- 0005.- 5050.0 0505.0 5000.0 5000.0 5500.0- 0005.0- 55 0005.0 0005.0- 0000.0 0500.0 0005.0 0000.0- 0005.0- 0000.0- 0000.0- 0000.0- 05 5055.0- 5500.0- 5050.0 0500.0 0005.0 0055.0- 0500.0- 0050.0- 0500.0 0500.0 0 0050.0- 5505.0 5000.0- 0050.0 0005.0 0055.0 0000.0 0550.0 0500.0- 5000.0- 0 5005.0 0500.0- 0000.0- 5500.0- 5000.0 5500.0 5555.0 0005.0 5005.0- 0550.0- 5 0000.0- 0055.0 0500.0- 0005.0- 5505.0 5500.0- 0005.0 0005.0 0500.0- 0005.0- 0 5050.0- 0005.0 0550.0- 0500.0- 5500.0 0000.0- 0055.0 5000.0 0500.0- 0005.0- 0 0000.0- 0000.0- 0005.0 0550.0 0000.0- 0050.0- 0000.0- 5555.0- 5005.0 0000.0 5 5500.0 0005.0 0005.0- 5000.0- 0000.0 5005.0 0050.0 0555.0 0005.0- 5505.0- 0 0000.0- 5000.0- 0000.0 0000.0 0505.0- 0050.0- 5505.0- 0505.0- 0050.0 0050.0 0 0000.0- 5005.0 0000.0- 0505.0- 5050.0 0500.0 0000.0 5505.0 0005.0- 0005.0- 5 05 0 0 5 0 0 5 0 0 5 ¢B¢Q BDOIMmB ZHdZIImHmNA¢Z¢ ZOHBdBOd Ndsz¢> 1.3 055520000050 152 0000.0! 0000.0! 0000.0 0000.0! 0000.0 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0 0000.0! 0000.0 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0 0000.0! 0000.0! 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0 0000.0! 0000.0! 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0! 0000.0 0000.0 00 50 00 05 05 55 05 05 55 05 05 55 05 00 00 50 00 00 50 00 APPENDIX B SPECIFIC INSTRUCTIONS AS ORIGINALLY DESIGNED FOR THE TREATMENT CONDITIONS (a) SPECIFIC INSTRUCTIONS FOR CLASS I (T1) (These instructions are to be given in class, and woven into the instructors design of the "class activity." ghey are to beAgiven orally.) 1. You will be expected to repeat each of the two within-term examinations at home. You may take up to four days before submitting this second attempt for scoring. 2. You will be free to make use of all resources, excluding instructors and fellow students. Your aim is to come out with all answers correct, working independently. 5. Part of your "class activity" score will be based on your performance in this examination repeat, and account will be taken of the gains you make in the number of correct responses. 4. (i) This part of the class activity is to count 10% of the instructor's grade, in other words it is worth 10 "points" out of a total of 100 "points" which make up the instructor's grade. (ii) Award 2 points to all subjects-—for having carried out the exercise. (iii) Award the remaining 8 points according to the table below. 155 154 Initial Score Maximum Score On First On Second Maximum Points to be Performance gfgerformance Gain Awarded 80 80 o 81 70+ 80 10 1 pt. for 1 gain 60+ 80 20 1 pt. for 2 gains 50+ 80 50 1 pt. for 5 gains 2 40+ 80 40 50+ 80 50 21 pt. for 4 gains 20+ 80 60 10+ 80 70 lNote that the t0p scoring student apparently makes no gains but is awarded the total maximum points for "gains." He de- serves it for maintaining his position in both performances. However, if he slips, his score on the second performance becomes the base and he is awarded points in the last cate— gory. For example, suppose second score is 68; the differ- ence is 12 and his "points" 5. The instructor will be ex- pected to comment on the practicality of this sCheme after it had been used. 2Note that the rate is Changed-—to the favor of low scoring students on the first performance. .155 (b) SPECIFIC INSTRUCTIONS FOR CLASS 2 (T2) (These instructions are to be given in class and woven into the instructor's design of the "class activity." They are to be given orally.) 1. 7. You will be expected to repeat each of the two within-term examinations at home. You will be free to make use of all resources excluding instructors and fellow students. Your aim is to come out with all answers correct, working inde— pendently. You will also be expected to score and grade your two per- formances. Score, using your best judgments on what you feel are the correct answers. Evaluate your scores by assigning grades to yourself (0......4.5), using some criteria you feel to be objective. You may take up to four days before submitting your second performance for machine scoring. Later when you receive the feedback, check your scoring and self evaluation and discuss the discrepancies with your instructor, until you are satisfied. Finally prepare your Progress Chart and return it to your instructor for comments. Part of your class activity score will be based on your performance in this exercise. Account will be taken both of the gains you make in_the number of correct responses and in particular of the Size of your mean discrepancies between your scorings and self evaluations and those of the instructor. (i) This part of the-class activity is to count 10% of the instructor's grade, in other words it is worth 10 points out of a total of 100 "points" which make up the instructors' grade. (ii) Award 2 "points" to all subjects—-for having carried out the exercise. (iii) Award the remaining 8 points according to the mean discrepancy score as illustrated in the table on the following page: 156 TABLE OF POINTS TO BE AWARDED Mean Discrepancy_$core Points to be Awarded 0 (Zero) 8 1 - 2 7 5 - 4 6 5 - 6 5 7 - 8 4 9 — 10 5 11 - 12 2 15 - 16 1 Above 16* 0 (Zero) *16 (i.e., 20% of 80--the total maximum score) is the maxi- mum discrepancy score that is to be rewarded. The instructor will be expected to comment on the practical- ity of this scheme after it has been used. 157 8. The following is the Progress Chart to be introduced and explained to the student after the meeting to discuss discrepancies. The student will use ng_page of a Graph paper to prepare his Chart as illustrated. PROGRESS CHART Aim: To Remove Discrepancies Between Evaluations 04.5 40 P b d 56 a \‘ \‘4o0 b §§ 0 5.5 52_ \V d .\1 C \050 280 Q 1 § §§ §§ 0 2.5 24- 0 5 a \ \020 § § ' 20" 0 0 \\ c §§ E: - 1.5 16 5 ‘V§ Nfi ‘~ \§§ \—1.0 12- \%§ § §§ §§ - 0.5 8- §§ \ \ 0% 5-0-0 4- § § 5 5 Test 1 Test 2 Key: a = Self evaluation--in-class performance b = Self evaluation-—repeat c = Instructor's evaluation in—class performance d = Instructor's evaluation repeat 0 4 8 12 16 20 24 28 52 56 40 - Raw Score Scale : 3 %* i %* %—— % %* i 45 %0 0 .5 1 1.5 2 2.5 5 .5 4 4.5 - Grade DETERMINATION OF MEAN DISCREPANCY SCORE Item Test 1 Test 2 Totals Mean TN=4_pairs) (a) minus (c) 4 4 8 i$%-= +5 (b) minus (d) 4 o I%. = 3* *absolute value 158 (c) SUPPLEMENTARY INSTRUCTIONS TO CLASS 2 (T2) In administering your "treatment" the steps listed below should be followed closely: 1) 2) 5) 4) 5) 6) Ask your students to a) write their names on their test booklets--to help them recover their copies b) mark their in-Class performance on both the test booklet and the answer sheets provided; the answer sheets will be handed in but they will keep (or pick up later) their test booklets to score and grade the markings at home as described below. Give to every student a spare answer sheet and a pencil for the repeat performance described below. Emphasize that every student is to rework the test making use of all possible resources excluding fellow students and instructors. To prevent any embarrassment of wide discrepancies this exercise must be done first and with care. When and only when the student has established enough con— fidence in his/her answers on the second performance (without any consideration of the first), then and only then should he/she proceed to score and grade this second repeat performance. Emphasize that guessing in any form will result in wide "discrepancies". With the scoring and grading of his/her repeat performance as the "Key" the student then turns over to his marked test booklet to score and grade that performance also. The student retains in his/her records his/her estimated score and grade. Then on a piece of paper, with his/her name on the paper, the following information is to be pro- vided--ready to be handed in together with the repeat performance. Thus: ‘ Name of Student Test Mid-term Test 2 In-class Repeat Estimated score Estimated grade This information will be used to check the accuracy of the graph. 7) 8) 9) 10) 159 In the following discussion class period the instructor collects the student's self-evaluations, and the repeat performance. Both must be collected before test results are to be made known in the times prescribed by the Course Coordinator. When all the machine scores are returned to the student, the student prepares the graph (two COpies of each) and returns them to the instructor. The instructor then adds appropriate comments--the same on both graphs, one of which he/she keeps and the other returned to the student. The instructor emphasizes that the graph is a Progress Chart--to give the student a visual image of his/her genuine progress. The graph also discourages guessing as it has been shown that this is the chief factor in wide "discrepancies". 160 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0500x5¢r055mm05m 05 05 55 00 00 50 00 00 00 05 50 50 5 0 0 50550055005 1 20 50 50 50 00 50 00 50 00 00 55 05 00 0 0 5 0505xc0r05900050 0 0 5 00 00 00 00 00 50 00 00 50 0 0 5 00550005005 50 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 50050550 05050>0502 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 5005055m 05055009 00 0.5 0.5 0.0 05 05 05 50 05 00 00 00 50 00 50 50 :O55ocsm 0:550>55oz, 0 0 0 05 55 55 50 50 00 00 00 50 00 50 00 00550000 00505005 mm .00 -Ma 55 .|ua -MH,.50 my «a 59 me «a .59 me me as mmflfiUmlmDm MOBU¢W HEB ZO ZOHBmO UZHmOOEU mBZMQZOmmmM m0 Bzmummm 20m: 0 NHQZHQQfi MICHIGAN STATE UNIVERSITY IBRARIES L 3 1293 03071 2297