SOME EFF£CTS OF EMPHASHZENG THE
LEARNENQ FUNCTEON 0F CLASSROOM
ACHiEVEMENT EXAMINATlONS

Thesis for the Degree of D. A. G;
MICHIGAN STATE UNIVERSITY
ABEL EKPO~UFOT
1969

wL=

14L LIBRAR/Y It ‘

Michigan StatO
L1 University

  
    

‘NE-D‘C

ABSTRACT

SOME EFFECTS OF EMPHASIZING THE LEARNING FUNCTION
OF CLASSROOM ACHIEVEMENT EXAMINATIONS

BY

Abel Ekpo—Ufot

How may achievement examinations be conducted so as to
better define and attain the objectives of classroom instruc-
tion? That is the problem investigated in this study. It
is suggested it may be solved by emphasizing the learning
function of examinations.

The suggestion rests on the literature evidence that
examinations are a learning device. The authors quoted as
supporting this view include Jersild, Standlee, Fitch and
Page. This study was designed to test methods which might
capitalize on such learning function.

The methods consisted in requiring one experimental group
to take examinations twice-—in class and outside class.

A second experimental group both repeated and evaluated their
performance before they had feedback. A third group, the
control, was permitted to keep the test scripts. The hypothe-
ses were that each of the experimental groups would score

higher than the control on the final examinations, and the

Abel Ekpo-Ufot

second experimental would score significantly highest among
the three. Furthermore the attitudes of the experimental
groups would be more favorable towards examinations and grad—
ing than those of the control.

About 1500 students formed the population for the study.
They were enrolled in two courses in the College of Education,
Michigan State University during fall term 1968. In one
course there were thirty-four classes, of whom thirty-two
were randomly selected to run experiment 1. All the twelve
classes in the other course were used in experiment 2. In
both cases the classes were randomly allocated to treatment.

The study began with the develOpment of a scale for
measuring students' attitudes towards examinations and grading.
Free response Opinions were obtained from a sample of students
by the use of an open-ended questionnaire. Content analysis
of the returns produced nuclei statements for the attitude
items.

Attitude is multidimensional. The key dimensions are
"positive" and "negative," but these may not be on the same
continuum since they are supported by different attributes
of the psychological object. Such is the framework which
guided the writing of attitude statements. These were tried
out with a sample of 585 students representative of the uni-
versity; the responses were factor analyzed and the final
items selected on the basis of their loadings.

In the main study two examinations were administered

within the term and students carried out instructions as

Abel Ekpo-Ufot

Specified for their treatment conditions. The final examina—
tions were the criterion measures of achievement. The Friedman
xi test showed an overall treatment effect only in experiment
1, and a t-test revealed that the mean for the second eXperi—
mental group was significantly larger than the one for the
control. However,the absolute differences were small; but the
trend was in the predicted direction. This is explained as
due largely to the effect of the second treatment condition:
it did stimulate effort and the self—evaluation would aid
understanding.

The chief weakness of the study was poor control: all
groups kept the within-term examination scripts. Also, the
period of one term might have been too short for the treatment
to work. These might partly account for the haphazard results
obtained on the attitude criterion measures.

The main conclusion is based on the trend in the pre-
dicted direction revealed in eXperiment 1. If students were
required to repeat and evaluate their examination performance
their achievement of course objectives would tend to rise
higher than what it would be without such conditions. It would
be worthwhile to investigate the hypothesis that this "self-
evaluation" would make students attitudes more "positive"
than "negative" towards examinations and grading. Any contri-
butions of this study to knowledge are conditional on col—
laborating evidence--evidence to support the usefulness of

the attitude scale, and the model on which it is based, and

Abel Ekpo-Ufot

above all evidence to show that emphasis on the learning
function of examinations will produce the type of effect

weakly indicated in experiment 1 of the present study.

SOME EFFECTS OF EMPHASIZING THE LEARNING FUNCTION

OF CLASSROOM ACHIEVEMENT EXAMINATIONS

BY

Abel Ekpo-Ufot

A THESIS

Submitted to
Michigan State University
reporting research done as part
of the Special requirements
for the degree of

EDUCATION SPECIALIST

College of Education

1969

/

635936]

6/22/(99

COpyright by
ABEL EKPO-UFOT
1969

This Thesis is Dedicated
to: Ufot-Ekpe—-my late father,
Amma—mmi-—my mother,
Fred Akpan—-my uncle and stepfather,
Udo-Eka-Ekpo, Udo—Aka ("Gabriel"),
Ebenge and Iboro--my brothers,
Idorienyin--my sister,
to: Esit-Ima (Grace)—-my wife,

and to: James S. Karslake-_my strategist.

TABLE OF CONTENTS

CHAPTER

I INTRODUCTION . . . . . . . . . . . . . . . . .

The Functions of Examinations . . . . . .
Dissatisfaction with Testing. . . . . . .
The Problem . . . . . . . . . . . . . . .
Related Literature. . . . . . . . . . . .
The Purposes and Hypotheses . . . . . . .

II METHODOLOGY OF THE MAIN STUDY. . . . . . . . .

The Treatment . . . . . . . . . . . . . .
The Criterion Measures. . . . . . . . . .
The POpulation, Sampling, and Allocation

to Treatment . . . . . . . . . . . .

The Design. . . . . . . . . . . . . . . .

Procedure . . . . . . . . . . . . . 4 . .
Analysis. . . . . .-. . . . . . . . . . .

III RESULTS AND ANALYSIS . . . . . . . . . . . . .

EXperiment 1. . . . . . . . .'. . . . . .
Experiment 2. . . . . . . . . . . . . . .
The Hypotheses on Attitudes . . . . . . .

IV DISCUSSION . . . . . . . . . . . . . . . . . .

Interpretation of Results . . . . . . . .
Implications of the Study . . . . . i . .
Suggestions for Further Study . . . . . .

V SUMMARY. . . . . . . . . . . . . . . . . . . .

Chapter by Chapter Review . . . . . . . .'

Contributions to Knowledge. . . . . . . .
Summary of Tentative Conclusions. . . . .
Summary of Testable Hypotheses. . . . . .
Summary of Conditional Contributions. . .
Conclusion. . . . . . . . . . . . . . . .

LIST OF REFERENCES. . . . . . . . . . . . . . . . . .

iii

Page

0103me P

16
18

19
20

26
27

27
32
55

4O

4O
47
51

55

55
64
66
67
68
69

7O

TABLE OF CONTENTS — Continued Page

APPENDIX A—-The Students' Attitudes Towards Exami-
nation and Grading Scale Battery (SATEG)° 75

APPENDIX B-—Specific Instructions as Originally
Designed for the Treatment Conditions . . 153

APPENDIX C--Mean Percent of Respondents Choosing
Option on the Factor Sub-scales . . . . . 160

iv

LIST OF TABLES

TABLE

1.

2.

Mean and Rank Scores on First Examination
(Experiment 1). . . . . . . . . . . . . . . . .

Mean and Rank Scores on Final Examination
(Experiment 1). . . . . . . . . . . . . . . . .

Covariance Analysis of the Final Examination
Scores-Y (Experiment 1) . . . . . . . . . . . .

Mean and Rank Scores on First Examination
(EXperiment 2). . . . . . . . . . . . . . . . .

Mean and Rank Scores on Final Examination
(Experiment 2). . . . . . . . . . . . . . . . .

Mean Group Scores on the Attitude Factor Sub-
scales. . . . . . . . . . . . . . . . . . . . .

Rank-Score Positions at the High End of the
Factor SUb_ScaleS o o o o o o o o o o o o o o o

Grouped Frequencies, Range and Median of Inter-
Item Correlations . . . . . . . . . . . . . .

Mean Percent of Respondents Choosing Option on
the Factor Sub—scales . . . . . . . . . . . . .

Page

28

5O

52

55

54

56

59

112

118

LIST OF FIGURES

FIGURE Page
1. The Attitude Model. . . . . . . . . . . . . . . 80
2. Attitude Profile of the Try-Out Sample (N=575). 117

5.

The Attitude Model with Specific Reference to
(a) Examinations, and (b) Grading . . . . . . . 121

vi

PREFACE

One conducting this type of study must have a bias;
so has the writer. He does not share the View that class—
'room achievement testing should be abolished at a formal
institution of learning--be it a school or a university.
He does not consider that these twin aspects of the cur—
riculum are a necessary evil: it may not be necessary that
they should be to the student a "traumatic experience".
Rather he shares the view that they are "a natural part of
the total learning process."

But this bias may have intruded itself unwittingly
into the tone of the presentation of this thesis; for this
the writer sincerely apologizes to the reader. He really
meant to present it as a scientific study uncolored by his
biasesmwbut he may not have succeeded. In particular he has
offered tentative conclusions based on trends revealed in
one of the two experiments conducted. But he has not hidden
the fact that the evidence is very weak, not only because
the absolute differences in the so-called trend were very
small despite the ”significance" of the t—test comparisons,
but also because the results in the second experiment were

not definitive. The reader should therefore take as

vii

hypotheses to be investigated all tentative conclusions
made in this thesis.

Many peOple have contributed to make this study possible.
The twenty-two professors who undertook to administer the
Attitude Scale deserve first mention; so also the students
who served as "guinea pigs". The writer wishes to express
his thanks to all these unnamed persons.

Thanks are due to Dr. Andrew C. Porter, and his staff of
research consultants in the College of Education. Their
criticisms and comments on the design of this study were of
great value. Two of the writer's teachers deserve special
mention: one is Dr. Maryellen McSweeny, of the College of
Education, and the other Dr. Charles F. wrigley, of the
Psychology Department and Director of the Computer Institute
for Social Science Research. As the reader will soon find
this study was in a way a "try-out" of some research methods.
These two professors gave the writer, among other students,

a brilliant introduction to these methods. Besides they have
criticized parts of the study that relate to their special-
ties, and in some cases have actually helped in the inter-
pretation of the data. Another professor who had criticized
parts of this study is Dr. Willard G. Warrington, Director

of the Evaluation Services. His searching questionings con-
tributed much to the development of the Attitude Scale to

be reported.

viii

The four members of the Program Committee occupy a
unique position. The Chairman Dr. Robert L. Ebel, has
indirectly inspired this study in that his philosophy is
behind it. Dr. James S. Karslake, of the Psychology Depart-
ment, urged that a research study be included in the
writer's program for the Education Specialist degree. The
other members are Drs. RObert C. Craig, Chairman of the
Department of Educational Psychology and Dr. Paul L. Dressel,
Director of Institutional Research. These four have each
rendered constructive and valuable criticism on the thesis
to be presented. The writer is grateful to them for their
services.

Without a scholarship grant by the home Government of
Nigeria the writer might not have embarked upon graduate
education. This Government has therefore contributed in-
directly but significantly to this study.

Thanks are also due to Drs. W. Sweetland, and D. Freeman,
and their staffs of instructors and secretaries. The study
might have been sabotaged without the cooperation of these
people running the courses in which the experiments were
carried out.

Apart from Dr. Karslake the other persons to whom this
thesis is dedicated are of the household in which the writer
is a member. He is deeply grateful to them for the price

they are paying--to wait.

ix

The last offer of thanks is to those who at one time
or another have grappled with the problems of education.

There is nothing reported here which is not owed to MAN.

CHAPTER I

INTRODUCTION

The focus of this study is on classroom achievement
examinations and the twin practice of grading. In this intro-
ductory chapter some functions of examinations are stated.

The fact that these may not always be realized leads to a
statement of the problem to be investigated. This in turn is
related to the evidence in the literature, in particular

that which SUpports the view that examinations perform some
learning functions. The chapter closes with a statement of

the purposes and hypotheses of the study.

The Function of Examinations

An important objective of formal education is the acqui-
sition of knowledge. Though, practices differ, classroom
achievement examinations are widely used for assessing how
far this objective has been attained. Examinations perform
other important functions also. They motivate the student to
learn. Admittedly, this function is differential; as Tyler
(1966) observes they may stir up "feelings of incompetence

in some students." However, it is likely that such unmotivated

students are in a small minority. Furthermore, examinations
provide a learning experience per se: in Stone's (1955)

words they "represent practice sessions which aid the fixation
of correct responses . . . and the elimination of error." In
other words, the taking of examinations in effect promotes

and guides learning.

Moreover, in a society such as ours, it would appear one
cannot escape evaluation. If the school exists to prepare
youth to fit a need in society then it cannot altogether ignore
some preliminary evaluation of the products it turns out to
society. It may even be argued that such evaluation helps to
remind the student of his future role,.and that society ex-
pects him to be proficient in his fulfilling that role. If
this argument be granted then, from the student's point of
View, there are at least four functions of classroom examina-
tions. The motivation for learning, the promotion, fostering
and guiding of learning, the assessment of what ”amount" has
been learned and the reminder that learning must be proficient
if one is to fulfill his role adequately in society-~all these

are of special importance to the student.

Dissatisfaction With Testing

Teachers tend to overemphasize the assessment function
at the expense of other functions of classroom achievement
testing. In such a situation, the attainment of the educa-

tional objectives may be limited or thwarted. Evidence is not

lacking that there is some dissatisfaction with testing in
general and achievement testing in particular. Take for an
example Banesh Hoffmann's book: The Tyranny of Testing (1962).
The author is directing his attack against the "professional
testers" and their reliance on multiple-choice tests and
item statistics. But his view that "there is no satisfactory
method of testing" applies to the classroom situation as well.
"If sample questions made by the best test makers can give
cause for concern," he asks, ”What of multiple-choice tests
made by individual teachers for their own classroom use. . .?”
The poor quality of test items, as Hoffmann says is
cause for concern. But one is tempted to express the Opinion
that, within the classroom, the "tyranny of testing" is most
evident in the teacher's toonmuch-emphasis on the assessment
function of examinations at the expense of the learning func-
tions, and the dissatisfaction among students may partly be

explained by this fact.

The Problem

Granted that examinations serve important functions to
the learner, it would appear there is a strong case to retain
the system as an aspect of the school curriculum. If one
takes this position he is faced with a problem: how may
achievement examinations be improved in use so as to better
define and attain the objectives of classroom instruction?

This appears to be an important practical problem in all

education. It may be that the achievement examination is
perceived as a necessary evil because of how it is carried

out in practice: the questions posed may be unintelligible,

or they may be ambiguous, or they may be highly speeded, or
the student may be denied the opportunity of knowing what his
performances are in the light of expected responses. Moreover,
as hinted earlier, it may be the teacher has created an atmos—
phere which overemphasizes the assessment function. This may
be the case when he deprives the student of the opportunity

to have back the examination papers because they must be kept
secure for use with other sets of students. This practice
added to other undesirable elements bias the student's attitude
against examinations.

The position taken here is that for those.who think the
examination system may be retained, the problem of improvement
in use may be partly solved by emphasizing the learning func—
tion of classroom testing. This change of emphasis is in
accord with the teachers professional role in the learning
situation. Besides the new emphasis may hopefully change the
student's perceptions of examinations, and the twin practice
of grading. The evil aspects of the system will thus be
minimized and conditions set for higher attainment of
objectives-—higher than the attainment possible under condi—
tions where the assessment function is emphasized at the
expense of the learning function.

If, for example, students take an examination in a class-

room situation under the so—called "examination conditions"

and in addition repeat the examination "at home," making use
of all available resources, excluding the teacher and fellow
students, and their performances on both occasions count for
their grades, then they may perceive the learning function

of examinations. In this case, a student would be "cheating"
if he were to solicit help from the teacher or his fellow
student, within the “examination period." Other genuine
efforts to seek out the correct response would then be encour—
aged and rewarded.

If, in addition, the students are made to "grade" their
own performance to the best of their knowledge, they would be
learning still in carrying out such a requirement, and they
may grow to perceive and appreciate the meaning of examina-
tion grades. Classroom achievement examinations administered
in this way may be described as improvement—in-use. From the
point of View of both the teacher and the learner the modi-
fied practice neither eliminates nor depreciates the assess-
ment function; but it pushes to the fore the learning
function, and this is likely to be richly rewarding.

The specific problem of this study was to investigate
these suppositions using students enrolled in two courses in
the College of Education, Michigan State University. Further
details about the students and the courses are given in
Chapter 2. The main purpose at this stage is to introduce
the problem both in general, and with specific reference to

the particular conditions in which it was investigated.

How may achievement examinations be improved in use when
applied to the courses selected, so as to better define and
attain the objectives of these courses? To solve this prob-
lem examinations were administered twice each--in class, and
outside class-—to groups of students. Members of one of the
grOUps were also required to evaluate their performances.
Was such a procedure any improvement-in-use of examinations?
The answer to this question will be found in Chapters 5 and
4. Meanwhile it will be necessary to relate such a practice

to similar ones in the literature.

Related Literature

The problem posed and the solution proposed stem not only
from a practical situation but also from two types of previous
research studies. One type deals directly with the learning
function of examinations and the other on the effect of
knowledge of results and encouraging comments on student's
examination performances. A few of these will be quoted to

illustrate the connection.

»Examinations as a Learning Device

In a study on “Examination as an Aid to Learning" Jersild
(1929) sought to answer this question: to what extent does
the examination enforce an active participating attitude of
mind on the learner,.and does such activity yield higher re-

turns in achievement when compared to the attainment resulting

from ordinary conditions of study? He used two equivalent
groups in each of a set of replicated experiments where the
main treatment variable was what the author called "pre-
examination." By this he exposed the "experimental" groups
to an examination experience before using the same test items
or constructing new ones to assess the groups' achievement

of course objectives. Thus the eXperimental groups had
examination "warm—up" during the pre-testing or "pre-examina-
tion" period. The other treatment variable was the examina—
tion-type; there were three types: true-false, multiple
choice and essay.

There were five replications of this study. In the
first two the "pre-examination" was made up of true-false
items; multiple—choice items were used in the third and
fourth experiments and the essay in the last one. Jersild's
study is very relevant to the present one; three of the
replications will therefore be described in some detail.

The first experiment, like the others, was carried out
in a psychology class. There were two sections in the class,
each made up of 57 students. The course objectives are
stated in general terms to include an understanding of "class~
room lectures" and "reading assignments." At the beginning
of a semester one of the groups was randomly selected as the
experimental group and given a "pre-examination" on materials
to be covered in the next six weeks; the other class had not

this treatment. At the end of the first six weeks both

classes were administered the same true-false examination
used in pre-testing the experimental group.

The claSses exchanged roles in the second part of the
semester such that the one that served as the control,
formerly, became the “experimental" group and was pre—tested
on materials to be covered in the rest of the semester. In
the end both groups were assessed on their achievement by
the same true—false test used in pretesting the experimental
group.

The third experiment also involved two classes, each
with 42 students. The tOpic to be learnt was "Reaction Time."
The experimental group was selected randomly and counter-
balanced as described above. The "pre—examination" in this
case was made up of multiple—choice items. But the final
achievement was tested by newly constructed true-false and
recall items.

The procedure in the last experiment (N = 65 in each
group) followed the lines already described. But here the
"pre-examination" for the experimental groups was of the
essay-type, and the subject to be learnt was a biographical
selection. A test of immediate recall was administered as
a criterion measure.

The author summarizes the results of these experiments

100(Me)

in the form of ratio scores M
c

where Me equals the
mean score for the experimental group and MC the mean score

for the control. With the exception of replications in which

true-false items were used in the "pre-examination" the
experimental group always scored higher than the control.

The study may be criticized on the ground that it did
not control for the memory factor. But as most of the results
were in the predicted direction one cannot reject completely
the author's conclusion that the treatment group excelled
the control in subsequent performance, and that the treatment
stimulated "the industry of the learner."

The present study will use and modify Jersild's method
of repeating the same examination with the treatment group;
but the repeat will be outside class so that not only will the
industry of the learner be challenged but also will he be
able to use the examination directly as a study guide.

Standlee §£_§;, (1960) investigated "quizzes" and their
contribution to learning. The quizzes were made up of twenty
true-false items and given at the end of each month of work;
thirteen of them were administered during the experiment.
There were three eXperimental conditions and a control. In
condition 1 the quizzes were administered in the written form,
graded by the instructor and the scripts were returned to
the subjects. The author explained that the mere giving of
quizzes would enforce the students' learning activity as well
as provide a structuring of the course for the guidance of
the students. The instructor's written grades provided
extrinsic motivation; moreover the students had knowledge of
their performance item by item as the corrected scripts were

returned to them.

10

The second experimental condition received the quizzes

in written form too; but the members checked their own work,
presumably from key provided by the instructor. This group

therefore experienced the same benefits as stated for those

in condition 1, but without the extrinsic motivation from

teacher—awarded grades. In the third condition the same

quizzes were read out orally by the instructor who also pro—
vided the correct answers. The only benefit enjoyed by this
ggzroup was the enforced activity and course structuring. The

control group enjoyed none of these benefits as it had no

quizzes .
All groups had a preliminary pre—test comprising 100

111111tiple-choice items which had been tried out in the same
The scores on this were used

course in a previous semester.
The criterion

as covariates in the analysis of the results.

I1“feasures comprised of 100~item multiple-choice examination
given at mid—semester, and a 150-item test given at the end.
The mid-semester examination included 50 items from the pre-
test while the final included the other 50 items which were
j‘r1 ‘the pre—test, but not in the midwsemester examination.
VV1145313 the mid-semester scores were used as criterion, signifi-
cant difference was found at the .06 level as against the .05

h . . . .
ybethesnzed. A t-test comparison of the means of condition

1
and the control was also significant at the .05 level.

Th Q
Cb -
3‘ terion .

differences were not significant with the finals as

¥

 

11

It appears that the author's criterion measures were

not sensitive enough, since they contained from a third to a

Furthermore, a "multiple

.half of the items on the pre-test.

<:omparison" technique like Scheffe's (1959) could have been

ilsed. It must be remembered also that the quizzes were

Iruade of true-false items, which according to Jersild (229 cit.)
Eilfe of "dubious value as a pedagogical instrument." These

.lgindtations may have eclipsed the effect of treatment.

The present study also uses examination as the main treat~

:nneent variable. But all the defects listed above are avoided.

.bdcareover, the idea of the subject grading his own work is

«EICiOpted and given much weight and significance in that the
.sstllﬁect was given the Opportunity to compare his self-evalua—

t::i<3ns with the evaluations from the teacher—experimenter.

In a similar study Fitch §£__l, (1951) investigated the

effect of "frequent testing as a motivating factor in large

The authors found that frequent testing

1e Cture classes . "
But they remarked that

re Sulted in " superior achievement."

"instructional function (is) best served when divorced

tiflea
The

from the regular process of achievement evaluation.”

I31:.‘5353ent study specifically challenges Fitch's (§£_2;,) remark.

The

subjects were told at the beginning that all examinations
a - . . .
cill-"llinistered would count towards their final evaluations.

CPea
F“‘§§EJEEZher's Comments

The studies hitherto mentioned were conducted.in a col-

 

leg
‘533 setting. Page (1958) couched his in a secondary school

 

 

12
setting. He had 74 teachers of different subjects from difw
ferent schools involved in an eXperiment in which the treat-
:ment consisted only of “teacher's comments" on objective

examination answer scripts. The subjects for the experiment

were drawn from the 7th through the 12th grades and twelve

schools were represented. The treatment variable was at

three levels--"no comment," "free comment" and "specified

comment" and subjects were assigned to treatment at random.
The experiment basically involved administration Of the

treatment on the answer scripts of a first test and then us» ,

ing the performance on a second as criterion. Since a

factoral design was used, the experimenter was able to

investigate interactions between treatment and schools, and

Classes and school year. The results were analyzed by the

Friedman Rank-Test and the effect of treatment was highly

Significant. The author concludes: “When the average

Secondary teacher takes the time and trouble to write com—

ruerits . . . these apparently have a measurable and potent

e:Ezlfect upon student effort . . . or whatever it is which

ca-‘Lnses learning to improve.”

It would be interesting and valuable to know whether
Similar conclusions can be reached if the study were conducted

In a college setting. The present study incorporated

Q'Cbrnments" in one of the conditions. But perhaps its greatest

QQ
hhection with Page's work will be in the use of different

QQ
uxses, and similar test statistics in result analysis.

<3:f?

15
The Purposes and Hypotheses

The related studies reviewed provide evidence that class_

room achievement examinations perform some learning functions

besides their measuring function. The present study was not

concerned with establishing additional evidence for this

learning function; this was, and is assumed. Rather it was

(moncerned with manipulating the examination variable in order
tx> realize and increase its value as a learning device. As

iJidicated earlier there is some dissatisfaction with examina-

thons. Such a state Of affairs would appear to result from
tlie way examinations are operated, and not through any in-
tirinsic attribute of examinations. One may even suspect that

izkuose who speak of the "tyranny of testing“ would not gainsay

Jitzs potentiality to stimulate and promote learning in a class-

room situation .

.P_ urposes

It was suggested earlier also that attainment Of course

C313§jectives may be increased and that student attitude may

1>€2<zome favorable towards examination and grading if the learnm
igrmg; function is deliberately emphasized. The primary purpose

'this study was therefore to spell out and test a method by

which the learning function of examinations may be increased.

tPIIJj-Es method consisted of administering two examinations within

a.

Tliues:

‘

1:”Eérm, followed by a final examination at the end of the term.

results of all three examinations count for the course

 

14

ggrade earned by each student. The two within-term examina-

t:ions were manipulated as follows: the examinations were

iiirst taken by all students, that is taken under classroom

"eexamination conditions." Two randomly formed experimental

ggiroups then repeated the examinations in a home situation,

EiIld members of one of the grOUps were further required to

eexraluate and grade their performances. It was hoped that

EBIJCh "treatment" challenges the industry of the student and

eernphasizes to him the learning function of examinations and

1:}1e meaning and significance of grading. This modification

j.r1 examination procedure seeks to incorporate deliberately

t:}nose practices judged to have high value as a learning

dievvice; moreover, it makes the student experience the prob-

lem of awarding grades.

A secondary purpose of the study was to survey and

describe the attitude of the students involved towards exami-

Ilaitxions and grading. The need for a valid instrument to carry

<311t: such a survey defined another purpose: to deveIOp an

attitude scale battery.

W

The following were the hypotheses emanating from the

p1’133‘L>Oses just stated:

1) The experimental groups which repeated earlier exami-
nations would score higher than the control group on

subsequent examinations in the same course.

2) The experimental group whose members both repeated
and evaluated their performances in earlier examina-
tions would score higher than all other groups in
subsequent examinations in the same course.

5) The attitudes of the eXperimental groups as measured
by a specially developed scale would be more favor-
able towards examinations and grading than the atti—
tudes of the control group.

The first two were tested at the .05 level of signifi—

cance, but the results on the last hypothesis were used for

rank-ordering the treatment groups on the criterion measure.

CHAPTER II

METHODOLOGY OF THE MAIN STUDY

By now it should have been obvious to the reader that
the "treatment variable" in this study was the method Of
examination. It was the "variable" not in the quantitative,
but in the qualitative sense. But it is necessary to ex-

plain how it was supposed to Operate.

The Treatment

There were three conditions as follows:

1) an examination was taken under normal conditions and
members of the group were required to repeat their
performance in non-examination conditions;

2) an examination was taken under normal conditions and
members of the group were required both to repeat
and to evaluate their performances on the two occa—
sions;

5) an examination was taken under normal conditions and
members of the grOUp were required neither to repeat
nor to evaluate their performances; but they were
permitted to keep the examination scripts as their
prOperty.

These three treatments may be referred to as T1, T3 and

T3, respectively. The requirements stated above are clear;
but the second one for T3 may easily be confused with the so-
called "level of aspiration" type of experiment. In the latter,

the subject "estimates" his score, for example; by and large

16

17

such estimates are guesses, depending as they do on past
experiences of success or failure. It must be emphasized
that the self-evaluations envisaged here should not be
guessed estimates; if they are guesses depending on what
"self-concept" the subject holds, then they are not the
treatment implied in this study.

If the self-evaluations were not to be guessed esti-
mates, what should they be? They were and should be scores
and grades which the subject awards himself--solidly based
on knowledge—~present knowledge, which he gains by expending
effort to use all possible resources, excluding the teacher
and fellow students, to search out for the correct responses
for the test items. In a University setting the requirement
to carry out such a search is not beyond the student. Even
in a High School setting with fairly adequate library and
other learning facilities the student can cope with this
requirement.

Is T1 different from T2? Both require effort to search
out the correct answers. It is, however, claimed that the
additional requirement imposed for members Of the second
group to judge their work induces them to pay more attention
than do the members of the other group; should this be so
they would also learn more. By the same argument members of
T3 might not learn as much as those Of the other two groups.

It should be added that all the three treatment condij

tions emphasize the learning function of examinations.

18

Clearly, the "practice effect" is double for T;, and T3, and
all three groups have the Opportunity to use the test items

as a study guide.

The Criterion Measures

The final examinations at the end of term reflect the
course objectives; scores on these were used to test the
hypotheses on students achievement.

The second set of criterion measures were scores on the
"Students'Attitudes Towards Examinations and Grading" scale
batteryf-a scale which may be called for short "SATEG" scale
battery. This instrument was Specially develOped for this
study. A brief account of the Operations involved is rela-
vant here.

Free reSponse statements of Opinions on examinations
and grading were first obtained from a small sample of stu-
dents through an Open-ended questionnaire. These responses
were content analyzed in a search for "significant“ statements
which focus on clearly specified attributes of examinations
and grading. The selected significant statements formed
nuclei for the initial sixty-five attitude statements con—
structed. These were rated and Q-sorted by ten judges. The
forty and eight statements which survived that exercise were
administered to a representative sample of students, and the
responses were factor analyzed. Finally thirty-two items

were selected largely on the basis of their high loadings on

19

the various "factors" revealed. There was therefore suf—
ficient evidence both in the operations outlined and in the
reliabilities of the factor sub-scales——enough evidence, that
is, to show that the scale is fairly valid for the purpose
for which it was designed. Full details of the Operations

will be found in Appendix A.
The Population, Sampling, and Allocation to Treatment

The population used in this study was made up of stu-

dents enrolled in two courses during the Fall Term, 1968.
The courses are (1) ED 200: Individual and School and
(2) ED 450: School and Society. Both are Offered in the

College of Education, Michigan State University.

In ED 200 there were 54 classes, each made up of at
least 50 students. Sixteen instructors were in charge of 52
c1asses-—one each in the morning and one each in the after—
noon. Two other instructors were in charge of the other two
classes, one for each. In one of these another experiment
was in progress, and to control for possible contamination
from this source the class was not sampled; the other class
was also withdrawn since its instructor had one class and not
two as the others. .Thus sixteen instructors class-groups
were left for sampling. Fifteen of these were randomly
selected, randomly formed into three equal groups and the
groups then randomly assigned to the experimental conditions.
The selection,'formation of groups and allocation to treatment

was done separately, and based on a table of random numbers.

20

In ED 450 there were twelve classes of at least fourteen
students each. .These were under seven instructors, five of
whom taught two classes each. The other two had one class
each. All classes here were involved in the study. These
twelve classes were randomly formed into three groups and the
latter were then randomly allocated to the three treatment
conditions.

The pOpulation thus defined is rather limited and conclu—
sions will largely be confined to it. But it may be argued
that it represents typical education students as these two
courses.are required Of all education majors. To the extent
that these students are typical of education majors in particu-
lar and college students in general, the conclusions may be
extended. However, no attempt at such wide generalization

will be made from this study—-as yet.
The Design

The main elements of the design have been described, but
it is necessary to add that the study was conducted as two
separate experiments. The one in ED 200 will be referred to
as experiment 1, and the one in ED 450 as experiment 2. The

resulting sub-designs are illustrated in tabular forms below:

21

SUMMARY OF SUB-DESIGN FOR EXPERIMENT 1

 

T1 T2 T3

 

C1 C2 C3 C4 C5 C11 C12 C13 C14 C15 C21 C22 C23 C24 C25

C6 C7 C8 C9 C10 C16 C17 C18 C19 C20 C26 C27 C28 C29 C30

 

 

 

 

 

SUMMARY OF SUB-DESIGN FOR EXPERIMENT 2

 

 

 

 

 

 

Ti T2 T3
Ci C2 C5 C6 C9 C10
C3 C4 C7 C8 C11 C12
KEY: Treatment

Class (nested within Treatment)
Procedure

This study was a practical classroom eXperiment. It is,
therefore apprOpriate to describe first how the courses used
are normally organized, and then the execution of the experi-
ment and how it was woven into the existing structure.

There is always a large enrollment in the two courses.

In the period Of study, the totals were 1129 and 185 for

ED 200 and ED 450 respectively. The lectures are given by a
team of professors including the Course Coordinators who are
also responsible for all arrangements relating to the courses.
The students are divided into "discussion" groups under the

leadership of graduate assistants, as instructors. These

 

22

groups constituted the "classes" which were the experimental
units in the present study.

The administrative operations were conducted at three
levels--(1) arrangements with the Course Coordinators,

(2) contact with the instructors and (5) students' activities.

Consultation with the Course Coordinators preceded and
continued throughout and beyond the study period. They were
informed of the nature and purpose of the study through dis-
cussion, and the prOposal was made available to them. They
in turn supplied the writer information on the number of dis-
cussion groups and their instructors. The latter formed the
basis for the definition of classes, formation of groups and
allocation to treatment. .All these were done randomly as
described earlier. Furthermore, the Coordinators were told
in discussion and in writing the type of scores that would be
used as criterion measures. Such information would be kept
in their records which would be made accessible to the writer
when he needed them.

Contact with the instructors was to be kept at a minimum.
There were reasons for this. First, the writer did not wish
to bias any of them for or against the treatment; secondly he
would have preferred an atmosphere in which no fuss about the
study existed, and in which as far as possible the subjects
remained naive; thirdly, it was desired to see how far the
procedure for carrying out this study could be understood from

written instructions only. More will be said on these points

25

in chapters IV and V . Meanwhile, it will suffice to say that
the instructors were expected to follow written instructions
but that in actual practice the writer dealt with problems
individually as they arose. These were very few in the T;
condition. Most of the problems arose in connection with T2
making it necessary to eliminate certain aspects of it.
Originally the members of this group were expected to graph
their scores and grades. Such a graph was called a "progress
chart" and was to be submitted to the instructor for "comments.'i
Furthermore the instructor was to allow at least ten percent
of his assigned grade to the activities involved in this study.
These aspects of the treatment were eliminated because they
involved both the student and the instructor in too much work.
Appendix.B presents all that was originally designed for
both treatment conditions and includes the supplementary in-
structions in full. Here it is only apprOpriate to present
the instructions as they actually applied. These were given
orally by the instructor, and woven into his planned activi—

ties for his class.

Instructions for T; Condition

1) "You will be expected to repeat each of the two within-
term examinations at home. You may take up to four
days before submitting this second attempt for scoring.”'

2) "You will be free to make use of all resources, ex-
cluding instructors and fellow students. Your aim
should be to come out with all answers correct, work—
ing independently."

24

Instructions for T2_Condition

1)

2)

5)

4)

"You will be eXpected to repeat each of the two
within-term examinations at home. You will be free
to make use of all resources, excluding instructors
and fellow students. Your aim should be to come

out with all answers correct, working independently."

"You will be eXpected to score and grade your two
performances. Score, using your best judgments on
what you feel are the correct answers. Evaluate
your scores by assigning grades to yourself (0 - 4.5)
using some criteria you feel to be objective."

 

"You may take up to four days before submitting your
second performance for machine scoring."

"Later when you receive the feedback, check your
scoring and self-evaluation and discuss the discrep-
ancies with your instructor, until you are satisfied.”

During the study it became necessary to issue SUpple-

mentary instructions for this group. They were likewise

addressed to the instructor. Here again the full instructions

will be found in Appendix B (c). The relevant portions

actually adopted were as follow:

1)

5)

4)

"Ask your students to:

a) write their names on their test booklets-~to help
them recover their copies,

b) mark their in-class performance on both the test
booklet and the answer sheets provided; the answer
sheets will be handed in but they will keep (or
pick Up later) their test booklets to score and
grade the markings at home-~as described below."

"Give to every student a spare answer sheet and a
pencil for the repeat performance described below.‘I

"Emphasize that every student is to rework the test
making use of all possible resources, excluding
fellow students and instructors. To prevent any
embarrassment over wide discrepancies this exercise
must be done first and with care.“

 

"When and only when the student has established enough
confidence in his/her answers on the second perform-
ance (without any consideration of the first), then

25

and only then should he/she proceed to score and
grade this second performance. Emphasize that guess-
ing in any form will result in wide discrepancies.”
5) "With the scoring and grading of his/her repeat per-
formance as the "Key" the student then turns over to

his marked test booklet to score and grade that
performance also.

Treatment Condition T3

This was the "control" group; the members were allowed
to keep their test booklets, but no other requirements were
expressed.

The third level of operation may be described under stu~
dents' activities. These consisted of their following
instructions as these were communicated to them through their
instructors. Members of both groups T1 and Ta repeated their
performances in the examinations and re-submitted their work
for machine scoring. But in some classes, and particularly
in experiment 2 there were misinterpretations of the self-
evaluation requirement at the beginning of the experiment.

As mentioned earlier it became necessary to issue supplemen-y
tary instructions; after that there were no more problems.

Members of the control group (T3) were not required to
do anything other than take back the examination scripts which
they kept as their properties. Finally, it is also relevant
to note that all students were given written "keys” to the two
within-term examinations. These were however delayed for
about four days until members of grOUps T1 and T2 had turned
in their second performances and self evaluations. Instructors

also discussed the tests in class.

26
Analysis

The results of this study were analyzed by nonparametric
methods. In particular the Friedman X? was used to test the
overall effect of treatment with respect to the hypotheses
on achievement of course objectives. This was followed by a
t-test comparison of group mean scores. The grOUps were
ranked on the attitude criterion according to their mean scores
add percent of high scorers on each sub-scale. Full details

of this analysis and the outcome are presented in chapter III.

CHAPTER III

RESULTS AND ANALYSIS

The use to which the results of the first mid—term
examinations were put is given in this chapter. This is fol-
lowed by the outcome of the study with respect to the hypothe-
ses investigated. The analysis is made for each of the two

experiments separately.

Experiment 1

There were ten classes under each treatment condition in
experiment.1. Their mean scores on the first mid-term exami-
nation are shown in Table 1 on the following page.

It will be noticed as one reads down the columns under
each treatment condition that the scores are arranged in
descending order of magnitude. Thus Class 1 occupies top rank
position within the T1 group; class 11 occupies top rank posi-
tion within group T2; similarly class 21 is tOp in the T3
group. _As a further illustration class 9 is ninth in T1;
class 19 and class 29 are also ninth in the groups T2 and T3
respectively. This arrangement makes it possible to match
the classes according to their rank positions in their respec-

tive groups. It turned out as the table shows that the

27

28

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.a scan map mm: pmmnmnn was
*
so.mm mm.mm mm.sm mnoum
cmmz
msonw
a mo.sm mm om m am.mm mm om m sm.mm mm oa
m Hm.¢m mm mm a mm.am mm ma m ma.sm em a
a sm.em mm mm m mm.em mm ma m mm.em mm m
m em.¢m an em a mﬁ.mm me be! m mm.sm an e
m ma.mm mm mm a ms.mm mm me n am.sm mm m
m ma.mm mm mm a om.mm mm ma m mo.mm mm m
m mm.mm mm am a m>.mm em «a m am.mm mm a
m >¢.mm mm mm m m~.mm mm we a am.mm mm m
m am.mm mm mm m mm.mm pm me a om.mm mm N
m em.mm mm Hm m «a.mm em as a mm.mm an a
xcmm mnoum z mmmao. xcmm muoom Z mmmau xcmm whoom z mmma
*BOM 2mm: *3Om smmz *3om cam:
0H. NH. HR.

 

 

 

 

Ad BZﬂSHmmmva ZOHBdZHEde BmMHm ZO mMMOUm MZ¢M QZ¢ Zﬂmz

H mﬂmﬂﬁ

 

 

29

matched classes would have very nearly identical scores if
these were reduced to two significant figures.

The columns headed “Row Rank" reflect the absolute difw
ferences in the scores of members of the matched triples.
Classes 1, 11, and 21, for example, have scores of 26.52,
26.11 and 25.84 respectively; their rank scores within the
triple are therefore 1, 2 and 5 respectively. Scores for
classes 7, 17 and 27 are 24.56, 25.16 and 24.97, and the
corresponding rank scores are 5, 1 and 2 respectively. The
Friedman test (Siegel, 1956) was applied to test the signifi-
cance of the difference of the sum of ranks shown in the
"Row Rank" column. .This was not significant (Xi = 4.2;
x:(.05)2 = 5.99). -Evidently the differences among the groups
were not statistically significant at the start of the

experiment.

The Hypotheses on Achievement of Course Objectives

The first hypothesis was that the mean score for group
T1 would exceed the one for group T3 in the achievement Of
course objectives as measured by the course end examinations.
The second hypothesis maintained that the mean for T2 would
exceed each of the means for T1 and T3. The final examina—
tion results presented in Table 2 on the following page were
uSed in testing these hypotheses.

Table 2 shows that the classes in each matched triple

‘V‘ire ranked on the basis of their mean scores, as illustrated

’ _

ea"airlier. The Friedman test was applied to test the overall

 

50

 

 

 

 

 

 

 

 

 

 

mm.mm mm.mm am.mm mnoum
Owe;
ucmEummHB
N mm.Hm on H mo.mm ON m mm.om 0H
m HH.Hm mN H Hm.mm mH N Hm.Hm m
m HH.Nm mm H mo.Hm NH N Hm.mm m
N mN.Hm sN H so.Nm sH m mm.om s
m om.Hm mN H mo.Hm mH N HH.mm m
H «H.Hm mN m mm.Nm mH N sm.mm m
m m©.Hm HN N mm.mm «H H om.mm H
m OH.Nm mN H mm.mm mH N >m.¢m m
m sm.Hm NN H mN.mm NH N mm.¢m N
m mH.Hm HN H NN.©m HH N mm.Hm H
Mcmm whoom mmMHU xcmm Onoom mmmHU Mcmm mnoom mmmao
30m cmmz 30m cmmz 30m cmmz
me we as

 

 

 

 

 

 

N mqmﬁa

AH BZHZHmmmNmV ZOHBdZHZde A¢2Hm ZO mmmovm MZ¢M QZ< 24m:

51

treatment effect. The difference was significant at the
.05 level (X? = 8.6; x:(005)2 = 5.99). This means that the
risk involved in rejecting a contrary ("null") hypothesis
that the treatment produced no effect has a probability of
about five percent: in other words the probability is high
that the null hypothesis is false. If so the alternative
that there was a treatment effect may be accepted.

A t—test comparison was then made between the pair of
means for T1 and T3. The difference was not significant
(t = 0.995; t.05(18)= 1.754). The meaning in this case is
that the treatment effects on these two groups, if any, were
not significantly different.

The second hypothesis was in two parts: part 1 involves
comparison of the means for T3 and T1; these as the table
shows are almost identical. The other part involves the
means for T2 and T3. A t-test showed that the mean for T2
was significantly larger than the one for T3 at the .05 level

as hypothesized (t = 2.55; = 1.754). The chances

t.05(18)
are therefore small--about five percent—~that the null
hypothesis of equality of means for the two groups is true.
The alternative experimental hypothesis was therefore accepted
that the mean for Ta was significantly larger than the one
for T3.

The nonparametric test revealed there was an overall
Significant treatment effect. Would a parametric test lead

tC) such conclusion? To provide answer to this question the

Crfiterion scores were re-analyzed by the analysis of

52

co-variance method. The means for the first examination

were used as co-variates. As mentioned earlier they were

not significantly different, but the F—value of 2.08 suggested
there might be one or two very large scores, so that it would
be advantageous to remove the variance associated with initial
test scores. Table 5 summarizes the results of this analysis.
It is evident that the gain in the co-variance analysis is
only slight. Without it the F-value is 2.45, and significant
at 20% (F — 1.71); with it F is 2.76 and significant

.20(2,27)‘

at 10% (F.10(2,26)= 2.52). In neither case is the difference

significant at the five per cent level, as was hypothesized.

TABLE 5

CO-VARIANCE-ANALYSIS OF THE FINAL EXAMINATION SCORES-Y
(EXPERIMENT 1)

 

 

Source SSX SSXY SSY ssy. df MS

 

Between 0.645 1.915 9.595 5.901 2 2.951 2.76*

{Within 18.056 21.252 52.860 27.818 26 1.069

 

Total 18.679 25.167 62.455 55.719 28

 

 

 

 

 

 

 

 

 

 

*Not significant; F 05(2 26): 5.57.

A similar covariance analysis of the means for T2 and T3 also
.revealed no "significant" difference; but the F value was

(3.02 less than the one required for significance. The

52a

tabular illustration below diSplays the relevant data.

COVARIANCE ANALYSIS OF THE FINAL EXAMINATION SCORES
FOR T2 vs. T3 IN EXPERIMENT 1

 

 

 

 

 

 

 

 

 

 

 

 

 

Experiment 2

Source SSx SSXY SSY SSY' df MSY' F
Between 0.4565 1.8476 9.5775 5.5567 1 5.5567 4.4175
rWithin 6.6420 8.0159 50.9776 21.5085 17 1.2554
Total 7.0985 9.8615 40.5549 26.8452 18

F.05(1,17)= 4°45’ F.10(1,17)= 5'03

Table 4 on the following page shows the mean class scores

on the first examination for the classes and grOUps in

 

55

 

GmmL

 

 

 

 

 

 

 

 

 

 

 

 

 

sm.mm mm.mm ms.mm
msouu

a mm.mm .m« we m HA.Sm «a m m A¢.mm me a

N mm.mm «a as m es.mm AA 5 H mm.mm «a m

m Aa.mm we OH H mm.mm me o A mm.mm SH m

m as.sm we a m «A.mm 4H m a Sm.sm 5H H
Mcwm mnoum z mmmHU xcmm mnoum z mmmHU Mcmm wuoom. 2 need
30“ CM 0: 30“ CM 0: 30m C“ m:

as we as

 

 

 

 

 

AN BZHSHMMQNMV ZOHBdZHZdNH BMmHm 20 mmMOUm MZ¢M QZ¢.Z¢HZ

ﬂ Hﬂmﬂﬁ

54

experiment 2. The ordering of the classes within each condi-

tion and their consequent matching and ranking within
matched triples across treatment conditions were done exactly

as described in experiment 1 earlier. The Friedman test was

also applied to test the significance of the sum Of ranks in

the "Row Rank" column. The groups were not statistically

different (X2 = 4.50; X2 =
r r.05(2)

As in experiment 1 the hypotheses were that

5.99).

(i) the mean for T1 is greater than the mean for T3
(ii) the mean for T2 is greater than the mean for T1
(iii) the mean for T2 is greater than the mean for T3.
Table 5 below presents the data for testing these hy-
potheses. But it is clearly evident that there is no need to
apply any statistical tests: the group means are almost
identical, and the figures in the "Row Rank" column show a
pattern contrary to the hypotheses.

The mean class scores on the final examination are shown

in Table 5 below.

TABLE 5

MEAN AND RANK SCORES ON FINAL EXAMINATION (EXPERIMENT 2)

 

 

 

 

 

 

 

 

T1 T2 T3
Mean Row Mean Row Mean Row
Class Score Rank Class Score Rank Class Score Rank
1 46.24 2 5 45.79 5 9 46.65 1
2 44.95 5 6 45.50 1 10 44.94 2
5 47.45 1 7 45.71 2 11 44.50 5
4 44.74 2 8 45.77 5 12 45.67 1
Group
ean 45.84 45.19 45.44

 

 

 

 

 

 

 

 

55

The pattern shown in the above figures is contrary to
the hypothesis. The differences between treatment conditions
were, however, not significant (x: 05(2)= 5.99). The risk of

rejecting the null hypothesis in this case would be as high

as 95 per cent (Siegel, Table N).

The Hypotheses on Attitudes

The third major hypothesis of the study was that the
attitudes of the experimental groups would be more favorable
towards examinations and grading than the attitudes of the
control group. In View of the breakdown of the scales
described in the Appendix A, and in view of the position taken
there of the nature of attitude this hypothesis will be sub-
divided and examined in parts and with reference to the
attitude "factors“. These sub-hypotheses are:

1) that each of the groups T1 and T2 would score higher
on the "learning function” factors (EP and GP) than
group T3.

2) that each of the groups T; and T2 would score hi her
on the "motivating function" factors (EP and GP)
than group T3.

5) that each of the groups T; and T2 would Score lower
on the "Dys function" factors (EN and GN) than
group T30

4) that each of the groups T1 and T2 would score lower
on the Pressure-Anxiety factors (EN and GN) than
group T3.

The first two of these sub-hypotheses remecho the parent

hypothesis,.and also Specify the crucial attitude "anchors."

The other two say the same things indirectly, since "lower”

Placement on the "negative" dimension is a ”more favorable“

56

position, relatively. Table 6 presents the mean scores on

the factor scales. The measure is the same in both experi-
ments, hence the results are reported under each treatment

condition, with the groups in the two experiments combined.

TABLE 6

MEAN GROUP SCORES ON THE ATTITUDE FACTOR SUB-SCALES

 

 

 

 

 

 

 

 

 

 

My “
Scale Factor T1 (N=265) T2 (c=219) T3 (N=245)
Examination Satisfac-
tion 1.62 1.61 1.62
EP Learning Function 2.19 2.55 2.25
Motivating Function 2.11 2.57 2.15
:Examination-type 2.92 2.79 2.97
EN Dysfunction 5.04 2.84 2.96
PressureeAnxiety 5.66 5.55 5.57
Hate 5.06 2.80 5.05
Learning Function 2.27 2.40 2.50
GP Motivating Function 2.69 2.88 2.66
Measuring Function 2.62 2.76 2.57
Dysfunction 5.24 5.51y 5.25
Pressure—Anxiety 5.71 5.74 5.75
GN Hate 2.71 2.59 2.67
Non—learning 5.27 5.00 5.25
Non-measuring 5.56 5.41 5.41

 

The absolute scores shown on the above table are SO close

that they may not be significantly different; but ranking

across treatment conditions (1 for the highest) produces the

following pattern of rank scores for the crucial factors

specified in the sub—hypotheses:

 

57

 

 

 

T1 T2 T3
EP Learning Function 5 1 2
Motivating Function 5 1 2
GP Learning Function 5 1 2
Motivating Function 2 1 5
EN Dysfunction 1 5 2
Pressure—Anxiety 1 5 2
GN Dysfunction 2 1 5
Pressure-Anxiety 5 2 1

 

 

 

 

 

 

When T1 and T3 are compared the trend Shows T1 scoring
lower on the EP and GP factors and higher on the EN factors.
This is contrary to eXpectation. On the other hand when T2
and T3 are compared T2 scored higher on the EP and GP factors
and lower on the EN factors. This fact tends to support the
hypothesis. The pattern for the GM factors is not consistent.

The resulusabove consider the means of the groups. .The
extreme scores throw further light on the relative positions
of these groups on the attitude factors. The percents of
the group choosing each point on the Likert Scale are given
in Appendix C. An extract from that Table gives the following
picture. On the learning function factor the percents of
respondents choosing point 4 and 5 were 17, 15 and 14 for T1,
T2 and T3, respectively. It would be expected that more
students in T1 than in T3 should be "high" on this factor.

The trend is in line with this expectation. On the other hand

comparison between T2 and T3 shows a contrary trend.

58

The trend is consistently in line with expectation when
the groups are compared on the motivating factor. The cor-
responding percentages are 18.5, 16.5, and 14.5 for T3, T3
and T3 respectively.

The dysfunction factor responses revealed the same pat—
tern as the learning function factor. T1 was lower than T3
as would be expected; but T2 was higher than T3—-against
expectation. The reSpective figures are 27, 54 and 50. On
the PressureeAnxiety factor the trend falls in line with the
expectation. The values are 51, 51 and 55 respectively.
Table 7 converts the percentages given here into rank scores,
and thus makes it easy to comprehend the relative positions
at the "high" extreme end of the scale factors.

On the Grading Scales the trend was consistently in the

Opposite direction as illustrated by the following percentage

figures:

T1 T3 T3
Learning function 17 18 18
Motivating function 51 50 55
Dysfunction 48 44 51
Pressure-Anxiety 65 67 65

These percentages are also converted to rank scores in Table

7 on the following page.

59

 

 

 

 

 

 

 

m H N mumecmrwnsmmem
m m H coAuocsmmsn
uzw
MHmmnuOmmn may Ow MHMHHSOU H m N SOHuucsm mcHum>Huoz
maamnmcmm tam ucmuMHmsooaH nuOQ mum
muouomm 20 can mo m£u How msumuumm was m.H m.H m coHuUSSm mSHSHmmA
“me
EHmwnuom%£
may nuH3 USTpMHmSOO MH cumuwmm One H m.N m.N. wumeSNrmnSmmmHm
mnmuucou MH NB "OSHA EH MH as N H m coHuUCSmme
”2m
mHmmnuomms
mzu anB uSTLMHmSOU mH cumuumm TAB m N H SOHUOSDN msHum>HuoE
humuucoo
MH NE MHmmnuomma nuHB TGHH SH EH H8 N m H SOHpocsm mGHSHmmA
"mi.
mHmwnuOmhm Tau Ou mSHumamm mucwEEOU OB we as mwamomlﬂsm Houom

 

 

 

mMA<UmIMOBU¢h MEB m0 QZN HZWMBNH mem mmB Ed mZOHBHmOm mMOUmIMZNM

h mﬂméﬁ

CHAPTER IV

DISCUSSION

Interpretation of the Results

This chapter is an overview of the results of the last
one with specific reference to the hypotheses of the study.
To what extent did these facts agree with the hypotheses,
and what were the limitations? The chapter also examines
some implications of the study, presents suggestions for
further research and draws some tentative conclusions based

on the present study.

The Achievement Hypotheses and the Statistical Tests

 

As reported in the last chapter the overall treatment
effect was tested by the use of the Friedman X§-—a nonpara-
metric method. The first showed that in eXperiment 1 there
was a significant difference among the treatment conditions
at the hypothesized level of five percent. But the second
did not confirm such a finding.

Moreover the absolute means for the treatment groups
were 55.24, 55.98 and 52.59 for T1, T3 and T3 respectively.
These are so close that the evidence for a treatment effect

was very weak indeed; but the trend was in the predicted

40

41

direction. It should be remembered that the "control" group
was not properly controlled as its members also had the
Opportunity to keep their examination test scripts. In
these circumstances some, at least, of the members might
have used the previous examinations as study guides.

Another Objection to a possible conclusion that there
was a treatment effect was the fact that the results Of experi—
ment 2 did not warrant such a conclusion. But the conditions
in the course used in this eXperiment was rather peculiar.
There was an open expression of "concern" by members of the
control group that the other groups were being placed on an
advantage by being required to repeat the examinations out-
side class. This concern "worried" the Course Coordinator.
Once again this was admittedly the problem of Control; it was
weak. But the existence of this concern Openly would lead
one to suppose that some members of the Control group were
making use of the previous examinations as study guide.

Furthermore in T2--the group whose members both repeated
and evaluated their own performances--the treatment was mis-
applied in the first examination. The instructions were
interpreted as if this was the so-called "level of aspiration"
experiment. As was emphasized earlier (see chapter 2) this

Ves not the case. The result of this misapplication was that
discrepancies between the students' self-evaluations and the
instructors' turned out to be so wide as to provide another

source of concern, this time for one of the experimental

42

groups. The effect on members interest and attitude towards
the whole exercise cannot be determined; it may be it is re—
flected in the low mean scores.

Granted that these observations were the facts Operat—
ing within experiment 2, its results may not be taken as a
reliable evidence against the existence of an overall treat-
ment effect. But the results of experiment 1 were in the
predicted direction. The nonparametric test was significant
at the hypothesized level and the covariance analysis showed
the risk of rejecting the null hypothesis not higher than
ten percent. Even so no definite conclusion could be made
on the overall effect of treatment if the two experiments
were considered together: their results were conflicting.

It is necessary to elaborate the statement just made
in the context of the two relevant hypotheses. As pointed
out in the previous chapter the mean achievement score for
T1 (the group that merely repeated the examinations) was not
significantly larger than the mean for T3 (the "control")
as was hypothesized. The point has already been made that
both groups might in effect have been experiencing the treat—
ment.

The second hypothesis stipulated that the mean for T2
would exceed each of the means for T1 and T3. The results
for experiment 1 and the t-test analysis Show that this
hypothesis was not supported when T2 and T1 were compared;

ibut between T2 and T3 the difference was significant at the

45

five percent level. The covariance analysis in this case
showed a risk considerably less than ten percent if the null
hypothesis were rejected. Undoubtedly T2 was the source of
the weak tendency towards the predicted direction evident in
experiment 1. The obvious question is: how was this treat-
ment supposed to work?

In the first place the treatment which members of T2
experienced would stimulate much effort if it would make the
students spend extra hours of work to find out the correct
answers. According to Jersild (9p, gi;,) examinations stimu-
late "the industry Of the learner." It may be argued that
in such a case it was the extra hours of work that led to
increased learning and increased achievement, and not the
repetition and self-evaluations as such. NO one would deny
that sheer effort as eXpressed in the extra hours of work
is one of the significant factors explaining these results.
But in the circumstances of this study work effort was a
secondary factor, and brought about by the treatment condi-
tion-—a specific requirement to exert the effort. It would
then follow that the primary factor was the imposed treat—
ment condition. Viewed this way, work effort is not a
contaminating factor but a necessary secondary factor serving
as the medium through which the primary factor operated.

The second element in the treatment was that the group
had Opportunity to use the test items as a study guide.

.Assuming that the examinations were made up of valid test

44

items, then they would be of immense value to guide the
learner to the types of skills considered essential. They
would also be of immense value in teaching the learner
"test-wiseness" with respect to the language of test items
and other test characteristics. An "examination set" to
quote Meyer (1956) is thus developed and this would influence
the learners methods of study.

A fourth means by which the treatment was supposed to
produce the effect was the double practice session which the
exercise brought about. Stone (1955) argues that the mere
taking Of an examination provides a "practice session" which
aids "the fixation of correct responses . . . and the elimi—
nation of error." If this be granted, then the treatment as
defined here would provide a reinforced practice session and
would tend to promote learning on that count, p§£_§e, Over—
learning may not be detrimental to learning.

Another element in the treatment was students' self~
evaluation. Here again if one accepts Stones peculiar defi-
nition of “reinforcement" as "the fixation of correct and
elimination of erroneous information," then self—evaluation
would be a sort of "reinforcement." Moreover, it would tend
to aid the development of critical ability which was further
sharpened when the student discussed his "discrepancies"
with the instructor. The discussion of the discrepancies and
their resolution would further lead the student to perceive
the meaning and prOper function of examinations and thus help

to motivate him.

45

These modes of Operation of the treatment may be re-
iterated for emphasis. It would stimulate extra work effort,
serve as study guide to direct the learner, be valuable as
an indirect way of teaching test-wiseness and develOping a
useful examination set. The additional practice session
would provide a necessary and not a superfluous reinforcement.
Add to these the advantages of self-evaluation which may in-
clude the develOpment of critical ability besides the rein—
forcement Opportunities it would provide. If these and other
claims existed they would produce the type of effect which
was evident, though very weak, in the T2 condition in experi-
ment 1.

The second tentative conclusion of this study therefore
re-echos parts of the second major hypothesis. It is this:
if a classroom achievement examination is administered under
the conditions defined for T2 of this study, if, that is,
the student is given the Opportunity to keep the test script,
to rework it outside class, and to evaluate his performances
there would be a tendency for his learning (and achievement)
to rise; it would be raised higher than what it would have
been if such conditions did not exist along with other de—
vices in the learning situation.

Admittedly this treatment requires expended effort, but
it is guided effort in the right direction. In fact some stu-
dents spend much more effort than necessary and end with very

little or no gain because they head in the wrong direction.

46

The Treatment Effect on Attitudes

The results of the attitude measures are anything but
definitive with respect to the hypothesis that the experi-
mental group subjects would show a more favorable attitude
towards examinations and grading than members of the control
group. As Tables 6 and 7 Show the positions of the groups
did not always agree with the hypothesis. .Moreover, the
results were in most cases not consistent when the basis for
comparison was the group mean score, or when it was the per-
centage of high scorers on the factor scales. NO conclu-
sions could be made from such haphazard results.

What were the limitations in this case? The anchors of
attitude for examination and grading are not located within
the confines of one particular course, but within the total
environment of the University setting. If so, it would re—
quire a longer period Of treatment within a wider range of
situations in the University for the treatment "to work,"
assuming that it was potentially effective. The point to be
emphasized is that no conclusions could be made on the
"effects" of the treatment on attitudes; nor could any be
made on the alternative position of "no-treatment effect."

It may be added that there is some evidence of construct
validity of the scales from the results as shown in Tables
6 and 7. In line with the position taken of the nature Of
attitudes towards examinations and grading it would be ex—

pected that if the student is "high" on the learning function

47

and motivating function factors then he should be low rela—
tively on the dysfunction and pressure-anxiety factors.

It is interesting to note that whether the basis Of compari-
son is the group mean scores or the percentage of ”high"
extreme scores, the pattern of rank scores consistently tend
to support this position. The only conclusion that may be
drawn from those results is that they did tap the attitudes

as defined in this study, despite the limitations of "response
sets." These are supposed tendencies of subjects to respond
to test items in some stereotyped manner; for example, a
subject may reveal a set to respond "true" in a true—false
test, another may Show a tendency to prefer the middle posi—
tion on a rating scale. The effects of such set tendencies
could not be determined in the present study; however, attempts
were made to minimize them in the peculiar way in which the

Likert points were defined.

Implications of the Study

This was supposed to be a classroom experiment. How
feasible may the "treatment" conditions described here be
applied in practice in a normal classroom situation? This
question may be broken into two parts—-from the point Of view
of both the student and the teacher. The answers essayed
will hOpefully clear some of the doubts which may exist con—
cerning the feasibility of the treatment. Moreover, the
problems discussed here will help to bring together what
further studies may be necessary to supply additional evi-

dence on the present issue.

48

Questions and Answers For the Student-Critic

It may be asked on behalf of the student: what other
values-—apart from the learning function--may accrue from
the treatment? The self—evaluation treatment may combat the

tendency to guess wildly; it may also emphasize objectivity

 

in the develOpment of the so—called "self—concept." An
incident that occurred during the study illustrates the lat-
ter point. A student broke down in tears because of the wide
discrepancy depicted by her graph of "self—evaluations" as
compared to the instructors evaluations. As mentioned
earlier this aspect of the treatment had to be deleted to
save the students (and instructors) from further embarrass-
ments. It is the writers View that such incidents could be
utilized to emphasize the need to be Objective, to be realis-
tic in making "self-evaluations." In other words, the treat-
ment potentially has a "mental hygiene" value, and this may
be exploited in an actual classroom situation.

Could the time involved in carrying out the "treatment"
not be more profitably used in extra reading and other assign-
ments to widen the students knowledge in the course? This
is a very important question, with an important omission:
the treatment is in effect an extra reading assignment. Any
student who has ever done a "take-home" examination knows
what extra reading he has to do on his own to produce an answer.
The "treatment" specifically asks the student "to make use of

all resources, excluding instructors and fellow students."

49

The incidental learning, to a student who experiences such a
"treatment" may be as profitable as, if not more than the
learning from extra reading assigned by the teacher, without
the treatment. This statement assumes that the examination

is made up Of good valid test items with reference to specific
course Objectives.

The critic may still ask: will the student be able to
cope with the amount of work involved if the "treatment" were
applied in all the courses he takes? There is no simple
answer to this question. It will depend upon the institution's
educational Objectives and policies for students' course loads;
it will also depend on the capacities of individual students.
Whatever the case it would not be too difficult for the stu-
dent tO adjust his course load, should the treatment be adOpted
as a general practice. Of course no such adoption is warranted

by the weak evidence provided by this study alone.

Questions and Answers For the Teacher-Critic

There are at least two problems of serious concern to
the teacher. He may wish to know what time involvement this
treatment would demand of him. Secondly, he may be uneasy
about having to lose good items in the process.

The second treatment condition even in its deleted form
required the instructors to discuss students "discrepancies"
individually, as they were brought up by each student. NO
estimates Of the actual times involved could be made as these

would vary widely according to the individual problems.

50

But it would certainly be "a lot of time"--to quote one
instructor, if the class is large. In eXperiment 1 each
instructor was in charge of at least sixty students. It
must be remembered that these instructors were themselves
students; one can then appreciate why §9m§_of them felt that
the treatment was too demanding of their time. The burden
may not be felt so much in a class about half that size, and
in charge of full-time teachers. In any case, the burden of
extra time involvement has to be weighed against the poten-
tial benefits of individualized instruction which this treat—
ment also fosters.

The last remark also applies to the teacher's concern
over loss Of items. The crux of this study is on methods
to increase the learning value of test items. Test items
have learning value only if they are good valid items, and
if they are, then there may not be a serious loss if the
"treatment" would help to bring about maximum learning, which
is the goal of instruction.

Another point is that course requirements are not static
in a dynamic educational system. Nor are successive groups
of students identical though they may be assumed to be from
the same pOpulation of students. In view of these considera-
tions a teacher may not use old test items without remodeling
them as it were.

A study conducted by Ebel (1968) and to be published soon
is relevant on this issue. The author was investigating the

relative effectiveness of "new" and "old" test items as

51

assessment tools in achievement testing. The "new" items
were newly constructed but the “old" ones were taken directly
from the authors supplementary text to his book (Ebel, 1965).
Students had previously been informed that some test items
would be drawn from the supplement. The results of this
study showed that the "new" and "old" items were quite compar-
able in discrimination index as well as difficulty values.
These values however tended to be lower for the "Old" items,
but the correlation between performance on both types of
items was positive and high-—.64 in one case and .51 in
another. The author concluded: "it is feasible to use some
of the "old" items to measure achievement." Should such re-
sults be collaborated by replicated studies the teacher need
not feel too concerned over loss of items. He can still keep
files of old test items and re-use "sgmgﬂ of them--remodeling

them in line with the dynamics of his subject.

Suggestions for Further Study

The conclusions of this study would still have been ten-
tative--even if the results had been significant at the .0001
level. One of the main Objectives of the discussion hitherto
is to present evidence to show that the "treatment" is worth
reconsideration. The practical problems involved are SO
slight that one cannot help suggesting and urging that this
study be repeated in many and varied situations. Of course,
in such replications the weaknesses of the present one must

be avoided.

52

Two of these weaknesses deserve special mention. The
first is on teacher-experimenter involvement and the second
on the "control" group. For reasons stated earlier contact
with instructors was forced tova minimum at the early part
of the study. If this is to be a co-operative study by
teachers, a study in which they are seeking ways to derive
maximum possible value from achievement testing in the class-
room, then they need to be told in advance the details and
purposes of the study. This will hOpefully get them involved.
The "treatment“ is believed to be such that it is experienced
individually by the student and there may not be much con-
tamination of results if teachers know about all the other
levels of the treatment, provided they do not require any
more Of their student subjects than their assigned treatment.
In a declared co—operative study the teachers will satisfy
this proviso.

The other main weakness was to be found in the control
group. It would be absolutely necessary to ensure a prOper
control group whose members are not exposed to any part of
the "treatment." The writer believes this was the main source
Of contamination in the present study. To get over this prob—
lem it is suggested the eXperimental "treatment" conditions
may be carried out in one term (or semester) and the control
condition in a subsequent one. The problems of group equi-
valence would not be insurmountable. In such a case the
course requirements, teachers and examinations used would

\

remain the same for both groups.

55

Furthermore, it is suggested that many Departments of
the University may be included in the study. In fact, at a
later stage different Universities may be incorporated, pro-
vided they use the examination technique and base their
grading decisions largely on it. It sounds an enormous ven-
ture. But if the present one may be considered as a pilot
study the experience is that it is quite feasible. Each
teacher within each department within each University will be
conducting his own study, if all concerned agree to take part
in the co-operative study. The reports of each experiment
would be submitted to a co-ordinator who will extract the
relevant data for a final analysis. The problem of different
criterion examinations would be easily handled by following
the lead of Page (1958) and converting raw scores to rank-
scores and using appropriate non-parametric tests as illus—
trated in this study. Such an encompassing study would take
one academic year to finish. It would be worth it.

The latter suggestion about running the study in a wider
and more varied setting underlies one other limitation of
the study. The setting was too narrow and the period of time
too short to allow the ”treatment" a fair chance to work on
students' attitudes. The extended design will meet this
limitation.

If this be taken as a pilot study the experience is that
it is largely feasible and that the implications from the

students' as well as the teachers' points of view can be

54

accommodated. If this is so, the study is worth repeating
so that additional evidence may be provided on the issues

involved.

CHAPTER V

SUMMARY

The purpose of this last chapter is not merely to sum-
marize what has been reported in earlier chapters; it is more
or less an attempt to tie the loose bits together. There is
a review chapter by chapter, and a summary of the tentative
conclusions reached, as well as the hypotheses suggested for
further investigation. The chapter closes by outlining the
ways in which the study may be considered a contribution to

knowledge.

Chapter by Chapter Review

Chapter 1 introduces the problem in question form: how
may achievement examinations be improved in use so as to
better define and attain the objectives of classroom instruc-
tion? This problem is the conclusion of an analysis. Examina—
tions perform various useful functions. -Nevertheless there
is some dissatisfaction, however slight, among the general
public and among educators and students in particular.
Banesh Hoffmann (22, gl;,) would attribute this largely to the

poor quality of multiple-choice test items. While not denying

55

56

Hoffmann's charges an Opinion is eXpressed that in the case

Of classroom achievement examinations the chief cause of
dissatisfaction, where it exists, may be the fact that some
teachers tend to over-emphasize the assessment function.

This is an Opinion, to what extent it is tenable may be judged
by surveying students' attitudes toward examinations, and in
particular discovering their perceptions of the functions of
examinations. The dissatisfaction and the assumed major

cause beg the question stated earlier as the problem for this
study.

It is suggested that the problem may be solved by empha-
sizing the learning function Of classroom examinations. One
way to achieve this end would be to require the student to
take the examination twice, first in class, and second outside
class; in addition he may be required to evaluate his per—
formances before having any feedback from his teacher. The
study is therefore concerned with testing the assumption
and the suggested solution.

The suggested solution also rests on the literature
evidence--that examinations are a learning device. The reader
is therefore introduced to a few of the previous studies
which arrived at the conclusion that examinations perform a
learning function. Jersild found this so in all but one of
his replicated eXperiments. Standlee's investigations of the
contribution of quizzes to learning produced results which

were clearly in the predicted direction though the statistical

57

tests revealed no significant difference among the treatment
conditions. In a similar study by Fitch (g£_al,) it was
concluded that frequent testing resulted in "superior achieve—
ment." Page-(op, gi£,) who carried out his studies in a
secondary school setting was interested in the "knowledge Of
results" aspect of examinations, and in particular on the
effect of "teachers comments" on one examination on perform-
ance in subsequent examinations. His conclusion is worth
quoting again: "when the average secondary teacher takes the
time and trouble to write comments . . . these apparently
have a measurable and potent effect upon student effort . . .
or whatever it is which causes learning to improve." Such
is the evidence on which the suggested solution to the stated
problem is based.

The purposes of this study may be re-stated in different
words, these were:

1) to test some methods which were supposed to be likely
to increase the learning value of examinations;

2) to survey and describe the attitudes towards examinav
tions and grading of those particular students in
which the methods were tested, and

5) to develop a scale battery for the purpose Of carry-
ing out the survey referred to in (2).

The hypotheses were that each of the experimental groups
would excel the control group in achievement of course Ob—
jectives and that one of the experimental groups--the one that
received double treatment, would excel all other groups in
the said achievement as measured by the final examinations.

It was further hypothesized that the "experimental" groups

58

would show a more favorable attitude towards examinations
and grading than do the Control group.

The second chapter deals with the methods of investiga-
tion. The three treatment conditions described correspond
to the means suggested for emphasizing the learning function
of examinations. The criterion measures were of two types——
the course end examinations and an attitude scale Specially
developed for the study. Other chief topics described in—
clude the population and the sample used, and the general
procedure. Subjects were sampled from two courses in the
College of Education, Michigan State University. They carried
out the treatment exercises by following appropriate instruc—
tions given to them by fheir instructors.

The results are presented in chapter III. To investigate
the hypotheses of the study the treatment effect was first
tested by the Friedman Xi. In experiment 1 the effect was
significant at the hypothesized level of five percent.

A second test of the same effect by the Covariance method was
not significant. The means for T1 and T3 were in the pre—
dicted direction; so were the means for T2 and T1. But none
of these were significantly different. However, the mean

for T2 was significantly larger than the one for T3 at the
hypothesized level of five percent, and the Covariance analy-
sis showed a risk less than ten percent for rejecting the
null hypothesis. NO treatment effects were found in experi-
ment 2. In both experiments the results on the attitude

measures were not always consistent with the hypothesis.

59

Chapter 4 discusses these results. The Friedman x:
test revealed a significant treatment effect on achievement
while the t-test comparisons located the source in T3. This
applied to experiment 1 only. The discussion brings out some
of the limitations that might have eclipsed the effects of
treatment, if any, when T1 and T3 were compared in experiment
1, and when all groups were compared in experiment 2. The
chief of these limitations was weakness of control: All groups
had the Opportunity to use the previous examinations as study
guide. Another possible source was the fact that the treat-
ment was misapplied in experiment 2 at the beginning. The
chapter goes on to explain the modi Operandi of the T3 condi—
tion. It stimulated extra effort; it served as a study guide;
it taught test—wiseness and develOped an examination set; it
provided a reinforcing practice session. Such conditions
would tend to increase students' achievement; but such tend—
ency was evident in experiment 1 only.

NO definite conclusions could be made on the effect of
treatment on achievement of course objectives when the two
experiments were viewed together. However, there was in ex-
periment 1 a weak evidence that the second treatment condition
tended to increase learning and achievement. It must be
emphasized that such conclusions are highly tentative, based,
as they are on a weak evidence.

The results of the Attitude measures were not definitive.
The chapter points out that this may be accounted for by the
total University environment. The attributes of the attitude

objects are not concentrated within one course. "If so,” o

60

quote from that chapter, "it would require a longer period
Of treatment within a wider range of situations in the Uni-
versity for the treatment to work, assuming that it was
potentially effective."

The Observation is made, however, that there is some
evidence of construct validity of the scale battery. It would
be expected that students "high" on the learning and motivat—
ing function factors would be low, relatively, on the dys—
function and pressure-anxiety factors. This is precisely
what the results indicated.

Questions are raised on behalf of the student. .Does the
treatment provide any benefits other than the learning func-
tion? It would appear it does, if applied in the manner
prescribed. The self-evaluation exercise may help in the
Objective develOpment Of self-concept and potentially it may
have a "mental hygiene" value which could be exploited in a
classroom situation. But this is an Opinion founded on meagre
evidence, though testable in a properly controlled study.
Could the time involved in carrying out the treatment not be
more profitably used in extra reading and other assignments?
Perhaps not; the "treatment" specifically asks the student to
make use of "all resources, excluding instructors and fellow
students." The student is guided, yet is left free to search
independently and the gains derived may be as much as, if not
more than what he would have had from extra assignment by the
teacher, without the treatment. .On the question Of student

load it is assumed that the majority of students will learn

61

to adjust their own load should such conditions as described
here be adOpted across the University.

The teacher critic may also have some legitimate prob-
lems. Time is limited for him to discuss discrepancies of
self-evaluations with students; time is limited for him to
write comments on students examination scripts. There is no
dispute of the fact that much time is involved, especially
where classes are large and where part-time teachers are in
charge. However, this burden has to be weighed against the
potential benefits of individualized instruction. The teacher
is also concerned about the "loss" of test items. But if the
test items are good valid items it seems doubtful that there
is any "loss" in view of the teacher's professional role.

In any case, he may be consoled by such research evidence as
provided by Ebel: "old" items may be remodelled and re-used
as valid test items even when they are exposed.

Table 6 shows that on the average students perceive grad-
ing more as a motivating factor than a learning function
factor; and this pattern is consistent across treatment con—
ditions. This is an interesting and useful finding for the
group in view Of the fact that most of them "hate" grading
as that table also shows.

The last section of the chapter considers the need for
further study. The evidence of this one does EQ£_establish
a case, but it has demonstrated a trend worth investigating.
But the would-be investigator is reminded of the limitations

of this study: in particular the control was weak, and the

62

instructors were not sufficiently involved. With these and
other limitations removed the study may be designed to last

at least two terms (or semesters) and to include other depart-
ments in the University, and possibly other Universities.
Furthermore, another criterion measure is suggested: this
might be a measure of retention after one term (or semester)
has elapsed following treatment.

Hitherto the review has focused on the main study. The
subsidiary one on attitude scale develOpment will now receive
some attention. Its significance stems from the fact that
it provided the measure for testing one of the main hypothe-
ses. Obviously, if the scale that resulted is not valid
little meaning, if any, can be attached to the section in
which the scale is used.

The scale was assumed valid before it was put into use,
and there were reasons for taking this position. As discussed
in Appendix A, it is based on a defined concept of attitude.
The multidimensional model provided is the outcome Of a
reasoned criticism of the traditional linear model. The steps
by which the attitude statements were constructed followed
closely the prOposed model. According to Thurstone, attitude
can be inferred from opinion statements. Such statements were
Obtained from a small sample of students and content analyzed.
The categories devised were in line with specific attributes
Of the Objects and the two dimensions of attitude defined in
the model. The significant statements derived from the

analysis formed the nuclei of statements which were rated by

65

a group of ten judges. Statements selected were fairly homo—
geneous within the four original scales. The responses
obtained from a representative sample of students were factor
analyzed. The factor output confirmed and clarified the
original categories into which the statements were grouped.
There was therefore a strong beginning evidence that the
items were measuring some homogeneous trait, the limitations
Of "response-sets" notwithstanding. The statements selected
had loadings of .40 or greater in the various "factors" re-
vealed. It is relevant to Observe that no assumption is made
that the factor analysis results are enough to demonstrate
the validity of the scale; but they are necessary in a pro-
gram of construct validation of which this scale provides a
type.

The theoretical model that forms the basis for the scale
is hypothesis--generating, and this would be another direc-
tion in which the validity of this scale may be investigated.
AS mentioned earlier it may be hypothesized that individuals
who score high on the learning function factors would score
low on the pressure—anxiety factors. The results of the main
study would tend to support such an hypothesis. In short,
the methods by which the scale was develOped, the factor
analysis results and the pattern of responses when the scale
was first put to use-~all these are evidences that the scale

is fairly valid.

64

Contributions to Knowledge

Now that the whole study has been reviewed it might be
asked: what contributions does it make to the field of edu~
cation? What contributions does it make to the general
field of knowledge? This is a legitimate question; but it
is the reader that must have the last say; he must judge
whether this study, in parts or in whole is a contribution to
knowledge: that is his prerogative.

Even so, the writer has as a duty to set down specifical-
ly what he considers to be the areas in which the study may
be considered an extension of knowledge. The attitude scale
comes first to mind. In these pages has been presented a
scale battery to measure student's attitude towards such
crucial issues in the educational curriculum as classroom
achievement testing and grading. No educator will doubt the
impact of these aspects of the curriculum in the learning
situation. The argument pressed here is that attitudes are
determined by the attributes of the psychological Object.

If, and when the validity of this scale is attested beyond
all reasonable doubt the teacher may use it as a barometer

to check the quality of his examinations and grading policies
as reflected by the perceptions of his students. Such use
will benefit education if it should by any way subscribe to
the fulfillment of its objects.

In the general field of knowledge the theoretical model

of attitude may be considered an extension. The existing

65

attitude scales eXpressly consider attitude as bipolar. The
view taken here is that in the domain of attitude "good“ may
not be the Opposite of “bad“; they may very well be on corre-
lated dimensions: the relationship is inverse, but not per—
fectly so. Furthermore, the view expressed by other critics
of the theory of attitude is accepted that attitude is multi-
dimensional in character. The present study goes further

to locate these dimensions in the attributes of the psycho-
logical Objects, and these are incorporated into the theoreti-
cal model Of attitude--incorporated, that is, in a way that
is of practical value. As has been illustrated in the
present scale the model provides a scheme for writing atti-
tude statements and the resulting scale does spell out as it
were areas in which the attitude Object may be manipulated
should one wish to effect attitude change. If, and when this
approach is tried by other investigators and found to work,
the model would then indicate its claim to be a contribution
to knowledge.

There is nothing new if the main study merely harps upon
two of the roles Of the teacher-educator--as a promoter of
learning and as an evaluator of learning. Apparently, these
roles are conflicting. In fact, there is literature evidence
(Fitch, 99, gig.) suggesting that each is best served when
divorced from the other. This study would reject such a sug-
gestion. Instead, it argues that the apparent conflict may
be resolved if the teacher—educator puts emphasis on the

learning function of classroom examinations. Let us put aside

66

for the moment any question on how the emphasis is to be
laid. The prescription: "emphasize the learning function

of classroom achievement examinations" is brief, simple, and,
one may add, easy to apply. Should the weak evidence pro—
vided by this study be collaborated and strengthened by

other investigators such that educators will deliberately
seek to derive this potential value of examinations, then it
would be a little contribution to have drawn the attention

of educators to a common sense and practical way of resolving
an apparent conflict in their roles.

Some methods have also been provided to translate the
said emphasis into practice. As a matter of fact, there is
nothing new or original in the methods. But they are pre-
sented as a package deal, and in a way is a unique combina-
tion; the incorporation of students'.self-evaluations deserves
Special mention in this respect. Above all, the study has pro—
vided an evidence in one case that the treatment would tend
to increase achievement; other evidences are necessary to
throw more light on the issues involved. Here again the
methods may be considered a contribution to the practical
problems that face the classroom teacher, if and only if,
when tested in a variety of situations, the results Should

overwhelmingly point to the desired direction.

Summary of Tentative Conclusions

1. The repetition of examination performance in non

"examination conditions" would tend to increase students'

67

achievement of course Objectives, provided the examination
is made up of good valid test items.

2. The requirement that students evaluate their examina-/
tion performance before receiving the teachers' feedback
would tend to increase their achievement of course objectives,
provided the examination is made up of good valid items.
These tentative conclusions are in essence hypotheses for in-
vestigation, since they are based on weak evidence. In any
case they are made with particular reference to the type of
situations as described in experiment 1.

5. It may be concluded from the results of the attitude
survey that most students in the population studied perceive

grading as performing some useful function: it motivates

learning.

Summary of Testable Hypotheses

1. There is an inverse relationship between the attitude
responses on the learning function factors on the one hand and
on the dysfunction and pressure-anxiety factors on the other.

2. There is a direct relationship between students' per-
ception of the learning function of examinations and their
achievement of course objectives as measured by examinations.

5. If students are required to evaluate their examina-
tion performances before having their instructors' feedback
then it would follow that:

a) their attitudes would tend to be more positive than

negative towards examinations and grading,

68

b) they would develop a more positive than negative

self-concept.

Summary of Conditional Contributions

1. The field of education is presented a scale for
measuring students attitudes on such crucial aspects of the
curriculum as examinations and grading.

2. The results of the scale may be used as basis for
effecting attitude change in a desired direction-—to promote
students' learning.

5. The results of the scale may also be used by the
teacher as a barometer to check the quality of his examination
and grading policies.

4. Theoretically the multidimensional character of
attitude is defined with reference to the anchoring attributes
of the psychological Object.

5. Theoretically the evidence provided by the maiden
use of the scale does not support the traditional linear
continuum and bipolar model of attitude.

6. The multidimensional model proposed may be applied
by social scientists in the study of attitudes.

7. Lastly: the field Of education is provided with a
prescription which may resolve the apparent conflict in the
roles of the teacher as a promoter and as an evaluator of
learning. "Emphasize"--the prescription says--"emphasize

the learning function of classroom achievement examinations."

69

It must be emphasized that these contributions are con-
ditional: they are conditional on collaborating evidence

from other investigators.

Conclusion

The problem of this study is stated in question form:
How may achievement examinations be improved in use so as
to better define and attain the objectives of classroom
instruction? Tyler (Op, 913,) is voicing the same problem
in different words: "We who are concerned with the improve—
ment of education and the effectiveness of learning must
consider how to achieve the maximum good potential in test-
ing and to minimize and eliminate the bad. . . ."

This study suggests some methods that may be applied
to meet the need expressed in this quotation, and in the
problem statement. There was some evidence in one of the
experiments that the methods may lead to increase in students'
achievement of course Objectives. Probably there would also
be effects on students attitudes if the treatment is widely
applied, and is allowed sufficient long time--to work.
However, all such conclusions will remain tentative until
enough collaborating evidence is available. But suppose the

wanted evidence turns out to be contrary, then the problem

remains--unsolved, but not beyond solution.

LIST OF REFERENCES

Ebel, R. L. Measuring Educational Achievement.
Englewood Cliffs, New Jersey: Prentice-Hall, Inc.,
1965.

. Personal communications (1968).

 

Edwards, A. L. Techniques of Attitude Scale Construction.
New York: Appleton-Century Crofts, 1957.

Fishbein, M. "Attitude and the Prediction of Behavior," a
chapter in the book he edits, Attitude Theoryyand
Measurement. .New York: John Wiley and Sons, 1967.

Fitch, M. L., Drucker, A. J. and Norton, J. A. "Frequent
Testing as a Motivating Factor in Large Lecture Classes,"
J. Educ. Psychol., 42, 1951, pp. 1-20.

Kerlinger, F. N. and Kaya, B. "The Construction and Factor
Analytic Validation Of Scales to Measure Attitudes
Towards Education," Ed. and Psychol. Meas., 19, 1959,
15f.

Freeman, L. (Ed.) "The Measurement of Opinion and Attitude
in Young," Handbook of Social Psychology.

Hoffman, B. The Tyranny_of Testing. New York: Crowell-
Collier Press, 1962.

Jersild, A. T. "Examination As an Aid to Learning,"
J. Educ. Psychol., 20, 1929, pp. 602-609.

Magnusson, D. Test Theory. Reading: Addison-Wesley Publish-
ing CO., 1966.

Meyer, G. "The Effect on Recall and Recognition of the
Examination Set in Classroom Situations," J. Educ.
Psychol., 57, 1956, pp. 81-99.

MSU CISSR. Michigan State University, Technical Report
No. 54 issued by the Computer Institute for Social
Science Research. 1967.

70

71

Page, E. B. "Teacher Comments and Student Performance:
A Seventy-Four Classroom EXperiment in School Motiva-
tion," J. Educ. Psychol., 49, 1958, pp. 175-181.

Schefféz H. The Analysis of Variance. New York: Wiley,
1959.

Siegel, S. Nonparametric Statistics for the Behavioral
Sciences. New York: McGraw-Hill Book Company, Inc.,
1956.

Standlee, L. S. and POpham, W. J. "Quizzes' Contribution
to Learning," J. Educ. Psychol., 51, 1960, pp. 522-525.

Stone, G. "The Training Function of Examinations: Retest
Performance as a Function of the Amount of Critique
Information." Research Report No. AFPTRC-TN-55-8.
United States Air Force Personnel Training Research
Center, Lackland Air Force Base, San Antonio, Texas,
1955.

Tyler, R. W. "What Testing Does to Teachers and Students,"
in Anne Anastasi (Ed.) Testing Problems in Perspective,
American Council on Education, 1966.

Vernon, P. H. The Structure of Human Abilities. London:
Methuen, 1961. -

Warrington, W. G. Personal communications (1968).

Wrigley, C. Personal communications (1968).

General References

Aiken, L. R. and Dreger, R. M. "The Effect of Attitudes on
Performance in Mathematics," J. Educ. Psychol, 52,
1961, pp. 19-24.

Bateman, R. M. "The Construction and Evaluation of a Scale
to Measure Attitude Toward Any Educational Program,"
J. Educ. Res., 56, 1945, 502f.

Cronbach, L. J. "Further Evidence on Re5ponse Sets and Test
Design," Educ. and Psychol. Meas., 10, 1950. PP. 5-51.

Green, B. F. "Attitude Measurement," in Lindzey, G. (Ed.)
Handbook of Social Psychology, Cambridge, Mass.:
Addison-Wesley, 1954.

72
Thurstone, L. L. The Measurement of Values. Chicago,
Illinois: University of Chicago Press, 1959.

Wang, C. K. A. "Suggested Criteria for Writing Attitude
Statements," J. Soc. Psychol., 5, 1952, 567f.

APPENDIX A

The Students Attitudes Towards
Examinations and Grading
Scale Battery (SATEG)

 

Abstract

SECTION HEADINGS

General Introduction . . . . . . . . . . . . . . . .

The Open-ended Questionnaire . . . . . . . . . . . .

Content Analysis . . . . . . . . . . . . . . . . . .

Attitude Statements. . . . . . . . . . . . . . . . .

Statement Values . . . . . . . . . . . . . . . . . .

Preliminary Try-Out. . . . . . . . . . . . . . . . .

The Main Try-Out . . . . . . . . . . . . . . . . . .

The Factor Experiment. . . . . . . . . . . . . . . .

Selection Of Items and Presentation Of the Battery .

Test Statistics. 0 O O O O O O O O O O O O O O O O 0

Interpretation of the Scores . . . . . . . . . . . .

Implications and Conclusion. . . . . . . . . . . . .

Sub—Appendices:

a)
b)
c)

a)

e)
f)

g)
h)

i)

j)

The Open-ended Questionnaire . . . . . . . .
Scheme for the Content Analysis. . . . . . .
Instructions for Judgment and Q-sorting of
Statements . . . . . . . . . . . . . . . . .
"Judgment" On and Median Values of the Orig—
inal 65 Items. . . . . . . . . . . . . . . .
Composition of the Main Try-out Sample . . .
The Factor Patterns and Loadings . . . . . .
A Comparison Of the Varimax Factors Across
the Three Samples and the Four Scales. . . .
Scale-item and Inter-item Correlations in-
fluding Relevant Item Standard Deviations

8' . . . . . . . . . . . . . . . . . . . .
Summary of Mean Scores and Standard Devi-
ations (S) . . . . . . . . . . . . . . . . .
Varimax Rotation Analysis-—Main Try-Out Data

75

Page
74
78
84
85
87
88
92
95
96

107
111
114
119

124
150

154
156
159
140

146

148

150
151

ABSTRACT

The test battery proposed in this Appendix is an instru—
ment for measuring students' attitudes towards examinations
and grading.

Attitude itself has evolved from a unidimensional to a
multidimensional concept. However authorities are not agreed
on what its relevant dimensions are; nor has a clear attempt
been made to reflect this multidimensional character in
current Attitude Measures. Here attitude is defined as a
predispositional set of like and dislike feelings towards a
psychological object. Outwardly its dimensions are two--the
like (positive) and the dislike (negative) feelings, which
are not necessarily bipolar since they are anchored on dif—
ferent attributes of the psychological Object. A model is
presented in which the attitudinal disposition is depicted
as a basal plane. From an origin on this plane emanate two
separate vectors representing the positive and negative di—
mensions Of attitude. The growth of these is determined by
the attributes of the psychological Objects. These attributes
are called anchors, and serve to elucidate the multidimension—
al character of attitude.’ This is the theoretical basis

against which the proposed battery is to be appreciated.

74

75

Examinations refer to all written forms of classroom
achievement testing in which the results are used in making
academic decisions on students. Grading on the other hand
refers to a system of evaluation of academic performance in
which some ranking procedures are used to reflect either
the relative standing of a student as compared to his peers,
or his achievement in a defined content area.

First, free reactions of a small number of students on
these Objects were Obtained through the administration of
an Open-ended questionnaire. The returns were content
analyzed, and significant statements extracted to form the
nuclei of attitude statements. These were written to focus
on specified attributes of the objects, and classified under
two main directions--positive and negative. Ten raters then
judged and Q-sorted the statements on an eleven-point scale.
The final statements were selected on the basis of their
median values, with preference for high extreme values only.
Such statements would tend to be homogeneous and would dis-
criminate satisfactorily. The selected items were grouped
under four scales: examination-positive (EP), examination-
negative (EN), grading-positive (GP), and grading negative
(GN). Finally they were administered to a sample of 585
students. Subjects responded on a specially defined Likert-
type scale. The results were factor analyzed.

In line with the theoretical model six factors were

hypothesized. But the analysis produced eight Varimax

76

factors. Loadings on these are the only criteria used in
the final selection of items to comprise the battery. The
bulk of the test statistics and details of the methods
described are presented in the sub-appendices which form an
integral part of this paper. The body text has some of

the chief statistics: for example the K-Rgo reliability
coefficients are reported as .798, .791,.812 and .746 for
each of the four original scales respectively.

An illustration is also provided for interpreting the
scores as "low" or "high" on the reference factors. This
is based largely on the mean item responses for each factor
sub-scale. A profile of the samples attitudes is also
presented. Generally students are "low" on the positive
learning function factors, and relatively “high" on the
pressure-anxiety factors. This fact would suggest a general
hypothesis that the higher a student is on the learning
function factors the lower he would be on the pressure—
anxiety factor and consequently the higher the amount and
quality of learning he would attain. The testing of such
an hypothesis may be incorporated into a program of construct
validation of this scale battery.

This study provides evidence which tends to support
the model and the way it defines the multidimensional char-
acter Of Attitude. An instrument like this may be used for
attitude survey purposes. It is also Of diagnostic value

and suggestive of ways of effecting attitude change.

77

However the battery would remain a research instrument—m

till further evidence is available from other investigators.

GENERAL INTRODUCTION

The concept of attitude has undergone some evolution.
Formerly it was unidimensional; now this View will not
satisfy all. There are some authorities who consider atti-
tude as made up of several components, including the affec-
tive, the cognitive and the conative. Fishbein (1967) does
not favor this type of approach however; rather he would
emphasize the affective aspect as the essence of attitude
and the cognitive and conative as its "determinants" or
"consequences." Thus like Edwards (1957) he would prefer
Thurstone's notion of attitude as "the degree Of positive
or negative affect associated with some psychological
object." The writer is inclined to accept the affective
aspect as the essence of attitude; but he does not find the
unidimensional approach very satisfactory if this means that
attitude is to be represented in some linear continuum.

A more realistic picture would be that Of a predispositional
base of like-dislike feelings towards an "object." From

such a base or "set of reaction tendency“ (to quote Freeman,
1957) emanate what may be described as "vectors" symbolizing
different directions or dimensions of the attitude. Some of

these directed vectors are towards and favorable while others

78

79

may be away from and unfavorable to the Object. The length
Of each vector would symbolize the strength of the reSpec-
tive dimension of attitude. It would be convenient for
measurement to focus attention on the two most significant

- attitude vectors- one is directed towards and the other

 

away from the object; they may be described as "positive" and

I'negative" respectively, provided no assumption is made that }_
they are diametrically Opposed. These two vectors in turn

each have sets of subsidiary branching-—vectors which sym- i.

bolize the attributes of the psychological Object.

These branching vectors in effect determine the lengths
and strengths and hence the dominance of the respective
dimensions whose existences they maintain. In this sense they
may be referred to as the ultimate anchors of the attitude.
These may well be the foci of attitude statements.

The use of "vectors" here is somewhat loose, but would
serve for the purpose Of analogy. Both the "positive" and
"negative" vectors are in the same (attitude) "space"; the
attribute vectors may be in a different (Object) "space",
but have direct links with the former, and both converge upon
the attitudinal base. Perhaps a pictorial account may help
to clarify the model. Figure 1, on the following page,

serves this purpose.

80

 

 

 

A B

Figure 1. The Attitude Model.

In the above diagram ABCD represents the predisposi-
tional base Of like-dislike feelings, with O as its outgrowth
point. OP and ON represent the "positive" and "negative"
dimensions respectively: their lepes reflect their nature.
The attribute vectors are P1, P2 as well as N1, N3. They
sustain or serve as anchors to the "dimension" vectors-—
that is to say: the attribute of an Object accounts for the
direction or "dimension" of attitude developed about the
Object. It is to be Observed that the lengths of the vectors
vary; moreover the number of the attribute vectors on each
dimension also vary. Furthermore it is conceivable that for
certain Objects the attributes may support growth along one
dimension only. In such a case the attitude may be described
as all-out "positive" or all out "negative" as the case may
be. The traditional model makes no provision for these

attitude anchors; besides the dimension vectors are made to

81

collapse on the base, end to end thus producing a bipolar
continuum. The view taken here is that such a model is an
oversimplification.

The present model may be justified. Given a psycho-
logical Object an individual's attitude in most cases is not
an all out feeling of like or dislike. It may well be a
mixture of the two. Certain attributes of the Object may
stimulate and sustain like feelings while others induce dis-
like feelings. In other words while an individual may
profess a favorable attitude towards an Object he may be
found to have some unfavorable attitude also. This is no
inconsistency, but the hard fact of human experience.

It is necessary to reiterate what has been said of the
proposed multidimensional model. The scale battery to be
presented cannot be fully appreciated without this model.

For this study attitude will be defined as a predispositional
set of like and dislike‘feelings. Two significant vectors
emanating from this set may be described as "positive" and
"negative". But these are themselves sustained by the
attributes of the psychological Object. Measurement of atti-
tude would therefore be concerned with the problem of placing
individuals on the attribute vectors. A description of a
person's attitude on a profile of such significant "vectors"
would be nearer to reality than the one on the traditional
linear continuum.

The psychological objects Of interest also deserve some

comment. Examinations refer to classroom achievement testing

82

involving the administration of quizzes, mid-terms, and
finals, made up of Objective or essay items--provided that
students performances form the basis on which academic de-
cisions are made, viz Pass/Fail, Credit/No Credit or the
award of grades. Grading refers to evaluation of student's
academic performance in examinations and/Or other aspects
of the curriculum by using letters (e.g., A, B, . . . F)

or numbers (e.g., 4:5 4.0 . . . 0.0) to classify students
according to their achievement relative either to their peers
or to a defined content area, provided the grading system
.also involves the report of "grade-point averages".

As the objects are defined above it is evident that the
reference pOpulation to which the prOposed scale battery may
be applied consists of "students". In its initial develop—
ment University students were used, but it is hOped that the
language adOpted in the final form is such as to make it
applicable to High School students, either directly or with
minor alterations. Thus, the purpose Of this scale battery
is to ascertain the predispositional set of like-dislike
feelings of students towards examinations and grading, to map,
as it were in a profile, the relative strengths Of these like-
dislike feelings. The ultimate aim is to provide a means
for a fairly accurate description of the attitude.

It may be asked: "What is the use of knowing student's
attitudes towards examinations and grading?" "What can one
'predict' by having such information?" Fishbein (pp, 913,)

feels that “the most important determinants of behavior may

85

be variables other than attitude.“ This may be so with most
psychological Objects. But in the learning situation the
attitude Of the learner may exert quite a significant effect,
if not the most important effect on the learning outcome.
It may be that as yet psychologists have not been able to
develOp a tool to identify the effect of attitudes on learn-
ing. Provision Of a valid scale to discern the attitudes
on crucial issues in the curriculum may be a necessary first
step. One would venture to hypothesize that students' atti-
tudes on issues like examinations and grading partly determine
the amount and quality of learning attained. It may also be
argued that in a student or child-centered system Of educa-
tion the attitude of the student or the child should not be
ignored. -Furthermore, findings from.an instrument which
specifies the attributes Of the attitude Object will suggest
which aspects Of the Object to manipulate, as it were, should
one wish to effect an attitude change. This brief discus-
sion Of the predictive and other uses of this instrument is
part and parcel of the overall purpose for which it is de-
signed.

.The plan followed in the develOpment of this Scale Battery
may be listed:

1. Administration Of an Open-ended questionnaire to
a small sample Of students

2. Content analysis of responses Obtained in step 1
above.

5. Development of attitude statements from the results
of step 2 (above).

84

4. Determination of statement values by the use of
judges employing Q-sort and Rating techniques, and
selection of statements with values at or above the
median

5. Preliminary try-out of the selected statements

6. The main try-out, using a sample of 585 students

7. Factor Analysis—-to ascertain the factorial validity
of the scales and use factor loadings for item
selection.

These stages are discussed in full in the pages which follow.

THE OPEN-ENDED QUESTIONNAIRE

The plan was to administer an Open-ended questionnaire
to a small sample of students. There was no attempt to use
any controlled sample since the Objective was pg£_to make
generalizations from the returns. Considerations Of the im-
mediate use for which the intended scale was to be applied
necessitated a preference for students in Education. This
is evident from the tabular illustration below where the
respondents are broken down according to the courses in

which they were enrolled.

Course Number of Students
Ed. 200 11
Ed. 450 6
Ed. 867 16
Phys. 827 1
Mth 455 5
Mth 215 9
Psy. 512 1
CEM 511 _5

TOTAL 50

85

It must be emphasized that the purpose behind this step
was to elicit the attitude in question,.and to check whether
in fact students may be broadly categorized into those who
favor and those who do not favor examinations or grading.

The typical language of each group would then be ascertained
and used as a basis for developing attitude statements.

Fifteen items comprised the Questionnaire. Item 4 reads:
"What reactions have you had to the examinations you have
taken in your College and University experience?" The other
items and the exact format of the Questionnaire may be found

in Sub-appendix (a).

CONTENT ANALYSIS

There was one and only one purpose for the content analy-
sis Of the free reSponse in the returned questionnaire: to
ascertain the typical language of students with positive atti-
tude and those with negative attitude towards the psycho-
logical Objects investigated. The search was therefore for
significant affective words, phrases, clauses and sentences
in the unit of analysis, which in this case was the whole re-
sponse to each item. These "significant" words, etc., were
categorized according to the scheme to be illustrated
presently.

The setting up Of content categories posed some problems.
The guiding principle was the theoretical model described as

the basis for the intended scale. Attitudes have dimensions

86

maintained by the attributes of the psychological Object.
The content categories should in turn reflect the attributes
Of the object. Following this reasoning nine categories
were arrived at as follows:

1. Perceived function/meaning of Object

2. Statement on efficiency or inefficiency

5. Expressed statement of preferences

4. Expressed Opinion with emotional overtones

5. Expressed Opinion with very intense emotion

6. Indication of satisfaction

7. Indication of dissatisfaction

8. Suggestions Of alternatives

9. Miscellaneous (unclassified) reasons stated.
A scheme for coding and general procedural steps in the analy-
sis were prepared. Two questionnaire copies were then con-
tent analyzed. It was feared that the content categories
lacked the qualities of Objectivity, reliability and valid-
ity. The fears were confirmed when another analyst1 was
engaged. Discussions that followed led to the reduction of
the number of categories to four. These were:

1. Statement of function (e.g., feed-back)

2. Statement of preferences

5. Statement expressing or implying emotion, and

4. Statement offering suggestions.

 

1The writer is very grateful to Ogunniyi Omotosho for
the role he played as analyst in this aspect of the study.
"Tosho" is currently finishing his Ph.D. dissertation.

87

Based on these reduced categories and on improved
instructions another questionnaire was analyzed independ-
ently by the writer and the engaged student. Agreement
was perfect on every item. The rest Of the scripts were
then analyzed by the writer only following the improved
scheme, which is given in full in the sub-appendix (b).

The words or phrases selected as "significant" tended
to the extreme: for example, "examinations should be com—
pletely abolished." It was thought that extreme statements
would discriminate better than moderately affective ones;

moreover they would help to make the scales homogeneous.

ATTITUDE STATEMENTS

The content analysis exercise revealed that two cate-
gories were the richest and most apprOpriate as sources for
attitude statements. These were (1) Statement of function,
and (5) statement expressing or implying emotion. The first
clearly focuses on one dimension of the psychological Ob-
jects while the second touches on varied aspects, including
for examples, administration procedures of both examinations
and grading, types of the examination or grading system and
quality Of test items. The "significant" statements selected
under the two categories were then tabulated and the fre-
quency of each statement across the fifty respondents was
determined. The following extract from the work-sheet illus-

trates this point with respect to grading:

88

Frequency of Significant Statements
Undergrad Graduate
Education Education All Other
Statement Courses Courses Courses Total

"motivates the
student" 4 4 . 2 10

"abolish" 5 4 6 15

A majority Of the significant statements selected were

the most freqpent. However judgment was exercised to include

 

those not necessarily the most frequent but thought to be
referring to important attributes Of the psychological Ob-
jects. Finally sentences were constructed using the signifi-
cant statements as nuclei. The sixty-five initial attitude

statements are shown in Sub-appendix (d).

STATEMENT VALUES

The next step was aimed primarily at assigning numeri—
cal values to the statements. The secondary aim was to use
other people to judge whether the statements were meaningful,
clear and unambiguous. The original plan was to use two
groups of advanced graduate students for this exercise. One
group would "judge" and "Q-sort" the statements while the
other would "judge" and "rate" them. In judging the subject
had to say whether the statement reflected a positive favor—
able attitude or an unfavorable negative attitude towards
the object. Rating involved weighing each statement singly

and assigning a value to it. In Q-sorting all the statements

 

89

within a group were viewed together and assigned relative
values, so as to produce a near-normal or rectangular distri-
bution. The eleven-point scale was to be used in both cases.
Full details Of the instructions may be found in Sub-appendix
(c).

In a pre-session it was found that most Of the raters
assigned extreme values to the statements. There was very
little discrimination. The use of this technique was there-
fore discarded. Incidentally it should be mentioned that the
results from the two techniques were to be compared and
averaged. As it turned out, such averages would have been
meaningless.

A forced-choice "Q-sort" technique was adOpted because
discrimination among the items was possible. Ten judges1
were engaged. The value of each statement was the median Of
the values assigned by the ten. The extract from the work-
sheet, on the following page, illustrates the procedure.

In an eleven-point scale the median value is six. The
criterion for selecting an item was therefore a minimum value

Of six. There was however another requirement that at least

—A_

lThe writer is grateful to the persons listed below for
the role they played in this aspect Of the study. .The first
two hold Doctorate degrees and are Assistant professors in
the Departments of Education and Psychology respectively.
0f the rest six are advanced graduate students working for
their Ph.D. degree, and two are Master's candidates (M).
.Dr. D. Freeman (Education), Dr. A. M. Barclay (Psychology).
Miss Gisila Dieze (M), Miss Jody Anderson, Terry Almquist,
William E. Martin, William S. Beavers, Paul David Goff (M),
John Hoogstra, and Dick Bate.

90

.HH mpum3ou manmno>mm

.AUV xHUSTQQMIQDm SH cm>Hm mum musmE
Imumum nmnuo map How MOSHM> SMHOOE mnuv usmamumum mnu HO ODHM> may mmEOown MHsau

.UmuMHSOHmo no: mm3 Om has usm>mHmHuH mH Hauou mHna

50m

A

.H .O>HuHmom mH umnu TUDuHuum as men
Imwnmxm USN .sOHumcHmem Ou mcHuHmmmH mH ucwﬁwumum may menus mmmpsn Ham wmmo MHz»
SH .usmﬁmumum may SH wuHsmHnﬁm mo mocmmmsm HO mocmmnm mnu muumamms HauOu MHsam

 

 

 

Um

HH

HH

HH

 

 

OH

.Hm>
.mmz
.mom

.Hm>
.mmz
.mom

mad

lemme

Edvn

 

.Hm>
6cm
. “HQ

 

.Hno

 

.O—ﬁ

EwuH

 

SMHOOZ

 

 

Hmuoe

 

Odm

mm

 

 

mm pm am PM

 

 

 

«m

 

 

0m

 

mm

 

Hm

 

ETOOSD

 

 

 

mpcmﬁmumum

Mo OSHm> cam SOHuumHHQ

wuuwﬂno mosmnmmmm

91

fifty-five items should be selected from the original sixty—
five. To meet this requirement also five items were select-
ed, each with a value Of fiye, There was no absolute need to
calculate and use indices Of dispersion since the aim was

not to produce a scale purely on the Thurstone model.

It is necessary to explain the purpose of statement
"values" at these early stages in the develOpment of the
scale battery. The values were g9£_to be used in the
Thurstone style: this must be emphasized. They were to be
used as aids to developing a homogeneous scale. Suppose
for example, that a statement has been judged to be Of
"negative" direction; if it is further assigned the value of
Qg§_then it represents a statement that is tending towards
a positive direction; if on the other hand it is assigned
the value of eleven it can be safely assumed to represent an
extremely negative statement. Ideally only items, each with
value eleven would be selected since as stated earlier there
was reason to prefer high extreme statements. To attain this
ideal is not impossible; all it involves is increasing the
original pool to at least 500 items, carefully written with
the same goal in mind. Actually the median value of the
selected items turned out to be seven, and there were two
items with value £§2.and another two with value eleven each.
.This is admittedly a poor approximation to the ideal, but
was accepted as fairly satisfactory in the present circum-

stances.

92

.Another comment is in place. Each statement was assigned
a relative value within its own group. The direction of the
attitude implied in the statement was p2£_taken into account
in assigning values. Thus, these values are not to be con-
fused with the five-point-Likert—scale used in the final
scale battery, as will be shown presently. In fact the exer-
cise thus described is an elaborated example of stimuli scal-
ing--the attitude statements are scaled; on the other hand
the Likert technique scales persons. The two methods were

therefore combined in the present study.

PRELIMINARY TRY-OUT

The preliminary try-out was necessary to check on the
suitability of the format in which the battery is to be
presented, to check also on the clarity of instructions, and
again on the quality of the items. Moreover it would provide
an opportunity to test the scoring procedure before a full
scale try-out was launched. This last need emerged from
.discussions with Warrington (1968).

.In view of the purposes just stated the "sample"--if it
could be called one at all--was confined to three advanced
graduate students1 invited to respond to the items in their
role as University students. Later they were expected to pro-

vide and did provide written comments as they felt necessary.

 

lThe writer is grateful to the advanced graduate students
named below for the role they played in this part of the study:
Jack Hruska, W. Russel Harris, Glenn L. Sterner.

95

As mentioned in the last section fifty-five items were
selected. Respondents were expected to Show the degree of
their agreement/disagreement with the statements by assign—
ing values using a five-point Likert scale. The points were
defined as follow:1

,1. Np agreement whatsoever

2.-Disagreement most of the time; agreement at few
occasions.

5. Opinion hovers between agreement and disagreement
equally.

4. Agreement most of the time; disagreement at few
occasions.

5. Complete agreement.

There were four groups of items: Examination-Positive,
Examination—Negative, Grading-Positive and Grading—Negative.
From now on these will be referred to as EP, EN, GP and GN
respectively. They are the four scales Which constitute the
scale battery. To simplify notations further they will also
be referred to as Scales 1, 2, 5, and 4 respectively. No
systematic order was employed in arranging the items in each

scale; but the scales were chosen alternately, and no more

than six items in the same scale were presented successively.

.The results from this investigation were as shown in tabular

form on the following page.

 

 

J'These definitions may be cumbrous; but the aim is to
avoid the stereotype and thus hopefully minimixe response
sets.

_ 3'11

 

94

ATTITUDE SCORES PRELIMINARY TRY-OUT

 

 

Scales

Possible score EP EN GP GN
Possible score

range 11-55 12-60 14-70 16-80
Cutting score* 55 59 42 48

Respondents

S; 17 52 55 57
Se 57 25 46 55
S3 14 46 17 59

 

 

 

 

 

 

 

*These scores are determined from the Likert point value
of 5 as defined above. ReSpondents with scores above
the cell entries here can generally be classified as
being "high" on the attitude measured by the scale. :The
values vary with the number of items in each scale; no
final selection of items was made, as yet.

RANK-DIFFERENCE CORRELATION COEFFICIENT*

 

 

EP EN GP GN
EP —1.00 +1.00 -1.00
EN -1 . 00 +1 . 00
GP -1.00 .
GN

 

 

 

 

 

 

 

*The high values are certainly an artifact of the sample
size; does this also apply to the direction?

95

The preceding pattern of scores and the direction of
the coefficients would be expected from the theoretical
model; the absolute values were of no significance. This
part of the exercise was therefore very valuable in that it
also led to the improvement in the diction of some of the
items and in the format of the instructions--all based on
the comments from the respondents and other consultants.

Of the fifty—five items used, forty-eight were retained—-

twelve each for the four scales EP, EN, GP and GN.

 

THE MAIN TRY-OUT

Two considerations determined the characteristics of

thegpample drawn for the main try-out phase. The first was

 

the immediate pOpulation for which the Scale is designed.

The Scale is directly applicable to a pOpulation of college
and university students. It is assumed that the students

of Michigan State University form such a typical pOpulation.
The sample was drawn in such a way that the main departments
of the University are represented. However, it was not
random; judgment was exercised to make the selection include
“juniors" and graduate students as shown in Sub—appendix (e).

The second consideration was the intention to factor

 

analyze the returns--in an effort to test the validity of

the theoretical model conceived as the basis for the scale

 

battery. Accordingly, the size of the sample was planned at

600 at least. As Sub-appendix (e) shows, the actual returns

96

were 585 (incidentally twelve data cards were destroyed in

process so that the final output involved 575 Observations).
.The questionnaire was administered by the instructors1
responsible for the classes selected. Subjects responded
to all items on a five-Option IBM answer sheet. About fif-

teen minutes were sufficient to respond to all items. The

scoring was done by the Office of Evaluation Services.

THE FACTOR EXPERIMENT

Both the theoretical basis for the battery and Eh;
hypothe§§§_that may be deduced from the model may sound a
little radical. It is therefore necessary to put them through
a somewhat rigorous test as may be provided by factor analysis.
In the first place the view is expressed that like and dis-
like attitudinal feelings are not necessarily on a linear
continuum. Accordingly it was hypothesized that EP and EN
scales represent two distinguishable "factors" and not one
bipolar factor. Similarly GP and GN scales also represent
separate factors. The model also depicts attributes of the
psychological Object as the anchors for attitudinal feelings.
It would follow therefore that where a number of attitude

statements focus on a well defined attribute of the object

 

lSpace forbids the listing of the twenty-and-two profes-
sors who were not only willing to permit the use of their
classes but also agreed to administer the questionnaire to
their students in an effort to help keep "the experimenter
out of the scene." The writer is deeply grateful to these
professors and their students for their COOperation.

97

factor analysis would bring out a "factor" symbolizing such
attribute. In the present battery develOpment it was pos-
sible to focus a number of statements on the functions of
the objects of interest. The content analysis exercise pro-
vided for this catetory. .The second hypothesis was there-
fore that a "functional factor" would emerge from the
analysis.

As mentioned above one of the richest content categories
on which the attitude statements were based was the one in
which emotion was expressed on diverse aspects of the objects.
It was therefore not possible to formulate a well defined
hypothesis in this area. At best it was hypothesized that-a
general attitudinal factor would also emerge.

The six types of factors discussed were clearly antici—
pated. But perhaps there might be another factor or factors
engulfed in the general factor. ~With such reasoning the
raw data was submitted for analysis in the hOpe that there

would emerge "at least five factors".

The Rotation Technigues

The analytic procedures were repeated three times.
In the first and second, half the Observations were used--
randomly divided; the third repeat involved all the Observa-
tions. The Kiel-Wrigley criterion (MSU CISSR, 1967) was used
in the rotation Of factors for the two half samples, but the

full sample data was rotated to ten factors.

98

Both the Quartimax and the Varimax methods of rotation
were applied. Extracts from the final outputs are given in
Sub-appendix (f). Only the loadings with value 0.40 or
greater are shown on that table. The lower values may not be
significant. The sample was split so that the factor pat-
terns may be compared. Such comparison would throw light on
the stability of the factors.

The full data analysis resulted in six Quartimax factors
each of which has loadings on at least three variables.

Three of these factors each account for at least five percent
of the common variance. The other four factors may not be
significant. The corresponding distribution for the Varimax
factors is as follows: nine factors--with at least three
variables, five factors, each accounting for at least five
percent of the common variance and only one factor that may
not be significant. Following Wrigley's (1968) suggestion
the Varimax factors are adOpted as the more appropriate in
the present case. In fact there are also evidences in the
literature (e.g., Vernon 1959, Kerlinger and Kaya, 1959) to
justify this preference. But it is worth observing that both
techniques Of rotation produce more factors than were hypothe—
sized. If the traditional model applied in this case there
would have been at most three factors. Furthermore, the
patterns across the three samples though not in perfect agree-
ment are sufficiently similar, and tend to show the factors in
the third analysis are stable. .A full comparison of the Vari-

max factors across the three samples and the four Scales is

99

 

 

The Naming of the Varimax Factors in the Full Data Analysis

Factor 1:
Var. Quest.
NO. NO.
1 EP,
6 EP3
27 EP24
28 EPgs
29 EP33
56 EP33
58 EP35
5 GP
16 GP13
25 GPgO
24 G931
25 Gng
26 GP23
59 GP33
40 GP37
21 EN13
51 GN33

(16.08% of Common Variance)

"(General) Learning

Function"
Attitude Statement Loading
Sum of scores on 12 items comprising 0.7262
Exam Positive Scale.
Of all teaching devices, examinations 0.4008
provide the most useful feedback.
Examinations provide the most satis- 0.5654
factory means for assessing learning.
Examinations are an indispensable 0.7194
feature of the University curriculum.
Without examinations, academic stand- 0.7911
ards fall.
The discipline Of examinations is vital 0.6628
to learning.
Abolition of examinations will in the 0.6548
long run lead to chaos in graduate
education.
Sum of scores on 12 items comprising 0.7071
Grading Positive Scale
Grades provide a necessary incentive to 0.5269
hard work.
The grading system should be an inte- 0.5212
gral part of the curriculum in higher
education.
For the student, grades are a desirable 0.5055
aid to self-evaluation.
Abolition of grading would jeopardize 0.7572
learning at the University level.
Grading is a necessity if standards 0.7726
have to be maintained in University
education.
I would campaign vigorously against 0.5912
any attempt to abolish grading at the
University level.
Without grading the motivational func— 0.5559
tion of examinations would be impaired.
Examinations should be abolished at the —0.4158
University level.
Grading should be abolished at the -0.5152

University level.

 

100

given in Sub—appendix (g). The conclusion from that table
is that the stability of the factors is not in doubt.

Seventeen variables have "significant" loadings on this
factor; of these there are seven each belonging to the orig-
inal EP and GP scales, and one each to the EN and GN scales.

0n the positive side the theme is that both examinations
and grading are relevant in the curriculum; the negative side
is also clear: these aSpects of the curriculum are not rele-
vant and "should be abolished".

This factor shows up as bipolar, but very few negative
items load on it and these negative loadings may reflect the
particular wordings in variables 21 and 51. Perhaps a bipolar
attitudinal factor may be an artifact of the language used
in the statement. This will therefore be called the General
Learning Function Factor. Future revisions will discard

variables 21 and 51 and all such types.

 

 

Factor 2: (5.45% of CommOn Variance) "Examination Type"
Var. Quest. Attitude Statement/Description Loading
NO. No. .

22 EN Sum of scores on 12 items comprising 0.5425

Examination Negative Scale.

18 EN15 Objective examinations are nothing 0.7024
more than a guessing game.

44 EN41 Examinations are nothing more than 0.6649
trickery.

 

Apart from the EN scale only two other variables load

significantly on this factor. One of them suggests this may

101

be an "Examination-Type" Factor. Further studies may in-
vestigate whether there is any such factor. It is worth
noting that no items on grading load significantly on this
factor. It is therefore peculiar to examinations, and pro-
vides another evidence that negative attitude may be on a

distinct attribute of the psychological object.

 

 

Factor 5: (7.06% of Common Variance) "Pressure-Anxiety"
Var. Quest. Attitude Statement/Description Loading
NO. NO.

2 EN Sum of scores on 12 items comprising 0.4576

Examination Negative Scale.

19 ENie Examinations provide the student a 0.5790
frustrating experience.

20 EN17 I resent the pressure which examina- 0.7110
tions bring on me.

45 EN4O Examinations generate too much anxiety 0.7808

50 GN37 Grades induce too much worry. 0.7459

 

Here again the only items that load significantly on
this factor belong to the negative EN and GN scales. All the
items provide "pressure" or "worry" or "anxiety" stimuli.
This will therefore be called the "pressure-anxiety" factor.
Examinations and grading go together, once again suggesting
some common frame of mind, or reflecting the fact that the
attitude dimensions and the supporting attributes are the

same for both objects.

102

 

 

Factor 4: (7.69% of Common Variance) "Grade-Measure"
Var. Quest. Attitude Statement/Description Loading
No. NO.

5 GP Sum of scores on 12 items comprising 0.6159

Grading-Positive Scale

14 GP11 Grades are very effective for indicat- 0.6595
ing students achievements of the course _
objectives.

15 GP;3 Grades are a good estimate of the quality 0.6262
of learning that has taken place.

17 GP14 Given the word "meaningful" as indicat- 0.5815 g
ing your Opinion of grading, rate it
according to the strength of this Opinion.

25 GPgO The grading system should be an integral 0.4292
part of the curriculum in higher education.

24 6P3; For the student, grades are a desirable 0.4175
aid to self-evaluation.

41 GP37 The finer the grading system, the better 0.5125
it reflects the students' competence
level.

42 GP33 Given the word "relevant" as indicating 0.5499

your Opinion of grades, rate it to show
the strength of this Opinion.

55 GN3O Grades are no indication of what the 0.4467
student has learned in a course.

 

With the exceptions of variables 25 and 24 (which also
load high on factor 1) these items focus on the effectiveness
of grading as a measuring instrument. That variable 55 loads
with an Opposite sign may be just an artifact of its wording
("no indication") and not necessarily that the factor is bi-

polar.

105

This shall be called the Grade—Measure Factor. It is
hard to explain why a similar item on examinations does not
load high on this factor. Are the perceptions Of these ob—

jects as measuring tools on different dimensions?

 

 

Factor 5: (6.54% of Common Variance) "Hate"
Var. Quest. Attitude Statement/Description Loading
No. No.

4 GN Sum of scores of 12 items comprising 0.5705

Grading—Negative Scale.

54 GN31 Given the word "evil" as reflecting 0.4555
your Opinion Of grading, rate it to
show the strength of this Opinion.

49 GN43 I have nothing for grades but pure 0.6459
hate.

50 GN47 Whoever put more grades into the 0.7578
scale should be hanged.

51 GN43 It is grossly unfair to award a gradu— 0.5969
ate student a "D" or an equivalent
grade.

47 EN44 Given the phrase ("a farce" as indi- 0.4594

cating your Opinion of examinations
rate it to show the strength of this
Opinion.

48 EN43 In my experience as a university 0.4550
student, examinations are the instruc-
tors' make—shift without any real
value.

 

Here as in Factor 1 the attitudinal disposition is the
same for examinations and grading. That this is a distinct
factor is further evidence that a negative attitudinal dis—

position may exist on a separate dimension.

104

This is named the Hate Factor; it is somewhat general

in that the determinants of the "Hate" are not specified.

 

 

Factor 6:(5.42 of Common Variance)

Var. Quest. Attitude Statement Loading
No. No.
11 EP3 Examinations make me feel happy and 0.5787

confident.

55 EP33 Examinations Should be given more empha- 0.4249
sis in the University curriculum.

 

57 EP34 Examinations make study exciting. 0.6515

 

This may be a general satisfaction factor--in Opposition
to the PressureeAnxiety factor. Perhaps if similar items

were included on grading they would also load on this factor.

 

 

Factor 7: (5.95 of Common Variance)

Var. Quest. Attitude Statement Loading
NO. No.
46 EN34 The examination system is entirely 0.5209

lacking in precision.

47 EN44 Given the phrase "a farce" as indicat- 0.4547
ing your Opinion of examinations, rate
it to Show the strength of this Opinion.

51 GN43 It is grossly unfair to award a graduate 0.5551
student a “D" or an equivalent grade.

 

It is difficult to explain why these items should com-

prise a separate factor. The last two also load significantly

105

on the "Hate" factor. It may not be a stable factor.
Further investigations may reveal the nature of this factor,
if at all it exists on a separate dimension. Meanwhile it

will be ignored.

 

 

Factor 8: (4.91% Of Common Variance) "Motivating Function
Var. Quest. Attitude Statement Loading
NO. NO.

5 EP3 Examinations are the best means for 0.5770

motivating students to learn.

7 EP4 I Examinations enforce my desire to 0.5960
learn.
8 EP5 Given the word "favorable" as refer- 0.5551

ring to your feeling about examinations,
rate it to indicate the degree of this
feeling.

16 GP13 Grades provide a necessary incentive to 015984
hard work.

 

The central thought in the first three items is that
examinations are perceived to motivate learning. The loading
of the last item on grading is below the criterion value of
.40; however it is so close as to justify its inclusion here.
This shall be called the "Motivating-Function" factor.

The statements which load on factor 9 (see the follow—
ing page) seem to say that the psychological objects are
worthless, or that they perform some undesirable function.
This will therefore be called the Dysfunction Factor in

Opposition to the relevant Function Factors 1 and 4.

106

 

 

Factor 9: (4.52 of Common Variance)

Var. Quest. Attitude Statement/Description Loading
NO. No.
12 EN3 There is very little of instructional 0.6558
value in the content of examinations.
15 EN10 Examinations are redundant in the edu- 0.6806
cational process at the University
level.
10 GN7 Grading encourages students to cheat 0.4559

in examinations.

 

It is worth Observing that these variables do not load
significantly On the first factor. There their loadings are
0.0806, 0.2254 and -0.0022 respectively. In other words the
evidence is not very strong that either Factor 1 or Factor 9
is bipolar.

Generally the hypotheses were confirmed. Most of the
"positive" statements came out under separate and identifiable
factors; and so did the negative statements. Furthermore
their identities have references or anchors in the attributes
of the attitude Objects. These attributes are reflected in
the factor names suggested. However only limited success was
achieved in separating the examination from the grading
factors. Perhaps there is a natural linkage between them.

It may also be that attitude factors are similar and parallel

as shown in Figure 5, page 121-

107

SELECTION OF ITEMS AND PRESENTATION
OF THE BATTERY

The table on the following page, shows the scheme used
in making a selection of eight items for each of the four
scales. The numbers appended refer to the items with the
highest loadings on the respective factors. The table serves
to emphasize the aims of the present battery. If attitude
statements are anchored on well-defined attributes Of the
psychological object separate "factors" will emerge to
symbolize these attributes. Furthermore the general nature
of the attribute determines the direction of attitude, that
is whether it is "positive" or "negative"--for or against.
It may be added that this table also provides a scheme for
writing new items. Ideally only unidimensional factors
would serve in this scheme-~to agree with the theoretical
model, but factors 1 and 4 fail to meet this ideal.

The battery in the final form is reproduced on pages

109 and 110. Where groups of items belong to one factor
they are arranged in descending order of the magnitude of

their loadings, which were given earlier.

108

 

 

 

 

 

 

 

 

 

 

 

 

 

 

>20 mannedzm AuouommISOHuosnmmmnv m
name Nmmuemm Auouomm :oHuossm msHum>Huozv m
Anouomm
Foam coHuUNNMHummISOHumsHmemv m
Hﬂzwnmﬁzo
oazoikvzw massiv+zm Auouomm mummv m
Tumouaamw Anouomm
onzw mumwuaamw. SOHuocSmlmcHHSmmmz OONHOV H
OHZN
rmzw hazmnovzm AHOuomm wumecmrmnsmmmumv m
mazm AHOHONm mmhalsoHumsHmemv N
TNmW
Dmmw mammummmﬂ
mmzw mmmuummmo mummiommm Anouomm SOHpussmumchHmmAV H
20 am an an

 

 

HOHOMJ

 

id

 

ZOHBUmnﬂm ZHBH mom

MEmmUm

109

 

.MSOHumSHEsxm mo

 

 

 

 

 

 

usmucoo OS» SH mSHm> HSSOHSUSHumSH mo OHuUHH wum> EH THOSE .m
.Hm>mH huHmHm>HSD on»
us mmwuoum HSSOHuNOSpm mnp SH uSmpSSUmH mum MSOHumSHmem..> SOHuOSSmme
uMHSmmme .mnouUSHHMSH may mum
mSOHumSHﬁmxm .uSmosum muHmum>HSD m mm TOSOHHmmxm ME SH .m
.SOHSHmo EHSu mo Sumsmnum may 30Sm ou uH mums mSOHuMSHmem mumm
mo SOHSHQO H50» mSHumUHUSH mm =monmm m: Tasman may SO>HO .m
.mOSmHmexm mSHumHumSHm m uSTUSum TS» m©H>OHm MSOHumSHSmxm .H
.mE So mSHHQ mSOHumSHmew SUHS3 OHSmmmHm SSH uSmme H .m huwHXS<|mHSmmmum zm
.mumesm SUSS oou mumHOSmm mSoHumSHmem .N
.mﬁmm
mSHmmmsm m SSS» SHOE mSHnuOS mum mSoHHMSHmew T>Huomnno .H mmhalﬁmxm
.SHSOH ou
muSmUSum mSHum>HuOE How MSmmE ume may mum mSOHumSHEmNm .m
SoduoSsmlmSHum>Huoz
.SHSOH ou muHmmp he TUMOMST mSoHumSHmem .s . . .
.mSHuHUxm mpSum TESS MSOHumSHmem..m SOHuommmemmlﬁmxm mm
.mSHSHmmH mSHmmmmnm
How mSmmE NHOHOSHMHumm umOE mnu OUH>OHQ MSOHumSHmem .m
.SOHumospm mumnomnm SH momno
Ou UmmH SSH mSOH Onu SH HHH3 MSOHumSHmem mo SOHuHHon< .H
.mSHSHmmH Op HSHH> MH MSOHHMSHmeO mo OSHHQHOch SSE .m
.ESHSUHHHSU wuHmum>HSD
may no musummm THAMESTQMHUSH Sm mum mSoHumSHmem .N
.HHmm OHSO3 mpnmpcmum OHﬁmomum .mSOHHMSHﬁmxw uSOSqu..H SOHHOSSMImSHSwaq

 

EOHH

 

Houomm

 

mHmom

 

 

 

110

 

 

 

 

 

 

 

 

 

.mSoHpmSHmeo SH umwSO Ou muSmpSum mmmmHSOOSO mSHpme .m SOHSUSSHmmn
.SOHSHQO MHSu Ho Sumamuum mSu 30Sm 0» SH mums .mSH
IUmHm Ho SOHSHmo snow mSHuomHHwH mm =HH>m= UHO3 SSS Sm>HO .s
.wpmnm uSmHm>HSUm Sm HO
gm: m uSwUSum mumspmum m OHNBM Ou HHSHSS memOHm EH SH .m mumm 20
.mumS Tuna SSS mmpmum HOH mSHSSOS T>MS H .m
.pmemS TS OHSOSm mHmom SSH ouSH mmnmum mHoE use HO>TOS3 .H
.mmnSOO S SH SoHuOSsm
SOSHmmH MSS uSmUSum SSS SSSB mo SOHumOHUSH OS mum mmnmnw .m ImSHHSmmmzlSoz
.mHHOB SUSS oou TUSOSH mwpmsw .N mumeSmrmHSmmmHm
SOHSOSSm
.Hm>mH wuHmum>HSD TS» um OmSmHHOSm OS UHSOSm mSHpme .H mSHSHmmHISOz
.Suo3 UHSS Ou T>HuSTOSH hummmmomS m TUH>OHm mwpmsw .m SOHSOSShImSHpm>Huoz
.Hm>mH OUSOSTQEOU .muSmtsum
TS“ muowHHmn uH HmuuTS mSu .Ewummm mSHanm TS» HOSHH TSB .s
.SOHSHmO
mHSu Ho Sumcmnpm mSu Ou mSHUnouom SH mums .mSHUmum Ho
SOHSHmO “Dom mSHumOHpSH mm =HSHmSHSmmE= UHO3 SSS SO>HO .m
.womHm smxmu mmS HMSS
mSHSHmmH Ho muHHmSU wSu Ho mumEHumm 000m m mum mmpmno .m SOHSOSSMImSHHSmmmz
.mm>HuomnSO mmusoo HO muSoEm>mHSUm .
.muSmpsum mSHumUHUSH How O>Hpowmmm msm> mum mmpmuw .H mm
.Hm>wH muHmHO>HSD mSu um mSvanm SmHHonm
ou umEmuum mSm umSHmmm mHmsouomH> smHmmﬁmO pHsoz H,.m
.Hm>mH wuHmHm>HSD
mSu um mSHSHNTH TNHnHmmomn OHSO3 mSHpmum Ho SoHuHHOSS .N SOHSOSzmImSHSHSOH.

.SOHHMUDUT SUHSH0>HSD SH
UwchuSHmE 09 OH w>MS mUHMUSMUM NH thmmmowG m mH mGHUwHO

 

 

111

A few comments are necessary. In administering the bat-
tery the items would be thrown into some random order. Future
revision will aim at ten items for each scale, at least three
and at most four factors under each scale, and two or four
items within each factor. The increase in the total number
Of items will hOpefully lead to increase in validity, while
the use of even number of items under each factor will make

it convenient to compute split—half reliability coefficients.

TEST STATISTICS

In the present case where there were five alternative
weighted responses the product moment correlation of item
scores with the total scores in their appropriate scales may
be used in determining items which belong to the Scale. But
such coefficients are inflated since the item scores are also
included in the scale scores. Even so these coefficients
are diSplayed in Sub-appendix (h) together with the standard
deviations for each item—variable, and also the inter—item
correlations. The latter may safely be interpreted as indices
prbelonging. To facilitate their comprehension Table 8
summarizes the relevant data. It is worth noting that all
coefficients are positive. Furthermore GP is the most homo-
geneous as its inter-item coefficients are all above .20. By
the same standard GN is the poorest scale, and needs much

revision.

112

TABLE 8

GROUPED FREQUENCIES, RANGE AND MEDIAN OF
INTER-ITEM CORRELATIONS

 

 

 

 

EP EN GP GN

Categories (f) (f) (f) (f)
.6000-.6999 1 1 1 -
.5000-.5999 4 6 7 1
.4000-.4999 16 12 29 12
.5000-.5999 24 22 17 15
.2000-.2999 15 20 12 20
.1000-.1999 6 5 - — 15
Below .1000 - - - 5
Total (f) 66 66 66 66

Range .159-.621 .129—.609 .228-.641 .004-.545
Median .554 .520 .410 .277

 

Intercorrelation Among the Scales

Logically the total scores for the "positive" and
"negative" scales should reveal an inverse relationship be-
tween them. But this may not be perfect since the"dimensions"
are not necessarily on the same linear continuum. In fact
the inverse relationship may be conceived to be an intrinsic
property of the "negative" and "positive" dimension vectors.
The absolute sizes of the coefficients as presented below
also Show an interesting pattern: the positive scales
(EP-GP) and the negative scales (EN-GN) correlate more highly

within their like-pairs than they do within unlike pairs

115

(EP-EN or EP—GN; similarly GP-GN or GP-EN). This may be
interpreted as another evidence against bipolarity of the
attitude factors. The correlation between "positive" and
"negative" scales is negative; if the scales were on the

same linear continuum, if they represented Opposite ends of

a bipolar factor then the absolute value of the correlation
coefficient would be as close to unity as possible. The evi-
dence of this study does not seem to support such a position.

In the sample the correlations were as follows:

EP EN GP GN
EP 1.00
EN - .589 1.00
GP .796 - .562 1.00
GN - .550 .800 - .624 1.00

The directions of these coefficients agree with those

illustrated on page 94.

Reliability of the Scales

An estimate of the reliabilities Of the scales was com—
puted by the Kuder-Richardson method. In the present case
where responses are weighted the apprOpriate formula accord-

ing to Magnusson (1966) is

 

2 _ 2
r = D (St >3 Si)
tt n-1 82
t
where rtt is the reliability coefficient (K-Rgo)

n is the number of observations

114

s2 is the variance of the test

2‘s: is the sum of the item variances

The reliabilities shown below were based on this formula.
The relevant data for calculations will be found in Sub-
appendix (h).

EP EN GP GN

.798 .791 .812 .746

INTERPRETATION OF THE SCORES

Ostensibly four scales make up this battery. However
factor analysis has brought out sub-scales which are fairly
easy to interpret. From the general instructions to the
Questionnaire a value of 5.13 to be assigned to a statement
if "Opinion hovers between agreement and disagreement equally."
It will therefore follow that a mean score less than 5 or a
mean score higher than 5 will be interpreted to indicate that
the group or the individual is "low" or "high" on the particu-
lar dimension of attitude. The mean total score for a group
of items may also be interpreted accordingly. .Thus if there
are four items in the sub-scale a mean total score of 12
'would form the dividing line between the "lows" and the
"highs" on the dimension reflected by that sub-scale.

The scheme for interpretation outlined implies a built—in
meaning for the scores, and not a meaning to be determined
with reference to any group. It seems logical that the mean-

ing of scores should be similar to the Likert values as here

115

defined. The only assumptions are that the subject under-
stands the instructions, and that he responds to the items
honestly. These may be somewhat limited by "response-set"
tendencies. The extent of such tendencies were not deter-
mined, but the percentage Of‘reSpondents choosing each
Option shown in Table 9 would lead one to say that the
effect of such sets may not have been very serious. The
choices are fairly spread out except that respondents tend
to avoid the high extreme value.

The above Observations will now be illustrated for the
try-out sample.

There are three factors in the EP Scale. In the first--
the learning-function factor--there are five items; the mean
total on these for the 575 Observations is 12.8081. This
places the group on the "low" end of this sub—scale with
respect to their perception of examinations as a learning
device. The mean item response on this and the other factors

may be set out as follows:

Range of
Mean Item Inter-item
Factor Responsea Correlations
Learning Function 2.562
Examination—Satisfaction 1.887 .159-.621
Motivating Function 2.584

aThe means for all items are given in the sub-appendix.

bThese may be taken as estimates of the reliabilities
of the factor scales.

 

116

These results also read "Low", or "Very Low", as on
the Examination-Satisfaction factor.
The break-down of the other scales is as follows:

Mean Item Range of Inter—item

Scale Factor Response Correlations
Examination-Type 2.615
PressureHAnxiety 5.528 _

EN Hate 2.551 .129 .609
Dysfunction 2.705
Learning-function 2.686

GP Measuring-function 2.640 .228—.641
Motivating function 5.077
Pressure-Anxiety 5.415
Hate 2.512

GN Dysfunction 5.059 .004-.545
Non-learning function

(bipolar) 2.670

Non-measuring function 5.158

The meaning that may be read into the above results is
that the group tends to be "high" on the following factors:
PressureeAnxiety, Grade-Motivating function, Grade-Dysfunction
and Grade-Non-Measuring function. On the other factors it is
"low". The point needs emphasis. The scores for an individu—
al (or group) on the Scales in this battery should be broken
down into "factor" scores, and then interpreted in terms of
“low" or "high" on the respective factors. The aim is to
present a profile mapping of the individual in the defined
attitude factors. Such a profile is presented in Figure 2
on the following page. The Pressure-Anxiety factors are
prominent in both Examination and Grading scales, while the

learning function factors are "low".

 

 

-
ﬂu (4

117

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5
very High
4
High
s:
5 \\ ‘ * $51
\ S RN is.
Low i {SLS§[§§DF< §§ :§? 5% 5§ §S_ ’SS
* §§\’\‘\‘ § § \?\:§. iE;
2 L \‘ \\ \\ t§é§mw§x
WT ‘ m, # 7
Very Low SE §§ § EI S $§§\i§
\Hhxh h \ \\‘\,\§’
§‘\§§’§’ §’§§§§\§
14 V. ‘§\ \\~‘ §§\\\\ L451.
T 3‘
Factors'LF ESMFET PAH DY LF M MFPAH DY NL NM—w
1 4|— ] 1- !JV :_:I
Scales EP EN GP GN
Key: EP = Exam.-Positive M = Measuring (function)
EN = Exam.-Negative ET = Exam. Type
GP = Grade-Positive PA = PressureeAnxiety
GN = Grade-Negative H = Hate
LF = Learning Function DY = Dysfunction
ES = Exam. Satisfaction NL = Non—learning Function
MF = Motivating Function NM = Non-measuring Function

Figure 2. Attitude profile of the try-out sample (N = 575)
Students attitudes towards Examination and
Grading Scale Battery (SATEG SB).

118

A frequency count was made of respondents choosing
each Option, and converted into percentages. Table 9

shows these mean percentages under each factor sub-scale.

 

 

 

 

 

TABLE 9
MEAN PERCENT OF RESPONDENTS CHOOSING OPTION
IN THE FACTOR SUB-SCALES H
l
Likert-Font Valgg§______

Scale and Factor 1 2 5 4 5 .
E. . E

Learning Function 19 52 26 18 5

Exam. Satisfaction 45 27 14 7 2

Motivating Function 14 55 26 26 5
EN

Exam. Type 14 59 22 19 5

Pressure-Anxiety 6 20 26 29 18

Hate 24 54 21 12 4

Dysfunction 10 56 5O 18 5
GP

Learning Function 21 24 25 21 7

Measuring Function 15 52 27 21 5

Motivating Function 10 21 28 54 7
GN

Non-Learning Function 20 25 29 17 9

Pressure—Anxiety 5 19 22 54 19

Non-Measuring 7 28 24 25 16

Hate 29 27 22 11 6

Dysfunction 12 25 25 26 14

 

 

 

 

 

 

 

 

The picture shown may be easily comprehended if the 0p-
tions 4 and 5 are combined and summarily described as "high".
(Similarly 1 and 2 may be combined and described as low.)

On this basis the following statements may be made of this

sample:

119

1) 22 percent are high on the learning function factor
in the EP scale
2) 29 percent are high on the motivating function
factor
In contrast only 9 percent are highly satisfied with exami—
nations. The EN scale throws some light on this contrast.
Here 47 percent are high on the Pressure-Anxiety factor
and 25 percent on the Dysfunction factor.
A similar analysis may be made of the Grade Scales.
For the GP scale the percentages on the high group are:
learning function, 28; measuring function, 24; and motivating
function 41. Thus the group perceives grades more as a
motivating than as a measuring or learning device. The
figures for the pressure—anxiety and dysfunction factors are
55 and 40 respectively. This would mean that more than half
the sample perceive grades as generating pressure and anxiety,

and about a half also feel grades perform no useful function.

IMPLICATIONS AND CONCLUSION

The manner of describing attitude on a psychological
object as being "high" or "low" along specified attribute
"factors" and the dimensions they support has some diagnostic
value. At least it is a step beyond a global conception of
attitude. Moreover it makes it comparatively easy to "control"
attitude. Suppose for example that this scale is valid and

that with its aid the attitude of a group in the learning-

=‘:..J ; .xn—I‘ﬁ—‘ﬁimum
‘. g .
I
.

120

function.aspect of examinations is diagnosed to be "low";

an area is thus clearly specified for "treatment" should
one desire to influence attitude on this positive dimension.
In other words, control of attitude towards a psychological
object becomes feasible if the anchors of the attitude are
identified. It is reasonable to think that attitude change
may be effected through some manipulation of the attributes
of the psychological objects.

A cursory look at the pattern of the figures in Table 9
may lead one to suppose an inverse relationship between the
learning and motivating function factors on the one hand and
pressure-anxiety and dysfunction factors on the other. This

suggests that the attitudes may be "changed" to be more

"positive" if effort is concentrated on developing the learn-

ing and motivating function attributes of both examinations
and grading. A general hypothesis may therefore be set out
as follows: the more students perceive examinations and
grading as promoting learning the less they will feel the
pressure and anxiety which these twin aspect of the curricu-
lum also generate, and hence the more positive will their
attitudes be towards these objects, and consequently the

higher the amount of learning that will take place. This

general hypothesis may be broken up and tested, among others,

in a program of construct validation of this scale battery.
The results of this study provide evidence which tend
to agree with the theoretical model. Figure 5 reproduces

the model with specific reference to the present study.

 

 

121

 

   

 

 

(a) (b)

 

"a;

Figure 5. The attitude model with specific reference to
Examinations (a) and Grading (b). (The reader
is now familiar with the abbreviations used;
the words they stand for are displayed in the
Key to Figure 2; the general model is presented
in Figure 1.)

In the figure ABCD still represents the attitude pre-
dispositional base, which remains the same for all attitudes
of an individual. In fact both parts (a) and (b) would be
shown on the same diagram; they are separated here to aid
clear presentation. It should be noted that the growth
points are now defined with reference to the attitude ob-
jects (E: Examinations: G: Grading). Furthermore the posi-
tive dimemsions (EP and GP) are parallel; so also the nega-
tive dimensions (EN and GN). The reader is reminded of the
high and positive correlation between the scales in brackets,
and of their loadings on the various factors discussed
earlier. The last observation would lead one to suggest

that positive attitudes, irreSpective of the attitude objects

122

would correlate highly and positively with one another;
similarly negative attitudes would correlate highly and
positively.

The attribute vectors shown in Figure 5 represent the
factors revealed in the factor analysis. The figure shows
that the upward growth of attitude along each dimension-
positive or negative is supported by the number and strength
(reflected by length) of the attributes. The model and the
evidence provided by this study would lead one to doubt that
attitude is bipolar. A linear continuum model for attitude
may not be apprOpriate.

An instrument like this can serve two purposes. It may
be used for an attitude survey and in studies of relations
between attitude and other variables. Furthermore it may
be used to plan "treatment" measures to bring about attitude
change. The traditional attitude measures do not seem to
suggest this diagnostic and treatment use. In the writer's
mind if social scientists survey attitude and always report
it in the global form they are unwittingly perpetuating the
attitude; and this may not always be desirable. If on the
other hand their reports make evident the anchoring factors,
someone's attention will be easily arrested to examine the
basis of the attitude.

It must be added however that the model needs further
supporting evidence to be worth considering. It is there-

fore suggested that the battery be used as a research

125

instrument--to investigate how stable the factors are across
different student pOpulations. Other workers may of course

wish to test the model and the approach using different

attitude objects.

 

SUB-APPENDIX (a)

THE OPEN-ENDED QUESTIONNAIRE

Course NO. and Title:
Course Instructor:
Student's Name & NO. (Optional)

Date: W

STUDENTS' OPINIONS AND ATTITUDES ON THE
EXAMINATION-GRADING CONTROVERSY

 

 

 

 

 

Introduction and Instructions Q'

The debate—-"to examine or not to examine, to grade or
not to grade"-~is a very crucial one in college and univer-
sity education today. TO be democratic and also to help
create a healthy_gtm9§phe£§ for carrying out our educational
objectives it would be desirable for students to take part
not only on this debate but in the formulation of policies
on this issue. OA survey is therefore being conducted to tap
students' opinions and attitudes. Your response to the
following questions will be of great importance in future de-
cisions on examination and grading practices in this University.
Consider it therefore a grand Opportunity now Offered you to
influence policies in these areas. It is up to 193 in particu-
lar to utilize such a rare Opportunity to express your views
for your good and for theggood of future generations of stu-
dents.

 

TO underline the importance of this survey to you in
particular, you are to take this questionnaire home; respond
to it independently and candidly and then return it to your
instructor the following day.

Feel free to use the blank pages Of this questionnaire
to write as much as you like on any of the questions.

Thank you for your COOperation.

124

DO not write
on this margin

125

The Questions Do not write

1.

on this margin
How important are exami-
nations in the instruc-
tional process? Defend
your opinion.

How important is "grading"
(involving the use of A, B-—
or 0,1) in the instructional
process? Defend your
Opinion.

Some say examinations and
grading are a necessary evil
while others believe they are an
important aspect of the instruc-
tional process. How do you feel
about these aspects Of the cur-
riculum? Defend your answer.

DO not write
on this margin

126

4. What reactions have ygg_
had to the examinations
you have taken in your
college and university
experience?

5. In your college and uni-
versity eXperience, what
reactions have you had
over your grades in parti—
cular and over the grading
system in general?

6. DO you have suggestions for
change that should be made
in the examination practice
at the college level?
Defend your suggestions.

Do not write
on this margin

Do not write
on this margin

127

7.-DO you have suggestions DO not write

for change that should on
be made in the grading
practice at the college
level? Defend your sug-
gestions.

8. Which examination type do
you prefer more-—the essay
9£_the Objective? State
reasons for your preference.

9. Which of the following item
types do you most prefer--
True-False, Multiple Choice
or Completion Type? State
reasons for your preference.

this margin

 

.128

DO not write 10. Which of the following

on this margin item types do you least
prefer-—True-False,
Multiple Choice or Com-
pletion Type? State

reasons for your prefer-
ence.

DO not write
on this margin

11. Would you favor a more or
a less emphasis on exami-
nations at the university
level? .State your reasons
for your answer.

.12. Would you favor a more or
a less emphasis on grading
at the University level?

State your reasons for your
answer.

DO not write
on this margin

15.

14.

129

It has been suggested
that students should be on
involved directly and
actively in the decisions
determining their grades.
Would you support this
suggestion? State reasons
for your Opinion.

If you can, suggest and
defend concrete ways in
which students might be
directly and actively in-
volved in the determination
of their grades.

15. Would you, or would you not,

support a student motion urg-
ing the completegabolition of
examinations and grading at
the college level? State
reasons for the position you
take.

DO not write

this margin

 

‘I_.I

(i)

(ii)

SUB-APPENDIX (b)

SCHEME FOR THE CONTENT ANALYSIS

Coding and Categorization

 

Coding
Description of Item Symbol
Positive1 direction of attitude toward
examination + ex
Negative2 direction Of attitude against
examination - ex
Positive direction of attitude toward grading + gr
Negative direction Of attitude against grading - gr

Content Categories:
1. Statement of function (e.g., feedback;
stifles learning) A
2. Statement of preferences--either in
direct answer to “which . . . prefer?"

or implied in statement B
5. Statement expressing or implying emotion
(e.g., very important, less emphasis) C
4. Statement Offering suggestions directly
(e.g., term paper) D
Question number--Use Roman numerals I, II...XV

Respondent: assigned Arabic numerals to be
written after the course number, and
separated by a colon:-- ED200:4

General Procedural Steps in the Analysis

1. Read through the response to each question.

2. Re—read, and underline significant words, etc., which
may be put into one of the content categories, and
append the appropriate code symbol.

5. Judge direction of attitude as either positive or
negative and append the code (+ ex, for example) be-
side the content code Of every underlined word, etc.

4. Transfer the coding symbols to the right margin (use
the left margin for writing comments, if any).

5. On the outline summary blank provided prepare the
"Summary of Analysis" table (as shown below) and trans-
fer the results Of the analysis.

6. All work is to be done on pencil.

J'Examples of words, etc.: "feedback"; "very important"

zExamples: "stifles learning"; "less emphasis"

150

” g.”

 

.1...

151

(iii) Specific Hints on the Analysis of Each_Question

Item No.* Hints

1. Perhaps this Q. will prove the best stimulus eli-
citing responses illustrating "statement of
function"--e.g.,

1. motivates learning

2. assesses performance

5. reveals weaknesses in learning
4. guides learning.

2. Perhaps best stimulus eliciting 1) incentive to
study, work hard; 2) reward. Category A or C may
abound, but others not excluded. This comment
also applies for number 1 and other items.

5. On the surface this Q seems.a repetition of number
1 and 2 but a new stimulus is subtly introduced in
"necessary evil." If respondents agree with this
stimulus then the direction of their attitude tends
to be negative. Look out for attitudinal and emo-
tional overtones.

4. Some reactions will reflect positive attitudes,
others negative. Rate (judge) each key word, etc.,
.appropriately. Perhaps "statement on efficiency/
inefficiency" will be elicited--(Category C).

5. Same remarks as in number 4.

6. The attitude object is written examination.at The
following therefore reflect negative attitude (-)Ex
1. Oral exams
2. Term papers
5. Reports of projects, etc.

On the other hand the following are positive

1. More emphasis on essay exams

2. More emphasis on Objective exams

5. More quizzes, etc. (Open-book, take-home)

*Involving a series of test items--Objective or essay,
taken in class or at home, closed-look or Open-book.

7. The attitude Object is the grading system involving
at least three levels-~whether letters or numerals,
and "GPA." Therefore suggestion Of
1. Pass--Fail
2. Pass--No credit, etc.
show negative attitude. Positive attitude is
reflected by
1. a finer system
2. a broader system
5. a narrower, etc.

 

*
See Sub-appendix (a).

Item NO.

8.

10.

11.

12.
15.

.14.

15.

152

Hints

-Main response here is in category 8 expressed

"statement of preference"; direction is positive.

<Judge direction of stated reasons and their cate-

gories separately. Score negative the response
'none'.

Same remarks as for number 8.

Main response is in stated preference category (8)
and direction is negative. Judge category and
direction of reasons separately.

Response of "more emphasis" reflects "clear indica-
tion of satisfaction-—category C; direction is posi-
tive. On the contrary "less emphasis" is negative
and in category C--Judge reasons separately.

Same remarks as for number 11.

Mere Yes or NO response is not scored. Base direc-
tion and category on the reasons--some will be pro-
grading, others against, e.g., mutual discussion

of grades determination is +ve. Pass-Fail is -ve.

Any suggestion reflects positive attitude and is
scored under category D, e.g., discussion with stu—
dents of what goes into the grade.

"Complete abolition" has emotional-attitudinal over-
tone and so reSponse is to be scored in category C.
The nature Of reasons helps to determine direction
also. Support shows negative attitude.

”""””::7
. .
.

c

' V ' ' ‘ ‘1 n1.
' v
x

155

(iv) §umm§£y of Analysis Table

(An actual entry is provided from
Course NO. and ReSpondent‘s NO. a respondent--Student ED450:1)

 

 

 

+ .-
Examination Grading Examination Grading
A 1 1. Little more 1. Stifles
1. Feedback to than the learning
instructor instructor‘s
scape goat
B 2. Essay-type 2. Multiple-
5. Completion Guess
type
C 4. Essay is 1. Possibly 5. Full of 2. NO importance
personalized it is im- errors for under-
5. Essay allows portant 4. Inadequate graduates
one's ex- in the coverage 5. Brings
pression Graduate 5. Idiocy of pressure
6. Completion School choosing 4. What does
is direct 6. Less em— it show?
phasis 5. Fosters
7. Entirely cramming
unnecessary 6. Abolish
8. Abolish
D 2. Award 7. Pass—Fail
grades
on papers
and Class
Discussions
Co lumna 6 2 a 7
Score
Place-
ment3 x x
Comments4
if_necess§ry,

 

lThe cell entries are the significant words, etc., marked
each entry is to be numbered and the item reSponse number indi-
cated. Number down a column only.

2TO be determined by a count Of all entries.

3This is determined by the direction of the difference be-
tween the column scores under "examination" and "grading" where
the difference is zero place according to reSponse to item XV.

4Note entries thought to be useful for an attitude state—
ment.

SUB-APPENDIX (c)
INSTRUCTIONS FOR JUDGMENT AND Q-SORTING OF STATEMENTS

1.2Name:

 

2..Academic Qualifications:

 

5. Present Degree Program:
4. Date

 

 

Examinations refer to classroom achievement testing in- ‘
volving the administration of quizzes, mid-terms and finals. Fl

Grading refers to evaluation of students' academic per-
formance in examinations and/Or other aspects Of the curricu-
lum by using letters (e.g., A, B, . . . F) or numbers (e.g.,
4, 5. 4.0 . . . 0.0; etc.) to classify students according to
their relative achievement levels, provided that the system W
also involves the report of"grade—point-averages." r"

This exercise involves two stages:

(a) Judgment--
in which you say whether the statement reflects a
favorable positive attitude (3) or an unfavorable
negative attitude (N) towards the named object.

and either

(b) Rating--
in which you assign the statement a place on an
eleven-point scale in which 1 represents the lowest
degree and 11 the highest degree of the judged
direction.

or

(c) Sorting--
in which you (1) group the statements under two main
headings: EXAMINATIONS, GRADING, (2) form sub—
groups Of positive and negative statements under each
main group, (5) arrange the statements in each £227
group on an eleven-point scale in which 1 represents
the lowest degree and 11 the highest degree of the
judged direction. To do this, View all the state-
ments in the sub—group as a whole; then decide which
among them will have the lowest value and place it
(or them) above the value 1; further decide which
has the highest value and place it (or them) above
the value 11. Finally, arrange the other statements
and place them above any of the values 2 to 10 as you
judge them apprOpriate. Your final results will look
something like this:

154

155

so
— 54 .—
_. — 51 _ ..
_ - _ 47 _ _ _
- - - - 46 - - - - 5
15 - - - - 58 - - - - 54
10 - - - - 61 - - - - 56

 

1 2 5 4 5 6 7 8 9 1O 11
It is advisable to use rough paper at first.
Summarize your final results in the apprOpriate

spaces provided below.

Summary ongesults Of Judgment and Sorting

 

 

 

 

 

 

1 2 5 4 5 6 7 8 9.10 11 1 2 5 4 5 6 7 8 9 10 11
Positive Negative
EXAMINATIONS
1 2 5 4 5 6 7 8 9 10 11 1 2 5 4 5 6 7 8 9710 11
Positive Negative
GRADING

Pile up vertically above each scale value the number Of state-
ments assigned to the scale value.

SUB-APPENDIX (d)

"JUDGMENT" ON THE MEDIAN VALUES OF THE
ORIGINAL 65 ITEMS

 

 

 

Statement Judgment Value
1. Without examinations most students will
not study. -EP 5
2. Examinations are the best means for moti-
vating college students to learn. EP 8
5. The taking Of examinations brings about
a highly valued learning experience. EP 9
4. Of all teaching devices, examinations
provide the most useful feedback EP 6
5. Examinations enforce my desire to learn EP 6
6. Given the word "favorable" as referring
to your feeling about examinations, rate
it to indicate the degree of this feeling. EP 6
7. Grades stifle learning. GN 8
8. Grading encourages students to cheat in
examinations. GN 7
9. Grades sometimes make me feel helpless
and insecure. GN 5
10. There should be less emphasis on grading
at the university level. GN 4
11. Examinations force students to cram facts
without real understanding. EN 4
12. There is very little of instructional
value in examinations. EN 5
15. Examinations are the scapegoat for most
instructors. EN 5
14. Examinations are redundant in the educa-
tional process at the university level. EN 5
15. Grades are very effective for indicating
students' achievement of the course Ob-
jectives. GP 8
16. Grades surpass in usefulness other
measures Of academic progress. GP 8
17. Grades are a good estimate of the quality
Of learning that has taken place. GP 6
18. Grades provide a necessary incentive tO
work hard. GP 7
19. Grades differentiate the serious-minded
from the care-free student. GP 5
20. Given the word "meaningful" as indicating
your Opinion of grading, rate it accord-
ing to the strength of this Opinion. GP 6

156

157

 

Statement Judgment Value

21. Objective examinations are nothing more

than a guessing game. EN 5
22. Examinations provide the student a frus-

trating experience. EN 5
25. I resent the pressure which examinations

bring on me. EN 6
24. Examinations should be abolished at the

university level. EN 9
25. Most examinations pose stupid and ridicu—

lous questions. EN 8

26. Given the word "useless" as indicating

your opinion Of examinations rate it ac-

cording to the strength Of such Opinion. EN 7
27. The grading system should be an integral

part Of the curriculum in higher educa-

 

tion. GP 9
28. For the student, grades are a desirable

aid to self-evaluation. GP 7
29. Abolition Of grading would jeopardize

learning at the university level. GP 8
50. Grading is a necessity if standards have

to be maintained in university education. GP 6

51. Examinations provide the most satisfactory

means for assessing learning. EP 7
52. Examinations should be an indispensable

feature Of the university curriculum. EP 6
55. I am satisfied with the university exam-

inations system. EP 5
54. Without examinations, academic standards

would fall. KEP 6
55. Grades induce too much worry. GN 6
56. Grading should be abolished at the uni-

versity level. GN 10
57. Most students' interests are diverted

from learning to grades--as a goal. GN 8
58. Grades prove nothing. GN 5
59. Grades are no indication of what the stu-

dent has learned in a course. GN 6
40. Given the word "evil" as reflecting your

Opinion of grading, rate it to show the

strength Of this Opinion. GN 6
41. Examinations should be given more empha-

sis in the university curriculum. EP 6
42. I enjoy taking examinations. EP 5
45. The discipline of examinations is vital

to learning. EP 8
44. Examinations make study exciting. EP 7
45. Abolition of examinations will in the long

run lead to chaos in graduate education. EP 7

158

 

 

 

 

Statement Judgment Value

46. Given the word “acceptable" as reflect-

ing your feelings on examination, rate

it to indicate the intensity of your

feelings. EP 4
47. I would campaign vigorously against any

attempt to abolish grading at the uni-

versity level. GP 11
48. Without grading the motivational function

of examinations would be impaired. GP 5
49. In general I have no complaint against

my grades. GP 1
50. Guaranteeing graduate students an "A" or

a "B" in a course is insulting to them. GP 2
51. The finer the grading system the better

it reflects the students' competence level.GP 6
52. There is no conflict between working for

grades and gaining knowledge. GP 2
55. Given the word "relevant" as describing

your Opinion of grades, rate it to show

the strength of this Opinion. GP 5
54. In general, examinations appeal to rote

memory. EN 1
55. Very little learning, if any, is derived

from taking examinations. EN 8
56. Examinations generate too much anxiety. EN 7
57. Examinations are nothing more than trick-

ery. EN 10
58. Grades are Of no importance in the edu-

cational process at the University level. GN 7
59. The examination system is entirely lack-

ing in precision. EN 5
60. Given the phrase "a farce" as indicating

your Opinion Of examinations, rate it to

show the strength of this Opinion. EN 8
61. In my experience as a university student,

examinations are the instructors make-

shift, without any real value. GN 5
62. I have nothing for grades but pure hate. GN 11
65.-Whoever put more grades into the scale

should be hanged. GN 10
64. It is grossly unfair to award a graduate

student a "D. or an equivalent grade. GN 7
65. Given the word "inadeggate" as reflecting

your Opinion of grading, rate it to show

the strength of this Opinion. GN 6

SUB-APPENDIX (e)

COMPOSITION OF THE MAIN TRY-OUT SAMPLE

 

 

Undergraduate Graduate
Level Level
Course Course

College1 Department NO. Returns No. Returns Totals

 

Arts and Art 555 26 802 15 41
Letters English 402 55 811 15 48
Business Economics 524 25 811 51 56
Communica-
tion Arts Journalism 510 19 800 11 50
Education Education 525C 55 867 97 152
Natural Botany 501 55 945 6 61
Science Chemistry 551 22 811 26 48
Mathematics 521 54 847 17 51
Physics 595 52 857 59 71
Social
Science Psychology 510 22 800 15 57
Totals 505 280 585

 

1Other Colleges of the University not directly sampled include
(1) Agriculture, (2) Engineering, (5) Home Economics, (4) Human
Medicine and Veterinary Medicine. The loss is apparent, not
real. Some students in the department of Economics major in
Agriculture, some of those in Mathematics and Physics major in
Engineering. In Education, Home Economics majors are quite
common, and similarly majors in Medicine are to be found in

the Botany, Chemistry and Physics departments. In short, the
sample is fairly representative of the population of students
in Michigan State University.

159

 

SUB-APPENDIX (f)

THE FACTOR PATTERNS AND LOADINGS

Preliminary_Notes

1. Arbitrary conditions for a factor to be considered
"significant": either

(a) five percent (or more) of the common variance is
accounted for by the factor, or
(b) at least three variables are “significantly" loaded
on the factor.
(Here a significant loading 2%.40/)
Non-significant loadings are not recorded.
2. Excepting those in parentheses ( ) recorded loadings are
the highest in the row in relation to other loadings of
the variable on subsequent factors.

5. Note that the decimal point precedes every "loading"
entry.

4. Loadings which are not the highest in the row are enclosed
in brackets.

140

 

 

 

 

 

 

Variable IQEQItimaX 1 __Quartimax 2
NO. Description 1 2 5 4 5 1 2 5 4
1 EP -925 -928
2 EN 701 474 (518) 752
5 GP -945 -944
4 GN 712 526 -408 (558) 775
5 EP2 -666 -618
6 EP3 -565 -562
7 EP4 -547 -609
8 EP5 —599 -568
9 GNe 591 452
10 6N7 -422
11 .EPB 555
12 EN9 475 467
15 EN10 529 570
14 GP1; -585 -554
15 GP12 -570 -499
16 GPla -642 -706
17 GP14 -655 -614
16 EN15 617 410
19 EN16 -486 549
20 EN17 -622 512
21 ENla 684 615
22 EN19 616 555
25 6920 —701 -721
24 GP21 -745 -657
25 Gng -728 -705
26 GP23 -761 -752
27 61:24 -669 -686
26 EP35 -692 -794
29 EPge -652 -764
50 GN27 (408) -664 467
51 GNga 757 617
52 GNas (451) -456 415
55 GNao 528
54 GNSI 486 410 570
55 EPSZ -541
56 EP33 -701 -709
57 EP34 -471 -546
58 EPas -628 -588
59 epaﬁ -657 -509
40 GP37. -552 -580
41 GPsa -501 -484
42 GP39 -722 -672
45 EN40 -727 554 -561
44 ENAI 564 590
45 GN42 595 509
46 EN43 480
47 EN“ 556 421 654
48 EN45 554 652
49 GN43 (458) 489 685
50 GN47, 558 572
51 GN4é’ 465 400
52 GN.,9 ' 466
Percent of
Variance 55.05 5.84 6.15 4.45 5.19 27.88 15.17 5.7 5.56

4 factors significant 2 factors significant

 

141

H+A
Hommqmmpmmp

HPPHHHPP
mmqmmpmm

NNNNNNNNN
(DQCDU'IIPCNNI-‘O

(NN
0(0

51

52

EP*
EN*
GP*
GN*
EP
EP
EP
EP
GN
GN
EP
EN
EN
GP
GP
GP
GP
EN
EN
EN
EN
EN
GP
GP
GP
GP
EP
EP
EP
GN
GN
GN
GN
GN
EP
EP
EP
EP
GP
GP
GP
GP
EN
EN
GN
EN
EN
EN
GN
GN
GN
GN

Percent

variance 17.08 6.5

*Scale

-748
-706

-782
-454

—556

474

-497
-584
-775
-758
—574
-695
-764

525

-490
-648

-615
-547
-544

-447

Varimax 1
2 5 4 5
-519
-456
-452
544

-607

-519

-580

-552

—707
—716

-597

-754

-756

(487)
(427)

657
727
612
470

6.59 5.58 4.22

8 factors are significant

142

-661

-664
-424

-680

-512

-429

415

-644
-429

-490
-484

8.57

-459
551

542
-556

-465

-445

455
465
498
625

-454

456

-544
5.86 7.95

EP*
-EN*
GP*
GN*
EP
EP
EP
EP
GN
10 GN
11 EP
12 EN
15 EN
14 GP
15 GP
16 GP
17 GP
18 EN
19 EN
20 EN
21 EN
22 EN
25 GP
24 GP
25 GP
26 GP
27 EP
28 EP
29 EP
50 GN
51 GN
52 GN
55 GN
54 GN
55 EP
56 EP
57 EP
58 EP
59 GP
40 GP
41 GP
42 GP
45 EN
44 EN
45 EN
46 EN
47 EN
48 EN
49 GN
50 GN
51 GN
52 GN
Percent

(omxlODUIH-‘(NNP

variance 18.49

*Scale

—745
-858

—610

-660
-529
-727
-760
-525
—692
-757

577

-654
—485
-592
-544
-611
-512
—545

Varimax 2
2 5 ' 4

445 698

586 426

448
564
654

511

-454
495
454

—715
425
421 “ 428

-721
587
515

474 465
548 418
627
755
569

7.72 5.55

7 factors are significant

145

9.55

-551

-457
-519
-554
—428

-547

-61O

-601
-555

488

488

5.97 5.27

-410
499
586

5.84

Variable

 

 

6 factors are "significant"

144

Descrip- Quartimax 5
NO. tion 1 2 5 4 6 7 8 9 10
1 EP 912
2 EN 615 498 -409 -400
5 GP -945
4 GN” 644 589
5 EP2 -644
6 EP3 -542
7 EP4 -569 404
8 EP5 -576
9 GNe 542
10 GN7 598
11 EPa 667
12 ENS -705
15 ENlo (420) “555
14 GPll -556 458
15 GP12 -510 461
16 GPla -685
‘17 GP14 -621 412
18 EN15 '448
19 ENle -588
20 EN17 -651
21 ENle 669
22 Eng 575
25 GPgo -724
24 GP21 '698
25 GPag -748
26 GPga -792
27 EP24 -666
28 EP25 -744
29 EP23 -721
50 GN27 -690
51 GNge 707
52 GN29 444
55 GNso (460) -480
.54 GN31 (465) 470
55 EP32 -424 -511
56 EP33 -705
57 EP34 -484 445
58 EPas -615
59 GPas -588
40 GP37 -585
41 GPae (401)
42 GP39 '684
45 EN4Q '758
44 EN41 465
45 GN42 528 (401)
46 EN43 491
47 EN44’ (485) 592
48 EN45 (450) 569
49 GN45 649
50 GN47 677
51 GN48 552
52 GN49 (401) 406
Percent
variance 50.55 7.81 5.24 2.92 2.45 2.77 2.48 5.4 2.58 2.40

Variable

 

 

9 factors are "significant"

145

Descrip- Varimax 5
NO. tion 1 2 5 4 5 6 7 8 9 10
1 EP* -726
2 EN* 542 -457
5 GP* —707 615
4 GN* (-441) -570
5 EP (~455) -577
6 EP -400
7 EP -596
8 EP -555
9 GN
10 GN -455
11 EP 578
12 EN -655
15 EN —680
14 GP 659
15 GP 626
16 GP -526 (-598)
17 GP 581
18 EN 702
19 EN -579
20 EN -711
21 EN 415
22 EN
25 GP -521 429
24 GP -505 417
25 GP -757
26 GP -772
27 EP -565
28 EP -719
29 EP -791
50 GN -745
51 GN 515
52 GN
55 GN -446
54 GN -455
55 EP 424
56 EP -662
257 EP 651
58 EP -654
59 GP -591
40 GP -555
41 GP 512
42 GP (—450) 549
45 EN -780
44 EN 664
45 GN
46 EN 520
47 EN -459 454
48 EN -455
49 GN -645
50 GN -757
51 GN -596 555
52 GN
Percent
variance 16.08 5.45 7.06 7.69 6.54 5.42 5.95 4.91 4.52 2.58

146

.uxwu came ocp mo coﬂuoom usm>mamu mnu CH ocsom on HHN3 muouomm ommnu

mcHEmc um mumEmuud .OHQmﬂHOH ocm magnum manﬂmm mum wmnu OBOnm OOOOOH>O one
.mcowuﬂmom nouw3m mawumﬁ OHOuowm may no 080m .HmuaucmoH mum mnouomm umuﬂm one
.ucmﬁmmumm uommnmm cﬂ uoc nmsozu HmHHEwm mum mOHmEMm mmouom mcumpumm HODOMM one

cowmsaocou

.m wamﬁmm CH m HOuomm £ua3 Hmowucmoﬂ mﬂ N can a mmHmEMm suon cw N Houumm

.Am ocm m mHoQEscv muouomm OBu mm mm: a mamﬁmw umn3 Aw Honescv Houomm 0:0
Ouca masonm m mamﬁmm usm .mHOuomm “waaﬁﬂm wuﬂsv uno UGHHQ N can H mmamﬁmm

.m mamﬁmm OH snow Ops“ ooumnmmom ma umnu Houomm
was» ma pH .AN OHQEMmV d “Ouomm mm OEmw may manmnonm OH Ad OHQEMOV m Houomm

.mnouomm 03p Oucﬂ cmxonn coon mm: mm mmHmEmm Ham CH .N

.OEMm onu mum m mamﬁmm mo 0
HOpOMm ocm N mamﬁmm mo m Houomm can a mamemm mo 5 Houomm Hmmmmm oH503 pH .d

ucmEEOO

 

20

m6

2m

mm

meum

.omcwaumocz mum mOHmEmm mounu map mmouum HOuumm m CH COEEOO meQMHHm> an

.m xHOOOQQMIQDm ca omcﬂmmo mm mmcﬁom0H uGOUHMHcmHm m>mn omonoowu mmHQmHHm> one AN

.m xHOSOQQMInsm CH cm>ﬂm wommuam mm mnmnﬁsc OHQMHHM> on» mum mmﬁuucm HHOU one Ad

mmuoz humcﬂﬁﬂaoum

mmﬂdum Mbom WEB 92¢ mmqm2¢m mmmma
mmB mmomvd WMOBUﬁm N¢2Hmm> HEB m0 ZOmHm¢NSOU d

13 552255

.005000 snow 0:u 0:0 mxmﬁu0umm cuHB m0anmﬂnm>
*

 

147

 

Nm.5 5m.5 mm.m N5.m 5m.m mm.> 00.5 m5.m mo.m5 m 0HmEmm
IIII 5m.m wN.m hm.m mm.m mm.m N>.> m5.mﬁ N 0HmEmm .Hm>
mm.s mm.m sm.m mm.5 mm.m mm.m m.m mo.s5 5 mamsmm .906
am
om.m5
05 0:0: Hm 0:0: .5m.*5 mm on 0:0: am m 0amamm
5m.om
. .m5.m5
6:66 6:06 05.6 660: mm 5m.mm.*5 on .5m.*5 5m m mamemm 20
m5
.mn.Nm Nm
.5m.om Nm 5m.om
.m.*5 mm.05 maoc 6:06 660: om .m5.*5 5m 5 mamsmm
N5.55 O5.mm
.5N.mN.>5 .mN.mN.5N
0:0: 0:0: 0:0: 0:0: 0:0: .md.5d.*m 0:0: 0:0: .MN.wd.*m m 0HmEmm
N5
.55.05.mm
pa . .mN.mN.5N
0:0: 0:0: 0:0: 0:0: .md.5ﬁ 0:0: 0:0: 0:0: .mN.md.*m N 0amﬁmm m0
N5
.55.05.mm
N5.MN. 55 .mN.mN.5N
0:0: .wd.*m 0:0: 0:0: .md.*m 0:0: 0:0: 0:0p .MN.md.*m a 0HmEmm
m5.om 55
m5.m5 0:06 55.55 0606 55.55 660: .m5.*m .m5.*m 5m m mamsmm
55.55.55
.NN.5N.m5 m5
0:0: 0:0: 0:0: 0:0: N5 .ma.Na.*N m5.ON .h5.*N 0:0: N 0HmEmm zm
m5.>5.m5
.55.NN.m5 m5.0N
0:0: 0:0: 0:0: .MH.NH.*N 0:0: 0:0: .mﬁ.*N 0:0: «N a 0HQE0m
mm
.mm.mN.mN
0:0: m.>.m 0:0: >n.mm.aﬁ 0:0: 0:0: 0:0: 0:0: .nN.w.*H m 0amﬁmm
mm
m.> .um.mm.mN
0:0: 0:0: m m.m.*ﬁ 0:0: 0:0: 55 0:0: .mN.>N.*H N 0HmEmm mm
mm.mm
55 .mm.mm.mm
0:0: 0:0: mm 0:0: 0:0: .m.>.*d 0:0: 0:0: .hN.w.m.*d a 0amﬁmm
m m w m m d m N H 0HQEmm 0H00m

muouomm

148

.:05505>0o oumocmumamv*

 

 

 

 

 

000. 555. 500. 550. 005. 005. 050. .500. 050. 500. 000. 555. 000.5 05
000. 000. 000. 000. 005. 000. 500. 000. 050. 050. 505. 500.5 55
500. 050. 500. 000. 050. 000. 550. 005. 005. 050. 555.5 05
500. 005. 055. 500. 055. 505. 500. 500. 000. 000.5 55
000. 500. 000. 005. 500. 500. 005. 000. 055.5 05
050. 050. 000. 500. 000. 050. 000. 005.5 00
055. 050. 500. 005. 050. 055. 500.5 50
505. 500. 000. 005. 500. 000.5 00
000. 000. 005. 550. 050.5 05
000. 050. 000. 055.5 05
500. 000. 000.5 05
005. 550.5 05
05 55 05 55 05 00 50 00 05 05 05 05 20 5
0509552 0590550> 0500m mwm H0QEDZ
.550 050 A * m5nm5um>
05600 20 was
050. 005. 050. 000. 005. 005. 000. 500. 500. 000. 550. 000. 505.5 00
000. 050. 500. 500. 000. 000. 000. 000. 000. 000. 500. 500.5 50
050. 550. 000. 005. 505. 000. 000. 550. 005. 005. 005.5 00
000. 000. 050. 005. 505. 000. 050. 000. 005. 050.0 00
000. 005. 005. 000. 000. 000. 505. -055. 005.5 00
500. 000. 000. 005. 055. 005. 005. 005.5 00
005. 550. 000. 055. 005. 000. 550.5 50
500. 050. 000. 055. 505. 550.5 55
005. 000. 055. 000. 000.5 0
000. 005. 500. 000.5 5
055. 000. 555.5 0
000. 500.5 0
00 50 00 00 00 00 50 55 0 5 0 0 mm 5
WHQQESZ 0HQMﬂHM> QHMUW Amwm HTQESZ
HAM HﬂH * 0HQMHHM>

0H00m mm 0:8

 

MCOHHMHTHHOU EwuﬂleHGH USN EGHHITHMOm Amy

2: 05320000150

mZOHB¢AmmmOU SmBHlmmBZH 92¢ SHEHIHHdUm

.:05505>0o oumocmumﬁmv*

 

 

149

 

 

 

005. 050. 050. 500. 000. 050. 550. 000. 000. 005. 550. 500. 000.5 00

050. 550. 005. 055. 000. 050. 005. 505. 500. 550. 050. 000.5 50

005. 550. 000. 005. 050. 050. 055. 505. 005. 505. 050.5 00

055. 505. 000. 000. 000. 500. 555. 050. 000. 005.5 05

505. 500. 000. 550. 050. 055. 000. 000. 005.5 05

505. 000. 555. 050. 050. 000. 050. 005.5 50

055. 055. 000. 505. 000. 000. 505.5 00

005. 005. 050. 050. 000. 000.5 00

005. 550. 505. 005. 000.5 50

050. 000. 000. 055.5 00

500. 055. 050.5 05

000. 005.5 0
00 50 00 05 05 50 00 00 50 00 05 0 20 5
5 0900a m.m 509552
A V* 0H905Hm>
05600 20 005

050. 005. 005. 005. 505. 000. 0504, 000. 005. 000. 505. 005. 000.5 05

500. 500. 050. 500. 000. 000. 550. 000. 000. 000. 050. 000.5 55

000. 005. 005. 005. 005. 000. 550. 050. 000. 000. 500.5 05

005. 505. 005. 005. 050. 500. 050. 000. 000. 055.5 00

050. 050. 050. 500. 050. 550. 000. 005. 005.5 00

505. 005. 505. 005. 500. 050. 555. 000.5 00

000. 005. 055. 055. 005. 005. 005.5, 50

505. 005. 550. 555. 505. 555.5 00

505. 005. 505. 050. 000.5 55

500. 500. 000. 055.5 05

555. 000. 500.5 05

000. 000.5 55

05 55 05 05 00 00 50 00 55 05 05 55 00 5

0509852 0590550> m 0Hmom mwm 509552
.550 050 A * m5nm5um>

 

mHMUm m0 058

SUMMARY OF MEAN SCORES AND STANDARD DEVIATIONS

SUB-APPENDIX (i)

 

 

Variable Mean S.D. Variable Mean S.D.
1 29.5141 8.5172 27 2.5864 1.0477
2 55.2496 8.5111 28 2.5812 1.1528
5 55.4295 9.0265 29 2.6500 1.1886
4 54.0995 8.5214 50 5.4154 1.1780
5 2.6405 1.0815 51 2.6702 1.2298
6 2.8098 1.1149 52 5.8554 1.0555
7 2.5525 1.2298 55 5.1579 1.1949
8 2.7469 1.0620 54 2.4295 1.1951
9 2.7245 1.1290 55 1.7260 0.8165

10 5.0595 1.2491 56 2.6667 1.1095
11 2.2129 1.0147 57 1.8866 1.0040
12 2.7016 1.0748 58 2.5458 1.1219
15 2.7086 1.0554 59 2.1798 1.1751
14 2.6405 1.0551 40 5.0855 1.2271
15 2.5665 1.0511 41 2.7855 1.2284
16 5.0768 1.1152 42 2.8604 1.0625
17 2.7784 1.0051 45 5.5794 1.1759
18 2.6126 1.1165 44 2.1850 1.0050
19 5.2216 1.0756 45 2.4154 1.1296
20 5.1815 1.2550 46 2.9564 1.1440
21 2.5759 1.2578 47 2.5462 1.2079
22 2.4904 1.1265 48 2.5166 1.0594
25 2.7958 1.1171 49 2.0858 1.1055
24 5.0175 1.1020 50 2.1518 1.5484
25 2.8569 1.2008 51 2.5812 1.5802
26 5.0227 1.1924 52 5.5864 1.5296

 

150

151

OOmM.OI

 

500N.OI mmmo.o ommm.o hwﬁd.01 wmmo.o mwwa.01 mhbm.OI mmwo.o mmmm.o mm
0050.0- 0000.0- 0005 0 0000.0 5500 0- 0000.0- 5055.0- 5000.0- 0055.0 0050.0 50
0050.0- 5000.0- 0000.0- 0500.0 0500.0- 5555.0- 0555.0- 0055.0- 0500.0 0055.0 00
5050.0- 5000.0 5005.0- 0005.0- 0050.0 0500.0 5550.0 0000.0 5005.0- 5505.0- 00
0000.0- 0000.0 0055.0- 0000.0- 0005.0 5050.0 0000.0 0505.0 0055.0- 5055.0- 00
0505.0- 0505.0 5505.0- 0000.0- 5050.0 0500.0- 5050.0 0500.0 0005.0- 5000.0- 50
0000.0 5050.0 0005.0- 0050.0 0000.0- 5500.0 0005.0 0555.0 0000.0- 0055.0- 00
5505.0 0500.0- 0005.0- 0000.0- 0550.0- 0500.0 0505.0 5000.0 5500.0- 0505.0- 00
0550.0- 5005.0 5055.0- 0050.0- 5005.0 0555.0 0555.0 0005.0 0055.0- 0000.0- 50
5505.0 0005.0 0005.0- 0000.0 5550.0 5005.0 0005.0 5005.0 5050.0- 0500.0- 00
0550.0- 0500.0- 0050.0 0000.0 0005.0- 0050.0- 0505.0- 5000.0- 0000.0 0000.0 00
5000.0- 0050.0- 0000.0 0500.0 5555.0- 0500.0- 0505.0- 5050.0- 0500.0 0055.0 50
0505.0 0555.0- 5000.0 0050.0 0000.0- 5055.0- 0500.0- 0555.0- 0005.0 0055.0 00
0505.0 0000.0- 5000.0 0000.0 0000.0- 5005.0- 5550.0- 0050.0- 5550.0 0500.0 05
0000.0- 5005.0- 0000.0- 5550.0- 5000.0- 5500.0- 0005.0- 0005.0- 5005.0 5055.0 05
5005.0 5005.0 0000.0- 0005.0- 0000.0 0000.0 0500.0 5005.0 0000.0- 0000.0- 55
5555.0 0000.0- 5000.0- 0000.0- 0050.0- 0055.0 0500.0 0000.0- 5550.0- 0000.0- 05
0500.0- 0000.0 0000.0- 0050.0- 0500.0 0000.0 0000.0 0055.0 0505.0- 5000.0- 05
0050.0 0500.0 0055.0- 0005.0- 5005.0 5500.0 0000.0 0055.0 5500.0 0000.0- 55
0000.0- 0000.0- 0050.0 0500.0- 5500.0- 5055.0- 5000.0- 0005.0- 0005.0 5000.0 05
5000.0 0000.0- 0500.0 0000.0 5555.0- 0050.0 5500.0- 0000.0 0550.0 0000.0 05
0000.0- 0005.0- 0000.0- 0005.- 5050.0 0505.0 5000.0 5000.0 5500.0- 0005.0- 55
0005.0 0005.0- 0000.0 0500.0 0005.0 0000.0- 0005.0- 0000.0- 0000.0- 0000.0- 05
5055.0- 5500.0- 5050.0 0500.0 0005.0 0055.0- 0500.0- 0050.0- 0500.0 0500.0 0
0050.0- 5505.0 5000.0- 0050.0 0005.0 0055.0 0000.0 0550.0 0500.0- 5000.0- 0
5005.0 0500.0- 0000.0- 5500.0- 5000.0 5500.0 5555.0 0005.0 5005.0- 0550.0- 5
0000.0- 0055.0 0500.0- 0005.0- 5505.0 5500.0- 0005.0 0005.0 0500.0- 0005.0- 0
5050.0- 0005.0 0550.0- 0500.0- 5500.0 0000.0- 0055.0 5000.0 0500.0- 0005.0- 0
0000.0- 0000.0- 0005.0 0550.0 0000.0- 0050.0- 0000.0- 5555.0- 5005.0 0000.0 5
5500.0 0005.0 0005.0- 5000.0- 0000.0 5005.0 0050.0 0555.0 0005.0- 5505.0- 0
0000.0- 5000.0- 0000.0 0000.0 0505.0- 0050.0- 5505.0- 0505.0- 0050.0 0050.0 0
0000.0- 5005.0 0000.0- 0505.0- 5050.0 0500.0 0000.0 5505.0 0005.0- 0005.0- 5

05 0 0 5 0 0 5 0 0 5

¢B¢Q BDOIMmB ZHdZIImHmNA¢Z¢ ZOHBdBOd Ndsz¢>

1.3 055520000050

152

 

0000.0!
0000.0!
0000.0
0000.0!
0000.0
0000.0!
0000.0!
0000.0!
0000.0
0000.0!
0000.0
0000.0!
0000.0
0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0!
0000.0!

0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0!
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!

0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!
0000.0
0000.0
0000.0

0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0

0000.0!
0000.0!
0000.0
0000.0
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!

0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!

0000.0!
0000.0
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!

0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!

0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!
0000.0!
0000.0
0000.0!
0000.0!
0000.0!
0000.0
0000.0
0000.0

0000.0
0000.0
0000.0!
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0!
0000.0
0000.0

00
50
00
05
05
55
05
05
55
05
05
55
05
00
00
50
00
00
50
00

APPENDIX B

SPECIFIC INSTRUCTIONS AS ORIGINALLY DESIGNED
FOR THE TREATMENT CONDITIONS

(a) SPECIFIC INSTRUCTIONS FOR CLASS I (T1)

(These instructions are to be given in class, and woven into
the instructors design of the "class activity." ghey are to
beAgiven orally.)

1. You will be expected to repeat each of the two within-term
examinations at home. You may take up to four days before
submitting this second attempt for scoring.

2. You will be free to make use of all resources, excluding
instructors and fellow students. Your aim is to come out
with all answers correct, working independently.

5. Part of your "class activity" score will be based on your
performance in this examination repeat, and account will
be taken of the gains you make in the number of correct
responses.

4. (i) This part of the class activity is to count 10% of the
instructor's grade, in other words it is worth 10
"points" out of a total of 100 "points" which make up
the instructor's grade.

(ii) Award 2 points to all subjects-—for having carried out
the exercise.

(iii) Award the remaining 8 points according to the table
below.

155

154

 

 

 

Initial Score Maximum Score
On First On Second Maximum Points to be
Performance gfgerformance Gain Awarded
80 80 o 81
70+ 80 10 1 pt. for 1 gain
60+ 80 20 1 pt. for 2 gains
50+ 80 50 1 pt. for 5 gains
2
40+ 80 40
50+ 80 50 21 pt. for 4 gains
20+ 80 60
10+ 80 70

 

lNote that the t0p scoring student apparently makes no gains
but is awarded the total maximum points for "gains." He de-
serves it for maintaining his position in both performances.
However, if he slips, his score on the second performance
becomes the base and he is awarded points in the last cate—
gory. For example, suppose second score is 68; the differ-
ence is 12 and his "points" 5. The instructor will be ex-
pected to comment on the practicality of this sCheme after it
had been used.

2Note that the rate is Changed-—to the favor of low scoring
students on the first performance.

.155

(b) SPECIFIC INSTRUCTIONS FOR CLASS 2 (T2)

(These instructions are to be given in class and woven into
the instructor's design of the "class activity." They are
to be given orally.)

1.

7.

You will be expected to repeat each of the two within-term
examinations at home. You will be free to make use of all
resources excluding instructors and fellow students. Your
aim is to come out with all answers correct, working inde—
pendently.

You will also be expected to score and grade your two per-
formances. Score, using your best judgments on what you
feel are the correct answers. Evaluate your scores by
assigning grades to yourself (0......4.5), using some
criteria you feel to be objective.

 

You may take up to four days before submitting your second
performance for machine scoring.

Later when you receive the feedback, check your scoring
and self evaluation and discuss the discrepancies with
your instructor, until you are satisfied.

Finally prepare your Progress Chart and return it to your
instructor for comments.

Part of your class activity score will be based on your
performance in this exercise. Account will be taken both
of the gains you make in_the number of correct responses
and in particular of the Size of your mean discrepancies
between your scorings and self evaluations and those of
the instructor.

(i) This part of the-class activity is to count 10% of
the instructor's grade, in other words it is worth 10
points out of a total of 100 "points" which make up
the instructors' grade.

(ii) Award 2 "points" to all subjects—-for having carried
out the exercise.

(iii) Award the remaining 8 points according to the mean

discrepancy score as illustrated in the table on
the following page:

156

TABLE OF POINTS TO BE AWARDED

 

 

Mean Discrepancy_$core Points to be Awarded

0 (Zero) 8
1 - 2 7
5 - 4 6
5 - 6 5
7 - 8 4
9 — 10 5

11 - 12 2

15 - 16 1

Above 16* 0 (Zero)

*16 (i.e., 20% of 80--the total maximum score) is the maxi-
mum discrepancy score that is to be rewarded.

The instructor will be expected to comment on the practical-
ity of this scheme after it has been used.

157

8. The following is the Progress Chart to be introduced and
explained to the student after the meeting to discuss
discrepancies. The student will use ng_page of a Graph
paper to prepare his Chart as illustrated.

PROGRESS CHART

Aim: To Remove Discrepancies Between Evaluations

   
       

   

 

 

 

 

 

 

 

 

 

 

 

 

04.5
40 P b d
56 a \‘ \‘4o0
b §§ 0 5.5
52_ \V d .\1 C \050
280 Q 1 §
§§ §§ 0 2.5
24- 0 5
a \ \020
§ § '
20" 0 0
\\ c §§ E: - 1.5
16 5 ‘V§ Nﬁ ‘~
\§§ \—1.0
12- \%§ §
§§ §§ - 0.5
8- §§ \ \
0% 5-0-0
4- § §
5 5
Test 1 Test 2
Key:
a = Self evaluation--in-class performance
b = Self evaluation-—repeat
c = Instructor's evaluation in—class performance
d = Instructor's evaluation repeat
0 4 8 12 16 20 24 28 52 56 40 - Raw Score
Scale : 3 %* i %* %—— % %* i 45 %0
0 .5 1 1.5 2 2.5 5 .5 4 4.5 - Grade
DETERMINATION OF MEAN DISCREPANCY SCORE
Item Test 1 Test 2 Totals Mean TN=4_pairs)
(a) minus (c) 4 4 8 i$%-= +5
(b) minus (d) 4 o I%. = 3*
*absolute value

 

 

 

 

 

 

 

158

(c) SUPPLEMENTARY INSTRUCTIONS TO CLASS 2 (T2)

In administering your "treatment" the steps listed below

should be followed closely:

1)

2)

5)

4)

5)

6)

Ask your students to

a) write their names on their test booklets--to help them
recover their copies

b) mark their in-Class performance on both the test booklet
and the answer sheets provided; the answer sheets will
be handed in but they will keep (or pick up later) their
test booklets to score and grade the markings at home as
described below.

Give to every student a spare answer sheet and a pencil for
the repeat performance described below.

Emphasize that every student is to rework the test making
use of all possible resources excluding fellow students
and instructors. To prevent any embarrassment of wide
discrepancies this exercise must be done first and with
care.

When and only when the student has established enough con—
fidence in his/her answers on the second performance
(without any consideration of the first), then and only
then should he/she proceed to score and grade this second
repeat performance. Emphasize that guessing in any form
will result in wide "discrepancies".

With the scoring and grading of his/her repeat performance
as the "Key" the student then turns over to his marked test
booklet to score and grade that performance also.

The student retains in his/her records his/her estimated
score and grade. Then on a piece of paper, with his/her
name on the paper, the following information is to be pro-
vided--ready to be handed in together with the repeat
performance. Thus: ‘

Name of Student
Test Mid-term Test 2
In-class Repeat

 

Estimated score
Estimated grade

This information will be used to check the accuracy of the
graph.

7)

8)

9)

10)

159

In the following discussion class period the instructor
collects the student's self-evaluations, and the repeat
performance. Both must be collected before test results
are to be made known in the times prescribed by the
Course Coordinator.

When all the machine scores are returned to the student,
the student prepares the graph (two COpies of each) and
returns them to the instructor.

The instructor then adds appropriate comments--the same
on both graphs, one of which he/she keeps and the other
returned to the student.

The instructor emphasizes that the graph is a Progress
Chart--to give the student a visual image of his/her
genuine progress. The graph also discourages guessing
as it has been shown that this is the chief factor in
wide "discrepancies".

 

160

 

 

 

00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0500x5¢r055mm05m
05 05 55 00 00 50 00 00 00 05 50 50 5 0 0 50550055005 1
20
50 50 50 00 50 00 50 00 00 55 05 00 0 0 5 0505xc0r05900050
0 0 5 00 00 00 00 00 50 00 00 50 0 0 5 00550005005
50
0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 50050550 05050>0502
0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 5005055m 05055009
00
0.5 0.5 0.0 05 05 05 50 05 00 00 00 50 00 50 50 :O55ocsm 0:550>55oz,
0 0 0 05 55 55 50 50 00 00 00 50 00 50 00 00550000 00505005
mm
.00 -Ma 55 .|ua -MH,.50 my «a 59 me «a .59 me me as

 

 

 

 

 

 

 

 

 

 

 

 

mmﬂﬁUmlmDm MOBU¢W HEB ZO

ZOHBmO UZHmOOEU mBZMQZOmmmM m0 Bzmummm 20m:

0 NHQZHQQﬁ

MICHIGAN STATE UNIVERSITY IBRARIES

L

3 1293 03071 2297