'____<

 

A STUDY TO DETERMINE THE
EFFECTIVENESS OF A TECHNIQUE
EMPLOYING AN AMBIGUOUS STIMULUS
FOR ASSESSING A CHILD'S LEVEL OF
SKILL AND CONCEPT DEVELOPMENT
IN THE AREAS OF ADDITION-

AND SU‘BTRACTION

Dissertation for the Degree of PII. D.
MICHIGAN STATE UNIVERSITY
JACQUELINE RESII LONG.

1 9 7 5

 

This is tb-.- ‘

thesis entitl'

 

  
  

A Study to Determine the Effectiveness

of 3 Technique Employing an Ambiguous Stimulus
for Assessing A Child's Level of Skill and
Concept Development in the Areas of Addition

and sub tratiopresented by

Jacqueline Resh Long

has been accepted towards fulﬁllment
of the requirements for

Ph.D.

/. / / ¥
. ‘ 1 /
' /’_’ V 4

1 A 1 LI

Majox professor

0-7639

(1951.” in Elementary Education

ABSTRACT
A STUDY TO DETERMINE THE EFFECTIVENESS OF A TECHNIQUE
EMPLOYING AN AMBIGUOUS STIMULUS FOR ASSESSING A

CHILD'S LEVEL OF SKILL AND CONCEPT DEVELOPMENT
IN THE AREAS OF ADDITION AND SUBTRACTION

By

Jacqueline Resh Long

The contributions of Skinner, Bruner, and Piaget have influenced
new goals in education and new approaches to instruction. These new
goals and approaches to instruction have created problems and needs
for teachers.

A technique of evaluation was developed in pilot studies to help
resolve the following problems and needs experienced by teachers in
evaluating student learning:

l. Validate a method of measuring student achievement at the
symbolic level of concept representation which would then

open the way for researching this technique at the concrete

and pictorial-diagrammatic levels of concept representation.

2. Drastically reduce the time required for preparing, adminis-

 

tering, and correcting_tests.

 

3. Drastically reduce the time students would spend in being
evaluated.

4. Offer a record of individualized growth by affording a teacher
a collection of evaluations individually submitted which shows

what a child regards as "hard" on a daily basis. This, then,

Jacqueline Resh Long

can be placed in a folder for the child, parent, or teacher
to examine.

5. Place an emphasis on a child's ability to assess his own
knowledge and recognize self-growth by asking him to submit
an example of what he gan_do. This technique of evaluation
is consistent with the goals of a behavioral philosophy of

self and environmental assessment.

The purpose of this research is to evaluate the researched
technique for assessing a child's level of skill and concept development
in the areas of addition and subtraction. The assessment technique to
be employed in this instance is limited to the symbolic representation
of the mathematic's concepts and skills being examined. The limitation
was placed on the study, because of the lack of instruments available in
the concrete or pictorial-diagrammatic modes of concept representation
with which to compare the newly researched technique. Currently
accepted instruments of evaluation are tests primarily written to
measure symbolic representation.

Several examiners used the technique in this study and admin-
istered the diagnostic tests to groups and individual children attending
public schools. The testing technique employed an ambiguous verbal
stimulus to which a child was asked to respond. The response of the
student being evaluated was then correlated with a traditional diag—
nostic test written for this study for validation of the results.

Using a Pearson product-moment correlation, a value of r = .85 for

addition and r = .81 for subtraction was found. Constructing confidence

Jacqueline Resh Long

intervals for these two correlations (P =.99) p will be between .75
and .91 for addition and .66 and .90 for subtraction.

The following hypotheses were tested using a series of t-tests
with an a level of .05 to determine if there were differences between
groups in their ability to use the testing technique in this study.

I. There will be no significant differences between the high,
average, and low achievers as determined by the Iowa Achievement
tests in their ability to assess their level of abstract
achievement.

2. There will be no significant differences between the high,
average, and low achievers as determined by teacher judgment
in their ability to assess their level of abstract achievement.

3. There will be no significant differences between Blacks and
Caucasians in their ability to assess their level of abstract
achievement.

4. There will be no significant differences between girls and
boys in their ability to assess their level of abstract
achievement.

5. There will be no significant differences between children from
high, average, and low family incomes in their ability to
assess their level of abstract achievement.

No significant differences between groups were noted. Therefore, it
appears that all groups in the study can use the testing technique
equally well.

The following hypothesis was tested to determine if there was

a racial bias with respect to what a child perceives as "hard."

Jacqueline Resh Long

There will be no significant differences between racial groups
in what they perceive as "hard."
A series of chi-square tests were used with an a level of .05. Holding
achievement constant no racial bias was found with respect to what is

considered "hard."

A STUDY TO DETERMINE THE EFFECTIVENESS OF A TECHNIQUE
EMPLOYING AN AMBIGUOUS STIMULUS FOR ASSESSING A

CHILD'S LEVEL OF SKILL AND CONCEPT DEVELOPMENT
IN THE AREAS OF ADDITION AND SUBTRACTION

By

Jacqueline Resh Long

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

College of Education

I975

TABLE OF CONTENTS

Page
LIST OF TABLES .......................... v
LIST OF FIGURES ......................... vi
Chapter
I. INTRODUCTION ....................... l
The Problem ...................... 2
The New Perspectives ................ 2
The Effect on Curriculum .............. 4
The Effect on Instruction ............. 5
The Effect on Mathematics Instruction ....... 7
The Effect on Teacher Roles ............ 8
Resulting Problems and Needs for Teachers ..... 9
Purpose of the Study .................. l0
General Evaluation Procedures of the Partial
Solution ....................... l2
Anticipated Outcomes of the Study ........... l2
Assumptions ...................... 14
Limitations of the Study ................ l4
Definition of Terms .................. l5
The Pilot Studies ................... l7
11. REVIEW OF THE LITERATURE ................. 19
Introduction ...................... 19
Role of Evaluation in Teaching Models ......... 20
Role of Evaluation in Mathematics ........... 2l
Historical Development of Standardized Testing ..... 24
Historically Developed Criteria for Judging Evaluation
Instruments and Measurements ............. 33
Validity ...................... 34
Reliability .................... 36
Usability ..................... 37
Review of the Research in Math Instruction ....... 37
Evaluation Methods in Assessing Learning in a Math Lab . 43
Anecdotal Records ................. 43
Rating Scales ................... 44
Checklists ..................... 45
Interview ..................... 46
Thresholding ...................... 47
The Use of Ambiguous Stimuli in Testing ........ 50

ii

Chapter
III. PROCEDURE AND METHODOLOGY ................

Setting and Sample ...................
Examiners .......................
Instruments and Methods Used for Validating the
Technique in This Study ...............
Procedure .......................
Methods of Analyzing Data ...............

IV. PRESENTATION AND ANALYSIS OF THE DATA ..........

Correlation Between the Technique in This Study

and the Test Written for This Study .........
Child's Ability to Assess Himself ...........
Analysis of the Data Concerning Hypotheses Bl

through B5 of the Study ...............
Analysis of the Data Concerning Hypothesis C

in the Study .....................

V. SUMMARY, GENERALIZATIONS, AND IMPLICATIONS FOR FUTURE
RESEARCH .........................

Criteria for Judging Testing Instruments and
Measurements .....................
Accuracy of a Child's Self-Assessment .........
Different Groups' Ability to Use the Testing
Technique in This Study ...............
A Racial Bias with Respect to What is "Hard" ......
Analysis of the Distribution of Percentage of Correct
Response Scores with Respect to the Technique in

This Study ......................
A Review of the Stated Purpose of This Study ......
Implications for Future Research ............
Usability of the Technique in This Study

in Other Areas ..................

Areas of Mathematics Education to Be Researched
Using the Technique in This Study ........

Appendix

A. QUESTIONS USED IN PILOTS . . ...............
B. PROCEDURE HANDOUT ....................
C. TESTS ..........................

BIBLIOGRAPHY ...........................

Page
53

53
54

55
55
58
65
65
67
70

76
79
79
86
89
89
89
92

92
93

96
97
99
106

Table

LIST OF TABLES

Summary of the effect of activity and model
methodologies on the learning of mathematics

in kindergarten through third grade ...........

Summary of studies to determine the effectiveness
of teaching with models and activities in grades

four through six ....................

Summary of the effect of activity and model
methodologies on the learning of mathematics

in grades seven through twelve .............

F-tests for determining the differences in variance

of the groups in the study ...............

Group differences in their ability to use the testing

technique in this study .................

Summary of the results of the chi-square tests with

addition ........................

Summary of the results of the chi-square tests with

subtraction .......................

iv

Page

39

40

42

72

74

77

78

LIST OF FIGURES

Figure Page
I. Scattergram of the results of the test written for

this study and the technique of this study ....... 60

2. Spread of scores for addition and subtraction ..... 85

CHAPTER I

INTRODUCTION

Recent acceptance of theories in the science of behavior,

cognitive development, and concept representations have created new

approaches to instruction. These, in turn, have created new problems

and needs for teachers. To better understand the dimensions of the

situation, this chapter will cover the following topics:

I.

the new perspectives and their corresponding effect on
curriculum, instruction, and teacher roles, and the resultant
problems and needs that have arisen for teachers of mathematics;
a description of the purpose of this study, which attempts to
identify a partial solution to one of the problems.

a description of the general procedures that were undertaken to
evaluate the partial solution, including the procedure for both
administering and evaluating the technique;

a discussion of the anticipated outcomes of the study;

a presentation of the assumptions that undergird the research,
the limitations of the research, and the definitions of key
terms employed in this study; and

an examination of the pilot studies which helped to develop

the technique.

The Problem

 

The New Perspectives

 

The new perspectives affecting educational goals have their
origin in the recently defined nature of man. The simplistic view
theorized by B. F. Skinner offers man an opportunity to achieve a
relative freedom heretofore unknown to him because of his past
ignorance and refusal to recognize the factors in his environment
which limit or destroy his freedom. Skinner, contrary to the gen-
erally accepted theory of internal control, has hypothesized that
man is born with a differentiated ability to respond to stimuli, and
through continuous conditioning the probability for any given behavior
is changed. Acceptance of the concept that behavior arises primarily
from conditioning requires that man learn to assess which environmental
factors affect him, and in what way, before he can achieve maximum
freedom from environmental control.

Skinner has also contributed a method of determining rela-
tionships between man and his environment through the observation of
behavior, its stimuli, and reinforcers, without theorizing about un-
observable factors. Thus, any individual with skill in assessing his
milieu is able to determine the behavioral cause-and-effect relation-
ships that exist for him, personally, and thereby possibly change the
portions of his environment which adversely affect his desired behavior.

The work of Jerome Bruner, Jean Piaget, and many math educators
has clearly demonstrated that learning needs to begin with concrete

models and progress to symbolic models. Van Engen (I949), supporting

the theories of both Bruner and Piaget, pointed out that the "meaning
of words cannot be thrown back on the meaning of other words. When
the child has seen the action and performed the act for himself, he
is ready for the symbol for the act."

Piaget has been the major contributor of theoretical support
for the use of concrete before symbolic models. He has proposed a
comprehensive theory of cognitive devel0pment that encompasses indi-
vidual growth from birth to maturity. Fennema (l972) describes
Piaget's concept:

According to Piaget's theory, schemas (mental structures)
are formed by a continual process of accommodation to and
assimilation of the individual's environment. This adap-
tation (accommodation and assimilation) is possible because
of the actions performed by the individual upon his environ-
ment. These actions change in character and progress from
overt, sensory actions done almost completely outside the
individual to partially internalized actions that can be
done with symbols representing previous actions, to com-
pletely abstract thought done entirely with symbols. This
development in cognitive growth involves, first the use of
physical actions to form schemas. Learners change from a
predominant reliance on physical action to a predominant
reliance on symbols.

Bruner has theorized that a learner utilizes, in order, three
representations in the process of acquiring a given concept. The first
is the enactive or manipulative stage in which an understanding of a
concept can be gained only as far as the actions in correspondence to
an object possess the attribute of the idea to be learned. In the
second stage, ikonic representation, a child can represent the world
by an image of the original object or action performed on the object,

without the object being present. The final representation is symbolic.

The Effect on Curriculum

 

Educating an individual both formally and informally to live
effectively within society has been the primary role of schools.
Unfortunately, past efforts have entailed the imparting of "factual"
knowledge without emphasizing the origin of these facts, thereby
concealing the structure of the subject area studied. Hilda Taba
(l967) is critical of a curriculum emphasizing the learning of facts
without structuring their implications: "Because specific facts become
obsolete more rapidly than basic concepts or main ideas, they are not
significant in themselves. Their chief function is to explain, illus-
trate, and develop main ideas."

Bruner (l960), by pointing out the historic problem of how
to teach the basic structure of a subject area, gives evidence of the
cafeteria style, fact-teaching of the past. He maintains that since
so little is known about teaching the fundamental structure, facts
rather than structure have been emphasized in the education of an
individual.

Studies done by Lankford (l974), Swart (T974), and Peck and
Jencks (l974) have attempted to determine what is being taught in
today's traditional math classes. These studies found classrooms
of children memorizing number facts, definitions, rules, and algorithms.

A curriculum consistent with a behavioral oriented philosophy
of education that is behavior oriented should have an emphasis which
fosters its goals. The education of an individual should now afford

him the opportunity to develop the skills necessary to maximize his

ability to perceive cause-and-effect relationships by helping him to
order and structure his milieu, thus enabling him to become as inde-
pendent as possible of both his physical and human environments. The
essence of this freedom remains less than absolute because of man's
inability to exist outside of an environment with controlling stimuli

and reinforcers. John Holt, in Freedom and Beyond, refers to man's

 

relative freedom as a constrained life.

We are all and always constrained, bound in, limited by

a great many things, not least of all the fact that we

are mortal. We are limited by our animal nature, by

our model of reality, by our relations with other people,

by our hopes and fears.
This "constrained" life can only have an individually achieved maximum
freedom based on an individual's unique genetic make-up and unique

sequence of experiences.

The Effect on Instruction

 

Fennema (1972), in summarizing a multitude of studies which
tended to support Piaget's theory of cognitive development and Bruner's
theory of concept representation, states:

Collectively, these data tend to support the hypothesis
that a learning environment embodying representational
models suited to the developmental level of the learner
facilitates learning better than a learning environment
that ignores the developmental level of the learner.

The acceptance of Bruner and Piaget's theories suggests that
models be present in a learning environment if conceptual learning is
to take place. Through the use of such models each child would be able

to test the correctness of his perceived generalizations for himself or

with other students, thereby placing the authority for learning on each

child or his group. This type of learning environment would foster
individual growth in the ability to perceive relationships and
encourage a child to be dependent on his own perceptions rather
than on those of a teacher or some other authority. The child

is thus weaned from his dependent state to one of independence.

Taba (1967) states:

In order to develop autonomy of thought, students need
opportunities to organize their own conceptual systems
and to develop their skills for independent processing
of information. Consequently, the nature and the orga-
nization of learning experiences should be calculated to
encourage the learner to inquire, to do his own thinking,
to develop his own ways of working out problems, and to
try out his own ideas. Faced with the temptation to pro-
vide the answers and solutions, the teacher must grant
the learner the right to come to grips with the learning
process, even though the products may be less refined
than the teacher would wish.

Skinner's postulation that individuals are born with a differ-
entiated ability to respond to stimuli, Piaget's theory of cognitive
development, and Bruner's theorized stages of concept representation
all point out a need for the individualization of instruction. By
postulating a genetic component to individual response, a uniqueness
of response is implied. Piaget's theorized stages of cognitive devel-
opment and Bruner's modes of representation also imply a variety of
levels of cognitive functioning and modes of concept representation
within any given group of children, necessitating the creation of a
learning environment which offers a variety of learning situations
designed to accommodate the uniqueness of each individual.

This individualized instruction could be achieved within a

classroom laboratory with concrete, pictorial, or diagrammatic and

symbolic models for the children to use in the attainment of concepts.
Each child would use a model most meaningful to him and would progress
at his own pace. The concepts to be learned could be determined for
the child by his teacher with a sequenced exposure to models to ensure
the eventual learning of the concept, or a nondirected laboratory
exposure to large collections of models could be used. Students in
this type of a milieu can grow in their ability to learn through student
interactions which could broaden their perceptions, or they can learn
through solitary experimenting. Both of these situations permit indi-
viduals to differ in the selection of meaningful models and in their
ability to perceive relationships while being a member of a learning
group. Lab-oriented experiences which use individual or small group
explorations, with materials and teachers as resources, would foster
the type of learning situation consistent with the goal of teaching

children how to perceive relationships.

The Effect on Mathematics Instruction

The unique contribution which mathematics instruction offers
to the education of a person is the opportunity to observe relation-
ships directly through the use of mathematical models which range from
the concrete to the symbolic. A concrete model (Fennema, l972) repre-
sents a mathematical idea by means of three-dimensional objects. A
second type of model is the pictorial or diagrammatic. Through pic-
tures or diagrams, the attributes of certain mathematical concepts are
demonstrated. Finally, symbolic models represent a mathematical idea

by means of commonly accepted numerals and signs that denote mathematical

operations or relationships. From the use of such models children

and adults can experience the act of learning to learn in a math
laboratory with models which encourage growth in skills of observing,
systematizing, formulating, and testing generalizations. Mathematics
also offers the Opportunity to develop the ability to quantify data
and tersely express relationships symbolically, so that patterns in
any given situation can be discerned more easily. These skills are
very necessary if individuals are to develop to their fullest capacity

their competency to determine cause-and-effect relationships.

The Effect on Teacher Roles

 

The role of the teacher in instruction can contribute to or
hinder the achievement of the educational goal of independence, for
the product or consequence of this instruction is a function of this
role and can be freeing or restricting with respect to an individual's
growth.

In the traditional instructional milieu, where authority for
learning rests solely with the instructor, two interrelated conditions
arise. First, a student becomes dependent on his instructor for the
"rightness" or "wrongness" of his generalizations rather than on his
own ability to prove to himself the truth of his conclusions. Second,
a student is limited by his instructor's knowledge rather than his own
concerning the relationships it is possible for him to perceive, and
he is then limited to perceiving only those relationships which his
teacher relates to him. Therefore, traditional expository teaching

violates the goal for achieving a maximum amount of independence for

‘IVT

each individual by limiting learning and forcing an individual to
depend on the perception of others. For similar reasons, programmed
instruction in areas of concept development and guided discovery where
only one outcome is acceptable are also deterrents to the goal of
independence.
Resulting Problems and Needs for
Teachers

The contributions of Skinner, Bruner, and Piaget have influenced
new goals in education and new approaches to instruction. Some of the
problems and needs which have resulted from these changes are the
following:

l. Teachers will be using a method of teaching that was not used
with them.

2. Teachers will need to learn how and when to use models in their
instruction.

3. Teachers will need to determine the student's stage of develop-
ment, as defined by Piaget, and the appropriate model for
depicting a particular concept best suited to the intellectual
needs of the student.

4. Teachers will need to find models for concepts that they wish
to teach and all the modes of representation for these concepts.

5. Teachers will need to learn how to organize their teaching days
so that they can offer individualized instruction.

6. Teachers will need a system of daily record keeping to enable

individual growth to be discerned and planned for.

10.

IO

Inherent in any teaching situation, especially an
individualized lab approach to teaching mathematics,

is the problem of accurately assessing the enterking

skill and mode of concept representation for each student.

In addition, an accurate evaluation following each learning
experience to redetermine the functioning level of the
student must be made.

Teachers will need sizable amounts of time to prepare,
administer, and grade tests for the myriad of levels in

an individualized lab milieu. Instructional time will be
significantly affected.

Teachers will need to set aside sizable amounts of student
time for taking tests.

Teachers will have to find commercial tests or design their
own to measure the concrete and pictorial-diagrammatic levels
of concept representation. Presently, most accepted evaluation
instruments test primarily the symbolic level of concept

representation.

Purpose of the Study

 

A technique of evaluation was developed in pilot studies by

this investigator which intended to do the following:

I.

Validate a method of measuring student achievement at the
symbolic level of concept representation which would then
open the way for researching this technique at the concrete

and pictorial-diagrammatic levels of concept representation.

ll

2. Drastically reduce the time required for preparing,

administering, and correcting_tests.

 

 

3. Drastically reduce the time students would spend in being
evaluated.

4. Offer a record of individualized growth by affording a teacher
a collection of evaluations individually submitted which shows
what work a child regards as difficult on a day-to-day basis.
This, then, can be placed in a folder for the child, parent,
or teacher to examine.

5. Place an emphasis on a child's ability to assess his own
knowledge and recognize self-growth by asking him to submit
an example of what he gan_do. This technique of evaluation
is consistent with the goals of a behavioral philosophy of

self--and environmental assessment.

The purpose of this research is to evaluate the pilot technique
for assessing a child's level of skill and concept develOpment in addi-
tion and subtraction. The assessment technique to be employed in this
research is limited to the symbolic representation of the mathematics
concepts and skills being examined. This limitation was placed on the
study because of the lack of instruments available in the concrete or
pictorial-diagrammatic modes of concept representation with which to
compare the results of the technique in this study.

The lack of such instruments was established by requesting and
subsequently reviewing the commercial diagnostic and achievement tests

cited in the twenty-sixth yearbook Evaluation in Mathematics, of the

CO

5’!

HI]

12

NCTM (National Council of Teachers of Mathematics) and the NCTM
brochure, “Mathematics Tests Available in the United States."

Marily Suydam's annotated list of unpublished evaluation instruments
also was reviewed.

Since the concrete stage of concept representation is entirely
omitted from all test items, and since the pictorial-diagrammatic rep-
resentation is omitted from all tests for middle and upper elementary
schools for most concepts, it is apparent that currently accepted
instruments of evaluation are tests primarily written to measure the
symbolic representation of concepts and skills.

General Evaluation Procedures of
the Partial Solution

 

 

Pre- and in-service teachers used the pilot technique and
administered the diagnostic test written for the study to groups and
individual children attending public schools.

The test employed an ambiguous verbal stimulus to which a
child was asked to respond. This response was evaluated and then
correlated with the index of the diagnostic test written for this

study to validate the results.

Anticipated Outcomes of the Study

 

The following major hypothesis will be tested to determine
whether or not there is a correlation between the results of testing

a child by a diagnostic test and the testing technique being studied:

13

There will be a high correlation between the results
of testing using a diagnostic test and the results of

testing using the technique being studied.

The following five hypotheses will be tested to determine

whether or not there is a difference between groups in their ability

to use the

B1

82

B3

B4

85

testing technique in this study.

There will be no significant differences between the

high, average, and low achievers as determined by the

Iowa Achievement tests in their ability to assess their
level of abstract achievement.

There will be no significant differences between the high,
average, and low achievers as determined by teacher judgment
in their ability to assess their level of abstract
achievement.

There will be no significant differences between Blacks and
Caucasians in their ability to assess their level of
abstract achievement.

There will be no significant differences between girls

and boys in their ability to assess their level of

abstract achievement.

There will be no significant differences between children
from high, average, and low income families in their

ability to assess their level of abstract achievement.

14

The following hypothesis will be tested to determine whether
or not there is a racial bias with respect to what a child perceives
as difficult or "hard.“

C. There will be no significant differences between racial

groups in what they perceive as "hard."

Assumptions

 

Evaluation in mathematics instruction is based on several
assumptions. First, determination of a student's stage of cognitive
and mathematical develOpment is a necessary task, regardless of the
teaching model being used. Second, current diagnostic tests are
relatively accurate in determining a student's competency level with
abstract models of concept representation. Third, thresholding is a
valid means of determining a level of students' functioning when using
diagnostic tests. Fourth, proper sequencing of levels within a diag-
nostic test is necessary if thresholding is to be used as a means of
determining a level of functioning. Fifth, there are three stages of
concept representation: the concrete, pictorial-diagrammatic, and

abstract.

Limitations of the Study

 

Three major limitations of this study should be noted. First,
only two of the four operations with whole numbers were used in the
study, and, no other areas of mathematics which might be assessed by
the technique being evaluated will be researched. Second, the abstract

stage of concept representation is the only stage considered because of

15

the problem of validating testing results for the concrete and
pictorial-diagrammatic stages. Finally, only children in grades l

through 6 were studied.

Definition of Terms

 

In what follows, the major terms used in this study are
defined.

abstract (symbolic) models: Models which represent a mathematical

 

idea by means of commonly accepted numberals and signs that
denote mathematical operations or relationships.

ambiguous stimulus: A stimulus which elicits a variable response

 

from a group of individuals.

behaviorism: The science of behavior which is attempting to understand
the relationships between and within the genetic endowment,
historical environment, and present environment of individuals
with the ultimate goal of accuracy in the prediction of
behavior.

commercial tests: Those tests prepared by various companies which

 

attempt to measure mathematics achievement.

concrete models: Models which represent a mathematical idea by means

 

of three-dimensional objects.

level of concept development: The level of model needed by a person in

 

order to attain the concept being presented. The model represen-

tations are the concrete, pictorial-diagrammatic, and symbolic.

16

math lab milieu: A math learning environment having models that
represent mathematical ideas concretely, pictorially-
diagrammatically, and symbolically and a variety of instruc-
tional media, such as tape recorders, to enhance the learning
of mathematics in an individualized or small group situation.

pictorial-diagrammatic models: Models which represent a diagrammatic
mathematical idea by means of pictures, diagrams, or devices
such as a number line, which illustrates many of the attributes
of the idea.

proper sequencing: A sequencing of response categories which consists
of an "ascending“ series carried far enough to locate the
transition part or threshold from one response category to
another.

ggantitative understanding: The understanding that comes with numerals,
mathematical symbols, and operations which enables a child to
relate these mathematical ideas to his environment.

teacher prepared tests: Those tests prepared by a teacher to measure
the entering or terminal behavior of a student in mathematics.

teaching model: A set of associated ideas and concepts more or less
organized around a larger conception of what teaching should
be like. It enumerates the components of a teaching situation
and shows a general relationship between these components.

testing technique: A method of eliciting student responses which
indicates an achievement level without utilizing an instrument

or prepared list of objective questions.

 

17

thresholding: A level (threshold) of functioning ascertained by

 

observing where in a sequenced task a person begins to make
more errors than correct responses, or where this individual

stops participating in the task.

The Pilot Studies

 

Pilot studies were conducted in Cornell School in Okemos,
Michigan, and in several schools in the Lansing, Michigan, area by
Elementary Intern Program students. Additional data were collected
at Ball State University by students in methods classes who are
required to tutor individual or small groups of elementary students.

These studies attempted to find out whether or not elementary
school children would respond to an open question posed in terms of
“hardness." Several forms of questions were used to determine the most
effective. See Appendix A for the questions used.

Many children in the pilot studies conducted for this research
responded to the assessment questions by giving a memorized problem and
answer, that is, 2000 + 2000 = 4000, lOO + l00 = 200. Since the prob-
lems always used large numbers, it would appear that this behavior was
intended to impress the examiner. To overcome this problem in the
validation study, youngsters were asked to write a problem without
zeros. This change in procedure appeared to give more dependable
results. Requesting a child to check his results with an aid also

eliminated memorized responses.

18

In addition, the pilot studies showed that, with further
testing, a child who would not submit a problem was not able to
respond to symbolic representation in the area being assessed.
However, if this nonrespondent was given a manipulative aid of his
choice, he could provide both problems and solutions.

Across operations, children indicated that "hardness" was
equivalent to large numbers. The majority (43 out of 72) gave examples
of "hard" problems using numbers greater than 100. When children were
given mathematical models to use, their responses seemed to be corre-
lated to the device used. If a model was used which limited the size
of numbers to a quantity under 70, then the hardest problems submitted
included numbers close to 70. If, as in Chip Trading, problems with
regrouping were treated no differently than those without, children
rarely cited problems with regrouping as "hard." These observations
were made with only 22 students.

Finally, the pilot studies revealed that errors in posing
assessment questions and interpretation of questions by children
resulted in some children offering problems that they could not solve.
These problems were generally solvable by the child who submitted the

problem after a short period of instruction.

CHAPTER II

REVIEW OF THE LITERATURE

Introduction

 

A review of the literature was made to establish the important
role of evaluation within the theoretical framework of teaching models
in general and within the teaching of mathematics in particular. In
this chapter, a review of the development of standardized tests dis-
closing the historically based need for objective evaluation to
ascertain a level of student cognitive functioning is followed by
a presentation of the historically established criteria for judging
evaluation instruments and measurements.

After examining the research conducted to determine the
effectiveness of teaching mathematics using concrete, pictorial-
diagrammatic, or symbolic models in a mathematics laboratory, numerous
ways of evaluating learning in a mathematics laboratory which are
currently being used are presented. The chapter concludes with a
theoretical basis for employing a thresholding technique, followed
by a presentation of the historical precedent of using an ambiguous
stimulus in testing, as is employed in the testing technique in this

study.

19

20

Role of Evaluation in Teaching Models

 

Both the behavior-modification teaching model and the discovery-
learning model consist of a set of associated ideas and concepts more or
less organized around a larger conception of what teaching should be and
how it should be viewed. Nutshall and Snook (l973) have described the
behavior-modification model: "[It] consists of that set of concepts and
claims about teaching which has arisen from the attempt to apply the
interpretive framework of behavioral psychology to the classroom." They
add that "the discovery-learning model incorporates those views of
teaching which place greatest emphasis on the self-directed activity
of the student.“

Glaser (1962) has developed a simple basic teaching model
including the four essential components of any teaching situation.
DeCecco (1968) pointed out that these components are present in most
teaching models, especially in the models used to depict behavior-
modification and discovery-learning. A basic teaching model (Glaser,
1962) is as follows:

Instructional ______+ Entering ______+ Instructional ______+ Performance

Objective Behavior Procedures Assessment
A B C D

Instructional objectives are measurable goals which a
student should obtain by the completion of a segment of
instruction. Entering behavior describes the student's
level of cognitive and affective devel0pment prior to
instruction. Instructional procedures refer to the input
of a teacher in the changing of a student's behavior and
is commonly called learning or achievement. Performance
assessment consists of tests and observations used to
determine how well the student has achieved the instruc—
tional objectives. Two of the four elements of the basic

21

model require that information from the student be
collected. In noting the entering behavior all past
experiences of a student deemed relevant to the new
teaching situation must be assessed, while performance
assessment in the portion of the model which deals with
determining what learning took place with respect to
the instructional objectives.

It is apparent from the literature that the evaluation of
student learning is a necessary component of most teaching models.
For the two models consistent with a behavioral philosophy, the
behavioral-modification and the discovery-learning models, evaluation

has a definite role.

Role of Evaluation in Mathematics

In the NCTM's twenty—sixth yearbook, Evaluation in Mathematics,

 

Sueltz states emphatically the role of evaluation in mathematics:

Mathematics is an important part of the curriculum at all
school levels beginning in the kindergarten. It is orga-
nized in a sequence of topics and activities that are
associated with appropriate levels of maturity and ability
of the students. Evaluation can identify and define steps
and levels in the sequence that are appropriate for a given
grade or age level. Careful evaluation should show not only
how far a pupil has progressed in the major steps of a
sequence, but also how well he has understood and mastered

a particular step. Good evaluation will show the facts and
skills mastered (and those not mastered) by the student, his
attitude toward the subject, and the depth of understanding
and insight accompanying his work.

He adds:

Evaluation is useful in determining the relative ease or
difficulty of learning, applying, or remembering a topic,
and materials. We need to know how long it takes to master
a given concept, the suitable concepts for different grades,
the appropriate sequence of concepts, and the aids the
teacher needs to build mastery of each concept.

 

 

' —' I I A. _' .. ._»-- . .

 

 

22

Reisman (1972) points out the importance of evaluation in
determining a mathematics curriculum for effective instruction. By
ascertaining a student's level of functioning, a curriculum can be
developed which will meet the needs of the students involved without
the negative ramification of an inappropriate curriculum. She states:

In looking at the mathematics curriculum, one must con-
sider the level of difficulty involved. If the curriculum
contains an abundance of material which is too advanced or
too difficult for the student, he may become frustrated and
give up; on the other hand, a curriculum that is too easy
leads to boredom and the student again may give up.

Reys (l97l), in an article on manipulative materials, remarked
that to judge the effectiveness of materials, it would be wise to
evaluate learning following their use.

Do evaluate the effectiveness of materials after using
them. Immediately upon the completion of an activity,
it can be very helpful to note particular problem areas,
strengths, weaknesses, and suggestions and to define
areas of needed improvement as well as possible areas
of modification. A continuous reevaluation of manipu-
lative materials ultimately results in better materials
as well as more effective use of them.

Ewbank (l97l), in an article on mathematics labs, discussed the
inherent problem of evaluating mathematics learning in a laboratory
milieu.

Some people use standard methods, that is, teacher-made
or standardized tests. But the results of these tests
may be deceptive, as it is very difficult to measure
understanding and grasp of concepts in this way. .

One way to measure progress in the mathematics labora-
tory is to look at the quality of written reports.

A high standard of written reports should be required,
but in the primary grades it is a mistake to force chil-
dren to write reports until they are ready to do so.

23

Mathematics can be learned by manipulating devices such as

the equalizer balance and colored rods without any writing

at all. Small children need to play with containers of sand,
water, and so on, and in the process they grasp very important
concepts such as the conservation of quantity. I do not see
how you can evaluate this in the orthodox way. . . . For chil-
dren at this early stage of devel0pment, subjective evaluation
may be the best means. However, subjective evaluation should
be based on the teacher's notes, anecdotal records, and a
scrutiny of the child's progress in his written recording.
Short periodic quizzes may be useful to show up those who
cannot do certain processes or who obviously have not grasped
the relevant concepts.

Sueltz summarizes the basic functions of evaluation in the
total mathematics program in the following way:

I. Evaluation can establish levels of learning and locate
a student at a level suitable for his current status in
mathematics.

2. Evaluation is useful in improving the mathematics pro-
gram in terms of curriculum, content, and organization,
selection of materials for learning, and modes of in-
struction and learning. It can furnish data which
should be used in making value judgments.

3. The place of mathematics in modern society can be
studied and appraised in its many ramifications, and
the results of such appraisal can be used in an appre-
ciative way and also as a factor in determining the
curriculum.

4. Competent evaluation of the mathematics program of a
school is useful in keeping the clientele of the school
informed and in answering questions raised by critics.

5. The information and data collected in evaluation form
the substance of a student's record in school. These
data are useful not only for records and reports, but
also for research.

6. Evaluation is much concerned with helping the student
learn mathematics more effectively. Hence, it seeks
answers to many questions dealing with the kind of
mathematics, the level of learning, motivation, and
aspiration.

7. Different modes of learning and their effectiveness
when applied to mathematics should be evaluated.

This applies to various types of materials, various
levels of learning, and various types of students.

8. Finally, evaluation itself provides valuable learning
experiences that a good teacher will capitalize on to
enhance the work of the students.

24

The importance of evaluation in mathematics education is
clearly stated in Sueltz's summary. In order accurately to perform
the evaluations he cites, new instruments and techniques of evaluation
will have to be developed which take into account Piaget's theory of
cognitive growth and Bruner's theory of concept representation within

a behavioral philosophy of education.

Historical Development of Standardized Testing

A review of the historical development of testing reveals that
testing did not originate in the pursuit of educational ideals, but,
rather, stemmed from personal and political considerations. Mehrens
and Lehman have described the historical setting:

When Binet developed his first scale, he was concerned

with devising a means of removing dull pupils from the
overcrowded schools in Paris rather than with constructing
an instrument specifically designed to help the classroom
teacher relate certain intellectual qualities to the learn-
ing process. Horace Mann really did not intend to devise

an objective measure of pupil accomplishment. His criticism
of the public schools in Massachusetts infuriated a group of
teachers and lay citizens in Boston. This group were intent
in resisting and refuting Mann's opinions. In the end, as a
solution to the problem, it was agreed to prepare written
examination questions in history, geography, vocabulary,
science, arithmetic, astronomy, and grammar. This survey
instigated by Horace Mann, was the first instance in which
the same written examination was given to a sample of all
pupils at the same school level, and where the papers were
scored under uniform conditions. Although the findings con-
firmed Mann's contention that the public schools were not as
good as claimed, it would appear that the findings did not
serve as a stimulus to more objective and refined evaluation
techniques in American public schools.

Green (l970) noted the following about Mann's achievement tests:

25

It is interesting to note that these same examinations
were given to all eighth graders in the Boston schools
following World War I in order to compare the results with
the scores of the original pupils. The children in l9l9
excelled their 1845 predecessors by a considerable degree
in all areas except arithmetic problem solving. Another
examination given in Springfield, Massachusetts, in l846,
and a retest in l906 gave results similar to those in
Boston (ubberley, I934).

At the time of the American Civil War a little known man in
the field of education constructed the first objective educational test.
Reverend George Fisher, an English schoolmaster, devised a series of
tests to measure accomplishment in spelling, grammar, handwriting,
composition, mathematics, and other school subjects. This series of
tests was referred to as a Scale Book. Mehrens and Lehmann (I969)
have described its contents.

Thorndike made a major contribution in 1904 when he pub-
lished the first comprehensive book in the field, Mental
and Social Measurement. In this book he proposed several
of the principles which are still used in constructing
standardized tests. Among these principles were (l) test
items should be scaled according to difficulty, (2) tests
should be objectively scored, and (3) tests should have
statistical norms. Thorndike gave further impetus to the
field by publishing the l909 "Scale for Handwriting of
Children" and by encouraging students to do further work
in the field. During this period there were several new
tests which helped turn the tide of schoolmen in favor of
the movement. These tests included C. W. Stone's I908
edition of a standardized achievement test in arithmetic,
the arithmetic scales by Courtes in 19l0, and the
"Composition Scale" by Ayres in l9l2.

The impetus for the continued development of standardized tests
came from three sources: (l) unreliability of school marks as an indi-
cator of school achievement, (2) a group of city school surveys con-
ducted between lOlO and l9l7 in which standardized tests were used to
measure student achievement, and (3) the results of three noteworthy

studies.

26

Mehrens and Lehmann (1969) have pointed out the problem of
unreliable teacher grading:

In 1912 and 1913, Storch and Elliott had a group of

teachers independently grade an English essay, a geometry
paper, and a history paper. They found considerable varia-
tion in grades assigned (even with the geometry paper, which
we would assume to be more amenable to objective evaluation).
In 1928, Falls had 100 English teachers grade an essay
written by a high school senior (who, incidently, wrote

for a newspaper). The teachers were required to assign

both a numerical grade to the essay as well as indicate

the grade level of the student. Once again, as in Storch
and Elliott's study, there was marked variation in both

the numerical grades assigned and the estimated grade level
of the writer. The grades varied from 60 to 98 percent and
the grade level from 5 to 15. These kinds of studies led

to the search for, and development of, more objective
procedures for testing and grading students.

From the school surveys done using standardized tests, the
economic value of producing an acceptable test battery became apparent.
In 1919 the Stanford Achievement Battery was published. It was designed
primarily for use at the elementary level. Green (1970) has stated:

Although achievement tests changed very little after the
publication of this battery, numerous test publishing
companies were established, and standardized tests were
developed in all fields. An idea of the rapid expansion
in the field can be gained from Hildreth's bibliography
of mental tests and rating scales. Hildreth listed 3500
titles in 1935, 4279 titles in 1939, and 5294 titles in
1945.

Three influential studies which showed the major development in
standardized achievement testing in the 19405 and 19505 as listed by
Mehrens and Lehmann (1969), were: (1) the Eight-Year Study of the
Progressive Education Association in 1942, (2) the College Entrance
Examination Board long-range study initiated in 1952, and (3) the

Cooperative Study of Evaluation in General Education completed in 1954.

27

These studies showed an increased use of standardized achievement

tests in our public schools, a beginning inclusion of critical thinking,
application of knowledge, synthesis, and evaluation, and the refinement
of techniques used to construct and standardized achievement tests.

Ayres (1918) prophesied the importance and subsequent growth
of the educational measurement movement in the seventeenth yearbook of
the National Society for the Study of Education, Part II: "Knowledge
is replacing opinion, and evidence is supporting guesswork in education
as in every other field of human activity." In the final chapter of
that yearbook, Judd (1918) noted:

The time is rapidly passing when the reformer can praise

his new devices and offer as the reason for his satisfaction,
his personal observation of what was accomplished. The super-
intendent who reports to his board on the basis of mere
opinion is rapidly becoming a relic of an earlier unscientific
age. There are indications that even the principals of ele-
mentary schools are beginning to study their schools by exact
methods and are basing their supervision on the results of
their measurements of what teachers accomplish.

Merwin (1969) pointed out that the changes in educational
evaluation have evolved through interaction with (1) accepted theories
and practices of education, (2) the role accepted for evaluation in the
educational process, and (3) technical developments in educational
evaluation.

Dobbin (1956), citing evidence of the effect of learning
theories and practices in education on evaluation, noted that not only
fundamental changes in learning theory, but also sweeping changes in

enrollment and school organization patterns, have led to changing

concepts of assessing achievement since the early 19305. Starch (1916)

28

suggested that evaluation concern itself with determining individual
differences in what pupils learn. Educational practices which evolved
from this general idea ranged from "homogeneous grouping" to and includ-
ing individualized instruction. Dressel (1950) pointed out that testing
cannot avoid influencing instruction.

The role of evaluation in educational changes and the resultant
changes in evaluation require examination.

1. The role in general school planning.--Efforts by Haggerty

 

(1917) to determine the effect of evaluation on school planning gave
evidence that, as a result of testing, changes occurred in (a) classi-
fication of pupils, (b) school organization, (c) courses of study,

(d) methods of instruction, (e) time devoted to subject, and (4)
methods of supervision. Twenty years later, Reaves commented: "The
development of the measuring movement and the perfection of tests for
the measurement of achievement and mental capacity have made possible
great advances in educational administration."

2. The role in instruction.—-Merwin (1969) pointed out that

 

during the 19305 there were a number of proposals suggesting that school
testing programs should be conducted in the fall of the year as a basis
for evaluating the level of achievement following instruction. Troyer
(1947) proposed that pretesting be used to determine the degree of
knowledge and skills the students possessed which were prerequisites

to the concepts to be taught. In the forty-fifth yearbook, Douglass

and Spitzer wrote: "For many years we believed that good teaching

begins where the child is, at the point to which his achievement has

29

brought him. We realize that we must take into consideration what
the pupil already knows if we are to guide his learning from then
on in an effective manner.“

3. The role of student decision making,--Simpson (1953)

 

cogently argued that most learning takes place outside the classroom
and that much more learning could take place if students developed
skills for realistically planning and evaluating their own educational
experiences.

4. Changing concepts and the content of evaluation.-—
(a) Merwin (1969) pointed out that we apparently are in the process
of completing a cycle approximately fifty years in length. Monroe's
book, Measuring the Results of Teaching, described evaluation as focus-
ing on very detailed objectives related to skills. Glaser (1967), at
the Invitational Conference on Testing Problems, presented graphical
descriptions of the accomplishments of individual students over time
on relatively minute units of learning. Between the publications of
these two reports, there has been considerable emphasis on more gen-
eral outcomes. (b) Acceptance of the philosophical position that the
teacher should take each child "where he is" and move him as far as
possible toward his maximum potential development calls for a measure
of status at two points in time as a basis for determining change, or
"growth." (c) Bloom (1956) gave considerable impetus to the broadening
of evaluation efforts to include the measuring of “higher mental
processes." A publication by Krathwohl, Bloom, and Masia (1964)
holds promise for broadening evaluation procedures to take into

account very important educational objectives that fall in the~

3O

affective area. Environmental factors affecting learning have long
been recognized, but only in recent years, with the work of Pace and
Stern (1959), Wolf (1965), and Coleman (1966), have there been serious
attempts to obtain measures of perceptions of environmental factors.
(d) Early emphasis on evaluation focused on individual achievement.
In more recent years the focus has been on the evaluation of group
achievement to determine the effectiveness of teaching materials,
instruction, and curriculum. The work of Rice (1897), Arnold (1916),
Cronbach (1963), and Scriven (l967) testify to these changes in emphasis.
(e) With the expansion of educational involvement in the areas of the
military, colleges and professional schools, and early childhood edu-
cation, the need for an accompanying new evaluation concept has arisen.

Merwin (l969),in the sixty-eight NSSE yearbook states that
changing concepts in evaluation have grown out of the technical develOp-
ment and the modes of interpretation which have developed to accompany
new testing techniques. He showed that there are three major areas of
concern.

1. The published Stanford Achievement Tests in 1923 by Terman,

Ruch, and Kelly offered the first battery approach to

testing across subject. This approach has been generally

accepted as a source of achievement information for many

years. The most prudent time to administer a test battery

has been a point of controversy. School administrators

have argued that the tests offer a measure of individual

and group accomplishments and should be given at the end

of a school year. Others have argued for fall testing to

provide information to teachers as a basis for planning

instruction.

2. When achievement tests were shown to be a more efficient
and objective measure of achievement when compared to
"essay" tests, the use of absolute (percentage) scores

resulted in the development of a normative approach to
testing. For several decades evaluation focused on the

31

development of instruments which reliably differentiate
between individuals and interpreted the results of these
instruments in terms of norms. Recently the focus has

been to establish standards, as in the Oak Leaf Project

at Pittsburgh (Glaser, 1968), which is a "mastery" testing.
This type of testing is based on a child showing that he
has accomplished a particular task or behavior to a cer-
tain degree of proficiency as required. Additional types.
of evaluation which have come from a competency--based on
education are those which Burns (1972) speaks of: "When
the method or way of performing (behaving) is important,

a process measuring situation can be thought of as a test
item. If the end result is more important than the method,
a product measuring situation is required. Products can
include plans, blueprints, drawings, paintings, tables,
charts, diagrams, models, photographs, collections,
specimens, stories, poems, and an infinite number of other
real things. In many instances much can be inferred about
a process from observing a product, the two are interrelated.
Evaluations using processes and products are commonly more
valid than merely testing at the verbal level, which may or
may not indicate competence."

The interpretation of achievement in terms of potential has
been used by educators for many years for identifying
selected norm groups. Schudson (1972) has described one

of these established norm groups as a “meritocracy.” He
states that through the use of College Boards to determine
"admissions to certain selective colleges, an additional
simultaneous choice is made in the selection of those indi-
viduals in a society who are to be the future rulers of that
society and the holders of the wealth." The report of the
Commission on Tests (1970) described the situation in the
following manner: "Certainly it is particularly unfortunate
that the characteristics that make for success in school work
as it is commonly conducted are, if not specific to some seg-
ments of society, at least disproportionately distributed
among its social classes and its racial and ethnic groups:
Bowdoin College's admissions director, Richard Moll, told
the press that the tests could not escape cultural bias and
so 'tend to work in favor of the more advantaged elements of
our society, while handicapping others.‘ Problems of inter-
pretation have arisen when achievement scores have been
regressed on aptitude scores giving 'expectancies.‘ A lack
of understanding of the meaning of 'expectancy' has led to
the ideas that 'underachievers' can come up to their pre-
dicted level of performance if they would just apply them-
selves, and an 'overachiever' is doing better than he is
capable of doing. As a result of labelling children,

32

teachers when expecting low achievement will often

get just what they expect, resulting in a phenomena

which has been called the 'self-fulfilling prophecy.'"

A major consideration of educational evaluation in the beginning
was the provision of information for the teacher's use in working with
students. The resultant effect of the use of standardized tests in
the early part of the century was a new potential for considering the
outcomes of different groups on a common examination. The use of a
common test to evaluate learning has spread from a schoolwide basis,
to a statewide consideration, and currently to a national assessment.

Lewy (1973) has raised some serious questions concerning the
use of achievement tests to discriminate both among individuals and
among classes.

Item selection procedures which are recommended for con-

structing tests for individuals differentiation may not

be adequate for tests for discrimination among classes.

In spite of the practical difference between discrimination

of these two types, educational research has not paid enough

attention to the existence of such differences, and there-

fore little systematic study has been devoted to its

implications for the planning of educational studies,

for the construction of instruments, and for analyzing

educational data.
Carver (1975) in reviewing the findings in the Coleman Report (Equality
of Educational Opportunity Survey, 1966) pointed out the Coleman data
was designed to be biased against finding significant educational
effects for the same reasons cited by Lewy. He stated:

Given the impart of the Coleman Report on federal policy

and the allocation of federal funds, it is important that

the basis for such policy be on firm ground. It would be

unfortunate if the data did not reflect what they were
purported to reflect.

33

With the advent of district, state, and federal testing and the
resultant use of these results to make decisions concerning the
funding of educational projects, the necessity for continued research
in evaluation to answer the problems cited has been mandated.

A review of the historical development of testing disclosed
the necessity of developing reliable objective tests to measure student
achievement. This need has continued and grown as the evaluation of
learning has been used to research the effectiveness of certain cur-
ricula, instruction, and learning environments as well as to simply
measure individual achievement. Based on the assumption that measures
of evaluation should be objective, the technique in this study offers
a means of evaluation which retains the well-established need for
objective measures. In addition, the testing technique also emphasizes
the measurement of individual growth and self assessment.

Historically Developed Criteria for Judging
Evaluation Instruments and Measurement

 

The need for objective evaluation instruments and measurements
has existed for a relatively long time, acting as an impetus for the
development of criteria to determine whether or not any given instrument
or measurement did what it was purported to do. These criteria will be
used in Chapter V to help evaluate the study's testing technique.

The first of a series of publications designed to help test
makers refine their instruments was Statistical Methods Applied in
Education written by Harold Rugg in 1917. From Rugg's work came a

series of criteria for judging the desirability of accepting a testing

34

instrument and its results. Gronlund (1971) lists and defines these

criteria as validity, reliability, and usability.

Validity
Validity refers to the extent to which the results of an
evaluation procedure serve the particular uses for which they are
intended. Three types of validity have been identified and are now
commonly used in educational and psychological measurement: (1) con-
tent validity, (2) criterion related validity, and (3) construct
validity.
Gronlund has defined these concepts:
1. Content validity may be defined as the extent to which
a test measures a representative sample of the subject-
matter content and the behavioral changes under
consideration.
2. Criterion-related validity may be defined as the extent

to which test performance is related to some other
valued measure of performance.

 

3. Construct validity may be defined as the extent to
Which test performance can be intepreted in terms
of certain psychological constructs.

 

Gronlund has pointed out additional factors found in the test instrument
which, if ignored, will lower the validity of the test results.

1. Unclear directions.

2. Reading vocabulary and sentence structure too difficult.
3. Inappropriate level of difficulty of test items.

4. Poorly constructed test items.

5. Ambiguity.

6. Test items inappropriate for the outcomes being measured.

35

Test too short.
Improper arrangement of items.

Identifiable pattern of answers.

Factors which influence validity that can be found in the

administration and scoring of a test are the following:

are

1.

been)

due

01-wa

Cheating.

Failure to follow directions.

Ignoring time limits.

Giving pupils unauthorized assistance.
Errors in scoring.

Poor physical environment.

Conditions that might adversely affect test validity which

to personal factors are:

Motivation.

Anxiety.

Fatigue.

Illness.

Test-wiseness (ability to discern cues to correct responses
from the test itself).

Response set (consistent tendency to follow a certain pattern

in responding to test items).

36

Gronlund summarizes the nature of validity thus:

the validity of test results is based on the extent to which
the behavior elicited in the testing situation is a true
representation of the behavior being evaluated. Thus,
anything in the construction or the administration of the
test which causes the test results to be unrepresentative
of the characteristics of the person tested contributes to
lower validity. In a very real sense, then, it is the user
of the test who must make the final judgment concerning the
validity of the test results. He is the only one who knows
how well the test fits his particular use, how well the
testing conditions were controlled and how typical the
responses were to the test situations.

Reliability

 

Reliability refers to the results obtained with an evaluation
instrument and not to the instrument itself. According to Gronlund

(1971),

Reliability refers to the consistency of measurement. That
is, to how consistent test scores or other evaluation results
are from one measurement to another. . . . A closely related
point is that an estimate of reliability always refers to a
particular type of consistency. Test scores are not reliable
in general. They are reliable (or able to be generalized)
over different periods of time, over different samples of
questions, over different raters, and the like. It is pos-
sible for test scores to be consistent in one of these
respects and not in another. The appropriate type of
consistency in a particular case is dictated by the use

to be made of the results. . . . Treating reliability as

a general characteristic can only lead to erroneous
interpretations.

Gronlund adds that reliability merely provides the consistency
which makes validity possible. A highly reliable measure may have
little or no validity.

Factors which may influence reliability are:

1. Length of test--In general, the longer the test the higher

 

reliability.

37

2. Spread of scores--In general, the larger the spread of scores,
the higher the estimate of reliability.

3. Difficulty of test-~Tests which are too easy or too difficult
for the group members taking it will tend to provide scores

of low reliability.

Usability
Usability refers to the practical considerations of selecting
an evaluation instrument. Some of these are:
1. Ease of administration.
Time required for administration.
Ease of scoring.
Ease of interpretation and application.

Availability of equivalent or comparable forms.

0301-wa

Cost.

Review of the Research in Math Instruction

 

The definition of a math lab contributed by Kerr (l974)
identifies the areas of research to be reviewed if math labs can be
thought of as effective environments for learning.

The mathematics laboratory is a strategy of instruction
in which the learner himself interacts with mathematics
and its real-world applications. The techniques used in
a laboratory strategy may be varied; they may include
discussion, discovery activities, model construction or
even some directed teaching. Likewise the interaction
of the learner with mathematics and its applications may
vary. But the laboratory strategy focuses the learner's
attention and activities on the relationship between
mathematics and its real-world applications.

38

The real world applications of mathematics take the form of
models which demonstrate the mathematical concepts in a meaningful
manner to the learner. On the basis of the research evidence put
forth by the 20 studies conducted to determine the effectiveness of
using models and activity oriented classrooms in teaching mathematics
in kindergarten through third grades, it does appear that the use of
mathematical models and activities contributed to effective teaching.
Table 1 presents a summary of these studies. Aurich (1963), Hollis
(1964), Crowder (1965), Nasca (1966), Williams (1967), Howard (1969),
and Wynrath (1970) found significance in favor of the experimental
groups using models and activities. Weber (1969) did not find sig-
nificance, but did find a trend favoring the use of manipulatives.

Two additional studies, by Norman (1955) and Ekman (1966), did not

find significance for either the control or experimental groups at

the end of the instructional period, but did find the experimental

group showed superior retention two weeks and three weeks, respectively,
after the instructional period had ended. Only one of the 20 studies
showed the "traditional" method of instruction produced significance

in achievement. This study, conducted by Passy (1963), used Cuisenaire
rods and offered the only evidence that a traditional approach can be
more effective than teaching with models and activities.

From the research charted in Table 2, it seems apparent that
using models does not hurt the learner's ability to comprehend mathe-
matical concepts. Studies by Dawson and Ruddell (1955), Carmody (1970),

Bisio (1970), and Nickel (1971) show significant results for the use of

39

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 1. Summary of the effect of activity and model methodologies on the learning of mathematics
in kindergarten through third grade
Significant Difference
Author Grade Level Model Test Used In Favor 0f Mathematical Content
Norman third concrete and author neither group at the division of whole
(1955) semiconcrete constructed end of instruction; numbers
models concrete and semi-
concrete at the end
of two weeks
Eidson early many standardized neither arithmetic in lower
(1956) elementary multisensory achievement grades
aids
Sole early manipulative standardized neither arithmetic in lower
(1957) elementary aids achievement grades
Seick second and multisensory author neither computation and
(1959) third aids constructed arithmetic reasoning
Aurich first Cuisenaire standardized Cuisenaire total range of
(1963) rods achievement treatment first grade work
Haynes third Cuisenaire author neither multiplication
(1963) rods constructed
Passy third Cuisenaire standardized traditional computation and
(1963) rods achievement treatment arithmetic reasoning
Lucow third Cuisenaire author neither multiplication and
(1963) rods constructed division
Hollis first Cuisenaire standardized Cuisenaire total range of first
(1964) rods achievement treatment grade work
Crowder first Cuisenaire standardized Cuisenaire total range of first
(1965) rods achievement treatment grade work
Nasea second Cuisenaire standardized Cuisenaire total range of second
(1966) rods achievement treatment grade work
Lucas first Dienes standardized Dienes treatment for identified in
(1966) arithmetic achievement conservation of projection terms:
blocks and author number and concep- multiplication of
constructed tualization of mathe- relations and addition-
matical principles; subtraction relations
traditional for
computation and
solving of verbal
problems
Ekman third counters author neither at end of addition and
(1966) constructed instruction; concrete subtraction
model group on a algorithms
retention test
Weber first manipulative standardized neither but a trend total range of first
(1969) and concrete achievement favored through third grades
and author manipulatives
constructed
Howard early concrete author concrete materials sorting, counting
(l969) elementary materials constructed classifying and
I patterning sets
Wynrath kindergarten games standardized games total range of
(1970) achievement kindergarten and
first grade work
Moody, third manipulative standardized neither multiplication
Abell & and concrete achievement
Bausell materials
(1971)
Ropes second multisensory standardized neither total range of
(1972) aids achievement second grade work
and author

 

 

 

constructed

 

 

 

 

4O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 2. Summary of studies to determine the effectiveness of teaching with models and activities
in grades four through six
Significant Difference
Author Grade Level Models Used Test Used In Favor Of Mathematical Content
Price fifth and multisensory author neither division of fraction
(1950) sixth aids constructed
Howard fifth and concrete and author neither at end of total range of fifth
(1950) sixth semiconcrete constructed instruction; semi- and sixth grade work
concrete three
months later
Dawson 8 fourth many diverse author concrete-model group division of whole
Ruddell models constructed numbers
(1955)
Anderson eighth various visual author neither area, volume and
(1957) tactile constructed pythagorean theorem
devices
Mott fifth and many multi- standardized neither measurement
(1959) sixth sensory aids achievement
Spross fifth and concrete aids standardized neither total range of fifth
(1962) sixth that had achievement and sixth grade work
cultural
significance
True— fifth and manipulation standardized demonstration of fractions
blood sixth of aids and achievement aids
(1967) demonstration
of aids
Toney fourth manipulation standardized neither fourth grade content
(1968) of aids and achievement
demonstration
of aids
Green fifth diagrams standardized neither multiplication of
(1969) cardboard achievement fractions
sticks
Carmody sixth concrete and author concrete and sixth grade work
(1970) semiconcrete constructed semiconcrete
Bisio fifth demonstrated author manipulatives fractions
(1970) manipulatives constructed
Wilkin- sixth laboratory standardized neither metric geometry
son materials achievement
(1970)
Nickel fourth abstract standardized multi-model approach fourth grade work
(1971) picture and achievement
diagrams;
concrete
Ropes sixth laboratory standardized neither sixth grade work
(1972) materials achievement

 

 

 

 

 

 

41

models in teaching. Howard (1950) showed that there was no significant
difference between treatment groups until a test was administered three
months later to make a determination on retention. 0n the retention
test the group using the models did significantly better.

The summary of results shown in Table 3 appears to reverse the
findings in the early elementary studies. Instruction using models is
less effective than traditional approaches. This finding was borne out
by the work of Johnson (1970), Cohen (1970), Schwartz (1971), and
Shoecraft (1971). Low achievers showed a need for aids in instruction
in the Shoecraft (1971) study by showing significant results in group
achievement. Waslyk (1970) showed significant results for his
experimental group when working with measurement concepts using
concrete models.

In reviewing this research, several questions occurred to
this reader concerning the wisdom of accepting many of the results
as an accurate measure of the effectiveness of model and activity
teaching. Two such reservations are noted below.

1. Key words and procedures in the study lacked operational
definitions. Therefore, variables which might have affected
the results remain undisclosed. This lack of definition also
affects replicability.

2. Concepts taught at the concrete, pictorial-diagrammatic level
of representation were primarily evaluated at the abstract
level of representation. This cannot help but place the
results of teaching which uses concrete and semiconcrete

mathematical aids and models at a disadvantage.

42

 

mce>mweue

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

new; ece m_eews spew:
eenempe Low Fecowpweecp ”mew acese>mwcue mewe ece Apnmpv
cw mcw>Fom Eepnoca ->e_;oe sop com me_e emeweeeeceum emce>we xcee spcm>em peegueogm

_e:owpweecp gucmve
xeoz eeeem eeega cpcmwe meecpwmc empuchmcou mewe use A_no_v
gunmwe use cpcm>em .eeecm gpce>em powwoea emee>we Aces gpce>em Nptezgom
cowpuaeemcw pamEm>mwgue mewe Foocum Aomm_v
meowpuecm Fecowpweecp emeweceeceum emcm>we xcee eFeewe ceeou

eepuzepmcoo
memesec Pecowuee ece cozuse ece

“seamcemeee .XLueEomm cowpusepmcw ucmse>ewgue AcumPV
.agoegp Lease: empcewco xoonpxep emNPeLeeceum aces Luce>em cemcgoe
pcmse>ewcoe idmumav
“cesecamems epegucou eeeweceecepm epecucou gecwc xxpmez

memuw>wpue epgmwe
xeoz meegm acese>e_coe ece mewe ece Amom_v
seemee ece epcm>em Leeuwec emeweceeceum emcm>we xcee cpce>em euce>
xcoz ucmse>ewcue mewe saw: idwmmpv
cow: Lorene mo emcee eecpwec eeeweceecepm emee>we xcee coves” ewe>m

mace»
uwcpeeomm mo

pceee>ewcue cowpeucemec Ammmpv
acumeoem errom Leguwm: emN_egeeceum 1am; Peowmaga gum—e3» cmcou
pceucou _euwpe2ecpez mo eo>em cH new: awe» ewe: Pena: Fe>e4 cogue<

euceceewwo uceuwewcowm eeeec

e>pe3u gmeogzu ce>em meeeem cm

mowpeemspes we mcwcceep we» so mewmopoeocpes Fence ece xpw>wuue mo puecee we» to xceesem .m epneh

43

Despite the criticisms which might be leveled at the research
cited, the results certainly can be accepted as strong evidence in
support of model and activity learning. In the majority of cases these
studies still found significant results in favor of such learning, even
though the instruments used to measure learning placed them at a dis-
advantage. These instruments measured learning using symbolic concept
representation, whereas a child using models and activities experiences
concrete or pictorial-diagrammatic concept representations.

If math labs which place an emphasis on model and activity
learning are themselves to be more accurately evaluated in terms of
their effectiveness in teaching math, it is necessary for new methods
of evaluation to be devised which will incorporate the objective nature
of standardized tests and offer a means of evaluating learning at the
concrete and pictorial-diagrammatic representation of concepts. With
this purpose in mind this study was undertaken.

Evaluation Methods in Assessinggtearningg
in a Math Lab

 

 

In addition to standardized tests in evaluating learning in

a math lab, a few other methods have been employed.

Anecdotal Records

 

Anecdotal records are the objective, as opposed to interpretive,
descriptions of pupil behavior written by the teacher on a daily or
frequent basis. Gronlund made the following suggestions concerning

the keeping of these records:

44

l. Confine observations to those areas of behavior that
cannot be evaluated by other means.

2. Limit observations of all pupils at any given time
to just a few types of behavior.

3. Restrict the use of extensive observations of behavior

to those few pupils who are most in need of special
help.

RatingAScales

 

Rating scales provide a systematic procedure for obtaining and
reporting the judgments of observers. A rating scale consists of a set
of characteristics or qualities to be judged and some type of scale for
indicating the degree to which each attribute is present. According to
Gronlund, the rating scale is valuable only to the extent it is care-
fully prepared and appropriately used. It should be constructed in
accordance with the learning outcomes to be evaluated, and its use
should be confined to those areas where there is a sufficient oppor-
tunity to make the necessary observations. If these two principles
are properly applied, a rating scale serves several important
evaluative functions: (1) It directs observation toward specific
and clearly defined aspects of behavior; (2) it provides a common
frame of reference for comparing all pupils on the same set of
characteristics; and it provides a convenient method for recording
the judgment of the observers.

The following principles were listed in Gronlund as important
characteristics to be considered in the preparation or selection of a

rating scale:

45

1. Characteristics should be educationally significant.
2. Characteristics should be directly observable.

3. Characteristics and points on the scale should be
clearly defined.

4. Between three and seven ratins should be provided and
raters should be permitted to mark at intermediate
points.

5. Raters should be instructed to omit ratings where they
feel unqualified to judge.

6. Ratings from several observers should be combined,
whenever possible.
Checklists
According to Gronlund,
A checklist is similar in appearance and use to the rating
scale. A rating scale provides an Opportunity to indicate
the degree to which a characteristic is present or the
frequency with which a behavior occurs. The checklist,
on the other hand, calls for a simple "yes-no" judgment.
It is basically a method of recording whether a character-
istic is present or absent, or whether an action was taken
or not taken. Checklists are especially useful in eval-
uating those performance skills that can be divided into
a series of clearly defined, specific actions.
In summary, the major points to be considered in developing
a checklist, according to Gronlund, are: (1) Identify and describe
clearly each of the specific actions desired in the performance;
(2) add to the list those actions which represent common errors,
if they are limited in number and can be clearly identified; (3)
arrange the desired actions and likely errors in the approximate
order in which they are expected to occur; and (4) provide a simple

procedure for numbering the actions in sequence or for checking each

action as it occurs.

46

Interview

An interview is an evaluation situation in which an examiner
faces a student and asks questions to which the student is expected
to respond. Suydam (1974) suggested the following procedure for a
mathematics evaluation: (1) Face the student with a problem; (2) let
him find a solution, as he tells you what he is doing; and (3) challenge
him, to elicit his highest level of understanding.

All of the methods cited in this chapter to evaluate learning
in a math lab are very time consuming in their preparation, administra-
tion, or both. These methods do offer a teacher a means of evaluating
learning using concrete and pictorial-diagrammatic representations of
concepts. Teachers using interviews or anecdotal records are able to
judge whether or not a child has understood a concept which has been
presented concretely by observing the behavior of the child using the
concrete model and either writing down what has been observed or by
asking the child questions about his behavior and recording the
questions and responses.

The methods of evaluation cited here have the inherent problem
of being subjective. The ability accurately to observe, record, and
pose meaningful questions to determine the depth of learning being
observed is highly dependent on the talents of the teacher doing the
evaluating. This subjectivity may well bring back into the educational
scene the kind of criticism which historically was shown to be valid

with respect to the accuracy of measurement.

47

It is apparent that with all the methods and instruments
available to evaluate learning, additional means are needed which
(1) can measure learning with the myriad of levels of learning present
in any given math lab, (2) require only a short time to prepare, admin-
ister, and correct, and (3) offer objective measures. This study offers
a beginning in the research needed to establish the effectiveness of a

testing technique which can accomplish these three necessary tasks.

Thresholding

 

Methods of evaluating student learning vary, but there is an
emphasis on achievement tests, which are used to determine a level of
functioning with respect to a norm. These norms are determined by test-
ing youngsters to be normed and ascertaining levels of expectancy for
children of a particular age or grade. Buswell and John, in Manual of
Directions for Use with Diagnostic Charts for Individual Difficulties
in Fundamental Processes in Arithmetic, state:

A standardized test in arithmetic will indicate whether a
pupil is doing satisfactory or unsatisfactory work for a
given school grade. It enables the teacher to identify
those pupils who need special attention. However, the
marked limitation of such a test is that it does not tell
why the pupil fails nor how he has made his errors.

Since these tests do not attempt to determine a student's level
of functioning within an area of arithmetic or mathematics, additional
types, called diagnostic or inventory tests, have been developed.
Meyers (1959) pointed out that there were 37 achievement and 10

diagnostic tests available in the area of arithmetic. The latter

have a varied format, with a portion of them offering a sequenced

48

test from simple to complex problems within a computational skill area.
To determine the level of functioning within a diagnostic test of this
kind, a threshold of functioning is ascertained by observing at what
point in this test a child either begins making more errors than
correct responses or stops answering questions.

This method of determining the functioning level of an indi-
vidual has a history beginning in 1860 with Fechner, who was the chief
precursor of experimental psychology. He published a voluminous
treatise on "Psychophysics" entitled Elements der Psychoplysik.
Initially a physicist who sometimes published philosophical works
under a pseudonym, Fechner, because of his interest in philosophy,
may have abandoned physics and been attracted to psycho-physics when
he suffered from a nervous breakdown. He wanted to demonstrate the
identity of mind and matter which to him were two faces of the same
reality, and either of which was apparent according to whether one
took an internal or an external point of view. His background in
physics made him denounce reasoning as a valid source of knowledge.
Seeking a scientific foundation for his knowledge, he hoped to determine
a quantitative relationship between a physical stimulus and resulting
conscious sensation. In his search for the scientific laws governing
psycho-physics he devised suitable methods of experimentation and
statistical treatment of data.

In his search for the relationship between mind and body,
Fechner had to measure as accurately as possible the different thresh-

olds of his subject. Threshold and its Latin equivalent, lemen, mean,

49

essentially, a boundary separating the stimuli that elicit one response
from the stimuli that elicit a different response. Thresholds must be
repeatedly tested, for they vary due to the nature of the senses.
Therefore, a threshold is always a statistical value; customarily, the
lower threshold is defined as the value of the stimulus which evokes a
positive response on 50 percent of the trials.
The threshold technique developed by Fechner is a method of
serial exploration. It consists of “descending" and "ascending" series,
each carried far enough to locate the momentary transition point or
threshold from one response category to another.
Using Fechner's technique, Binet attempted to measure a total
intelligence by measuring its individual aspects. Terman (1917) has
noted:
It was this point of view which long controlled the work
of Binet, who, like others, began by attempting to get at
intelligence by measuring memory, attention, sense
discrimination and other individual functions.

Terman adds:
The assumption that it is easier to measure a part, or one
aspect of intelligence than all of it, is fallacious in
that the parts are not separate parts and cannot be separated
by any refinement of experiment. They are interwoven and
intertwined. Each ramifies everywhere and appears in all
other functions. Memory, for example, cannot be tested
separately from the associative processes. After vainly
trying to disintangle the various intellective functions,
Binet decided to test their combined functional capacity
without any pretense of measuring the exact contribution
of each to the total product. Intelligence tests have been
successful just to the extent to which they have been guided
by that aim.

Terman concluded: "The proof of the Binet method is the fact that it

works so wellf

50

The technique of determining a threshold for the functional
level of a sense with any individual, which began in psycho-physics
with Fechner, was used by Binet in his initial experiments with the
measurement of intelligence. When his first efforts failed, he con-
tinued using this technique, assuming that measuring sense functioning
in combination would not diminish the effectiveness of the technique.

With the establishment of this technique in determining
intelligence, thresholding has been employed in diagnostic inventory
testing to ascertain a level of functioning within an arithmetic
operation. Based on the assumption that thresholding is valid in
diagnostic testing, the proposed research will attempt to shortcut
this technique by demonstrating a more efficient method of determining

a level of performance within an arithmetic operation.

The Use of Ambiguous Stimuli in Testing

 

Ambiguous stimuli were first employed in the area of projective
techniques for identifying emotional problems of individuals. By plac-
ing a stimulus, which could have many responses, before an individual,
much was learned about the person's inner thoughts. Rorschach's ink-
blots projective approach was a precursor to a variety of projective
techniques, including interpretation of drawing, painting, handwriting,
stories, fantasies, play, and drama. Exner (1974) was noted:

Although Rorscahch first became interested in the use of

inkblots to study psychopathology about 1911, it is doubtful
that he undertook any serious investigation of their useful-
ness until 1917. In that he died in 1922, he probably spent

no more than between 3 and 4 years working intensively with
them.

51

Before his death, Rorschach did offer a variety of postulates
concerning specific test features, especially form, color, and human
movement. He did not formulate a global theory of the test and was
quite conservative in discussing its potential usefulness. After his
death, five major systems or approaches in using the Rorscahch devel—
oped. These five systems have caused much controversy in the use and
interpretations of the instrument and its results. Despite all the
controversy, Exner (1974) pointed out that 60 percent of all patients
in a clinical situation in 1971 were administered the test.

Aside from measuring psychopathology with projective techniques,
attitudes have also been measured using ambiguous visual stimuli.
Alberts (Suydam 1974) has developed a test using 21 cartoon-like
drawings. Children are asked to respond to these by associating
themselves with the character portrayal.

Self-reports which request that a student relate what he has
learned in a given class or with a given instructor are common examples
of uses of an ambiguous verbal stimulus.

To test a person's mathematical creativity, Evans (Suydam 1974)
has designed a test for late elementary and early junior high school
students which presents an ambiguous math situation. The student is
expected to respond in as many different ways as possible. Responses
are scored with respect to number, number of different kinds, and

degree of uncommonness.

52

The evaluation of academic achievement as employed in this
study appears to be a new area for using ambiguous stimuli. But
the technique has a long history in the field of psychological
assessment, where the Rorschach and Thematic Aperception Test have
been used for diagnostic purposes in mental health for more than half

a century.

CHAPTER III

PROCEDURE AND METHODOLOGY

The setting, the sample, the examiners who used the proposed
technique, and the instrument used for validating the technique are
described in this chapter. In addition, the procedure for determining
a child's ability accurately to assess and communicate what he knows
about addition and subtraction using symbolic models of concept rep-
resentation, as well as the methods of analyzing the collected data,

will be discussed.

Settinggand Sample

 

This study was conducted in the Muncie, Indiana, school system
and at the laboratory school at Ball State University using 161 ele-
mentary students. The Muncie schools in the study are located in an
area of mixed socioeconomic populations. The predominant races rep-
resented in Muncie are Negroid and Caucasian. Burris, the Ball State
University laboratory school has a mixed cultural, racial, and economic
population, and 30 percent of the students have learning disabilities.
These children are channeled into the regular school classrooms.

The Muncie schools were selected in consultation with the
office of the superintendent of schools and members of the adminis-

tration who were familiar with the type of school populations. Schools

53

54

with the most diverse composition with respect to racial groups
and economic levels were selected.

The testing procedure was administered both to groups of
children and to individual children. At Burris children were grouped
in classrooms with three grades in each class. All classrooms in the
elementary portion of the school were either a 1-3 grade group or a
4-6 grade group. There were four classrooms of each grouping. Six
Muncie classrooms in six different schools were chosen. There were
two first grades, three second grades, and one fifth grade used. The
entire classroom of children in the Muncie schools and the entire pop-
ulation of Burris youngsters in grades 1-6 were evaluated using the

technique in the study.
Examiners

The examiners were both preservice and in-service teachers.
The former came from the student body of Ball State University and were
majoring in elementary or special education. Sections of college
juniors and seniors taking methods classes and who were scheduled to
tutor were asked to use the technique in this study to determine the
level of development of their child or small group of children within
an operation prior to tutoring for fall and winter quarter (l974-1975).
The in-service teachers were from the Muncie school system. They were
selected by their principals from the schools recommended by the Muncie
school administration. A11 six teachers who were asked to participate

accepted.

55

Instruments and Methods Used for Validating
the Technique in This Study

Two sequenced tests were written for the study. A subject's
level of functioning on either or both of these tests was determined
by using Fechner's technique of thresholding. Another measure of the
child's level of functioning was taken using the technique of this
study. This level was determined by comparing the child's submitted
problem to the level of the test designed for the study. The number
of the level which most nearly corresponded to the submitted problem
was then given to the submitted problem. This resulted in each child
having two scores in the form of two level numbers--one from the
sequenced achievement test prepared for the study, and one from the
technique being researched.

A commercially prepared test, Fundamental Processes in
Arithmetic, devised by Buswell and John and published by Bobbs-Merrill
Company, Inc., was used as a guide for sequencing problem levels in
the tests written for the study. A copy of the commercial test may
be found in Appendix C. One additional problem per level was added
to increase reliability, but no more than one was added in an effort
to minimize test fatigue. Copies of the tests prepared for the study

are found in Appendix C.

Procedure

The examiners were given a procedure sheet (Appendix B)

explaining what they were to do. This sheet requested the following:

56

1. Ask the child to be evaluated to, "Show me the hardest

problem that you have learned to do in

 

and write the answer." (The participating college students used the
technique with all four operations in whole numbers and fractions,
but only the addition and subtraction data were analyzed.)

2. If the child, when writing an addition problem, wrote one
having all zeros except for one digit in each addend, for example,
1000 + 2000 = 3000, then the examiner was to request that the child
write a problem with no zeros, except possibly in the answer. (In
the pilot studies, when children submitted memorized responses, the
level of functioning was not discernible to the examiner. Sometimes
the child, when giving a (lOOOi-lOOO==ZOOO) response, indicated that
he could only add a one—digit number to another one-digit number. In
other instances, the problem indicated that he could add numbers in the
thousands.)

3. After the child submitted his problem and answer, the test
written for the study in addition or subtraction was given to him.

4. Last, the test was to be collected when the child wished
to hand it in.

5. A request to fill out a data sheet concluded the directions

on the procedure sheet.

To compile the data, two students from Ball State University,
one in graduate school in elementary education and one a senior in
secondary math education, determined the level of the submitted

"hardest" problem by comparing the problem to the tests written

57

for this study in the appropriate operation and selecting the level
that most corresponded to the submitted problem. This was done for
all submitted problems first. Then the tests written for the study
were scored using the thresholding technique to score the tests. The
thresholding technique of scoring was used in the following way: When
a child missed all three problems at a given level, his functioning
level was determined to be at gne_level befgye_the missed group of
problems.
To test the following hypothesis, a criterion for determining

high, average, and low achievers was established.

81 There will be no significant differences between the high,
average, and low achievers as determined by the Iowa
Achievement tests in their ability to assess their level

of abstract achievement.

A child was judged to be a high achiever if his score on the Iowa
Achievement test was in the 85th percentile or above, an average
achiever if his score on the Iowa Achievement test was between the
30th and 85th percentile, and a low achiever if his score on the Iowa
Achievement test was on the 30th percentile or below.

To test hypothesis 82, which reads as follows:

82 There will be no significant differences between the high,
average, and low achievers as determined by teacher judgment,

in their ability to assess their level of abstract achievement.

58

children were determined to be high, average, or low achievers simply
on the basis of how a teacher viewed their achievement.
Hypothesis 85 states:
85 There will be no significant differences between children from
high, average, and low family incomes in their ability to

assess their level Of abstract achievement.

To test this hypothesis, the following criteria to determine the
category of family income which most nearly corresponded to each
child was used: Scale of family incomes--high, over $25,000; average,

$4,681 to $24,999; and low, below $4,681.

Methods of Analyzing Data

 

To establish a measure Of validity with respect to the testing
technique in this study, a comparison of results was made between the
test written for this study, using the concept Of thresholding to
determine the level Of functioning, and the technique in this study.
The comparison took the form of a correlation which was hypothesis A

of this study. It states:

A There will be a high correlation between the results of
testing using a diagnostic test and the results Of testing

using the technique being studied.

Constructing a scattergram on the results of the test written
for this study together with the results of the technique in this

study, a linear relationship was noted for both Operations. (See

59

accompanying scattergram, Figure 1.) On the vertical axis Of the
scattergram are listed all possible levels (1 through 22) that a child
could attain on the tests designed for the study. The horizontal axis
lists levels 1 through 22, which are all the possible scores attainable
by the testing technique in this study. Each pair of scores which a
child acquires through testing are used as coordinates Of points in
the scattergram.

Since a linear relationship was apparent from the data, a
decision to use the Pearson product-moment correlation coefficient
was made. This correlation coefficient is denoted by rxy' It can
be expressed as the covariance Of two variables, divided by the
standard deviation Of each Of the variables:

rxy SXSy '

The computational formula which was used is:

nZIXiYi - (2X1) (ZYi)

 

rxy mzxiz - (2X1)2] [11”,-2 " (”1“)2

3

 

where X and Y are the variables to be correlated, and n is the total
number of subjects.

In an effort to test the following hypotheses it was necessary
to establish a criterion for determining which children were successful
in communicating their level of functioning by submitting a problem in
addition or subtraction which they thought was the "hardest" that they

could do.

6O

—J-—l—l-—IN
WWNLOd

£0

01.

Level Of Test Written for This Study

 

13579111315171921
Level of Submitted Problem

otindicates addition data

0 indicates subtraction data

Figure 1. Scattergram of the results Of the test written for
this study and the technique of this study.

61

81 There will be no significant differences between the high,
average, and low achievers as determined by the Iowa
Achievement tests in their ability to assess their level
Of abstract achievement.

82 There will be no significant differences between the high,
average, and low achievers as determined by teacher judgment
in their ability to assess their level of abstract achievement.

83 There will be no significant differences between Blacks and
Caucasians in their ability to assess their level of abstract
achievement.

84 There will be no significant differences between girls and boys
in their ability to assess their level of abstract achievement.

85 There will be no significant differences between children from
high, average, and low income families in their ability tO
assess their level of abstract achievement.

The level Of the problem submitted was compared with the results of the
test written for this study, which the children took in the same testing
session. The criterion for a successful self-assessment was established
as follows: When a child submitted a problem which was within tng

levels above or two levels below the level Of functioning established

 

 

by the test written for the study, he was judged to be successful in
his ability to assess himself. In tabulating the results, dichotomous
data were collected, with a I'1" being given to successful students and

a “O“ to nonsuccessful students.

whether
to ass
check

varian

Wher

The 1

/) AW?!

 

62

As a precaution to the subsequent use of t-tests to determine
whether or not there were differences in group means in their ability
to assess themselves (hypotheses 81 through 85), an F—test was used to
check sample variances. When the tests showed no differences in sample
variances, the following two-tailed t~test was used:
i1 ' ;2

t:
Sp lrllnl + 1/n2

 

 

Y, = mean of one group;
Yé = mean of second group;
111 = number of responses in first group; and

h2 = number Of responses in second group; and

(n-1) S2 + (n -1) S2
where sz = I 2 ag-, and

n] + n2 - 2

Sp; = total population variance;
S? = variance Of first group; and

SE = variance of second group.

The limits were:

Upper = t1 - a/Z;

Lower = t d/2;

d.f. = n1 + 112 - 2; and
a = .05.

The assumptions which were made by using this test statistic

were:

63

1. X1 and X2 are normally distributed;

2. homoscedasticity; and

3. samples were randomly selected and independent.

In the determination of a racial bias with respect to what a

child evaluates as "hard," as suggested by hypothesis Cl (there will
be no significant differences between racial groups in what they
perceive as "hard"), the submitted problems were studied in an attempt
to ascertain appropriate groupings for the analyses. If a submitted
problem fitted into moe than one category, then a tally mark was placed
in all appropriate categories. The addition data were grouped in the

following manner:

—-I
0

addition with regrouping;
2 addition without regrouping;
3 problems with three digits or less;
4. problems with more than three digits; and
5 problems with multiple addends (more than two).
Subtraction was grouped into the following categories:
1. subtraction with borrowing;

2. subtraction without borrowing;

(A)

problems with three digits or less; and

A

problems with more than three digits.

The nature of the data collected to test hypothesis C suggested
that a series Of chi-square tests be used with a = .05. The following

test statistic was used:

vhe

 

64

 

where n1 is the observed cell frequency, n is the sum of

n1 + 112 + ... + nk, and pi is the expected frequency.

CHAPTER IV

PRESENTATION AND ANALYSIS OF THE DATA

The results of this investigation using the procedures and
data analysis described in Chapter III are presented in this chapter.
A presentation of the data demonstrating the correlation between the
technique in this study and that Of the test written for this study will
be given first. A discussion of the results of determining whether a
child can assess himself by the criteria established in this research
will follow. Finally, a presentation of the data showing the different
groups' ability to use the testing technique in this study, cited as
hypotheses in the preceding chapter, and the data used to determine
whether or not a racial bias exists with respect to what a child
considers "hard" will be discussed.

Correlation Between the Technique in This Study
and the Test Written for This Study

 

 

The test and the children's submitted problems were collected
as described in the procedure sheet in Appendix B. After the collec-
tion Of these papers, a senior student in secondary math education and
a graduate student in elementary education from Ball State University
determined the level of the submitted "hardest" problem by comparing
the problem to the test written for this study in the appropriate

Operation and selecting the level that most corresponded to the

65

66

submitted problem. This was done for all_submitted problems, first.
Then the tests written for this study and taken by the children were
scored using Fechner's thresholding technique to determine the child's
level of performance on the test. The thresholding technique of scoring
a test was used in the following way: When a child missed all three
problems at a given level, his functioning level was determined to be
at gng_leve1 bgfg[g_the missed group Of problems.

Each child in this study, thus, has two scores--one from his
submitted problem and one from the test designed for the study. A

Pearson product-moment correlation was used to test hypothesis A.

A There will be a high correlation between the results Of
testing a child by a diagnostic test and the testing technique

being studied.

A value Of r = .85 for addition and r = .81 for subtraction was
computed. The results do show that a high correlation was found
between the diagnostic test designed for the study and the testing
technique in this study. Constructing confidence intervals for these
two correlations (P==.99), p was found to be between .75 and .91 for
addition and .66 and .90 for subtraction. Therefore, it can be con-
cluded that the technique in this study gave results which correlated
quite well with the results of the tests designed for this study for

both Operations.

67

Child's Ability to Assess Himself

The percentage of students who submitted problems within
two levels gbgyg_or bglgg_the level of functioning indicated by the
diagnostic test was calculated to be 62 percent with addition and
57 percent with subtraction. A breakdown of the addition data shows
that 33 of the 91 students were nonassessors by the criteria stated
in Chapter III. It was not possible to assess two of the students in
the study because they refused to submit a problem, stating that they
could not think of one. The nonassessors could be broken down into
the following categories: (1) submitted a problem incorrectly solved;
(2) submitted a problem below (less difficult) the level of functioning
as determined by the diagnostic test; and (3) submitted a problem above
(more difficult) the level Of functioning as determined by the
diagnostic test.

Two students solved their submitted problem incorrectly,
making errors that they also made on their test. 0f the remaining
nonassessors, 16 achieved a higher level score on the diagnostic test
than their submitted problem indicated that they could do. Of these
16, 8 submitted problems which placed them in levels 1-12. All the
1-12 levels require little understanding Of place value, and children
could use their fingers to give a correct answer tO the problems.
Therefore, the 8 children who suggested by their submitted problems
that they considered a one-digit number plus a one-digit number as the

"hardest" problem that they could do, correctly answered many problems

68

by treating a multi-digit number as a series of one-digit number
problems, that is, 435 + 362 equals: five plus two, three plus six,
and four plus three. This became apparent by observing the errors in
the problems that they had missed. All Of these children had the
following type Of error:

738

+ 436
11614

It would appear that the submitted problem more accurately depicted
their level of functioning.

Of the 16 students, 2 submitted problems without regrouping,
and on their tests they indicated, by correctly working problems with-
out regrouping, that they could regroup. Another 2 of the 16 could do
multiple addend problems, but did not submit one. Four of the children
submitted three or four-digit numbers with regrouping in their problem,
but went on to solve the five-digit number problems with regrouping on
their diagnostic test.

Of the 35 children who were not evaluated as self-assessors,

15 submitted problems which were on higher levels than they had scored
on the diagnostic test. Of the 15, 4 appeared to suffer from test
fatigue, boredom, or some other conditions which stopped the child
from working all the problems up to the level of the submitted problem.
Six of the students submitted problems which had many zeros, that is,
200 + 300 = 500. This type of problem in the pilot studies preceding
this investigation were shown to be an unreliable indicator of the

level Of functioning. The addition Of a one-digit number to a two-digit

69

number was sequenced by the traditional diagnostic test written for
this study as three levels above the addition of two two-digit numbers.
By solving the one-digit problems and missing the addition of two
two-digit problems, 5 youngsters indicated that the sequencing was
incorrect for them.

Looking at the data obtained using the Operation Of subtraction,
29 children did not correctly assess their level of functioning as
defined by the researcher in Chapter III. The nonassessors could be
distributed into the following categories: (1) submitted problems with
incorrect answers; (2) submitted a problem below (less difficult) the
level Of functioning indicated by the traditional diagnostic test writ-
ten for this study; (3) submitted a problem above (more difficult) the
level of functioning indicated by the traditional diagnostic test
written for this study; and (4) had difficulty with the sequencing
used in constructing the diagnostic test written for this study.

Five of the nonassessors wrote problems with incorrect answers,
thereby giving no level of functioning. Another 9 students simply
stopped answering test problems or missed problems with fewer digits
and borrowing, which in earlier parts of the test they had answered
correctly. It appears that test fatigue or lack Of reinforcement may
have influenced this behavior. These students submitted problems on
a more difficult level than their diagnostic test indicated that they
could do. Seven students submitted problems which were easier than

they actually could do as determined by the diagnostic test.

70

In the sequencing provided by the test written for this study,
levels containing problems with borrowing were intermixed with levels
without. The emphasized criterion for adding a level in the test
written for this study was the number of digits in a number, that is,

a three-digit number with borrowing was considered more difficult than
a four-digit number without borrowing. This emphasis in sequencing
caused problems for some youngsters. A child who submitted a problem
made up of three-digit numbers without borrowing would miss all borrow-
ing problems at levels with smaller numbers, causing him to be judged

a nonassessor. This was the case for 8 of the 29 nonassessors.

Analysis of the Data Concerning Hypotheses 81
through 85 of the Study

 

TO test the following hypotheses of this study a series of

t-tests were used:

81 There will be no significant differences between the high,
average, and low achievers as determined by the Iowa
Achievement tests in their ability to assess their level
Of abstract achievement.

82 There will be no significant differences between the high,
average, and low achievers as determined by teacher judgment
in their ability to assess their level of abstract achievement.

83 There will be no significant differences between Blacks and
Caucasians in their ability to assess their level of abstract

achievement.

71

84 There will be no significant differences between girls and
boys in their ability to assess their level of abstract
achievement.

85 There will be no significant differences between children
from high, average, and low income families in their ability

to assess their level Of abstract achievement.

Several F-tests were run first in order to determine whether or not
there were equal variances in the sample populations. The results Of
those tests are presented in Table 4.

The number Of subjects used for the F-tests was 152; 9 students
were omitted from the analysis because they either did not submit a
problem or answered their problem incorrectly. In either case, it was
impossible to determine a level of functioning from the use of the
technique in this study. Using an a level of .05, no significant
differences were found between the variances of the groups.

After collecting the data sheet handed out with the procedure
sheet, it was noted that no teachers in the study evaluated a child
in a different category of achievement than the category in which the
child had been placed by the Iowa tests. Therefore, hypothesis 82 was
not analyzed separately. Since no differences in variances were indi-

cated by the F-tests, the following two-tailed t-test was used:

X1'X2

t:
Sp I l/n1 + l/n2

 

72

.umepum cw eem: mezocm eueu_e:He

 

mcoc

mcoc

mCO:

mcoc

amp
mm

No
mm

mpwcz
xuepm

"euem
m_cwm
mxon
”xem
20F

emeem>e
new:

”Fe>e4 uceee>eweu<

zo_
emece>e
sew;

”_e>e4 owEocoum

 

museuwewcmwm

azocw cw
meueeesm
do ceasez

 

 

xeepm esp cw meaocm we» we mucewce> cw meucmcoewwe esp mcwcweceuee com mumeuum .e aneH

73

X1 = mean Of one group;
X2 = mean Of second group;
111 = number Of responses in first group; and

n2 = number Of responses in second group;

 

where 2 2
sz : (n-l) 51 + (n2-1) S2
111 + 112 - 2

sz = total population variance;
S? = variance of first group;

SE = variance of second group;

Upper = t1 - a/2;

Lower = t a/Z;

d.f. n1 + n - 2.

2
The results Of the two-tailed t-tests are shown in Table 5.
The two means (.44 and .60) for the high and average family income
levels, respectively, were used in the t-test. No significant dif-
ferences were found with an a level Of .05. The t-statistic was .989,
with 141 degrees Of freedom. Therefore, it was concluded that there
were no differences between the high, average, and low income family

children in their ability to use the testing technique in this study.

74

.emeuuu esp cw new: menace esp eeeewecHe

 

 

amp mo. eeegz
mm um. xee_m
ecoc mo. omp mmm. we. Heeem
No mm. mFLWm
mm om. ween
ego: mo. om_ mm_. mm. ”xem
m_ we. zo_
omF emo. emeee>e
m eee. new;
ecoc mo. _e_ mm_._ we. n_e>ee eCeEe>ewge<
m ee. sop
em? eom. emeee>e
m eee. Low;
eco: mo. _e_ mmm. me. ”Fe>ee eweocoem
euceewewcmwm _e>ee .e.e e:_e> cowuepzeoa menace cw :eez
peep-u ePnEem mpeennzm mecca

mo euceeee> mo Lensez

 

 

zeeem meg» cw eaawczeee unwemee esp em: op xpw__ee creep cw meuceeeeewe eeocu .m epne»

75

The means with the widest spread for ascertaining whether or
not there was a difference between high, average, and low achievers
in their ability to use the testing technique of this study were .44
and .63 (high and average, respectively). NO significant differences
were found with an a level Of .05. The t-statistic computed was 1.199,
with 141 degrees of freedom. It was concluded, therefore, that children
who are high, average, or low achievers are all equally able to use the
testing technique in this study.

TO test whether or not boys and girls were equal in their
ability to use the testing technique in this study, a t-test with an
a level of .05 was used. A t-statistic of .165, with 150 degrees Of
freedom, was computed. No significant differences were found.

A t-test with an a level Of .05 was used to determine whether
or not there was a difference between Black and Caucasian children in
their ability to use the testing technique in this study. The t-
statistic was found to be .555, with 150 degrees of freedom. It was
concluded that Black and Caucasian children were equally able to use
the technique in this study. It would, therefore, appear from the
data that all groups in the study are equally able to respond to the
open question with a self-assessment which has a high degree of

accuracy.

76

Analysis of the Data Concerning
Hypothesis C in the Study

The child—submitted addition problems were studied, and a
decision was made to use the following categories as a basis for
grouping to determine whether or not a racial bias exists with respect
to what a child considered "hard." If a submitted problem fitted into
more than one category, then a tally mark was placed in all the appro-
priate categories. The categories for addition and subtraction are
given below.

Addition:

1. addition with regrouping;

2. addition without regrouping;

3. problems with three digits or less;

4. problems with more than three digits; and
5. problems with multiple addends.
Subtraction:

 

1. subtraction with borrowing;
2. subtraction without borrowing;
problems with three digits or less; and

3
4. problems with more than three digits.

If a child submitted the following problem in addition,
638 + 494 + 863 = , then a tally mark would be placed in the
following categories: addition with regrouping, problems with three

digits or less, and problems with multiple addends.

77

A chi-square test was used to analyze each category. The
results of these tests can be found in Table 6 (addition) and Table 7
(subtraction).

In the addition category, two children did not submit problems,
two children incorrectly solved their problems, and one child was
Chinese, a category not considered in this research. Omitting these
subjects, 88 children were left to be used for testing hypothesis C
with respect to addition. The chi-square values were very low and non-
significant. The values ranged from .0004 to .3450. NO cultural bias

in addition was found with respect to what a child perceived as "hard."

Table 6. Summary Of the results of the chi-square tests with addition

 

 

Number of
Subjects x2 a
in Group Value d.f. Level Significance

 

NO regrouping: .0004 l .05 none
Blacks l3
Whites 75

Regrouping: .0009 1 .05 none
Blacks l3
Whites 75

Multiple addends: 3.4500 1 .05 none
Blacks 13
Whites 75

Three digits or less: .0015 l .05 none
Blacks l3
Whites 75

More than three digits: .0207 l .05 none
Blacks l3

Whites 75

 

78

In the subtraction category, five subjects incorrectly solved
their submitted problems, thus limiting the number of subjects to 63
for the analysis. Very low nonsignificant values for chi-square were
found, the values ranging from .0144 to .8900. It therefore was
concluded that no racial bias was found with respect to what is

considered "hard“ by a child within the Operation of subtraction.

Table 7. Summary Of the results Of the chi-square test with subtraction

 

 

 

Number of
Subjects X2 a
in Group Value d.f. Level Significance
No borrowing: .7830 1 .05 none
Blacks 9
Whites 54
Borrowing: .8900 l .05 none
Blacks 9
Whites 54
Three digits or less: .0114 1 .05 none
Blacks 9
Whites 54
More than three digits: .0160 l .05 none
Blacks 9

Whites 54

 

CHAPTER V

SUMMARY, GENERALIZATIONS, AND IMPLICATIONS
FOR FUTURE RESEARCH

The effectiveness of a testing technique which employs an
ambiguous stimulus to ascertain a level Of functioning within the
Operations of addition and subtraction was the primary question which
this study attempted to explore. Historically developed criteria for
evaluating testing instruments and measurements taken from Chapter II
will be used to summarize and generalize the findings on the effective-
ness of the technique in this study. A summary and the resultant
generalizations concerning the data on a child's self-assessment as
well as the different groups' ability to use the technique in this
study will be presented. An additional analysis of the distribution
Of percentage of correct-response scores with respect tO the technique
in this study, which lent support to the conclusions concerning the
effectiveness of this technique will be Offered. A review Of the
stated purpose Of this study and the implications for future research
will conclude the chapter.

Criteria for Judging Testing
Instruments and Measurements

 

 

Criteria for judging testing instruments and measurements cited

in Chapter II will now be used to evaluate the testing technique in this

79

80

study. By comparing the results of the test designed for this study
with the results Of the new technique, a measure Of criterion-related
validity was made. A correlation of r = .85 for addition and r = .81
for subtraction was found. Using a confidence interval to examine the
combined correlations, it can be assumed that with a probability of .99,
the correlation between the results of the test designed for this study
and the results of the technique of this study will be in the interval
of r = .72 and r = .90 for both operations.

A testing instrument with content validity should ask questions
covering all levels Of representation for all concepts which the exam-
iner deems necessary to an understanding of the area being tested.

Since the technique in this study has the specific questions concerning
content being posed and answered by the individuals being tested, the
content validity is dependent on the examinee's ability to pose valid
questions.

Does the testing technique measure a child's depth of under-
standing and reasoning ability, or does it measure a memorized or rote
learned piece Of information or rule? The construct validity Of the
test which comes from the testing techniques in this study has not been
explored. TO say that the construct validity of a test derived from the
technique in this study is, in general, the same as a sequenced diagnos-
tic test might not be true, for no research has been conducted to show
this.

Looking at additional factors found in the instrument, which if
ignored would lower validity, there are several which are minimized by

the technique in this study.

81

unclear directions--The directions were tested in pilot studies,
and few children in those studies indicated that they did not
know what was being asked Of them. Confused children either
asked questions or did not respond.

reading vocabulary and sentence structure too difficult-~NO
child is asked to read anything more than he, himself, writes.
The directions for the test are read aloud by the examiner.
inappropriate level of difficulty Of test items--The level of
difficulty is judged by the examinee. From the data it appears
that most children submit the "hardest" problem that they can do.
poorly constructed test items--The examinee writes what is
understandable to him, and any poorly constructed items Offer
to the examiner information about the examinee's level of
understanding.

ambiguity--The questions are posed and answered by the examinee,
thus eliminating ambiguity of specific questions.

test items inappropriate for the outcomes being measured--The
examinee, by posing his own question in an area designated by
the examiner, minimizes this problem. By submitting inappro-
priate questions, information concerning the level of function-
ing Of a child is still made available to the examiner.

test too short--By asking the examinee to submit the "hardest"
problem that he can do, the necessity for a lengthy test was
minimized. By correlating the results with a lengthy test, as
was done in this study, the validity was, to a large measure,

substantiated.

82

improper arrangement of items--Since the child submits only
one problem per area to be measured, no arrangement of items
is necessary.

identifiable pattern of answers-~This category does not apply

to the technique in this study.

Several comments can be made concerning factors which influence

validity that can be found in the administration and scoring of a test.

1.

cheating--Since each child submits his own problem and answer,
cheating could be easily detected and minimized.

failure to follow directions--The only directions given are
oral. Since there is only one direction, it is very easy for
an examiner to clarify any misconceptions.

ignoring time 1imits--No time limits are imposed by the
technique in this study.

giving pupils unauthorized assistance--This problem could
apply to the technique in this study.

errors in scoring--Since there is only one problem per area,
the number of errors is minimized. But each problem is unique.
Therefore, no general answer sheet is available.

poor physical environment--A poor physical environment could
effect the results of the technique in this study. But the
time needed to complete this test is minimized, so the effects

of the environment would be minimized.

83

Concerning conditions that might adversely affect test

validity which are due to personal factors, the following may be

noted:

1.

motivation--Motivation would be increased, for children would
be asked to show what they can do without being confronted
with tasks that they cannot do.

anxiety--Anxiety would be minimized, for the child is asked
only to demonstrate what he can do.

fatigue-—The initial fatigue that the child has when entering
the testing situation would remain with this technique, but any
additional fatigue would be minimized due to the shortness of
the testing period.

illness--Illness would still effect the child's ability to
function, but its affects would be minimized due to the
shortness Of the testing period.

test-wiseness--This does not apply, since the child writes
his own exam.

response-set--This does not apply, since the test is only

one-problem-per-area long.

In conclusion, it appears that the test has good general validity

using the criteria cited to make the judgment. Additional research

should be done to establish the construct validity of the response

which each examinee submits. Categories of responses, as with psy-

chological testing using ambiguous stimuli, may Offer different

constructs.

84

The reliability Of the testing technique in this study was

measured, in part, when it was shown that two ways of measuring

a level of functioning had a high correlation. This correlation

indicates a consistency Of response in a single testing situation.

Several other factors which may influence reliability were pointed

out in Chapter II.

1.

The length of the test is a factor in reliability. Since

the testing technique in this study requires only one problem
per area for achievement evaluation, reliability might be
questioned. The correlation data Offer support to the reli-
ability Of the measurement along with the analysis Of the
percentage of problems correctly answered up to and including
the level Of the submitted problem.

Scores with a large spread are indicators of good reliability.
The scores collected in this study have a very wide spread,

as can be seen in Figure 2. (The horizontal axis of the
figure lists the levels of the Operations on the tests designed
for the study. The vertical axis has a series of numbers from
1 through 22, which represents the number of students who sub-
mitted a problem. The coordinates of the points represent the
level Of the problem submitted and the number Of students who
submitted a problem at that level.

If a test is too easy or too difficult, the reliability Of the
results is threatened. The technique in the study asks that a

child write a problem that he thinks is the hardest he can do.

85

18
16
14

Number Of Students
#05000

N

 

246810121416182022

Levels of Problems

Figure 2. Spread of scores for addition and subtraction.

86

The data support that the child does just that. Therefore,
it seems reasonable to assume that the test is neither tOO

difficult nor too simple.

Usability is the last major factor to consider when making a
decision about the advisability Of using a particular test. The tech-
nique in this study has the following points in its favor: (1) It is
easy to administer; (2) it requires a very short time to administer;
(3) it is easy to score; (4) each child supplies equivalent forms of
the test by identifying his level Of performance with his own unique
problem; and (5) little cost is involved.

The major problem that the technique in this study poses is
one Of interpretation Of the results. If operations are tested using
whole numbers and fractions, the problem is simplified. Materials are
available which offer a sequencing of the skills involved in solving
problems in these areas. But if the testing technique is to be used
in other areas, analyses of what a child most likely knows in order
to pose and answer a question in the chosen area will have to be done

in order to interpret the results.

Accuracy Of a Child's Self-Assessment

 

0f the children in the study, 60 percent, according to the
criteria established in Chapter III, could assess their level of
functioning. In analyzing the 64 youngsters who were categorized

as nonassessors, 40 Of these may well have assessed themselves.

87

These children met the following problems with the criteria established

for assessment:

1.

Eight youngsters indicated that they regarded a one-digit plus

a one-digit number as the "hardest“ problem that they could do.
On the test written for the study, they treated several multi-
digit problems with the algorithm they claimed to know for one-
digit addition and solved the problems correctly. From their
errors on the test, the algorithm used was made apparent.
Therefore, it appears that these eight children did indicate
their level of functioning.

Thirteen youngsters submitted problems which were more difficult
than they completed correctly on the test written for the study.
These children either quit solving problems or made errors that
they had indicated earlier in the testing situation were within

their scope Of knowledge. For example,

 

23 234 , 2359
+47 later +478 st1ll later +6874
—70' 61012 9233

It appears that these youngsters may well have indicated their
level Of functioning, but were judged as nonassessors because
of test fatigue, boredom, or some other similar problem.
Thirteen of the children appeared to have problems with the

way the test was sequenced. They submitted problems which were
considered easier or more difficult than the problem which they
answered on the test designed for the study. The discrepancy

proved to be enough to have them evaluated as nonassessors.

88

Six children submitted problems with zeros despite the attempt
by a specific direction on the procedure sheet to negate the
possibility of this happening. More care should be taken to
avoid this type of error in the administration of the testing
technique. With proper questioning, these children may well

have assessed themselves correctly.

In considering the additional data just cited, apart from the criteria

cited for successful assessing, it is questionable whether the 40

children just reviewed really could not assess their level Of

functioning.

It would appear that for the children who seemed unable to

use the technique in this study several procedural considerations

might be noted:

1.

Some children in the study refused to submit a problem because
they could not think of a "hard" one, for all problems within
the operation being tested were considered simple by them. An
examiner may, when noting the absence of a response caused by
the cited difficulty, encourage a child to relate the fact in
writing that all problems seem simple, thereby encouraging an
honesty of response and a possible accurate assessment.

If a child were to submit more than one "hard“ problem, an
incorrect response may be more accurately evaluated by Observing
whether the error occurs again or whether it is a simple

"foolish" inaccuracy.

89

Different Groups' Ability to Use the
Testing Technique in This Study

Examining the data in the study concerning the different groups'
ability to assess themselves (boys-girls, high-average-low achievers,
children from high-average-low income families, and Blacks-Whites), all
groups were shown to be able to use the testing technique equally effec-
tively. The thought Of using a test which has no built-in advantages
or disadvantages for those children who in the past have suffered unfair
discrimination from evaluation methods is very exciting. The possible
use Of the technique in this study to measure achievement in other con-
tent areas Or even in intelligence testing may well Offer a solution to

the biased results present in testing today.

A Racial Bias with Respect to What Is "Hard"

NO bias was found among Black and White children with respect
to what is considered hard within the operations of addition and sub-
traction. Additional investigations may find biases where Operations
or realms of numbers are more complex.

Analysis Of the Distribution of Percentage
of Correct Response SEOres with Respect
to the Technique in This Study

When a child submits a problem as the “hardest" that he can do,
can it be assumed that the levels considered simpler or less difficult
are mastered? Using the sequencing of the test designed for this study

and identifying the levels Of this test to which the problem best

90

corresponds, an analysis of the percentage Of correct responses was
made. All those problems correctly answered up to and including the
problem on the level of the submitted one were counted, and the per-
centage of correct responses was calculated. For addition, the mean
was .86, with a standard deviation Of .21 and a variance of .04. The
subtraction data had a mean Of .84, with a standard deviation of .18
and variance Of .03. The data show that when two-thirds Of a group
of children submit a problem as the "hardest" one that they can do,
they have mastered at least 65 percent of those problems sequenced
as simpler and may have 100 percent of the simpler problems mastered.
Examining the percentage Of problems answered correctly five
levels above (more difficult) the submitted problem, the mean, vari-
ance, and standard deviation for addition were .21, .08, and .28,
respectively. For subtraction, a mean of .20 with a variance Of .10
and a standard deviation of .31 was found. It appears from the data
that 68 percent of a group of children when submitting a "hardest-
they-can-do" problem are able to work about one in five of a series

Of problems sequenced as more difficult.

A Review Of the Stated Purpose of This Study

 

The purposes of this study were stated in Chapter I. How well
these purposes were met will now be discussed.

1. The validation of the testing technique Of this study has
been, to a large measure, accomplished. Both the correlation and
additional analyses concerning how well children individually and

in groups can use this technique have yielded encouraging results.

91

2. The time required to prepare, administer, and correct the
test in this technique is, indeed, minimized. The time required to
think of the areas which need to be assessed and, possibly, to list
them, is all the time required to use this technique. The adminis-
tration and correcting time is also shortened, because the test itself
is very short (one problem per area).

3. The shortness Of the testing procedure directly affects
the time that the student must spend in having his achievement
evaluated.

4. The technique Of this study indeed Offers, on a daily
basis, a collection of individual evaluations which will show the
changes in what a child perceives as "hard" in his daily learning
environment. If his environment has manipulatives or models, he can
offer a problem which he can solve using these. Either he or his
teacher can note on his paper what was used to help solve the problem.

5. The testing technique in this study places an emphasis on
the examinee's ability to recognize what he gan_do. Through the
repeated use of this technique a child may well be able to improve
his ability to recognize self-growth; then, with guidance, he might
be able to recognize what fosters self-growth and what deters it.
With the emphasis on assessing what an individual knows instead of
what he does not know, a testing situation will pose less threat to
feelings Of self-worth. With evaluation being done in terms of
individual growth, the threat Of having to meet group goals is also
minimized. Both of these factors enhance the development of a good

self-concept.

92

Implications for Future Research

 

The research proposed falls into two categories. The first
is research on the usability Of this technique in other areas besides
the symbolic representation Of addition and subtraction. The areas in
mathematics education which might be researched using the technique of

this study is the second category.

Usability Of the Technique in This
Study in Other Areas

 

 

Does an individual have the ability to recognize the knowledge
and skills which he possesses? Can he relate what they are? These
questions were answered in the affirmative with respect to the skill
areas researched in this study. Studies to determine the effectiveness
Of evaluating learning with other Operations, such as multiplication,
and with different realms Of numbers are also needed.

This research dealt primarily with measuring the level Of skill
development in computation. Can this technique measure concept learn-
ing? If college students were asked to note for themselves all the
concepts that they felt had been presented to them in a given lecture,
textbook chapter, laboratory manual, and so forth, could they then write
the "hardest" question that they could think Of which would test the
understanding of each concept? By so doing, could a professor discern
the degree of learning which has taken place for the student?

The greatest need for evaluative instruments and techniques
is at the concrete and pictorial-diagrammatic representation of con-

cepts and skills. With the encouraging results Of this study using

93

symbolic representation, additional research is now called for using
the technique in the evaluation of concept learning using other
representations.

If a child cannot assess his knowledge initially, can he learn
to do this? If he can assess himself and communicate his knowledge
fairly well, can this skill be developed to a high degree of accuracy
and broadened to include most Of his learning experiences? Does the
skill in self-assessment increase with the number of times that it is
done? If a child cannot assess himself, can he be taught to do this?
These are many of the questions which must be answered if the technique
researched here is to be used with maximum understanding Of its effects
upon the examinee.

Areas Of Mathematics Education to Be
ResearchedFUsing_the Technique in
This Study

The technique in this study may prove fruitful in researching
(1) the sequencing of mathematical models for the development Of an
understanding of a concept, (2) the carefully ordered presentation of
concepts in learning a general area Of mathematics, and (3) the effec-
tive ordering of the attributes of a concept for maximum clarification.
Research will also have to determine whether there is a general
sequencing of models, concepts, and attributes, or whether the
orderings must take into account the background of each learner

who will use them.

94

The effectiveness Of different mathematical models for teaching
concepts might also be explored with the technique in this study. In
the pilot studies, children appeared to select a "hard" problem on the
basis Of the mathematical model that they were using at the time; that
is, multiple addend problems were frequently submitted by children
using Chip Trading to learn addition. Large numbers and problems with
regrouping are very simply added using Chip Trading, but addition with
several addends causes some problems. Studies to varify or negate the
relationship between "hard" problems and models may prove valuable.
When the most effective model is used to teach a particular concept
to a child who finds the model readily understandable, learning would
be greatly facilitated.

In general, the technique in this study Offers a researcher
the Opportunity to collect evaluation data on a daily basis because
of the simplicity of administration and the small amount of time
required to complete the testing task. The daily evaluations make
available information on the order in which skills and concepts are
learned.

The examination of nonassessors' test papers indicated that
some of these children found the sequencing of the test written for
the study incorrect for them. They learned how to correctly answer
levels on the test which were considered more difficult than the
ones that they had missed. Another group of children seemed to agree
with the sequencing by missing all the problems beyond a particular

level. These data raised the issue Of whether a sequence of learning

95

tasks could be written whereby all children would find the sequence
correct for them, or whether the sequencing of learning tasks for
individuals requires that the learner's background be taken into
account. Since the testing technique in this study pointed out this
discrepancy, it may be a useful tOOl to help answer the sequencing
questions.

If the question used in the testing technique were altered
to read: “Write a "hard" problem in that you
cannot answer“ (the area to be evaluated would be read in the blank),
the child would have to know enough about the area being evaluated to
write a question, but not enough to answer it. This may well prove
to be a way of ascertaining an appropriate "next" learning experience

which would enable a child to solve his posed problem.

APPENDIX A

QUESTIONS TESTED IN PILOTS

APPENDIX A

QUESTIONS USED IN PILOTS

These questions are listed in order of greatest number of
positive responses. If a child could not think of a response to the
question, this was noted by the examiner. The question with the few-
est number of “no responses" was selected to be used for the study.

1. Show me the hardest problem that you have learned to do in
addition (subtraction, multiplication, or any other realm of
study about which the examiner wishes to gain information)
and write the answer.

2. Make up the hardest problem that you can in addition (subtraction,
multiplication, and so forth). Solve it and write the answer.

3. Write the two hard problems in addition (subtraction, multiplica-
tion, and so forth) that we can put on ditto for the class to solve.
Please include the answer.

4. Write down a problem that you can do in addition (subtraction,
multiplication, and so forth), but maybe no one else can, and
solve it.

5. Write a hard, tricky problem that only you can find the answer to.

Question 1 was amended to meet different assessment needs. If
the question were used to measure a daily growth learning situation, it
was worded: "Show me the hardest problem that you learned to do today
and write your answer."

If a concrete or diagrammatic mode was being assessed, the
question became: "Show me the hardest problem that you learned to
do today and use the aid that you were working with to check your
answer."

96

APPENDIX B

PROCEDURE HANDOUT

used to

APPENDIX B

PROCEDURE SHEET

I wish to thank you for helping to collect data which will be
determine the effectiveness Of this testing technique.

Select the child or group Of children that you wish to test.
Read to the child or group the following question, substituting
the correct operation or area Of mathematics that you would
like them to consider when answering the question. I have used
addition in the wording of this sample question. "Show me the
hardest problem that you have learned to do in addition and
write the answer."

When testing the area of addition, only, do the following: See
if a child submits a problem with all addends using zeros except
for the first digit. If he does, request that he write another
problem with nO zeros except for a possible zero in the answer.
When the child indicates that the task is completed, collect
the problem. There is no time limit.

Pass out the diagnostic test appropriate to the area you are
testing.

Ask the child to complete as many of the problems as he can,
letting him know that there is no time limit.

Collect the diagnostic test.

Fill out the accompanying data sheet on the child.

97

98

Data Sheet

 

 

Child's Name Age

 

Sex

Achievement level as measured by the last Iowa Test child has taken:

Circle one: high average low

Economic level:

Circle one: (Over $25,000) ($4,68l-$24,999) (Below $4,681)
high average low

Race:

Circle one: Negroid Caucasian Other

Achievement level as measured by the child's classroom teacher:

high average low

APPENDIX C

TESTS

Prod. No. 77856

99

PUPIL’S WORK SHEET

'Rﬂm-

Diagnostic Chart for Fundamental Processes in Arithmetic
a...“ s, or. Inval ..1 1.... a.

‘J 1'

n '
ji: .1._

 

MeﬂMMJMmWQmmmyMe
«unmannmwwmmuau

Printed In 0.8.4.

 

 

 

 

 

 

 

 

 

 

 

ADD: School Name._ ._ _ .__,__._ _, _ __
(l) (2) (3)
5 6 2 8 12 13
2 3 9 4 2 5
(4) (5) (6)
19 17 6+2: 52 4o
2 9 13 39
—— —— 3-F4== __
(7) (8) (9)
78 46 3 8 53 7
71 92 5 7 8 89
—— —— 8 9 —— __
2 7
(10) (11) (12)
2+5+1+8= 664 145 35 601
203 652 234 78
44+9-F4i-6=1 ——— ——— ___ ___
(13) (14) (15)
69 38 532 82 13 8
12 84 87 896 7 9
—— ——- ——— ——— 5 33
2 8
(16) (17) (18)
268 943 283 495 34 66
961 128 748 778 33 98
——— ——- ——— ——~ 55 68
94 49
(19) (20) (21)
13 66 9361825 3907598 1 6
587 989 8758785 785763 6 2
46 896 8 7
131 467 1 9
——- -—~ 3 4
O 9
7 8
1 6
8 6
4 9
O 8
2 4
2 3
(22) (23)
879 866 817 5134
266 969 7053 73045
498 986 42610 3
167 898 92 227528
137 449 938512 242

 

 

 

100

 

 

 

 

 

 

 

 

 

 

 

 

 

 

”TRACE:
(1) <2) <3)
6 8 7-1= 19 15
3 8 2 4
_ _ 9—o= —— ——
(4) <5) (6)
58 79 36 79 12 10
4 3 21 24 6 2
(7) (8) (9)
15 19 59-2= 346 836
13 12 215 302
—— —— 86-4= ———'
(10) (11) (12)
189 399 61 75 56 42
45 7O 2 9 48 36
(13) (14) (15)
92 42 528 292 1067 4498
64 19 64 94 237 825
(16) (17) (18)
624 852 431 963 950 507
193 308 162 594 376 221
(19) (20) (21)
9546 9653 5941 6805 132428 823533
8687 2954 968 978 38679 245838
(22) _
10000 80030
8192 46759

 

 

 

 

Level

Level

Level

Level

Level

Level

Level

Level

Level

Level

10

12
+2

19
+6

20
:3_0

23
:2

101

Addition Test

50
:39.

34
:5;

36
+82

+
Iwmoom

4:00

Level

Level

Level

Level

Level

Level

Level

Level

Level

11

12

13

14

15

16

17

18

19

Addition Test (continued)

537
+122

35
+343

 

33
:25

532
:91
17
6

3
+8

349
+868

 

64
38
96

:31

12
466
83
+106

9416772

+6541334

603
+115

 

67
+112

 

42
:53;

94
+937

4
2
27
:12

914
+879

17
33
14
:72

343
8

14
+173

7634215

+4556l48

232
+145

 

231
+64

75
:2_6_

643
:22.

406
+798

21
16
38
:22

684
16

9
+352

3716482

m

Level 20

Level 21

Level 22

Addition Test (continued)

domme'IChLONmO-ﬁd

+

688
964

874
+118

816
961453
4105
+ 63

 

103

hONhNOQO-DWLD-HN

+

816
37

9
4864

+718611

+ .
womeN-bON-DU'I-bd

Level

Level

Level

Level

Level

Level

Level

Level

Level

Level

Level

Level

10

11

12

104

Subtraction Test

16
-5

48
-2

28
-_11

13
-6

15
:11

346
-215

364
- 3

61
-2

36
-27

49
-_23

18
-9

19
:16.

836
-302

287
:_u_

75
-9

47
:29.

97
-35

15
-7

14
it

666
-422

574
-133

91
-8

75
_-_6§.

105

Subtraction Test (continued)

Level 13 37 48 93
:12 _-_2_9_ :51

Level 14 528 292 325
iii :5: -_3;

Level 15 1067 4498 9147
-237 -825 -735

Level 16 173 237 576
-89 -l89 -398

Level 17 700 900 600
:15. :25. :_l_9_

Level 18 9546 8132 9758
-7325 -6021 -8543

Level 19 8535 9542 6543
~7986 -8786 -5754

Level 20 5941 6805 9762
-968 -978 -986

Level 21 132428 823533 173461
-38679 -245835 -96748

Level 22 10000 80030 60011

-8192 -46759 -8965

 

BIBLIOGRAPHY

BIBLIOGRAPHY

Anderson, G. L. "Visual-Tactual Devices and Their Efficacy: An
Experiment in Grade Eight." The Arithmetic Teacher, November
1957, pp. 196-203.

 

Arnold, Felix. The Measurement of Teaching Efficiency. New York:
Lloyd Adams Noble, 1916.

 

Aurich, Sister M. R. "A Comparative Study to Determine the
Effectiveness Of the Cuisenaire Method of Arithmetic
Instruction with Children Of the First Grade Level."
Master's thesis, Catholic University of America, 1963.

Ayres, Leonard P. “History and Present Status of Educational
Measurement." The Measurement Of Educational Products, in
Seventeenth Yearbook Of the National Society for the Study
OflEducation, pt. 2. Bloomington, 111.: Public School
Pﬁblishing Co., 1918, p. 9.

 

 

Bisio, Robert M. "Effect of Manipulative Materials on Understanding
Operations with Fractions in Grade V." Ed.D. dissertation,
University of California, Berkeley, 1970.

Bloom, Benjamin S., ed. Taxonomy of Educational Objectives: The
Classification of Educational Goals, Handbook 1: Cognitive
Domain. New York: Longmans, Green & Co., 1956.

Bruner, Jerome S. The Process of Education. New York: Vintage Books,
1960.

 

, et a1. Studies in Cognitive Growth. New York: John Wiley
and Sons, 1966.

 

Burns, Richard W. "Achievement Testing in Competency-Based Education."
Educational Technology, November 1972, pp. 39-42.

Carmody, Lenora M. "A Theoretical and Experimental Investigation into
the Role of Concrete and Semi-Concrete Materials in the Teaching
Of Elementary School Mathematics." Ph.D. dissertation, The Ohio
State University, 1970.

Carry, L. Ray. "A Critical Assessment of Published Tests for Elementary
School Mathematics." The Arithmetic Teacher 21 (1974): 14-18.

 

106

107

Carver, Ronald P. "The Coleman Report: Using Inappropriately Designed
Achievement Tests.“ American Educational Research Journal 12
(1975): 77-86.

Cohen, Louis. "An Evaluation of a Technique to Improve Space Perception
Abilities Through the Construction Of Models by Students in a
Course in Solid Geometry." Ph.D. dissertation, Yishwa
University, 1959.

Cohen, Martin S. "A Comparison of Effects of Laboratory and Conven-
tional Mathematics Teaching upon Underachieving Middle School
Boys." Ed.D. dissertation, Temple University, 1970.

Coleman, James S., et a1. Equality of Educational Opportunity, 2 vols.
Publication of the National Center for EducationéT Statistics,
DE 38001. Washington, D.C.: Government Printing Office, 1966.

 

Cronbach, Lee J. "Course Improvement through Evaluation." Teachers
College Record 64 (May 1963): 762-683.

 

Crowder, A. B. "A Comparative Study Of Two Methods of Teaching
Arithmetic in the First Grade." Ph.D. dissertation, North
Texas State University, 1965.

Dawson, 0. T., and Ruddell, A. K. "An Experimental Approach to the
Division Idea." The Arithmetic Teacher 2 (February 1955):
6-9.

 

De Cecco, John P. The Psychology of Learning and Instruction:
Educational Psychology. Englewood C1iff5,TN.J.: Prentice-Hall,
1968.

 

 

Dobbin, John E. "Measuring Achievement in a Changing Curriculum."
Proceedings 1956 Invitational Conference on Testing Problems.
Princeton, N.J.: Educational Testing Service, 1957, p. 103.

 

Douglass, Harl R., and Spitzer, Herbert R. "The Importance of Teaching
for Understanding." The Measurement Of Understanding, in
Forty-Fifth Yearbook Of the National Society for the Study
OflEdUCation, pt. 1. Chicago: University of ChicagoTPress,
1946, p. 24.

 

 

Dressel, Paul L. "Information Which Should Be Provided by Test
Publishers and Testing Agencies on the Validity and Use
of Their Tests: Achievement Tests." Proceedings, 1949
Invitational Conference on Testing Problems. Princeton,
N.J.: EducationalTTesting Service, 1950, p. 73.

 

Ebeid, William P. "An Experimental Study Of the Scheduled Classroom
Use of Student Self-Selected Materials in Teaching Junior High
School Mathematics." Ph.D. dissertation, The University Of
Michigan, 1964.

108

Ebel, Robert L. "Obtaining and Reporting Evidence on Content Validity."
Educational and Psychological Measurement 16 (Autumn 1956):
269-282.

 

, ed. Encyclopedia of Educational Research. 4th ed. New York:
Macmillan, 1973.

 

Eidson, William P. “The Role of Instructional Aids in Arithmetic
Education." Ph.D. dissertation, The Ohio State University,
1956.

Ekman, L. G. "A Comparison of the Effectiveness of Different Approaches
to the Teaching Of Addition and Subtraction Algorithms in the
Third Grade." Ph.D. dissertation, University of Minnesota,
1966.

Ewbank, William A. "The Mathematics Laboratory: What? When? How?"
The Arithmetic Teacher 18 (1971): 559-564.

 

Exner, John E., Jr. The Rorschach: A Comprehensive System. New York:
John Wiley and Sons, T974.

 

Fennema, Elizabeth H. "Models and Mathematics.“ In Teacher-Made Aids
for Elementary School Mathematics. Edited by Seaton E. Smith
JF: and Carl A. Backman. Reston, Va.: The National Council
of Teachers Of Mathematics, Inc., 1974, pp. 17-22.

 

 

"A Study of the Relative Effectiveness Of a Meaningful
Concrete and a Meaningful Symbolic Model in Learning a Selected
Mathematical Principle." Technical Report NO. 101. Madison:
Wisconsin Research and Development Center for Cognitive
Learning, 1969.

Fitzgerald, William M., and Higgins, Jon L., eds. Mathematics
Laboratories: Implementation, Research, and Evaluation.
Columbus, 0.: Center for Sciences and Mathematics Education,
1974.

 

 

Glaser, Robert. "Adapting the Elementary School Curriculum to
Individual Performance." Proceedings of the 1967 Invitational
Conference on Testing Problems. Princeton, N.J.: Educational
Testing Service, 1968, pp. 3-36.

 

 

. "Psychology and Instructional Technology.“ In Training
Research and Education. Pittsburgh: University of Pittsburgh
Press, 1962.

 

Green, Geraldine A. "A Comparison Of Two Approaches, Area and Finding
a Part Of, and Two Instructional Materials, Diagrams and Manip-
ulative Aids, on Multiplication of Fractional Numbers in Grade
Five." Ph.D. dissertation, The University of Michigan, 1969.

109

Green, John A. Introduction to Measurement and Evaluation. New York:
DOdd, Mead and Company, 1970.

Gronlund, Norman E. Measurement and Evaluation in Teaching, 2nd ed.
New York: The Macmillan—Campany, 1971.

Haggerty, M. E. "Specific Uses of Measurement in the Solution School
Problems.“ The Measurement of Educational Products, in
Seventeenth Yearbook of the National Society for the Study
of Education, pt. 2. Bloomington, 111.: Public School
Publishing Co., 1918, p. 25.

 

Haynes, J. O. “Cuisenaire Rods and the Teaching of Multiplication to
Third Grade Children." Ph.D. dissertation, Florida State
University, 1963.

Hollis, Loye Y. "A Study to Compare the Effect Of Teaching First and
Second Grade Mathematics by the Cuisenaire-Gattegno Method with
a Traditional Method." School Science and Mathematics 65
(November 1965): 683-687.

 

Holt, John. Freedom and Bgyond. New York: Dell Publishing Company,
1972.

Howard, C. F. "Three Methods Of Teaching Arithmetic." California
Journal of Educational Research 1 (January 1950): 25-29.

Howard, Vivian G. "Teaching Mathematics to the Culturally Deprived
and Academically Retarded Rural Child." Ph.D. dissertation,
University of Virginia, 1969.

Johnson, Donovan A., ed. Evaluation in Mathematics. Reston, Va.:
National Council of Teachers Of Mathematics, 1965.

Johnson, Randall E. "The Effect of Activity Oriented Lessons on the
Achievement and Attitudes Of Seventh Grade Students in Mathe-
matics." Ph.D. dissertation, University of Minnesota, 1970.

Judd, Charles H. "A Look Forward." The Measurement of Educational
Products, in Seventeenth Yearbook Of the National Society for
the Study of Edhcation, pt. 2. BTOomington, IlT}: Public
School Publishing Co., 1918, pp. 159-160.

 

Kieren, Thomas E. "Manipulative Activity in Mathematics Learning."
Journal for Research in Mathematics Education, May 1971,
pp. 228-233.

 

. "Review of Research on Activity Learning." Review of
Educational Research, October 1969, pp. 509-522.

 

110

Kerr, Donald R., Jr. in consultation with John F. Le Blanc.
"Mathematics Laboratory Evaluation." In Mathematics
Laboratories: Implementation, Research, and Evaluation.
Edited byTWilliam M. Fitzgerald and Jon L. Higgins.
Columbus, 0.: ERIC, November 1974.

 

Krathwohl, David R., Bloom, Benjamin S., and Mason, Bertram B.
Taxonomy of Educational Objectives: The Classification
of Educational Goals) Handbook II: Affective Domain.
New York: David McKay Company, Inc., 1964.

 

Lankford, Frances G. "What Can a Teacher Learn About a Pupil's
Thinking Through Oral InterviewS?" The Arithmetic Teacher
1 (January 1974): 26-32.

 

Lewy, Arieh. “Discrimination Among Individuals V. Discrimination
Among Groups." Journal Of Educational Measurement 10 (1975):
19-24.

Lucas, J. S. "The Effect of Attribute-Block Training on Children's
Development Of Arithmetic." Ph.D. dissertation, University
of California, Berkeley, 1966.

Lucon, William H. “An Experiment with the Cuisenaire Method in Grade
Three." American Educational Research Journal 1 (May 1964):
159-167.

McNemar, Quinn. Psychological Statistics. New York: John Wiley and
Sons, Inc., 1969.

 

Mehrens, W. A., and Lehmann, Irvin J. Standardized Tests in Education.
New York: Holt, Rinehart and Winston, Inc., 1969.

 

Merwin, Jack C. "Historical Review of Changing Concepts of Evaluation."
Educational Evaluation New Roles, New Means, in The Sixty-Eighth

 

Yearbook of the National Sogiety for the Study of Education,
pt. 2. Edited by Ralph W. Tyler. Chicago: The University
Of Chicago Press, 1969, pp. 6-25.

Monroe, Walter S. Measuringuthe Results Of Teaching. Boston:
Houghton Mifflin Co., 1918.

Moody, William 8., Abdell, Roberta, and Bausell, Barker R. "The Effect
of Activity Oriented Instruction Upon Original Learning,
Transfer and Retention." Journal for Research in
Mathematics Education, May 1971, pp. 208-212.

 

 

Mott, E. R. "An Experimental Study Testing the Value of Using
Multisensory Experiences in the Teaching of Measurement
Units on the Fifth and Sixth Grade Level." Ph.D. dissertation,
Pennsylvania State University, 1959.

111

Myers, Shelton S. Mathematics Tests Available in the United States.

Washington, D.C.: National Council Of Teachers of Mathematics,
April 1959.

 

Nasea, 0. "Comparative Merits of a Manipulative Approach to Second-
Grade Arithmetic." The Arithmetic Teacher 13 (March 1966):
221-226.

 

Nickel, Anton P. "A Multi-Experience Approach to Conceputalization
for the Purpose of Improvement Of Verbal Problem Solving in
Arithmetic." Ph.D. dissertation, University Of Oregon, 1971.

Norman, M. "Three Methods Of Teaching Basic Division Facts."
Ph.D. dissertation, University Of Iowa, 1955.

Nutshall, E., and Snooh, R. "Teaching Models." In Encyclopedia Of
Educational Research. 4th ed. Edited by Robert L. EbeTZ
New York: Macmillan, 1973.

 

 

Pace, C. R., and Stern, G. G. "An Approach to the Measurement of
Psychological Characteristics of College Environments."
Journal Of Educational Ppythology 49 (1959): 269-277.

 

Passy, R. A. "The Effect Of Cuisenaire Materials on Reasoning and
Computation.“ The Arithmetic Teacher 10 (November 1963):
439-440.

 

Peck, Donald M., and Jencks, Stanley M. "What the Tests Don't Tell."
The Arithmetic Teacher 21 (January 1974): 54-56.

 

Price, R. D. "An Experimental Evaluation Of the Relative Effectiveness
Of the Use Of Certain Multi-Sensory Aids in Instruction in the
Division Of Fractions." Ph.D. dissertation, University of
Minnesota, 1950.

Rankin, Paul T. "Environmental Factors Contributing to Learning.‘I
Educational Diagnosis, in Thirty-Fourth Yearbook of the National

 

 

Societyufor the Study of Edﬁcation. Bloomington, 111.: Public
School Publishing Co., 1935.

Reavis, William C. "Contributions Of Research to Educational Adminis-
tration." The Scientific Movement in Education, in Thirt -
Seventh Yearbook Of the National Society for the Study of
Education, pt. 2. Bloomington, 111.: Public School Publishing
Co., 1938, p. 27.

 

Reisman, Fredicka K. A Guide to the Diagnostic Teaching Of Arithmetic.
Columbus, 0.: Charles E. Merril Publishing Company.

 

Reys, Robert E. "Considerations for Teachers Using Manipulative
Materials." The Arithmetic Teacher 18 (1971): 551-558.

 

112

Rice, Joseph M. "The Futility Of the Spelling Grind." Forum 23
(April, June 1897): 163-172, 409-419.

Ropes, George H. “Multi-Sensory Aids in the Teaching Of Arithmetic
to the Second Grade." Ph.D. dissertation, Teachers College,
Columbia University, 1973.

Rugg: Harold. Statistical Methods Applied in Education. Chicago:
University of Chicago Press, 1917.

 

Russell, Butrand. Education and the Good Life. New York: Leveright
Paperbound Edition, 1926.

 

Schudson, Michael S. "Organizing the 'Meritocracy': A History of the
College Entrance Examination Board." Harvard Educational
Review 42 (1972): 34-69.

 

Schwab, Joseph J. “The Concept of Structure in the Subject Field."
Paper presented at the 20th Annual Meeting Of the Council on
Cooperation in Teacher Education of the American Council on
Education, October 1961, Washington, D.C. Chicago: University
of Chicago.

Schwartz, Frederick J. "The Impact On Learning of COLAMADA Project
Materials on Low Achievers in Mathematics." Ph.D. dissertation,
University of Virginia, 1971.

Scriven, Michael. "The Methodology Of Evaluation." Perspectives of
Curriculum Evaluation: American Educational Research Associa-
tion, Monogrpph Series on Curriculum Evaluation. Chicago: Rand
McNally & Co., 1967. pp. 39-83.

 

 

Seick, Dana F. “The Value Of Multi-Sensory Learning Aids in the Teach-
ing of Arithmetical Skills and Problem Solving--An Experimental
Study." Ph.D. dissertation, Northwestern University, 1959.

Shoecraft, Paul J. "The Effects of Provisions for Imagery Through
Materials and Drawings on Translating Algebra Word Problems,
Grades Seven and Nine." Ph.D. dissertation, The University
of Michigan, 1971.

Simpson, Ray N. Improving_Teaching-Learning Process. New York:
Longmans, Green & Co., 1953.

 

Sinclair, Hermine. "Piaget's Theory of Development: The Main Stages."
In Piagetian Cognitive-Development Research and Mathematical
Education. Edited byTMyron F. Rosokopf. Reston, Va.: National
Council of Teachers of Mathematics, 1971.

113

Skinner, B. F. About Behaviorism. New York: Alfred A. Knopf, 1974.

Beyond Freedom and Dignity, New York: Alfred A. KnOpf,

 

 

 

“197T.
. Science and Human Behavior. New York: The Free Press,
1965.
. The Technology Of Teaching. New York: Meredith Corporation,
1968.

Sole, David. “The Use Of Materials in Teaching Of Arithmetic."
Ph.D. dissertation, Columbia University, 1957.

Spross, P. M. "A Study of the Effect of a Tangible and Conceptualized
Presentation of Arithmetic on Achievement in the Fifth and Sixth
Grades." Ph.D. dissertation, Michigan State University, 1962.

Squire, A., and Applebee, J. "Language Education." In Enpyclopedia of
Educational Research. Edited by Robert L. Ebel. New York:
Macmillan, 1966.

 

Stanley, J. C., and Glass, G. V. Statistical Methods in Education and
Psychology. Englewood Cliffs, N.J.: Prentice Hall, Inc., 1970.

 

Starch, Daniel. "Standard Tests as Aids in the Classification and
Promotion of Pupils.“ Standards and Tests for the Measurement
of the Efficiency Of Schools and School Systems, in Fifteenth
Yearbook of the National Society for the Study,of Education,
pt. 2. Chicago: University Of Chicago Press, 1916, p. 143.

Suydam, Marilyn. "Evaluation in Mathematics Classrooms: From What and
Why to How and Where." ERIC. Columbus, 0.: Information
Analysis Center for Science and Mathematics, 1974.

. "Unpublished Instruments for Evaluation in Mathematics
Education: An Annotated Listing." ERIC. Columbus, 0.:
Information Analysis Center for Science and Mathematics, 1974.

Swart, William L. "Evaluation of Mathematics Instruction in the
Elementary Classroom." The Arithmetic Teacher 21 (January
1974): 7-11.

 

Taba, Hilda. Teachers' Handbook for Elementary Social Studies. Palo
Alto: Addison-Wesley Publishing Company, 1967.

Terman, Lewis M., Lyman, Grace, Ordall, George, Ordahl, Louise E.,
Galbraith, Neva, and Talbert, Wilford. The Stanford Revision
and Extension of the Binet-Simon Scale for Measuring Intelli-
gence. Baltimore: Warwick and York, Inc., 1917.

 

114

Toney, JO Anne. "The Effectiveness of Individual Manipulation of
Instructional Materials as Compared to a Teacher Demonstration
in Developing Understanding in Mathematics." Ph.D. dissertation,
Indiana University, 1968.

Troyer, Maurice E. Accuracy_and Validity in Evaluation Are Not Enough.
New York: Syracuse University Press, 1947.

 

Trueblood, Cedel R. "A Comparison of Two Techniques for Using Visual-
Tactual Devices to Teach Exponents and Non-Decimal Bases in
Elementary School Mathematics." Ed.D. dissertation, The
Pennsylvania State University, 1967.

Ullman, Neil R. Statistics--An ApplieduApproach. Lexington, Mass.:
Xerox College Publishing, 1972.

 

Vance, James H. “The Effects of a Mathematics Laboratory in Grade 7
and 8. An Experimental Study." Ph.D. dissertation, University
of Alberta, 1969.

, and Kieren, Thomas E. "Laboratory Settings in Mathematics:
What Does Research Say to the Teacher?" The Arithmetic Teacher,
December 1971. pp. 585-589. '

Van Engen, H. "Analysis Of Meaning in Arithmetic." Elementary School
Journal 49 (February-March 1949): 321-329; 395-400.

 

Wasylyk, E. "A Laboratory Approach to Mathematics for Low Achievers:
An Experimental Study." A working paper, University of Alberta,
1970.

Weber, Andra W. "Introducing Mathematics to First Grade Children:
Manipulative vs. Paper and Pencil." Ed.D. dissertation,
University of California, Berkeley, 1969.

Wilkinson, Jack D. "A Laboratory Method to Teach Geometry in Selected
Sixth Grade Mathematics Classes." Ph.D. dissertation, Iowa
State University, 1970.

. "Teacher-Directed Evaluation of Mathematics Laboratories."
The Arithmetic Teacher 21 (1974).

 

Wolf, Richard. "The Measurement of Environments.“ Proceedings Of the
1964 Invitational Conference on TestingyProblems. Princeton,
N.J.: Educational Testing Service, 1965, pp. 93-106.

 

 

Wynroth, Lloyd Z. “Learning Arithmetic by Playing Games." Ph.D.
dissertation, Cornell University, 1970.

1 13111 1211111 111112111 1111,1111