'____< A STUDY TO DETERMINE THE EFFECTIVENESS OF A TECHNIQUE EMPLOYING AN AMBIGUOUS STIMULUS FOR ASSESSING A CHILD'S LEVEL OF SKILL AND CONCEPT DEVELOPMENT IN THE AREAS OF ADDITION- AND SU‘BTRACTION Dissertation for the Degree of PII. D. MICHIGAN STATE UNIVERSITY JACQUELINE RESII LONG. 1 9 7 5 This is tb-.- ‘ thesis entitl' A Study to Determine the Effectiveness of 3 Technique Employing an Ambiguous Stimulus for Assessing A Child's Level of Skill and Concept Development in the Areas of Addition and sub tratiopresented by Jacqueline Resh Long has been accepted towards fulfillment of the requirements for Ph.D. /. / / ¥ . ‘ 1 / ' /’_’ V 4 1 A 1 LI Majox professor 0-7639 (1951.” in Elementary Education ABSTRACT A STUDY TO DETERMINE THE EFFECTIVENESS OF A TECHNIQUE EMPLOYING AN AMBIGUOUS STIMULUS FOR ASSESSING A CHILD'S LEVEL OF SKILL AND CONCEPT DEVELOPMENT IN THE AREAS OF ADDITION AND SUBTRACTION By Jacqueline Resh Long The contributions of Skinner, Bruner, and Piaget have influenced new goals in education and new approaches to instruction. These new goals and approaches to instruction have created problems and needs for teachers. A technique of evaluation was developed in pilot studies to help resolve the following problems and needs experienced by teachers in evaluating student learning: l. Validate a method of measuring student achievement at the symbolic level of concept representation which would then open the way for researching this technique at the concrete and pictorial-diagrammatic levels of concept representation. 2. Drastically reduce the time required for preparing, adminis- tering, and correcting_tests. 3. Drastically reduce the time students would spend in being evaluated. 4. Offer a record of individualized growth by affording a teacher a collection of evaluations individually submitted which shows what a child regards as "hard" on a daily basis. This, then, Jacqueline Resh Long can be placed in a folder for the child, parent, or teacher to examine. 5. Place an emphasis on a child's ability to assess his own knowledge and recognize self-growth by asking him to submit an example of what he gan_do. This technique of evaluation is consistent with the goals of a behavioral philosophy of self and environmental assessment. The purpose of this research is to evaluate the researched technique for assessing a child's level of skill and concept development in the areas of addition and subtraction. The assessment technique to be employed in this instance is limited to the symbolic representation of the mathematic's concepts and skills being examined. The limitation was placed on the study, because of the lack of instruments available in the concrete or pictorial-diagrammatic modes of concept representation with which to compare the newly researched technique. Currently accepted instruments of evaluation are tests primarily written to measure symbolic representation. Several examiners used the technique in this study and admin- istered the diagnostic tests to groups and individual children attending public schools. The testing technique employed an ambiguous verbal stimulus to which a child was asked to respond. The response of the student being evaluated was then correlated with a traditional diag— nostic test written for this study for validation of the results. Using a Pearson product-moment correlation, a value of r = .85 for addition and r = .81 for subtraction was found. Constructing confidence Jacqueline Resh Long intervals for these two correlations (P =.99) p will be between .75 and .91 for addition and .66 and .90 for subtraction. The following hypotheses were tested using a series of t-tests with an a level of .05 to determine if there were differences between groups in their ability to use the testing technique in this study. I. There will be no significant differences between the high, average, and low achievers as determined by the Iowa Achievement tests in their ability to assess their level of abstract achievement. 2. There will be no significant differences between the high, average, and low achievers as determined by teacher judgment in their ability to assess their level of abstract achievement. 3. There will be no significant differences between Blacks and Caucasians in their ability to assess their level of abstract achievement. 4. There will be no significant differences between girls and boys in their ability to assess their level of abstract achievement. 5. There will be no significant differences between children from high, average, and low family incomes in their ability to assess their level of abstract achievement. No significant differences between groups were noted. Therefore, it appears that all groups in the study can use the testing technique equally well. The following hypothesis was tested to determine if there was a racial bias with respect to what a child perceives as "hard." Jacqueline Resh Long There will be no significant differences between racial groups in what they perceive as "hard." A series of chi-square tests were used with an a level of .05. Holding achievement constant no racial bias was found with respect to what is considered "hard." A STUDY TO DETERMINE THE EFFECTIVENESS OF A TECHNIQUE EMPLOYING AN AMBIGUOUS STIMULUS FOR ASSESSING A CHILD'S LEVEL OF SKILL AND CONCEPT DEVELOPMENT IN THE AREAS OF ADDITION AND SUBTRACTION By Jacqueline Resh Long A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education I975 TABLE OF CONTENTS Page LIST OF TABLES .......................... v LIST OF FIGURES ......................... vi Chapter I. INTRODUCTION ....................... l The Problem ...................... 2 The New Perspectives ................ 2 The Effect on Curriculum .............. 4 The Effect on Instruction ............. 5 The Effect on Mathematics Instruction ....... 7 The Effect on Teacher Roles ............ 8 Resulting Problems and Needs for Teachers ..... 9 Purpose of the Study .................. l0 General Evaluation Procedures of the Partial Solution ....................... l2 Anticipated Outcomes of the Study ........... l2 Assumptions ...................... 14 Limitations of the Study ................ l4 Definition of Terms .................. l5 The Pilot Studies ................... l7 11. REVIEW OF THE LITERATURE ................. 19 Introduction ...................... 19 Role of Evaluation in Teaching Models ......... 20 Role of Evaluation in Mathematics ........... 2l Historical Development of Standardized Testing ..... 24 Historically Developed Criteria for Judging Evaluation Instruments and Measurements ............. 33 Validity ...................... 34 Reliability .................... 36 Usability ..................... 37 Review of the Research in Math Instruction ....... 37 Evaluation Methods in Assessing Learning in a Math Lab . 43 Anecdotal Records ................. 43 Rating Scales ................... 44 Checklists ..................... 45 Interview ..................... 46 Thresholding ...................... 47 The Use of Ambiguous Stimuli in Testing ........ 50 ii Chapter III. PROCEDURE AND METHODOLOGY ................ Setting and Sample ................... Examiners ....................... Instruments and Methods Used for Validating the Technique in This Study ............... Procedure ....................... Methods of Analyzing Data ............... IV. PRESENTATION AND ANALYSIS OF THE DATA .......... Correlation Between the Technique in This Study and the Test Written for This Study ......... Child's Ability to Assess Himself ........... Analysis of the Data Concerning Hypotheses Bl through B5 of the Study ............... Analysis of the Data Concerning Hypothesis C in the Study ..................... V. SUMMARY, GENERALIZATIONS, AND IMPLICATIONS FOR FUTURE RESEARCH ......................... Criteria for Judging Testing Instruments and Measurements ..................... Accuracy of a Child's Self-Assessment ......... Different Groups' Ability to Use the Testing Technique in This Study ............... A Racial Bias with Respect to What is "Hard" ...... Analysis of the Distribution of Percentage of Correct Response Scores with Respect to the Technique in This Study ...................... A Review of the Stated Purpose of This Study ...... Implications for Future Research ............ Usability of the Technique in This Study in Other Areas .................. Areas of Mathematics Education to Be Researched Using the Technique in This Study ........ Appendix A. QUESTIONS USED IN PILOTS . . ............... B. PROCEDURE HANDOUT .................... C. TESTS .......................... BIBLIOGRAPHY ........................... Page 53 53 54 55 55 58 65 65 67 70 76 79 79 86 89 89 89 92 92 93 96 97 99 106 Table LIST OF TABLES Summary of the effect of activity and model methodologies on the learning of mathematics in kindergarten through third grade ........... Summary of studies to determine the effectiveness of teaching with models and activities in grades four through six .................... Summary of the effect of activity and model methodologies on the learning of mathematics in grades seven through twelve ............. F-tests for determining the differences in variance of the groups in the study ............... Group differences in their ability to use the testing technique in this study ................. Summary of the results of the chi-square tests with addition ........................ Summary of the results of the chi-square tests with subtraction ....................... iv Page 39 40 42 72 74 77 78 LIST OF FIGURES Figure Page I. Scattergram of the results of the test written for this study and the technique of this study ....... 60 2. Spread of scores for addition and subtraction ..... 85 CHAPTER I INTRODUCTION Recent acceptance of theories in the science of behavior, cognitive development, and concept representations have created new approaches to instruction. These, in turn, have created new problems and needs for teachers. To better understand the dimensions of the situation, this chapter will cover the following topics: I. the new perspectives and their corresponding effect on curriculum, instruction, and teacher roles, and the resultant problems and needs that have arisen for teachers of mathematics; a description of the purpose of this study, which attempts to identify a partial solution to one of the problems. a description of the general procedures that were undertaken to evaluate the partial solution, including the procedure for both administering and evaluating the technique; a discussion of the anticipated outcomes of the study; a presentation of the assumptions that undergird the research, the limitations of the research, and the definitions of key terms employed in this study; and an examination of the pilot studies which helped to develop the technique. The Problem The New Perspectives The new perspectives affecting educational goals have their origin in the recently defined nature of man. The simplistic view theorized by B. F. Skinner offers man an opportunity to achieve a relative freedom heretofore unknown to him because of his past ignorance and refusal to recognize the factors in his environment which limit or destroy his freedom. Skinner, contrary to the gen- erally accepted theory of internal control, has hypothesized that man is born with a differentiated ability to respond to stimuli, and through continuous conditioning the probability for any given behavior is changed. Acceptance of the concept that behavior arises primarily from conditioning requires that man learn to assess which environmental factors affect him, and in what way, before he can achieve maximum freedom from environmental control. Skinner has also contributed a method of determining rela- tionships between man and his environment through the observation of behavior, its stimuli, and reinforcers, without theorizing about un- observable factors. Thus, any individual with skill in assessing his milieu is able to determine the behavioral cause-and-effect relation- ships that exist for him, personally, and thereby possibly change the portions of his environment which adversely affect his desired behavior. The work of Jerome Bruner, Jean Piaget, and many math educators has clearly demonstrated that learning needs to begin with concrete models and progress to symbolic models. Van Engen (I949), supporting the theories of both Bruner and Piaget, pointed out that the "meaning of words cannot be thrown back on the meaning of other words. When the child has seen the action and performed the act for himself, he is ready for the symbol for the act." Piaget has been the major contributor of theoretical support for the use of concrete before symbolic models. He has proposed a comprehensive theory of cognitive devel0pment that encompasses indi- vidual growth from birth to maturity. Fennema (l972) describes Piaget's concept: According to Piaget's theory, schemas (mental structures) are formed by a continual process of accommodation to and assimilation of the individual's environment. This adap- tation (accommodation and assimilation) is possible because of the actions performed by the individual upon his environ- ment. These actions change in character and progress from overt, sensory actions done almost completely outside the individual to partially internalized actions that can be done with symbols representing previous actions, to com- pletely abstract thought done entirely with symbols. This development in cognitive growth involves, first the use of physical actions to form schemas. Learners change from a predominant reliance on physical action to a predominant reliance on symbols. Bruner has theorized that a learner utilizes, in order, three representations in the process of acquiring a given concept. The first is the enactive or manipulative stage in which an understanding of a concept can be gained only as far as the actions in correspondence to an object possess the attribute of the idea to be learned. In the second stage, ikonic representation, a child can represent the world by an image of the original object or action performed on the object, without the object being present. The final representation is symbolic. The Effect on Curriculum Educating an individual both formally and informally to live effectively within society has been the primary role of schools. Unfortunately, past efforts have entailed the imparting of "factual" knowledge without emphasizing the origin of these facts, thereby concealing the structure of the subject area studied. Hilda Taba (l967) is critical of a curriculum emphasizing the learning of facts without structuring their implications: "Because specific facts become obsolete more rapidly than basic concepts or main ideas, they are not significant in themselves. Their chief function is to explain, illus- trate, and develop main ideas." Bruner (l960), by pointing out the historic problem of how to teach the basic structure of a subject area, gives evidence of the cafeteria style, fact-teaching of the past. He maintains that since so little is known about teaching the fundamental structure, facts rather than structure have been emphasized in the education of an individual. Studies done by Lankford (l974), Swart (T974), and Peck and Jencks (l974) have attempted to determine what is being taught in today's traditional math classes. These studies found classrooms of children memorizing number facts, definitions, rules, and algorithms. A curriculum consistent with a behavioral oriented philosophy of education that is behavior oriented should have an emphasis which fosters its goals. The education of an individual should now afford him the opportunity to develop the skills necessary to maximize his ability to perceive cause-and-effect relationships by helping him to order and structure his milieu, thus enabling him to become as inde- pendent as possible of both his physical and human environments. The essence of this freedom remains less than absolute because of man's inability to exist outside of an environment with controlling stimuli and reinforcers. John Holt, in Freedom and Beyond, refers to man's relative freedom as a constrained life. We are all and always constrained, bound in, limited by a great many things, not least of all the fact that we are mortal. We are limited by our animal nature, by our model of reality, by our relations with other people, by our hopes and fears. This "constrained" life can only have an individually achieved maximum freedom based on an individual's unique genetic make-up and unique sequence of experiences. The Effect on Instruction Fennema (1972), in summarizing a multitude of studies which tended to support Piaget's theory of cognitive development and Bruner's theory of concept representation, states: Collectively, these data tend to support the hypothesis that a learning environment embodying representational models suited to the developmental level of the learner facilitates learning better than a learning environment that ignores the developmental level of the learner. The acceptance of Bruner and Piaget's theories suggests that models be present in a learning environment if conceptual learning is to take place. Through the use of such models each child would be able to test the correctness of his perceived generalizations for himself or with other students, thereby placing the authority for learning on each child or his group. This type of learning environment would foster individual growth in the ability to perceive relationships and encourage a child to be dependent on his own perceptions rather than on those of a teacher or some other authority. The child is thus weaned from his dependent state to one of independence. Taba (1967) states: In order to develop autonomy of thought, students need opportunities to organize their own conceptual systems and to develop their skills for independent processing of information. Consequently, the nature and the orga- nization of learning experiences should be calculated to encourage the learner to inquire, to do his own thinking, to develop his own ways of working out problems, and to try out his own ideas. Faced with the temptation to pro- vide the answers and solutions, the teacher must grant the learner the right to come to grips with the learning process, even though the products may be less refined than the teacher would wish. Skinner's postulation that individuals are born with a differ- entiated ability to respond to stimuli, Piaget's theory of cognitive development, and Bruner's theorized stages of concept representation all point out a need for the individualization of instruction. By postulating a genetic component to individual response, a uniqueness of response is implied. Piaget's theorized stages of cognitive devel- opment and Bruner's modes of representation also imply a variety of levels of cognitive functioning and modes of concept representation within any given group of children, necessitating the creation of a learning environment which offers a variety of learning situations designed to accommodate the uniqueness of each individual. This individualized instruction could be achieved within a classroom laboratory with concrete, pictorial, or diagrammatic and symbolic models for the children to use in the attainment of concepts. Each child would use a model most meaningful to him and would progress at his own pace. The concepts to be learned could be determined for the child by his teacher with a sequenced exposure to models to ensure the eventual learning of the concept, or a nondirected laboratory exposure to large collections of models could be used. Students in this type of a milieu can grow in their ability to learn through student interactions which could broaden their perceptions, or they can learn through solitary experimenting. Both of these situations permit indi- viduals to differ in the selection of meaningful models and in their ability to perceive relationships while being a member of a learning group. Lab-oriented experiences which use individual or small group explorations, with materials and teachers as resources, would foster the type of learning situation consistent with the goal of teaching children how to perceive relationships. The Effect on Mathematics Instruction The unique contribution which mathematics instruction offers to the education of a person is the opportunity to observe relation- ships directly through the use of mathematical models which range from the concrete to the symbolic. A concrete model (Fennema, l972) repre- sents a mathematical idea by means of three-dimensional objects. A second type of model is the pictorial or diagrammatic. Through pic- tures or diagrams, the attributes of certain mathematical concepts are demonstrated. Finally, symbolic models represent a mathematical idea by means of commonly accepted numerals and signs that denote mathematical operations or relationships. From the use of such models children and adults can experience the act of learning to learn in a math laboratory with models which encourage growth in skills of observing, systematizing, formulating, and testing generalizations. Mathematics also offers the Opportunity to develop the ability to quantify data and tersely express relationships symbolically, so that patterns in any given situation can be discerned more easily. These skills are very necessary if individuals are to develop to their fullest capacity their competency to determine cause-and-effect relationships. The Effect on Teacher Roles The role of the teacher in instruction can contribute to or hinder the achievement of the educational goal of independence, for the product or consequence of this instruction is a function of this role and can be freeing or restricting with respect to an individual's growth. In the traditional instructional milieu, where authority for learning rests solely with the instructor, two interrelated conditions arise. First, a student becomes dependent on his instructor for the "rightness" or "wrongness" of his generalizations rather than on his own ability to prove to himself the truth of his conclusions. Second, a student is limited by his instructor's knowledge rather than his own concerning the relationships it is possible for him to perceive, and he is then limited to perceiving only those relationships which his teacher relates to him. Therefore, traditional expository teaching violates the goal for achieving a maximum amount of independence for ‘IVT each individual by limiting learning and forcing an individual to depend on the perception of others. For similar reasons, programmed instruction in areas of concept development and guided discovery where only one outcome is acceptable are also deterrents to the goal of independence. Resulting Problems and Needs for Teachers The contributions of Skinner, Bruner, and Piaget have influenced new goals in education and new approaches to instruction. Some of the problems and needs which have resulted from these changes are the following: l. Teachers will be using a method of teaching that was not used with them. 2. Teachers will need to learn how and when to use models in their instruction. 3. Teachers will need to determine the student's stage of develop- ment, as defined by Piaget, and the appropriate model for depicting a particular concept best suited to the intellectual needs of the student. 4. Teachers will need to find models for concepts that they wish to teach and all the modes of representation for these concepts. 5. Teachers will need to learn how to organize their teaching days so that they can offer individualized instruction. 6. Teachers will need a system of daily record keeping to enable individual growth to be discerned and planned for. 10. IO Inherent in any teaching situation, especially an individualized lab approach to teaching mathematics, is the problem of accurately assessing the enterking skill and mode of concept representation for each student. In addition, an accurate evaluation following each learning experience to redetermine the functioning level of the student must be made. Teachers will need sizable amounts of time to prepare, administer, and grade tests for the myriad of levels in an individualized lab milieu. Instructional time will be significantly affected. Teachers will need to set aside sizable amounts of student time for taking tests. Teachers will have to find commercial tests or design their own to measure the concrete and pictorial-diagrammatic levels of concept representation. Presently, most accepted evaluation instruments test primarily the symbolic level of concept representation. Purpose of the Study A technique of evaluation was developed in pilot studies by this investigator which intended to do the following: I. Validate a method of measuring student achievement at the symbolic level of concept representation which would then open the way for researching this technique at the concrete and pictorial-diagrammatic levels of concept representation. ll 2. Drastically reduce the time required for preparing, administering, and correcting_tests. 3. Drastically reduce the time students would spend in being evaluated. 4. Offer a record of individualized growth by affording a teacher a collection of evaluations individually submitted which shows what work a child regards as difficult on a day-to-day basis. This, then, can be placed in a folder for the child, parent, or teacher to examine. 5. Place an emphasis on a child's ability to assess his own knowledge and recognize self-growth by asking him to submit an example of what he gan_do. This technique of evaluation is consistent with the goals of a behavioral philosophy of self--and environmental assessment. The purpose of this research is to evaluate the pilot technique for assessing a child's level of skill and concept develOpment in addi- tion and subtraction. The assessment technique to be employed in this research is limited to the symbolic representation of the mathematics concepts and skills being examined. This limitation was placed on the study because of the lack of instruments available in the concrete or pictorial-diagrammatic modes of concept representation with which to compare the results of the technique in this study. The lack of such instruments was established by requesting and subsequently reviewing the commercial diagnostic and achievement tests cited in the twenty-sixth yearbook Evaluation in Mathematics, of the CO 5’! HI] 12 NCTM (National Council of Teachers of Mathematics) and the NCTM brochure, “Mathematics Tests Available in the United States." Marily Suydam's annotated list of unpublished evaluation instruments also was reviewed. Since the concrete stage of concept representation is entirely omitted from all test items, and since the pictorial-diagrammatic rep- resentation is omitted from all tests for middle and upper elementary schools for most concepts, it is apparent that currently accepted instruments of evaluation are tests primarily written to measure the symbolic representation of concepts and skills. General Evaluation Procedures of the Partial Solution Pre- and in-service teachers used the pilot technique and administered the diagnostic test written for the study to groups and individual children attending public schools. The test employed an ambiguous verbal stimulus to which a child was asked to respond. This response was evaluated and then correlated with the index of the diagnostic test written for this study to validate the results. Anticipated Outcomes of the Study The following major hypothesis will be tested to determine whether or not there is a correlation between the results of testing a child by a diagnostic test and the testing technique being studied: 13 There will be a high correlation between the results of testing using a diagnostic test and the results of testing using the technique being studied. The following five hypotheses will be tested to determine whether or not there is a difference between groups in their ability to use the B1 82 B3 B4 85 testing technique in this study. There will be no significant differences between the high, average, and low achievers as determined by the Iowa Achievement tests in their ability to assess their level of abstract achievement. There will be no significant differences between the high, average, and low achievers as determined by teacher judgment in their ability to assess their level of abstract achievement. There will be no significant differences between Blacks and Caucasians in their ability to assess their level of abstract achievement. There will be no significant differences between girls and boys in their ability to assess their level of abstract achievement. There will be no significant differences between children from high, average, and low income families in their ability to assess their level of abstract achievement. 14 The following hypothesis will be tested to determine whether or not there is a racial bias with respect to what a child perceives as difficult or "hard.“ C. There will be no significant differences between racial groups in what they perceive as "hard." Assumptions Evaluation in mathematics instruction is based on several assumptions. First, determination of a student's stage of cognitive and mathematical develOpment is a necessary task, regardless of the teaching model being used. Second, current diagnostic tests are relatively accurate in determining a student's competency level with abstract models of concept representation. Third, thresholding is a valid means of determining a level of students' functioning when using diagnostic tests. Fourth, proper sequencing of levels within a diag- nostic test is necessary if thresholding is to be used as a means of determining a level of functioning. Fifth, there are three stages of concept representation: the concrete, pictorial-diagrammatic, and abstract. Limitations of the Study Three major limitations of this study should be noted. First, only two of the four operations with whole numbers were used in the study, and, no other areas of mathematics which might be assessed by the technique being evaluated will be researched. Second, the abstract stage of concept representation is the only stage considered because of 15 the problem of validating testing results for the concrete and pictorial-diagrammatic stages. Finally, only children in grades l through 6 were studied. Definition of Terms In what follows, the major terms used in this study are defined. abstract (symbolic) models: Models which represent a mathematical idea by means of commonly accepted numberals and signs that denote mathematical operations or relationships. ambiguous stimulus: A stimulus which elicits a variable response from a group of individuals. behaviorism: The science of behavior which is attempting to understand the relationships between and within the genetic endowment, historical environment, and present environment of individuals with the ultimate goal of accuracy in the prediction of behavior. commercial tests: Those tests prepared by various companies which attempt to measure mathematics achievement. concrete models: Models which represent a mathematical idea by means of three-dimensional objects. level of concept development: The level of model needed by a person in order to attain the concept being presented. The model represen- tations are the concrete, pictorial-diagrammatic, and symbolic. 16 math lab milieu: A math learning environment having models that represent mathematical ideas concretely, pictorially- diagrammatically, and symbolically and a variety of instruc- tional media, such as tape recorders, to enhance the learning of mathematics in an individualized or small group situation. pictorial-diagrammatic models: Models which represent a diagrammatic mathematical idea by means of pictures, diagrams, or devices such as a number line, which illustrates many of the attributes of the idea. proper sequencing: A sequencing of response categories which consists of an "ascending“ series carried far enough to locate the transition part or threshold from one response category to another. ggantitative understanding: The understanding that comes with numerals, mathematical symbols, and operations which enables a child to relate these mathematical ideas to his environment. teacher prepared tests: Those tests prepared by a teacher to measure the entering or terminal behavior of a student in mathematics. teaching model: A set of associated ideas and concepts more or less organized around a larger conception of what teaching should be like. It enumerates the components of a teaching situation and shows a general relationship between these components. testing technique: A method of eliciting student responses which indicates an achievement level without utilizing an instrument or prepared list of objective questions. 17 thresholding: A level (threshold) of functioning ascertained by observing where in a sequenced task a person begins to make more errors than correct responses, or where this individual stops participating in the task. The Pilot Studies Pilot studies were conducted in Cornell School in Okemos, Michigan, and in several schools in the Lansing, Michigan, area by Elementary Intern Program students. Additional data were collected at Ball State University by students in methods classes who are required to tutor individual or small groups of elementary students. These studies attempted to find out whether or not elementary school children would respond to an open question posed in terms of “hardness." Several forms of questions were used to determine the most effective. See Appendix A for the questions used. Many children in the pilot studies conducted for this research responded to the assessment questions by giving a memorized problem and answer, that is, 2000 + 2000 = 4000, lOO + l00 = 200. Since the prob- lems always used large numbers, it would appear that this behavior was intended to impress the examiner. To overcome this problem in the validation study, youngsters were asked to write a problem without zeros. This change in procedure appeared to give more dependable results. Requesting a child to check his results with an aid also eliminated memorized responses. 18 In addition, the pilot studies showed that, with further testing, a child who would not submit a problem was not able to respond to symbolic representation in the area being assessed. However, if this nonrespondent was given a manipulative aid of his choice, he could provide both problems and solutions. Across operations, children indicated that "hardness" was equivalent to large numbers. The majority (43 out of 72) gave examples of "hard" problems using numbers greater than 100. When children were given mathematical models to use, their responses seemed to be corre- lated to the device used. If a model was used which limited the size of numbers to a quantity under 70, then the hardest problems submitted included numbers close to 70. If, as in Chip Trading, problems with regrouping were treated no differently than those without, children rarely cited problems with regrouping as "hard." These observations were made with only 22 students. Finally, the pilot studies revealed that errors in posing assessment questions and interpretation of questions by children resulted in some children offering problems that they could not solve. These problems were generally solvable by the child who submitted the problem after a short period of instruction. CHAPTER II REVIEW OF THE LITERATURE Introduction A review of the literature was made to establish the important role of evaluation within the theoretical framework of teaching models in general and within the teaching of mathematics in particular. In this chapter, a review of the development of standardized tests dis- closing the historically based need for objective evaluation to ascertain a level of student cognitive functioning is followed by a presentation of the historically established criteria for judging evaluation instruments and measurements. After examining the research conducted to determine the effectiveness of teaching mathematics using concrete, pictorial- diagrammatic, or symbolic models in a mathematics laboratory, numerous ways of evaluating learning in a mathematics laboratory which are currently being used are presented. The chapter concludes with a theoretical basis for employing a thresholding technique, followed by a presentation of the historical precedent of using an ambiguous stimulus in testing, as is employed in the testing technique in this study. 19 20 Role of Evaluation in Teaching Models Both the behavior-modification teaching model and the discovery- learning model consist of a set of associated ideas and concepts more or less organized around a larger conception of what teaching should be and how it should be viewed. Nutshall and Snook (l973) have described the behavior-modification model: "[It] consists of that set of concepts and claims about teaching which has arisen from the attempt to apply the interpretive framework of behavioral psychology to the classroom." They add that "the discovery-learning model incorporates those views of teaching which place greatest emphasis on the self-directed activity of the student.“ Glaser (1962) has developed a simple basic teaching model including the four essential components of any teaching situation. DeCecco (1968) pointed out that these components are present in most teaching models, especially in the models used to depict behavior- modification and discovery-learning. A basic teaching model (Glaser, 1962) is as follows: Instructional ______+ Entering ______+ Instructional ______+ Performance Objective Behavior Procedures Assessment A B C D Instructional objectives are measurable goals which a student should obtain by the completion of a segment of instruction. Entering behavior describes the student's level of cognitive and affective devel0pment prior to instruction. Instructional procedures refer to the input of a teacher in the changing of a student's behavior and is commonly called learning or achievement. Performance assessment consists of tests and observations used to determine how well the student has achieved the instruc— tional objectives. Two of the four elements of the basic 21 model require that information from the student be collected. In noting the entering behavior all past experiences of a student deemed relevant to the new teaching situation must be assessed, while performance assessment in the portion of the model which deals with determining what learning took place with respect to the instructional objectives. It is apparent from the literature that the evaluation of student learning is a necessary component of most teaching models. For the two models consistent with a behavioral philosophy, the behavioral-modification and the discovery-learning models, evaluation has a definite role. Role of Evaluation in Mathematics In the NCTM's twenty—sixth yearbook, Evaluation in Mathematics, Sueltz states emphatically the role of evaluation in mathematics: Mathematics is an important part of the curriculum at all school levels beginning in the kindergarten. It is orga- nized in a sequence of topics and activities that are associated with appropriate levels of maturity and ability of the students. Evaluation can identify and define steps and levels in the sequence that are appropriate for a given grade or age level. Careful evaluation should show not only how far a pupil has progressed in the major steps of a sequence, but also how well he has understood and mastered a particular step. Good evaluation will show the facts and skills mastered (and those not mastered) by the student, his attitude toward the subject, and the depth of understanding and insight accompanying his work. He adds: Evaluation is useful in determining the relative ease or difficulty of learning, applying, or remembering a topic, and materials. We need to know how long it takes to master a given concept, the suitable concepts for different grades, the appropriate sequence of concepts, and the aids the teacher needs to build mastery of each concept. ' —' I I A. _' .. ._»-- . . 22 Reisman (1972) points out the importance of evaluation in determining a mathematics curriculum for effective instruction. By ascertaining a student's level of functioning, a curriculum can be developed which will meet the needs of the students involved without the negative ramification of an inappropriate curriculum. She states: In looking at the mathematics curriculum, one must con- sider the level of difficulty involved. If the curriculum contains an abundance of material which is too advanced or too difficult for the student, he may become frustrated and give up; on the other hand, a curriculum that is too easy leads to boredom and the student again may give up. Reys (l97l), in an article on manipulative materials, remarked that to judge the effectiveness of materials, it would be wise to evaluate learning following their use. Do evaluate the effectiveness of materials after using them. Immediately upon the completion of an activity, it can be very helpful to note particular problem areas, strengths, weaknesses, and suggestions and to define areas of needed improvement as well as possible areas of modification. A continuous reevaluation of manipu- lative materials ultimately results in better materials as well as more effective use of them. Ewbank (l97l), in an article on mathematics labs, discussed the inherent problem of evaluating mathematics learning in a laboratory milieu. Some people use standard methods, that is, teacher-made or standardized tests. But the results of these tests may be deceptive, as it is very difficult to measure understanding and grasp of concepts in this way. . One way to measure progress in the mathematics labora- tory is to look at the quality of written reports. A high standard of written reports should be required, but in the primary grades it is a mistake to force chil- dren to write reports until they are ready to do so. 23 Mathematics can be learned by manipulating devices such as the equalizer balance and colored rods without any writing at all. Small children need to play with containers of sand, water, and so on, and in the process they grasp very important concepts such as the conservation of quantity. I do not see how you can evaluate this in the orthodox way. . . . For chil- dren at this early stage of devel0pment, subjective evaluation may be the best means. However, subjective evaluation should be based on the teacher's notes, anecdotal records, and a scrutiny of the child's progress in his written recording. Short periodic quizzes may be useful to show up those who cannot do certain processes or who obviously have not grasped the relevant concepts. Sueltz summarizes the basic functions of evaluation in the total mathematics program in the following way: I. Evaluation can establish levels of learning and locate a student at a level suitable for his current status in mathematics. 2. Evaluation is useful in improving the mathematics pro- gram in terms of curriculum, content, and organization, selection of materials for learning, and modes of in- struction and learning. It can furnish data which should be used in making value judgments. 3. The place of mathematics in modern society can be studied and appraised in its many ramifications, and the results of such appraisal can be used in an appre- ciative way and also as a factor in determining the curriculum. 4. Competent evaluation of the mathematics program of a school is useful in keeping the clientele of the school informed and in answering questions raised by critics. 5. The information and data collected in evaluation form the substance of a student's record in school. These data are useful not only for records and reports, but also for research. 6. Evaluation is much concerned with helping the student learn mathematics more effectively. Hence, it seeks answers to many questions dealing with the kind of mathematics, the level of learning, motivation, and aspiration. 7. Different modes of learning and their effectiveness when applied to mathematics should be evaluated. This applies to various types of materials, various levels of learning, and various types of students. 8. Finally, evaluation itself provides valuable learning experiences that a good teacher will capitalize on to enhance the work of the students. 24 The importance of evaluation in mathematics education is clearly stated in Sueltz's summary. In order accurately to perform the evaluations he cites, new instruments and techniques of evaluation will have to be developed which take into account Piaget's theory of cognitive growth and Bruner's theory of concept representation within a behavioral philosophy of education. Historical Development of Standardized Testing A review of the historical development of testing reveals that testing did not originate in the pursuit of educational ideals, but, rather, stemmed from personal and political considerations. Mehrens and Lehman have described the historical setting: When Binet developed his first scale, he was concerned with devising a means of removing dull pupils from the overcrowded schools in Paris rather than with constructing an instrument specifically designed to help the classroom teacher relate certain intellectual qualities to the learn- ing process. Horace Mann really did not intend to devise an objective measure of pupil accomplishment. His criticism of the public schools in Massachusetts infuriated a group of teachers and lay citizens in Boston. This group were intent in resisting and refuting Mann's opinions. In the end, as a solution to the problem, it was agreed to prepare written examination questions in history, geography, vocabulary, science, arithmetic, astronomy, and grammar. This survey instigated by Horace Mann, was the first instance in which the same written examination was given to a sample of all pupils at the same school level, and where the papers were scored under uniform conditions. Although the findings con- firmed Mann's contention that the public schools were not as good as claimed, it would appear that the findings did not serve as a stimulus to more objective and refined evaluation techniques in American public schools. Green (l970) noted the following about Mann's achievement tests: 25 It is interesting to note that these same examinations were given to all eighth graders in the Boston schools following World War I in order to compare the results with the scores of the original pupils. The children in l9l9 excelled their 1845 predecessors by a considerable degree in all areas except arithmetic problem solving. Another examination given in Springfield, Massachusetts, in l846, and a retest in l906 gave results similar to those in Boston (ubberley, I934). At the time of the American Civil War a little known man in the field of education constructed the first objective educational test. Reverend George Fisher, an English schoolmaster, devised a series of tests to measure accomplishment in spelling, grammar, handwriting, composition, mathematics, and other school subjects. This series of tests was referred to as a Scale Book. Mehrens and Lehmann (I969) have described its contents. Thorndike made a major contribution in 1904 when he pub- lished the first comprehensive book in the field, Mental and Social Measurement. In this book he proposed several of the principles which are still used in constructing standardized tests. Among these principles were (l) test items should be scaled according to difficulty, (2) tests should be objectively scored, and (3) tests should have statistical norms. Thorndike gave further impetus to the field by publishing the l909 "Scale for Handwriting of Children" and by encouraging students to do further work in the field. During this period there were several new tests which helped turn the tide of schoolmen in favor of the movement. These tests included C. W. Stone's I908 edition of a standardized achievement test in arithmetic, the arithmetic scales by Courtes in 19l0, and the "Composition Scale" by Ayres in l9l2. The impetus for the continued development of standardized tests came from three sources: (l) unreliability of school marks as an indi- cator of school achievement, (2) a group of city school surveys con- ducted between lOlO and l9l7 in which standardized tests were used to measure student achievement, and (3) the results of three noteworthy studies. 26 Mehrens and Lehmann (1969) have pointed out the problem of unreliable teacher grading: In 1912 and 1913, Storch and Elliott had a group of teachers independently grade an English essay, a geometry paper, and a history paper. They found considerable varia- tion in grades assigned (even with the geometry paper, which we would assume to be more amenable to objective evaluation). In 1928, Falls had 100 English teachers grade an essay written by a high school senior (who, incidently, wrote for a newspaper). The teachers were required to assign both a numerical grade to the essay as well as indicate the grade level of the student. Once again, as in Storch and Elliott's study, there was marked variation in both the numerical grades assigned and the estimated grade level of the writer. The grades varied from 60 to 98 percent and the grade level from 5 to 15. These kinds of studies led to the search for, and development of, more objective procedures for testing and grading students. From the school surveys done using standardized tests, the economic value of producing an acceptable test battery became apparent. In 1919 the Stanford Achievement Battery was published. It was designed primarily for use at the elementary level. Green (1970) has stated: Although achievement tests changed very little after the publication of this battery, numerous test publishing companies were established, and standardized tests were developed in all fields. An idea of the rapid expansion in the field can be gained from Hildreth's bibliography of mental tests and rating scales. Hildreth listed 3500 titles in 1935, 4279 titles in 1939, and 5294 titles in 1945. Three influential studies which showed the major development in standardized achievement testing in the 19405 and 19505 as listed by Mehrens and Lehmann (1969), were: (1) the Eight-Year Study of the Progressive Education Association in 1942, (2) the College Entrance Examination Board long-range study initiated in 1952, and (3) the Cooperative Study of Evaluation in General Education completed in 1954. 27 These studies showed an increased use of standardized achievement tests in our public schools, a beginning inclusion of critical thinking, application of knowledge, synthesis, and evaluation, and the refinement of techniques used to construct and standardized achievement tests. Ayres (1918) prophesied the importance and subsequent growth of the educational measurement movement in the seventeenth yearbook of the National Society for the Study of Education, Part II: "Knowledge is replacing opinion, and evidence is supporting guesswork in education as in every other field of human activity." In the final chapter of that yearbook, Judd (1918) noted: The time is rapidly passing when the reformer can praise his new devices and offer as the reason for his satisfaction, his personal observation of what was accomplished. The super- intendent who reports to his board on the basis of mere opinion is rapidly becoming a relic of an earlier unscientific age. There are indications that even the principals of ele- mentary schools are beginning to study their schools by exact methods and are basing their supervision on the results of their measurements of what teachers accomplish. Merwin (1969) pointed out that the changes in educational evaluation have evolved through interaction with (1) accepted theories and practices of education, (2) the role accepted for evaluation in the educational process, and (3) technical developments in educational evaluation. Dobbin (1956), citing evidence of the effect of learning theories and practices in education on evaluation, noted that not only fundamental changes in learning theory, but also sweeping changes in enrollment and school organization patterns, have led to changing concepts of assessing achievement since the early 19305. Starch (1916) 28 suggested that evaluation concern itself with determining individual differences in what pupils learn. Educational practices which evolved from this general idea ranged from "homogeneous grouping" to and includ- ing individualized instruction. Dressel (1950) pointed out that testing cannot avoid influencing instruction. The role of evaluation in educational changes and the resultant changes in evaluation require examination. 1. The role in general school planning.--Efforts by Haggerty (1917) to determine the effect of evaluation on school planning gave evidence that, as a result of testing, changes occurred in (a) classi- fication of pupils, (b) school organization, (c) courses of study, (d) methods of instruction, (e) time devoted to subject, and (4) methods of supervision. Twenty years later, Reaves commented: "The development of the measuring movement and the perfection of tests for the measurement of achievement and mental capacity have made possible great advances in educational administration." 2. The role in instruction.—-Merwin (1969) pointed out that during the 19305 there were a number of proposals suggesting that school testing programs should be conducted in the fall of the year as a basis for evaluating the level of achievement following instruction. Troyer (1947) proposed that pretesting be used to determine the degree of knowledge and skills the students possessed which were prerequisites to the concepts to be taught. In the forty-fifth yearbook, Douglass and Spitzer wrote: "For many years we believed that good teaching begins where the child is, at the point to which his achievement has 29 brought him. We realize that we must take into consideration what the pupil already knows if we are to guide his learning from then on in an effective manner.“ 3. The role of student decision making,--Simpson (1953) cogently argued that most learning takes place outside the classroom and that much more learning could take place if students developed skills for realistically planning and evaluating their own educational experiences. 4. Changing concepts and the content of evaluation.-— (a) Merwin (1969) pointed out that we apparently are in the process of completing a cycle approximately fifty years in length. Monroe's book, Measuring the Results of Teaching, described evaluation as focus- ing on very detailed objectives related to skills. Glaser (1967), at the Invitational Conference on Testing Problems, presented graphical descriptions of the accomplishments of individual students over time on relatively minute units of learning. Between the publications of these two reports, there has been considerable emphasis on more gen- eral outcomes. (b) Acceptance of the philosophical position that the teacher should take each child "where he is" and move him as far as possible toward his maximum potential development calls for a measure of status at two points in time as a basis for determining change, or "growth." (c) Bloom (1956) gave considerable impetus to the broadening of evaluation efforts to include the measuring of “higher mental processes." A publication by Krathwohl, Bloom, and Masia (1964) holds promise for broadening evaluation procedures to take into account very important educational objectives that fall in the~ 3O affective area. Environmental factors affecting learning have long been recognized, but only in recent years, with the work of Pace and Stern (1959), Wolf (1965), and Coleman (1966), have there been serious attempts to obtain measures of perceptions of environmental factors. (d) Early emphasis on evaluation focused on individual achievement. In more recent years the focus has been on the evaluation of group achievement to determine the effectiveness of teaching materials, instruction, and curriculum. The work of Rice (1897), Arnold (1916), Cronbach (1963), and Scriven (l967) testify to these changes in emphasis. (e) With the expansion of educational involvement in the areas of the military, colleges and professional schools, and early childhood edu- cation, the need for an accompanying new evaluation concept has arisen. Merwin (l969),in the sixty-eight NSSE yearbook states that changing concepts in evaluation have grown out of the technical develOp- ment and the modes of interpretation which have developed to accompany new testing techniques. He showed that there are three major areas of concern. 1. The published Stanford Achievement Tests in 1923 by Terman, Ruch, and Kelly offered the first battery approach to testing across subject. This approach has been generally accepted as a source of achievement information for many years. The most prudent time to administer a test battery has been a point of controversy. School administrators have argued that the tests offer a measure of individual and group accomplishments and should be given at the end of a school year. Others have argued for fall testing to provide information to teachers as a basis for planning instruction. 2. When achievement tests were shown to be a more efficient and objective measure of achievement when compared to "essay" tests, the use of absolute (percentage) scores resulted in the development of a normative approach to testing. For several decades evaluation focused on the 31 development of instruments which reliably differentiate between individuals and interpreted the results of these instruments in terms of norms. Recently the focus has been to establish standards, as in the Oak Leaf Project at Pittsburgh (Glaser, 1968), which is a "mastery" testing. This type of testing is based on a child showing that he has accomplished a particular task or behavior to a cer- tain degree of proficiency as required. Additional types. of evaluation which have come from a competency--based on education are those which Burns (1972) speaks of: "When the method or way of performing (behaving) is important, a process measuring situation can be thought of as a test item. If the end result is more important than the method, a product measuring situation is required. Products can include plans, blueprints, drawings, paintings, tables, charts, diagrams, models, photographs, collections, specimens, stories, poems, and an infinite number of other real things. In many instances much can be inferred about a process from observing a product, the two are interrelated. Evaluations using processes and products are commonly more valid than merely testing at the verbal level, which may or may not indicate competence." The interpretation of achievement in terms of potential has been used by educators for many years for identifying selected norm groups. Schudson (1972) has described one of these established norm groups as a “meritocracy.” He states that through the use of College Boards to determine "admissions to certain selective colleges, an additional simultaneous choice is made in the selection of those indi- viduals in a society who are to be the future rulers of that society and the holders of the wealth." The report of the Commission on Tests (1970) described the situation in the following manner: "Certainly it is particularly unfortunate that the characteristics that make for success in school work as it is commonly conducted are, if not specific to some seg- ments of society, at least disproportionately distributed among its social classes and its racial and ethnic groups: Bowdoin College's admissions director, Richard Moll, told the press that the tests could not escape cultural bias and so 'tend to work in favor of the more advantaged elements of our society, while handicapping others.‘ Problems of inter- pretation have arisen when achievement scores have been regressed on aptitude scores giving 'expectancies.‘ A lack of understanding of the meaning of 'expectancy' has led to the ideas that 'underachievers' can come up to their pre- dicted level of performance if they would just apply them- selves, and an 'overachiever' is doing better than he is capable of doing. As a result of labelling children, 32 teachers when expecting low achievement will often get just what they expect, resulting in a phenomena which has been called the 'self-fulfilling prophecy.'" A major consideration of educational evaluation in the beginning was the provision of information for the teacher's use in working with students. The resultant effect of the use of standardized tests in the early part of the century was a new potential for considering the outcomes of different groups on a common examination. The use of a common test to evaluate learning has spread from a schoolwide basis, to a statewide consideration, and currently to a national assessment. Lewy (1973) has raised some serious questions concerning the use of achievement tests to discriminate both among individuals and among classes. Item selection procedures which are recommended for con- structing tests for individuals differentiation may not be adequate for tests for discrimination among classes. In spite of the practical difference between discrimination of these two types, educational research has not paid enough attention to the existence of such differences, and there- fore little systematic study has been devoted to its implications for the planning of educational studies, for the construction of instruments, and for analyzing educational data. Carver (1975) in reviewing the findings in the Coleman Report (Equality of Educational Opportunity Survey, 1966) pointed out the Coleman data was designed to be biased against finding significant educational effects for the same reasons cited by Lewy. He stated: Given the impart of the Coleman Report on federal policy and the allocation of federal funds, it is important that the basis for such policy be on firm ground. It would be unfortunate if the data did not reflect what they were purported to reflect. 33 With the advent of district, state, and federal testing and the resultant use of these results to make decisions concerning the funding of educational projects, the necessity for continued research in evaluation to answer the problems cited has been mandated. A review of the historical development of testing disclosed the necessity of developing reliable objective tests to measure student achievement. This need has continued and grown as the evaluation of learning has been used to research the effectiveness of certain cur- ricula, instruction, and learning environments as well as to simply measure individual achievement. Based on the assumption that measures of evaluation should be objective, the technique in this study offers a means of evaluation which retains the well-established need for objective measures. In addition, the testing technique also emphasizes the measurement of individual growth and self assessment. Historically Developed Criteria for Judging Evaluation Instruments and Measurement The need for objective evaluation instruments and measurements has existed for a relatively long time, acting as an impetus for the development of criteria to determine whether or not any given instrument or measurement did what it was purported to do. These criteria will be used in Chapter V to help evaluate the study's testing technique. The first of a series of publications designed to help test makers refine their instruments was Statistical Methods Applied in Education written by Harold Rugg in 1917. From Rugg's work came a series of criteria for judging the desirability of accepting a testing 34 instrument and its results. Gronlund (1971) lists and defines these criteria as validity, reliability, and usability. Validity Validity refers to the extent to which the results of an evaluation procedure serve the particular uses for which they are intended. Three types of validity have been identified and are now commonly used in educational and psychological measurement: (1) con- tent validity, (2) criterion related validity, and (3) construct validity. Gronlund has defined these concepts: 1. Content validity may be defined as the extent to which a test measures a representative sample of the subject- matter content and the behavioral changes under consideration. 2. Criterion-related validity may be defined as the extent to which test performance is related to some other valued measure of performance. 3. Construct validity may be defined as the extent to Which test performance can be intepreted in terms of certain psychological constructs. Gronlund has pointed out additional factors found in the test instrument which, if ignored, will lower the validity of the test results. 1. Unclear directions. 2. Reading vocabulary and sentence structure too difficult. 3. Inappropriate level of difficulty of test items. 4. Poorly constructed test items. 5. Ambiguity. 6. Test items inappropriate for the outcomes being measured. 35 Test too short. Improper arrangement of items. Identifiable pattern of answers. Factors which influence validity that can be found in the administration and scoring of a test are the following: are 1. been) due 01-wa Cheating. Failure to follow directions. Ignoring time limits. Giving pupils unauthorized assistance. Errors in scoring. Poor physical environment. Conditions that might adversely affect test validity which to personal factors are: Motivation. Anxiety. Fatigue. Illness. Test-wiseness (ability to discern cues to correct responses from the test itself). Response set (consistent tendency to follow a certain pattern in responding to test items). 36 Gronlund summarizes the nature of validity thus: the validity of test results is based on the extent to which the behavior elicited in the testing situation is a true representation of the behavior being evaluated. Thus, anything in the construction or the administration of the test which causes the test results to be unrepresentative of the characteristics of the person tested contributes to lower validity. In a very real sense, then, it is the user of the test who must make the final judgment concerning the validity of the test results. He is the only one who knows how well the test fits his particular use, how well the testing conditions were controlled and how typical the responses were to the test situations. Reliability Reliability refers to the results obtained with an evaluation instrument and not to the instrument itself. According to Gronlund (1971), Reliability refers to the consistency of measurement. That is, to how consistent test scores or other evaluation results are from one measurement to another. . . . A closely related point is that an estimate of reliability always refers to a particular type of consistency. Test scores are not reliable in general. They are reliable (or able to be generalized) over different periods of time, over different samples of questions, over different raters, and the like. It is pos- sible for test scores to be consistent in one of these respects and not in another. The appropriate type of consistency in a particular case is dictated by the use to be made of the results. . . . Treating reliability as a general characteristic can only lead to erroneous interpretations. Gronlund adds that reliability merely provides the consistency which makes validity possible. A highly reliable measure may have little or no validity. Factors which may influence reliability are: 1. Length of test--In general, the longer the test the higher reliability. 37 2. Spread of scores--In general, the larger the spread of scores, the higher the estimate of reliability. 3. Difficulty of test-~Tests which are too easy or too difficult for the group members taking it will tend to provide scores of low reliability. Usability Usability refers to the practical considerations of selecting an evaluation instrument. Some of these are: 1. Ease of administration. Time required for administration. Ease of scoring. Ease of interpretation and application. Availability of equivalent or comparable forms. 0301-wa Cost. Review of the Research in Math Instruction The definition of a math lab contributed by Kerr (l974) identifies the areas of research to be reviewed if math labs can be thought of as effective environments for learning. The mathematics laboratory is a strategy of instruction in which the learner himself interacts with mathematics and its real-world applications. The techniques used in a laboratory strategy may be varied; they may include discussion, discovery activities, model construction or even some directed teaching. Likewise the interaction of the learner with mathematics and its applications may vary. But the laboratory strategy focuses the learner's attention and activities on the relationship between mathematics and its real-world applications. 38 The real world applications of mathematics take the form of models which demonstrate the mathematical concepts in a meaningful manner to the learner. On the basis of the research evidence put forth by the 20 studies conducted to determine the effectiveness of using models and activity oriented classrooms in teaching mathematics in kindergarten through third grades, it does appear that the use of mathematical models and activities contributed to effective teaching. Table 1 presents a summary of these studies. Aurich (1963), Hollis (1964), Crowder (1965), Nasca (1966), Williams (1967), Howard (1969), and Wynrath (1970) found significance in favor of the experimental groups using models and activities. Weber (1969) did not find sig- nificance, but did find a trend favoring the use of manipulatives. Two additional studies, by Norman (1955) and Ekman (1966), did not find significance for either the control or experimental groups at the end of the instructional period, but did find the experimental group showed superior retention two weeks and three weeks, respectively, after the instructional period had ended. Only one of the 20 studies showed the "traditional" method of instruction produced significance in achievement. This study, conducted by Passy (1963), used Cuisenaire rods and offered the only evidence that a traditional approach can be more effective than teaching with models and activities. From the research charted in Table 2, it seems apparent that using models does not hurt the learner's ability to comprehend mathe- matical concepts. Studies by Dawson and Ruddell (1955), Carmody (1970), Bisio (1970), and Nickel (1971) show significant results for the use of 39 Table 1. Summary of the effect of activity and model methodologies on the learning of mathematics in kindergarten through third grade Significant Difference Author Grade Level Model Test Used In Favor 0f Mathematical Content Norman third concrete and author neither group at the division of whole (1955) semiconcrete constructed end of instruction; numbers models concrete and semi- concrete at the end of two weeks Eidson early many standardized neither arithmetic in lower (1956) elementary multisensory achievement grades aids Sole early manipulative standardized neither arithmetic in lower (1957) elementary aids achievement grades Seick second and multisensory author neither computation and (1959) third aids constructed arithmetic reasoning Aurich first Cuisenaire standardized Cuisenaire total range of (1963) rods achievement treatment first grade work Haynes third Cuisenaire author neither multiplication (1963) rods constructed Passy third Cuisenaire standardized traditional computation and (1963) rods achievement treatment arithmetic reasoning Lucow third Cuisenaire author neither multiplication and (1963) rods constructed division Hollis first Cuisenaire standardized Cuisenaire total range of first (1964) rods achievement treatment grade work Crowder first Cuisenaire standardized Cuisenaire total range of first (1965) rods achievement treatment grade work Nasea second Cuisenaire standardized Cuisenaire total range of second (1966) rods achievement treatment grade work Lucas first Dienes standardized Dienes treatment for identified in (1966) arithmetic achievement conservation of projection terms: blocks and author number and concep- multiplication of constructed tualization of mathe- relations and addition- matical principles; subtraction relations traditional for computation and solving of verbal problems Ekman third counters author neither at end of addition and (1966) constructed instruction; concrete subtraction model group on a algorithms retention test Weber first manipulative standardized neither but a trend total range of first (1969) and concrete achievement favored through third grades and author manipulatives constructed Howard early concrete author concrete materials sorting, counting (l969) elementary materials constructed classifying and I patterning sets Wynrath kindergarten games standardized games total range of (1970) achievement kindergarten and first grade work Moody, third manipulative standardized neither multiplication Abell & and concrete achievement Bausell materials (1971) Ropes second multisensory standardized neither total range of (1972) aids achievement second grade work and author constructed 4O Table 2. Summary of studies to determine the effectiveness of teaching with models and activities in grades four through six Significant Difference Author Grade Level Models Used Test Used In Favor Of Mathematical Content Price fifth and multisensory author neither division of fraction (1950) sixth aids constructed Howard fifth and concrete and author neither at end of total range of fifth (1950) sixth semiconcrete constructed instruction; semi- and sixth grade work concrete three months later Dawson 8 fourth many diverse author concrete-model group division of whole Ruddell models constructed numbers (1955) Anderson eighth various visual author neither area, volume and (1957) tactile constructed pythagorean theorem devices Mott fifth and many multi- standardized neither measurement (1959) sixth sensory aids achievement Spross fifth and concrete aids standardized neither total range of fifth (1962) sixth that had achievement and sixth grade work cultural significance True— fifth and manipulation standardized demonstration of fractions blood sixth of aids and achievement aids (1967) demonstration of aids Toney fourth manipulation standardized neither fourth grade content (1968) of aids and achievement demonstration of aids Green fifth diagrams standardized neither multiplication of (1969) cardboard achievement fractions sticks Carmody sixth concrete and author concrete and sixth grade work (1970) semiconcrete constructed semiconcrete Bisio fifth demonstrated author manipulatives fractions (1970) manipulatives constructed Wilkin- sixth laboratory standardized neither metric geometry son materials achievement (1970) Nickel fourth abstract standardized multi-model approach fourth grade work (1971) picture and achievement diagrams; concrete Ropes sixth laboratory standardized neither sixth grade work (1972) materials achievement 41 models in teaching. Howard (1950) showed that there was no significant difference between treatment groups until a test was administered three months later to make a determination on retention. 0n the retention test the group using the models did significantly better. The summary of results shown in Table 3 appears to reverse the findings in the early elementary studies. Instruction using models is less effective than traditional approaches. This finding was borne out by the work of Johnson (1970), Cohen (1970), Schwartz (1971), and Shoecraft (1971). Low achievers showed a need for aids in instruction in the Shoecraft (1971) study by showing significant results in group achievement. Waslyk (1970) showed significant results for his experimental group when working with measurement concepts using concrete models. In reviewing this research, several questions occurred to this reader concerning the wisdom of accepting many of the results as an accurate measure of the effectiveness of model and activity teaching. Two such reservations are noted below. 1. Key words and procedures in the study lacked operational definitions. Therefore, variables which might have affected the results remain undisclosed. This lack of definition also affects replicability. 2. Concepts taught at the concrete, pictorial-diagrammatic level of representation were primarily evaluated at the abstract level of representation. This cannot help but place the results of teaching which uses concrete and semiconcrete mathematical aids and models at a disadvantage. 42 mce>mweue new; ece m_eews spew: eenempe Low Fecowpweecp ”mew acese>mwcue mewe ece Apnmpv cw mcw>Fom Eepnoca ->e_;oe sop com me_e emeweeeeceum emce>we xcee spcm>em peegueogm _e:owpweecp gucmve xeoz eeeem eeega cpcmwe meecpwmc empuchmcou mewe use A_no_v gunmwe use cpcm>em .eeecm gpce>em powwoea emee>we Aces gpce>em Nptezgom cowpuaeemcw pamEm>mwgue mewe Foocum Aomm_v meowpuecm Fecowpweecp emeweceeceum emcm>we xcee eFeewe ceeou eepuzepmcoo memesec Pecowuee ece cozuse ece “seamcemeee .XLueEomm cowpusepmcw ucmse>ewgue AcumPV .agoegp Lease: empcewco xoonpxep emNPeLeeceum aces Luce>em cemcgoe pcmse>ewcoe idmumav “cesecamems epegucou eeeweceecepm epecucou gecwc xxpmez memuw>wpue epgmwe xeoz meegm acese>e_coe ece mewe ece Amom_v seemee ece epcm>em Leeuwec emeweceeceum emcm>we xcee cpce>em euce> xcoz ucmse>ewcue mewe saw: idwmmpv cow: Lorene mo emcee eecpwec eeeweceecepm emee>we xcee coves” ewe>m mace» uwcpeeomm mo pceee>ewcue cowpeucemec Ammmpv acumeoem errom Leguwm: emN_egeeceum 1am; Peowmaga gum—e3» cmcou pceucou _euwpe2ecpez mo eo>em cH new: awe» ewe: Pena: Fe>e4 cogue< euceceewwo uceuwewcowm eeeec e>pe3u gmeogzu ce>em meeeem cm mowpeemspes we mcwcceep we» so mewmopoeocpes Fence ece xpw>wuue mo puecee we» to xceesem .m epneh 43 Despite the criticisms which might be leveled at the research cited, the results certainly can be accepted as strong evidence in support of model and activity learning. In the majority of cases these studies still found significant results in favor of such learning, even though the instruments used to measure learning placed them at a dis- advantage. These instruments measured learning using symbolic concept representation, whereas a child using models and activities experiences concrete or pictorial-diagrammatic concept representations. If math labs which place an emphasis on model and activity learning are themselves to be more accurately evaluated in terms of their effectiveness in teaching math, it is necessary for new methods of evaluation to be devised which will incorporate the objective nature of standardized tests and offer a means of evaluating learning at the concrete and pictorial-diagrammatic representation of concepts. With this purpose in mind this study was undertaken. Evaluation Methods in Assessinggtearningg in a Math Lab In addition to standardized tests in evaluating learning in a math lab, a few other methods have been employed. Anecdotal Records Anecdotal records are the objective, as opposed to interpretive, descriptions of pupil behavior written by the teacher on a daily or frequent basis. Gronlund made the following suggestions concerning the keeping of these records: 44 l. Confine observations to those areas of behavior that cannot be evaluated by other means. 2. Limit observations of all pupils at any given time to just a few types of behavior. 3. Restrict the use of extensive observations of behavior to those few pupils who are most in need of special help. RatingAScales Rating scales provide a systematic procedure for obtaining and reporting the judgments of observers. A rating scale consists of a set of characteristics or qualities to be judged and some type of scale for indicating the degree to which each attribute is present. According to Gronlund, the rating scale is valuable only to the extent it is care- fully prepared and appropriately used. It should be constructed in accordance with the learning outcomes to be evaluated, and its use should be confined to those areas where there is a sufficient oppor- tunity to make the necessary observations. If these two principles are properly applied, a rating scale serves several important evaluative functions: (1) It directs observation toward specific and clearly defined aspects of behavior; (2) it provides a common frame of reference for comparing all pupils on the same set of characteristics; and it provides a convenient method for recording the judgment of the observers. The following principles were listed in Gronlund as important characteristics to be considered in the preparation or selection of a rating scale: 45 1. Characteristics should be educationally significant. 2. Characteristics should be directly observable. 3. Characteristics and points on the scale should be clearly defined. 4. Between three and seven ratins should be provided and raters should be permitted to mark at intermediate points. 5. Raters should be instructed to omit ratings where they feel unqualified to judge. 6. Ratings from several observers should be combined, whenever possible. Checklists According to Gronlund, A checklist is similar in appearance and use to the rating scale. A rating scale provides an Opportunity to indicate the degree to which a characteristic is present or the frequency with which a behavior occurs. The checklist, on the other hand, calls for a simple "yes-no" judgment. It is basically a method of recording whether a character- istic is present or absent, or whether an action was taken or not taken. Checklists are especially useful in eval- uating those performance skills that can be divided into a series of clearly defined, specific actions. In summary, the major points to be considered in developing a checklist, according to Gronlund, are: (1) Identify and describe clearly each of the specific actions desired in the performance; (2) add to the list those actions which represent common errors, if they are limited in number and can be clearly identified; (3) arrange the desired actions and likely errors in the approximate order in which they are expected to occur; and (4) provide a simple procedure for numbering the actions in sequence or for checking each action as it occurs. 46 Interview An interview is an evaluation situation in which an examiner faces a student and asks questions to which the student is expected to respond. Suydam (1974) suggested the following procedure for a mathematics evaluation: (1) Face the student with a problem; (2) let him find a solution, as he tells you what he is doing; and (3) challenge him, to elicit his highest level of understanding. All of the methods cited in this chapter to evaluate learning in a math lab are very time consuming in their preparation, administra- tion, or both. These methods do offer a teacher a means of evaluating learning using concrete and pictorial-diagrammatic representations of concepts. Teachers using interviews or anecdotal records are able to judge whether or not a child has understood a concept which has been presented concretely by observing the behavior of the child using the concrete model and either writing down what has been observed or by asking the child questions about his behavior and recording the questions and responses. The methods of evaluation cited here have the inherent problem of being subjective. The ability accurately to observe, record, and pose meaningful questions to determine the depth of learning being observed is highly dependent on the talents of the teacher doing the evaluating. This subjectivity may well bring back into the educational scene the kind of criticism which historically was shown to be valid with respect to the accuracy of measurement. 47 It is apparent that with all the methods and instruments available to evaluate learning, additional means are needed which (1) can measure learning with the myriad of levels of learning present in any given math lab, (2) require only a short time to prepare, admin- ister, and correct, and (3) offer objective measures. This study offers a beginning in the research needed to establish the effectiveness of a testing technique which can accomplish these three necessary tasks. Thresholding Methods of evaluating student learning vary, but there is an emphasis on achievement tests, which are used to determine a level of functioning with respect to a norm. These norms are determined by test- ing youngsters to be normed and ascertaining levels of expectancy for children of a particular age or grade. Buswell and John, in Manual of Directions for Use with Diagnostic Charts for Individual Difficulties in Fundamental Processes in Arithmetic, state: A standardized test in arithmetic will indicate whether a pupil is doing satisfactory or unsatisfactory work for a given school grade. It enables the teacher to identify those pupils who need special attention. However, the marked limitation of such a test is that it does not tell why the pupil fails nor how he has made his errors. Since these tests do not attempt to determine a student's level of functioning within an area of arithmetic or mathematics, additional types, called diagnostic or inventory tests, have been developed. Meyers (1959) pointed out that there were 37 achievement and 10 diagnostic tests available in the area of arithmetic. The latter have a varied format, with a portion of them offering a sequenced 48 test from simple to complex problems within a computational skill area. To determine the level of functioning within a diagnostic test of this kind, a threshold of functioning is ascertained by observing at what point in this test a child either begins making more errors than correct responses or stops answering questions. This method of determining the functioning level of an indi- vidual has a history beginning in 1860 with Fechner, who was the chief precursor of experimental psychology. He published a voluminous treatise on "Psychophysics" entitled Elements der Psychoplysik. Initially a physicist who sometimes published philosophical works under a pseudonym, Fechner, because of his interest in philosophy, may have abandoned physics and been attracted to psycho-physics when he suffered from a nervous breakdown. He wanted to demonstrate the identity of mind and matter which to him were two faces of the same reality, and either of which was apparent according to whether one took an internal or an external point of view. His background in physics made him denounce reasoning as a valid source of knowledge. Seeking a scientific foundation for his knowledge, he hoped to determine a quantitative relationship between a physical stimulus and resulting conscious sensation. In his search for the scientific laws governing psycho-physics he devised suitable methods of experimentation and statistical treatment of data. In his search for the relationship between mind and body, Fechner had to measure as accurately as possible the different thresh- olds of his subject. Threshold and its Latin equivalent, lemen, mean, 49 essentially, a boundary separating the stimuli that elicit one response from the stimuli that elicit a different response. Thresholds must be repeatedly tested, for they vary due to the nature of the senses. Therefore, a threshold is always a statistical value; customarily, the lower threshold is defined as the value of the stimulus which evokes a positive response on 50 percent of the trials. The threshold technique developed by Fechner is a method of serial exploration. It consists of “descending" and "ascending" series, each carried far enough to locate the momentary transition point or threshold from one response category to another. Using Fechner's technique, Binet attempted to measure a total intelligence by measuring its individual aspects. Terman (1917) has noted: It was this point of view which long controlled the work of Binet, who, like others, began by attempting to get at intelligence by measuring memory, attention, sense discrimination and other individual functions. Terman adds: The assumption that it is easier to measure a part, or one aspect of intelligence than all of it, is fallacious in that the parts are not separate parts and cannot be separated by any refinement of experiment. They are interwoven and intertwined. Each ramifies everywhere and appears in all other functions. Memory, for example, cannot be tested separately from the associative processes. After vainly trying to disintangle the various intellective functions, Binet decided to test their combined functional capacity without any pretense of measuring the exact contribution of each to the total product. Intelligence tests have been successful just to the extent to which they have been guided by that aim. Terman concluded: "The proof of the Binet method is the fact that it works so wellf 50 The technique of determining a threshold for the functional level of a sense with any individual, which began in psycho-physics with Fechner, was used by Binet in his initial experiments with the measurement of intelligence. When his first efforts failed, he con- tinued using this technique, assuming that measuring sense functioning in combination would not diminish the effectiveness of the technique. With the establishment of this technique in determining intelligence, thresholding has been employed in diagnostic inventory testing to ascertain a level of functioning within an arithmetic operation. Based on the assumption that thresholding is valid in diagnostic testing, the proposed research will attempt to shortcut this technique by demonstrating a more efficient method of determining a level of performance within an arithmetic operation. The Use of Ambiguous Stimuli in Testing Ambiguous stimuli were first employed in the area of projective techniques for identifying emotional problems of individuals. By plac- ing a stimulus, which could have many responses, before an individual, much was learned about the person's inner thoughts. Rorschach's ink- blots projective approach was a precursor to a variety of projective techniques, including interpretation of drawing, painting, handwriting, stories, fantasies, play, and drama. Exner (1974) was noted: Although Rorscahch first became interested in the use of inkblots to study psychopathology about 1911, it is doubtful that he undertook any serious investigation of their useful- ness until 1917. In that he died in 1922, he probably spent no more than between 3 and 4 years working intensively with them. 51 Before his death, Rorschach did offer a variety of postulates concerning specific test features, especially form, color, and human movement. He did not formulate a global theory of the test and was quite conservative in discussing its potential usefulness. After his death, five major systems or approaches in using the Rorscahch devel— oped. These five systems have caused much controversy in the use and interpretations of the instrument and its results. Despite all the controversy, Exner (1974) pointed out that 60 percent of all patients in a clinical situation in 1971 were administered the test. Aside from measuring psychopathology with projective techniques, attitudes have also been measured using ambiguous visual stimuli. Alberts (Suydam 1974) has developed a test using 21 cartoon-like drawings. Children are asked to respond to these by associating themselves with the character portrayal. Self-reports which request that a student relate what he has learned in a given class or with a given instructor are common examples of uses of an ambiguous verbal stimulus. To test a person's mathematical creativity, Evans (Suydam 1974) has designed a test for late elementary and early junior high school students which presents an ambiguous math situation. The student is expected to respond in as many different ways as possible. Responses are scored with respect to number, number of different kinds, and degree of uncommonness. 52 The evaluation of academic achievement as employed in this study appears to be a new area for using ambiguous stimuli. But the technique has a long history in the field of psychological assessment, where the Rorschach and Thematic Aperception Test have been used for diagnostic purposes in mental health for more than half a century. CHAPTER III PROCEDURE AND METHODOLOGY The setting, the sample, the examiners who used the proposed technique, and the instrument used for validating the technique are described in this chapter. In addition, the procedure for determining a child's ability accurately to assess and communicate what he knows about addition and subtraction using symbolic models of concept rep- resentation, as well as the methods of analyzing the collected data, will be discussed. Settinggand Sample This study was conducted in the Muncie, Indiana, school system and at the laboratory school at Ball State University using 161 ele- mentary students. The Muncie schools in the study are located in an area of mixed socioeconomic populations. The predominant races rep- resented in Muncie are Negroid and Caucasian. Burris, the Ball State University laboratory school has a mixed cultural, racial, and economic population, and 30 percent of the students have learning disabilities. These children are channeled into the regular school classrooms. The Muncie schools were selected in consultation with the office of the superintendent of schools and members of the adminis- tration who were familiar with the type of school populations. Schools 53 54 with the most diverse composition with respect to racial groups and economic levels were selected. The testing procedure was administered both to groups of children and to individual children. At Burris children were grouped in classrooms with three grades in each class. All classrooms in the elementary portion of the school were either a 1-3 grade group or a 4-6 grade group. There were four classrooms of each grouping. Six Muncie classrooms in six different schools were chosen. There were two first grades, three second grades, and one fifth grade used. The entire classroom of children in the Muncie schools and the entire pop- ulation of Burris youngsters in grades 1-6 were evaluated using the technique in the study. Examiners The examiners were both preservice and in-service teachers. The former came from the student body of Ball State University and were majoring in elementary or special education. Sections of college juniors and seniors taking methods classes and who were scheduled to tutor were asked to use the technique in this study to determine the level of development of their child or small group of children within an operation prior to tutoring for fall and winter quarter (l974-1975). The in-service teachers were from the Muncie school system. They were selected by their principals from the schools recommended by the Muncie school administration. A11 six teachers who were asked to participate accepted. 55 Instruments and Methods Used for Validating the Technique in This Study Two sequenced tests were written for the study. A subject's level of functioning on either or both of these tests was determined by using Fechner's technique of thresholding. Another measure of the child's level of functioning was taken using the technique of this study. This level was determined by comparing the child's submitted problem to the level of the test designed for the study. The number of the level which most nearly corresponded to the submitted problem was then given to the submitted problem. This resulted in each child having two scores in the form of two level numbers--one from the sequenced achievement test prepared for the study, and one from the technique being researched. A commercially prepared test, Fundamental Processes in Arithmetic, devised by Buswell and John and published by Bobbs-Merrill Company, Inc., was used as a guide for sequencing problem levels in the tests written for the study. A copy of the commercial test may be found in Appendix C. One additional problem per level was added to increase reliability, but no more than one was added in an effort to minimize test fatigue. Copies of the tests prepared for the study are found in Appendix C. Procedure The examiners were given a procedure sheet (Appendix B) explaining what they were to do. This sheet requested the following: 56 1. Ask the child to be evaluated to, "Show me the hardest problem that you have learned to do in and write the answer." (The participating college students used the technique with all four operations in whole numbers and fractions, but only the addition and subtraction data were analyzed.) 2. If the child, when writing an addition problem, wrote one having all zeros except for one digit in each addend, for example, 1000 + 2000 = 3000, then the examiner was to request that the child write a problem with no zeros, except possibly in the answer. (In the pilot studies, when children submitted memorized responses, the level of functioning was not discernible to the examiner. Sometimes the child, when giving a (lOOOi-lOOO==ZOOO) response, indicated that he could only add a one—digit number to another one-digit number. In other instances, the problem indicated that he could add numbers in the thousands.) 3. After the child submitted his problem and answer, the test written for the study in addition or subtraction was given to him. 4. Last, the test was to be collected when the child wished to hand it in. 5. A request to fill out a data sheet concluded the directions on the procedure sheet. To compile the data, two students from Ball State University, one in graduate school in elementary education and one a senior in secondary math education, determined the level of the submitted "hardest" problem by comparing the problem to the tests written 57 for this study in the appropriate operation and selecting the level that most corresponded to the submitted problem. This was done for all submitted problems first. Then the tests written for the study were scored using the thresholding technique to score the tests. The thresholding technique of scoring was used in the following way: When a child missed all three problems at a given level, his functioning level was determined to be at gne_level befgye_the missed group of problems. To test the following hypothesis, a criterion for determining high, average, and low achievers was established. 81 There will be no significant differences between the high, average, and low achievers as determined by the Iowa Achievement tests in their ability to assess their level of abstract achievement. A child was judged to be a high achiever if his score on the Iowa Achievement test was in the 85th percentile or above, an average achiever if his score on the Iowa Achievement test was between the 30th and 85th percentile, and a low achiever if his score on the Iowa Achievement test was on the 30th percentile or below. To test hypothesis 82, which reads as follows: 82 There will be no significant differences between the high, average, and low achievers as determined by teacher judgment, in their ability to assess their level of abstract achievement. 58 children were determined to be high, average, or low achievers simply on the basis of how a teacher viewed their achievement. Hypothesis 85 states: 85 There will be no significant differences between children from high, average, and low family incomes in their ability to assess their level Of abstract achievement. To test this hypothesis, the following criteria to determine the category of family income which most nearly corresponded to each child was used: Scale of family incomes--high, over $25,000; average, $4,681 to $24,999; and low, below $4,681. Methods of Analyzing Data To establish a measure Of validity with respect to the testing technique in this study, a comparison of results was made between the test written for this study, using the concept Of thresholding to determine the level Of functioning, and the technique in this study. The comparison took the form of a correlation which was hypothesis A of this study. It states: A There will be a high correlation between the results of testing using a diagnostic test and the results Of testing using the technique being studied. Constructing a scattergram on the results of the test written for this study together with the results of the technique in this study, a linear relationship was noted for both Operations. (See 59 accompanying scattergram, Figure 1.) On the vertical axis Of the scattergram are listed all possible levels (1 through 22) that a child could attain on the tests designed for the study. The horizontal axis lists levels 1 through 22, which are all the possible scores attainable by the testing technique in this study. Each pair of scores which a child acquires through testing are used as coordinates Of points in the scattergram. Since a linear relationship was apparent from the data, a decision to use the Pearson product-moment correlation coefficient was made. This correlation coefficient is denoted by rxy' It can be expressed as the covariance Of two variables, divided by the standard deviation Of each Of the variables: rxy SXSy ' The computational formula which was used is: nZIXiYi - (2X1) (ZYi) rxy mzxiz - (2X1)2] [11”,-2 " (”1“)2 3 where X and Y are the variables to be correlated, and n is the total number of subjects. In an effort to test the following hypotheses it was necessary to establish a criterion for determining which children were successful in communicating their level of functioning by submitting a problem in addition or subtraction which they thought was the "hardest" that they could do. 6O —J-—l—l-—IN WWNLOd £0 01. Level Of Test Written for This Study 13579111315171921 Level of Submitted Problem otindicates addition data 0 indicates subtraction data Figure 1. Scattergram of the results Of the test written for this study and the technique of this study. 61 81 There will be no significant differences between the high, average, and low achievers as determined by the Iowa Achievement tests in their ability to assess their level Of abstract achievement. 82 There will be no significant differences between the high, average, and low achievers as determined by teacher judgment in their ability to assess their level of abstract achievement. 83 There will be no significant differences between Blacks and Caucasians in their ability to assess their level of abstract achievement. 84 There will be no significant differences between girls and boys in their ability to assess their level of abstract achievement. 85 There will be no significant differences between children from high, average, and low income families in their ability tO assess their level of abstract achievement. The level Of the problem submitted was compared with the results of the test written for this study, which the children took in the same testing session. The criterion for a successful self-assessment was established as follows: When a child submitted a problem which was within tng levels above or two levels below the level Of functioning established by the test written for the study, he was judged to be successful in his ability to assess himself. In tabulating the results, dichotomous data were collected, with a I'1" being given to successful students and a “O“ to nonsuccessful students. whether to ass check varian Wher The 1 /) AW?! 62 As a precaution to the subsequent use of t-tests to determine whether or not there were differences in group means in their ability to assess themselves (hypotheses 81 through 85), an F—test was used to check sample variances. When the tests showed no differences in sample variances, the following two-tailed t~test was used: i1 ' ;2 t: Sp lrllnl + 1/n2 Y, = mean of one group; Yé = mean of second group; 111 = number of responses in first group; and h2 = number Of responses in second group; and (n-1) S2 + (n -1) S2 where sz = I 2 ag-, and n] + n2 - 2 Sp; = total population variance; S? = variance Of first group; and SE = variance of second group. The limits were: Upper = t1 - a/Z; Lower = t d/2; d.f. = n1 + 112 - 2; and a = .05. The assumptions which were made by using this test statistic were: 63 1. X1 and X2 are normally distributed; 2. homoscedasticity; and 3. samples were randomly selected and independent. In the determination of a racial bias with respect to what a child evaluates as "hard," as suggested by hypothesis Cl (there will be no significant differences between racial groups in what they perceive as "hard"), the submitted problems were studied in an attempt to ascertain appropriate groupings for the analyses. If a submitted problem fitted into moe than one category, then a tally mark was placed in all appropriate categories. The addition data were grouped in the following manner: —-I 0 addition with regrouping; 2 addition without regrouping; 3 problems with three digits or less; 4. problems with more than three digits; and 5 problems with multiple addends (more than two). Subtraction was grouped into the following categories: 1. subtraction with borrowing; 2. subtraction without borrowing; (A) problems with three digits or less; and A problems with more than three digits. The nature of the data collected to test hypothesis C suggested that a series Of chi-square tests be used with a = .05. The following test statistic was used: vhe 64 where n1 is the observed cell frequency, n is the sum of n1 + 112 + ... + nk, and pi is the expected frequency. CHAPTER IV PRESENTATION AND ANALYSIS OF THE DATA The results of this investigation using the procedures and data analysis described in Chapter III are presented in this chapter. A presentation of the data demonstrating the correlation between the technique in this study and that Of the test written for this study will be given first. A discussion of the results of determining whether a child can assess himself by the criteria established in this research will follow. Finally, a presentation of the data showing the different groups' ability to use the testing technique in this study, cited as hypotheses in the preceding chapter, and the data used to determine whether or not a racial bias exists with respect to what a child considers "hard" will be discussed. Correlation Between the Technique in This Study and the Test Written for This Study The test and the children's submitted problems were collected as described in the procedure sheet in Appendix B. After the collec- tion Of these papers, a senior student in secondary math education and a graduate student in elementary education from Ball State University determined the level of the submitted "hardest" problem by comparing the problem to the test written for this study in the appropriate Operation and selecting the level that most corresponded to the 65 66 submitted problem. This was done for all_submitted problems, first. Then the tests written for this study and taken by the children were scored using Fechner's thresholding technique to determine the child's level of performance on the test. The thresholding technique of scoring a test was used in the following way: When a child missed all three problems at a given level, his functioning level was determined to be at gng_leve1 bgfg[g_the missed group Of problems. Each child in this study, thus, has two scores--one from his submitted problem and one from the test designed for the study. A Pearson product-moment correlation was used to test hypothesis A. A There will be a high correlation between the results Of testing a child by a diagnostic test and the testing technique being studied. A value Of r = .85 for addition and r = .81 for subtraction was computed. The results do show that a high correlation was found between the diagnostic test designed for the study and the testing technique in this study. Constructing confidence intervals for these two correlations (P==.99), p was found to be between .75 and .91 for addition and .66 and .90 for subtraction. Therefore, it can be con- cluded that the technique in this study gave results which correlated quite well with the results of the tests designed for this study for both Operations. 67 Child's Ability to Assess Himself The percentage of students who submitted problems within two levels gbgyg_or bglgg_the level of functioning indicated by the diagnostic test was calculated to be 62 percent with addition and 57 percent with subtraction. A breakdown of the addition data shows that 33 of the 91 students were nonassessors by the criteria stated in Chapter III. It was not possible to assess two of the students in the study because they refused to submit a problem, stating that they could not think of one. The nonassessors could be broken down into the following categories: (1) submitted a problem incorrectly solved; (2) submitted a problem below (less difficult) the level of functioning as determined by the diagnostic test; and (3) submitted a problem above (more difficult) the level Of functioning as determined by the diagnostic test. Two students solved their submitted problem incorrectly, making errors that they also made on their test. 0f the remaining nonassessors, 16 achieved a higher level score on the diagnostic test than their submitted problem indicated that they could do. Of these 16, 8 submitted problems which placed them in levels 1-12. All the 1-12 levels require little understanding Of place value, and children could use their fingers to give a correct answer tO the problems. Therefore, the 8 children who suggested by their submitted problems that they considered a one-digit number plus a one-digit number as the "hardest" problem that they could do, correctly answered many problems 68 by treating a multi-digit number as a series of one-digit number problems, that is, 435 + 362 equals: five plus two, three plus six, and four plus three. This became apparent by observing the errors in the problems that they had missed. All Of these children had the following type Of error: 738 + 436 11614 It would appear that the submitted problem more accurately depicted their level of functioning. Of the 16 students, 2 submitted problems without regrouping, and on their tests they indicated, by correctly working problems with- out regrouping, that they could regroup. Another 2 of the 16 could do multiple addend problems, but did not submit one. Four of the children submitted three or four-digit numbers with regrouping in their problem, but went on to solve the five-digit number problems with regrouping on their diagnostic test. Of the 35 children who were not evaluated as self-assessors, 15 submitted problems which were on higher levels than they had scored on the diagnostic test. Of the 15, 4 appeared to suffer from test fatigue, boredom, or some other conditions which stopped the child from working all the problems up to the level of the submitted problem. Six of the students submitted problems which had many zeros, that is, 200 + 300 = 500. This type of problem in the pilot studies preceding this investigation were shown to be an unreliable indicator of the level Of functioning. The addition Of a one-digit number to a two-digit 69 number was sequenced by the traditional diagnostic test written for this study as three levels above the addition of two two-digit numbers. By solving the one-digit problems and missing the addition of two two-digit problems, 5 youngsters indicated that the sequencing was incorrect for them. Looking at the data obtained using the Operation Of subtraction, 29 children did not correctly assess their level of functioning as defined by the researcher in Chapter III. The nonassessors could be distributed into the following categories: (1) submitted problems with incorrect answers; (2) submitted a problem below (less difficult) the level Of functioning indicated by the traditional diagnostic test writ- ten for this study; (3) submitted a problem above (more difficult) the level of functioning indicated by the traditional diagnostic test written for this study; and (4) had difficulty with the sequencing used in constructing the diagnostic test written for this study. Five of the nonassessors wrote problems with incorrect answers, thereby giving no level of functioning. Another 9 students simply stopped answering test problems or missed problems with fewer digits and borrowing, which in earlier parts of the test they had answered correctly. It appears that test fatigue or lack Of reinforcement may have influenced this behavior. These students submitted problems on a more difficult level than their diagnostic test indicated that they could do. Seven students submitted problems which were easier than they actually could do as determined by the diagnostic test. 70 In the sequencing provided by the test written for this study, levels containing problems with borrowing were intermixed with levels without. The emphasized criterion for adding a level in the test written for this study was the number of digits in a number, that is, a three-digit number with borrowing was considered more difficult than a four-digit number without borrowing. This emphasis in sequencing caused problems for some youngsters. A child who submitted a problem made up of three-digit numbers without borrowing would miss all borrow- ing problems at levels with smaller numbers, causing him to be judged a nonassessor. This was the case for 8 of the 29 nonassessors. Analysis of the Data Concerning Hypotheses 81 through 85 of the Study TO test the following hypotheses of this study a series of t-tests were used: 81 There will be no significant differences between the high, average, and low achievers as determined by the Iowa Achievement tests in their ability to assess their level Of abstract achievement. 82 There will be no significant differences between the high, average, and low achievers as determined by teacher judgment in their ability to assess their level of abstract achievement. 83 There will be no significant differences between Blacks and Caucasians in their ability to assess their level of abstract achievement. 71 84 There will be no significant differences between girls and boys in their ability to assess their level of abstract achievement. 85 There will be no significant differences between children from high, average, and low income families in their ability to assess their level Of abstract achievement. Several F-tests were run first in order to determine whether or not there were equal variances in the sample populations. The results Of those tests are presented in Table 4. The number Of subjects used for the F-tests was 152; 9 students were omitted from the analysis because they either did not submit a problem or answered their problem incorrectly. In either case, it was impossible to determine a level of functioning from the use of the technique in this study. Using an a level of .05, no significant differences were found between the variances of the groups. After collecting the data sheet handed out with the procedure sheet, it was noted that no teachers in the study evaluated a child in a different category of achievement than the category in which the child had been placed by the Iowa tests. Therefore, hypothesis 82 was not analyzed separately. Since no differences in variances were indi- cated by the F-tests, the following two-tailed t-test was used: X1'X2 t: Sp I l/n1 + l/n2 72 .umepum cw eem: mezocm eueu_e:He mcoc mcoc mCO: mcoc amp mm No mm mpwcz xuepm "euem m_cwm mxon ”xem 20F emeem>e new: ”Fe>e4 uceee>eweu< zo_ emece>e sew; ”_e>e4 owEocoum museuwewcmwm azocw cw meueeesm do ceasez xeepm esp cw meaocm we» we mucewce> cw meucmcoewwe esp mcwcweceuee com mumeuum .e aneH 73 X1 = mean Of one group; X2 = mean Of second group; 111 = number Of responses in first group; and n2 = number Of responses in second group; where 2 2 sz : (n-l) 51 + (n2-1) S2 111 + 112 - 2 sz = total population variance; S? = variance of first group; SE = variance of second group; Upper = t1 - a/2; Lower = t a/Z; d.f. n1 + n - 2. 2 The results Of the two-tailed t-tests are shown in Table 5. The two means (.44 and .60) for the high and average family income levels, respectively, were used in the t-test. No significant dif- ferences were found with an a level Of .05. The t-statistic was .989, with 141 degrees Of freedom. Therefore, it was concluded that there were no differences between the high, average, and low income family children in their ability to use the testing technique in this study. 74 .emeuuu esp cw new: menace esp eeeewecHe amp mo. eeegz mm um. xee_m ecoc mo. omp mmm. we. Heeem No mm. mFLWm mm om. ween ego: mo. om_ mm_. mm. ”xem m_ we. zo_ omF emo. emeee>e m eee. new; ecoc mo. _e_ mm_._ we. n_e>ee eCeEe>ewge< m ee. sop em? eom. emeee>e m eee. Low; eco: mo. _e_ mmm. me. ”Fe>ee eweocoem euceewewcmwm _e>ee .e.e e:_e> cowuepzeoa menace cw :eez peep-u ePnEem mpeennzm mecca mo euceeee> mo Lensez zeeem meg» cw eaawczeee unwemee esp em: op xpw__ee creep cw meuceeeeewe eeocu .m epne» 75 The means with the widest spread for ascertaining whether or not there was a difference between high, average, and low achievers in their ability to use the testing technique of this study were .44 and .63 (high and average, respectively). NO significant differences were found with an a level Of .05. The t-statistic computed was 1.199, with 141 degrees of freedom. It was concluded, therefore, that children who are high, average, or low achievers are all equally able to use the testing technique in this study. TO test whether or not boys and girls were equal in their ability to use the testing technique in this study, a t-test with an a level of .05 was used. A t-statistic of .165, with 150 degrees Of freedom, was computed. No significant differences were found. A t-test with an a level Of .05 was used to determine whether or not there was a difference between Black and Caucasian children in their ability to use the testing technique in this study. The t- statistic was found to be .555, with 150 degrees of freedom. It was concluded that Black and Caucasian children were equally able to use the technique in this study. It would, therefore, appear from the data that all groups in the study are equally able to respond to the open question with a self-assessment which has a high degree of accuracy. 76 Analysis of the Data Concerning Hypothesis C in the Study The child—submitted addition problems were studied, and a decision was made to use the following categories as a basis for grouping to determine whether or not a racial bias exists with respect to what a child considered "hard." If a submitted problem fitted into more than one category, then a tally mark was placed in all the appro- priate categories. The categories for addition and subtraction are given below. Addition: 1. addition with regrouping; 2. addition without regrouping; 3. problems with three digits or less; 4. problems with more than three digits; and 5. problems with multiple addends. Subtraction: 1. subtraction with borrowing; 2. subtraction without borrowing; problems with three digits or less; and 3 4. problems with more than three digits. If a child submitted the following problem in addition, 638 + 494 + 863 = , then a tally mark would be placed in the following categories: addition with regrouping, problems with three digits or less, and problems with multiple addends. 77 A chi-square test was used to analyze each category. The results of these tests can be found in Table 6 (addition) and Table 7 (subtraction). In the addition category, two children did not submit problems, two children incorrectly solved their problems, and one child was Chinese, a category not considered in this research. Omitting these subjects, 88 children were left to be used for testing hypothesis C with respect to addition. The chi-square values were very low and non- significant. The values ranged from .0004 to .3450. NO cultural bias in addition was found with respect to what a child perceived as "hard." Table 6. Summary Of the results of the chi-square tests with addition Number of Subjects x2 a in Group Value d.f. Level Significance NO regrouping: .0004 l .05 none Blacks l3 Whites 75 Regrouping: .0009 1 .05 none Blacks l3 Whites 75 Multiple addends: 3.4500 1 .05 none Blacks 13 Whites 75 Three digits or less: .0015 l .05 none Blacks l3 Whites 75 More than three digits: .0207 l .05 none Blacks l3 Whites 75 78 In the subtraction category, five subjects incorrectly solved their submitted problems, thus limiting the number of subjects to 63 for the analysis. Very low nonsignificant values for chi-square were found, the values ranging from .0144 to .8900. It therefore was concluded that no racial bias was found with respect to what is considered "hard“ by a child within the Operation of subtraction. Table 7. Summary Of the results Of the chi-square test with subtraction Number of Subjects X2 a in Group Value d.f. Level Significance No borrowing: .7830 1 .05 none Blacks 9 Whites 54 Borrowing: .8900 l .05 none Blacks 9 Whites 54 Three digits or less: .0114 1 .05 none Blacks 9 Whites 54 More than three digits: .0160 l .05 none Blacks 9 Whites 54 CHAPTER V SUMMARY, GENERALIZATIONS, AND IMPLICATIONS FOR FUTURE RESEARCH The effectiveness of a testing technique which employs an ambiguous stimulus to ascertain a level Of functioning within the Operations of addition and subtraction was the primary question which this study attempted to explore. Historically developed criteria for evaluating testing instruments and measurements taken from Chapter II will be used to summarize and generalize the findings on the effective- ness of the technique in this study. A summary and the resultant generalizations concerning the data on a child's self-assessment as well as the different groups' ability to use the technique in this study will be presented. An additional analysis of the distribution Of percentage of correct-response scores with respect tO the technique in this study, which lent support to the conclusions concerning the effectiveness of this technique will be Offered. A review Of the stated purpose Of this study and the implications for future research will conclude the chapter. Criteria for Judging Testing Instruments and Measurements Criteria for judging testing instruments and measurements cited in Chapter II will now be used to evaluate the testing technique in this 79 80 study. By comparing the results of the test designed for this study with the results Of the new technique, a measure Of criterion-related validity was made. A correlation of r = .85 for addition and r = .81 for subtraction was found. Using a confidence interval to examine the combined correlations, it can be assumed that with a probability of .99, the correlation between the results of the test designed for this study and the results of the technique of this study will be in the interval of r = .72 and r = .90 for both operations. A testing instrument with content validity should ask questions covering all levels Of representation for all concepts which the exam- iner deems necessary to an understanding of the area being tested. Since the technique in this study has the specific questions concerning content being posed and answered by the individuals being tested, the content validity is dependent on the examinee's ability to pose valid questions. Does the testing technique measure a child's depth of under- standing and reasoning ability, or does it measure a memorized or rote learned piece Of information or rule? The construct validity Of the test which comes from the testing techniques in this study has not been explored. TO say that the construct validity of a test derived from the technique in this study is, in general, the same as a sequenced diagnos- tic test might not be true, for no research has been conducted to show this. Looking at additional factors found in the instrument, which if ignored would lower validity, there are several which are minimized by the technique in this study. 81 unclear directions--The directions were tested in pilot studies, and few children in those studies indicated that they did not know what was being asked Of them. Confused children either asked questions or did not respond. reading vocabulary and sentence structure too difficult-~NO child is asked to read anything more than he, himself, writes. The directions for the test are read aloud by the examiner. inappropriate level of difficulty Of test items--The level of difficulty is judged by the examinee. From the data it appears that most children submit the "hardest" problem that they can do. poorly constructed test items--The examinee writes what is understandable to him, and any poorly constructed items Offer to the examiner information about the examinee's level of understanding. ambiguity--The questions are posed and answered by the examinee, thus eliminating ambiguity of specific questions. test items inappropriate for the outcomes being measured--The examinee, by posing his own question in an area designated by the examiner, minimizes this problem. By submitting inappro- priate questions, information concerning the level of function- ing Of a child is still made available to the examiner. test too short--By asking the examinee to submit the "hardest" problem that he can do, the necessity for a lengthy test was minimized. By correlating the results with a lengthy test, as was done in this study, the validity was, to a large measure, substantiated. 82 improper arrangement of items--Since the child submits only one problem per area to be measured, no arrangement of items is necessary. identifiable pattern of answers-~This category does not apply to the technique in this study. Several comments can be made concerning factors which influence validity that can be found in the administration and scoring of a test. 1. cheating--Since each child submits his own problem and answer, cheating could be easily detected and minimized. failure to follow directions--The only directions given are oral. Since there is only one direction, it is very easy for an examiner to clarify any misconceptions. ignoring time 1imits--No time limits are imposed by the technique in this study. giving pupils unauthorized assistance--This problem could apply to the technique in this study. errors in scoring--Since there is only one problem per area, the number of errors is minimized. But each problem is unique. Therefore, no general answer sheet is available. poor physical environment--A poor physical environment could effect the results of the technique in this study. But the time needed to complete this test is minimized, so the effects of the environment would be minimized. 83 Concerning conditions that might adversely affect test validity which are due to personal factors, the following may be noted: 1. motivation--Motivation would be increased, for children would be asked to show what they can do without being confronted with tasks that they cannot do. anxiety--Anxiety would be minimized, for the child is asked only to demonstrate what he can do. fatigue-—The initial fatigue that the child has when entering the testing situation would remain with this technique, but any additional fatigue would be minimized due to the shortness of the testing period. illness--Illness would still effect the child's ability to function, but its affects would be minimized due to the shortness Of the testing period. test-wiseness--This does not apply, since the child writes his own exam. response-set--This does not apply, since the test is only one-problem-per-area long. In conclusion, it appears that the test has good general validity using the criteria cited to make the judgment. Additional research should be done to establish the construct validity of the response which each examinee submits. Categories of responses, as with psy- chological testing using ambiguous stimuli, may Offer different constructs. 84 The reliability Of the testing technique in this study was measured, in part, when it was shown that two ways of measuring a level of functioning had a high correlation. This correlation indicates a consistency Of response in a single testing situation. Several other factors which may influence reliability were pointed out in Chapter II. 1. The length of the test is a factor in reliability. Since the testing technique in this study requires only one problem per area for achievement evaluation, reliability might be questioned. The correlation data Offer support to the reli- ability Of the measurement along with the analysis Of the percentage of problems correctly answered up to and including the level Of the submitted problem. Scores with a large spread are indicators of good reliability. The scores collected in this study have a very wide spread, as can be seen in Figure 2. (The horizontal axis of the figure lists the levels of the Operations on the tests designed for the study. The vertical axis has a series of numbers from 1 through 22, which represents the number of students who sub- mitted a problem. The coordinates of the points represent the level Of the problem submitted and the number Of students who submitted a problem at that level. If a test is too easy or too difficult, the reliability Of the results is threatened. The technique in the study asks that a child write a problem that he thinks is the hardest he can do. 85 18 16 14 Number Of Students #05000 N 246810121416182022 Levels of Problems Figure 2. Spread of scores for addition and subtraction. 86 The data support that the child does just that. Therefore, it seems reasonable to assume that the test is neither tOO difficult nor too simple. Usability is the last major factor to consider when making a decision about the advisability Of using a particular test. The tech- nique in this study has the following points in its favor: (1) It is easy to administer; (2) it requires a very short time to administer; (3) it is easy to score; (4) each child supplies equivalent forms of the test by identifying his level Of performance with his own unique problem; and (5) little cost is involved. The major problem that the technique in this study poses is one Of interpretation Of the results. If operations are tested using whole numbers and fractions, the problem is simplified. Materials are available which offer a sequencing of the skills involved in solving problems in these areas. But if the testing technique is to be used in other areas, analyses of what a child most likely knows in order to pose and answer a question in the chosen area will have to be done in order to interpret the results. Accuracy Of a Child's Self-Assessment 0f the children in the study, 60 percent, according to the criteria established in Chapter III, could assess their level of functioning. In analyzing the 64 youngsters who were categorized as nonassessors, 40 Of these may well have assessed themselves. 87 These children met the following problems with the criteria established for assessment: 1. Eight youngsters indicated that they regarded a one-digit plus a one-digit number as the "hardest“ problem that they could do. On the test written for the study, they treated several multi- digit problems with the algorithm they claimed to know for one- digit addition and solved the problems correctly. From their errors on the test, the algorithm used was made apparent. Therefore, it appears that these eight children did indicate their level of functioning. Thirteen youngsters submitted problems which were more difficult than they completed correctly on the test written for the study. These children either quit solving problems or made errors that they had indicated earlier in the testing situation were within their scope Of knowledge. For example, 23 234 , 2359 +47 later +478 st1ll later +6874 —70' 61012 9233 It appears that these youngsters may well have indicated their level Of functioning, but were judged as nonassessors because of test fatigue, boredom, or some other similar problem. Thirteen of the children appeared to have problems with the way the test was sequenced. They submitted problems which were considered easier or more difficult than the problem which they answered on the test designed for the study. The discrepancy proved to be enough to have them evaluated as nonassessors. 88 Six children submitted problems with zeros despite the attempt by a specific direction on the procedure sheet to negate the possibility of this happening. More care should be taken to avoid this type of error in the administration of the testing technique. With proper questioning, these children may well have assessed themselves correctly. In considering the additional data just cited, apart from the criteria cited for successful assessing, it is questionable whether the 40 children just reviewed really could not assess their level Of functioning. It would appear that for the children who seemed unable to use the technique in this study several procedural considerations might be noted: 1. Some children in the study refused to submit a problem because they could not think of a "hard" one, for all problems within the operation being tested were considered simple by them. An examiner may, when noting the absence of a response caused by the cited difficulty, encourage a child to relate the fact in writing that all problems seem simple, thereby encouraging an honesty of response and a possible accurate assessment. If a child were to submit more than one "hard“ problem, an incorrect response may be more accurately evaluated by Observing whether the error occurs again or whether it is a simple "foolish" inaccuracy. 89 Different Groups' Ability to Use the Testing Technique in This Study Examining the data in the study concerning the different groups' ability to assess themselves (boys-girls, high-average-low achievers, children from high-average-low income families, and Blacks-Whites), all groups were shown to be able to use the testing technique equally effec- tively. The thought Of using a test which has no built-in advantages or disadvantages for those children who in the past have suffered unfair discrimination from evaluation methods is very exciting. The possible use Of the technique in this study to measure achievement in other con- tent areas Or even in intelligence testing may well Offer a solution to the biased results present in testing today. A Racial Bias with Respect to What Is "Hard" NO bias was found among Black and White children with respect to what is considered hard within the operations of addition and sub- traction. Additional investigations may find biases where Operations or realms of numbers are more complex. Analysis Of the Distribution of Percentage of Correct Response SEOres with Respect to the Technique in This Study When a child submits a problem as the “hardest" that he can do, can it be assumed that the levels considered simpler or less difficult are mastered? Using the sequencing of the test designed for this study and identifying the levels Of this test to which the problem best 90 corresponds, an analysis of the percentage Of correct responses was made. All those problems correctly answered up to and including the problem on the level of the submitted one were counted, and the per- centage of correct responses was calculated. For addition, the mean was .86, with a standard deviation Of .21 and a variance of .04. The subtraction data had a mean Of .84, with a standard deviation of .18 and variance Of .03. The data show that when two-thirds Of a group of children submit a problem as the "hardest" one that they can do, they have mastered at least 65 percent of those problems sequenced as simpler and may have 100 percent of the simpler problems mastered. Examining the percentage Of problems answered correctly five levels above (more difficult) the submitted problem, the mean, vari- ance, and standard deviation for addition were .21, .08, and .28, respectively. For subtraction, a mean of .20 with a variance Of .10 and a standard deviation of .31 was found. It appears from the data that 68 percent of a group of children when submitting a "hardest- they-can-do" problem are able to work about one in five of a series Of problems sequenced as more difficult. A Review Of the Stated Purpose of This Study The purposes of this study were stated in Chapter I. How well these purposes were met will now be discussed. 1. The validation of the testing technique Of this study has been, to a large measure, accomplished. Both the correlation and additional analyses concerning how well children individually and in groups can use this technique have yielded encouraging results. 91 2. The time required to prepare, administer, and correct the test in this technique is, indeed, minimized. The time required to think of the areas which need to be assessed and, possibly, to list them, is all the time required to use this technique. The adminis- tration and correcting time is also shortened, because the test itself is very short (one problem per area). 3. The shortness Of the testing procedure directly affects the time that the student must spend in having his achievement evaluated. 4. The technique Of this study indeed Offers, on a daily basis, a collection of individual evaluations which will show the changes in what a child perceives as "hard" in his daily learning environment. If his environment has manipulatives or models, he can offer a problem which he can solve using these. Either he or his teacher can note on his paper what was used to help solve the problem. 5. The testing technique in this study places an emphasis on the examinee's ability to recognize what he gan_do. Through the repeated use of this technique a child may well be able to improve his ability to recognize self-growth; then, with guidance, he might be able to recognize what fosters self-growth and what deters it. With the emphasis on assessing what an individual knows instead of what he does not know, a testing situation will pose less threat to feelings Of self-worth. With evaluation being done in terms of individual growth, the threat Of having to meet group goals is also minimized. Both of these factors enhance the development of a good self-concept. 92 Implications for Future Research The research proposed falls into two categories. The first is research on the usability Of this technique in other areas besides the symbolic representation Of addition and subtraction. The areas in mathematics education which might be researched using the technique of this study is the second category. Usability Of the Technique in This Study in Other Areas Does an individual have the ability to recognize the knowledge and skills which he possesses? Can he relate what they are? These questions were answered in the affirmative with respect to the skill areas researched in this study. Studies to determine the effectiveness Of evaluating learning with other Operations, such as multiplication, and with different realms Of numbers are also needed. This research dealt primarily with measuring the level Of skill development in computation. Can this technique measure concept learn- ing? If college students were asked to note for themselves all the concepts that they felt had been presented to them in a given lecture, textbook chapter, laboratory manual, and so forth, could they then write the "hardest" question that they could think Of which would test the understanding of each concept? By so doing, could a professor discern the degree of learning which has taken place for the student? The greatest need for evaluative instruments and techniques is at the concrete and pictorial-diagrammatic representation of con- cepts and skills. With the encouraging results Of this study using 93 symbolic representation, additional research is now called for using the technique in the evaluation of concept learning using other representations. If a child cannot assess his knowledge initially, can he learn to do this? If he can assess himself and communicate his knowledge fairly well, can this skill be developed to a high degree of accuracy and broadened to include most Of his learning experiences? Does the skill in self-assessment increase with the number of times that it is done? If a child cannot assess himself, can he be taught to do this? These are many of the questions which must be answered if the technique researched here is to be used with maximum understanding Of its effects upon the examinee. Areas Of Mathematics Education to Be ResearchedFUsing_the Technique in This Study The technique in this study may prove fruitful in researching (1) the sequencing of mathematical models for the development Of an understanding of a concept, (2) the carefully ordered presentation of concepts in learning a general area Of mathematics, and (3) the effec- tive ordering of the attributes of a concept for maximum clarification. Research will also have to determine whether there is a general sequencing of models, concepts, and attributes, or whether the orderings must take into account the background of each learner who will use them. 94 The effectiveness Of different mathematical models for teaching concepts might also be explored with the technique in this study. In the pilot studies, children appeared to select a "hard" problem on the basis Of the mathematical model that they were using at the time; that is, multiple addend problems were frequently submitted by children using Chip Trading to learn addition. Large numbers and problems with regrouping are very simply added using Chip Trading, but addition with several addends causes some problems. Studies to varify or negate the relationship between "hard" problems and models may prove valuable. When the most effective model is used to teach a particular concept to a child who finds the model readily understandable, learning would be greatly facilitated. In general, the technique in this study Offers a researcher the Opportunity to collect evaluation data on a daily basis because of the simplicity of administration and the small amount of time required to complete the testing task. The daily evaluations make available information on the order in which skills and concepts are learned. The examination of nonassessors' test papers indicated that some of these children found the sequencing of the test written for the study incorrect for them. They learned how to correctly answer levels on the test which were considered more difficult than the ones that they had missed. Another group of children seemed to agree with the sequencing by missing all the problems beyond a particular level. These data raised the issue Of whether a sequence of learning 95 tasks could be written whereby all children would find the sequence correct for them, or whether the sequencing of learning tasks for individuals requires that the learner's background be taken into account. Since the testing technique in this study pointed out this discrepancy, it may be a useful tOOl to help answer the sequencing questions. If the question used in the testing technique were altered to read: “Write a "hard" problem in that you cannot answer“ (the area to be evaluated would be read in the blank), the child would have to know enough about the area being evaluated to write a question, but not enough to answer it. This may well prove to be a way of ascertaining an appropriate "next" learning experience which would enable a child to solve his posed problem. APPENDIX A QUESTIONS TESTED IN PILOTS APPENDIX A QUESTIONS USED IN PILOTS These questions are listed in order of greatest number of positive responses. If a child could not think of a response to the question, this was noted by the examiner. The question with the few- est number of “no responses" was selected to be used for the study. 1. Show me the hardest problem that you have learned to do in addition (subtraction, multiplication, or any other realm of study about which the examiner wishes to gain information) and write the answer. 2. Make up the hardest problem that you can in addition (subtraction, multiplication, and so forth). Solve it and write the answer. 3. Write the two hard problems in addition (subtraction, multiplica- tion, and so forth) that we can put on ditto for the class to solve. Please include the answer. 4. Write down a problem that you can do in addition (subtraction, multiplication, and so forth), but maybe no one else can, and solve it. 5. Write a hard, tricky problem that only you can find the answer to. Question 1 was amended to meet different assessment needs. If the question were used to measure a daily growth learning situation, it was worded: "Show me the hardest problem that you learned to do today and write your answer." If a concrete or diagrammatic mode was being assessed, the question became: "Show me the hardest problem that you learned to do today and use the aid that you were working with to check your answer." 96 APPENDIX B PROCEDURE HANDOUT used to APPENDIX B PROCEDURE SHEET I wish to thank you for helping to collect data which will be determine the effectiveness Of this testing technique. Select the child or group Of children that you wish to test. Read to the child or group the following question, substituting the correct operation or area Of mathematics that you would like them to consider when answering the question. I have used addition in the wording of this sample question. "Show me the hardest problem that you have learned to do in addition and write the answer." When testing the area of addition, only, do the following: See if a child submits a problem with all addends using zeros except for the first digit. If he does, request that he write another problem with nO zeros except for a possible zero in the answer. When the child indicates that the task is completed, collect the problem. There is no time limit. Pass out the diagnostic test appropriate to the area you are testing. Ask the child to complete as many of the problems as he can, letting him know that there is no time limit. Collect the diagnostic test. Fill out the accompanying data sheet on the child. 97 98 Data Sheet Child's Name Age Sex Achievement level as measured by the last Iowa Test child has taken: Circle one: high average low Economic level: Circle one: (Over $25,000) ($4,68l-$24,999) (Below $4,681) high average low Race: Circle one: Negroid Caucasian Other Achievement level as measured by the child's classroom teacher: high average low APPENDIX C TESTS Prod. No. 77856 99 PUPIL’S WORK SHEET 'Rflm- Diagnostic Chart for Fundamental Processes in Arithmetic a...“ s, or. Inval ..1 1.... a. ‘J 1' n ' ji: .1._ MeflMMJMmWQmmmyMe «unmannmwwmmuau Printed In 0.8.4. ADD: School Name._ ._ _ .__,__._ _, _ __ (l) (2) (3) 5 6 2 8 12 13 2 3 9 4 2 5 (4) (5) (6) 19 17 6+2: 52 4o 2 9 13 39 —— —— 3-F4== __ (7) (8) (9) 78 46 3 8 53 7 71 92 5 7 8 89 —— —— 8 9 —— __ 2 7 (10) (11) (12) 2+5+1+8= 664 145 35 601 203 652 234 78 44+9-F4i-6=1 ——— ——— ___ ___ (13) (14) (15) 69 38 532 82 13 8 12 84 87 896 7 9 —— ——- ——— ——— 5 33 2 8 (16) (17) (18) 268 943 283 495 34 66 961 128 748 778 33 98 ——— ——- ——— ——~ 55 68 94 49 (19) (20) (21) 13 66 9361825 3907598 1 6 587 989 8758785 785763 6 2 46 896 8 7 131 467 1 9 ——- -—~ 3 4 O 9 7 8 1 6 8 6 4 9 O 8 2 4 2 3 (22) (23) 879 866 817 5134 266 969 7053 73045 498 986 42610 3 167 898 92 227528 137 449 938512 242 100 ”TRACE: (1) <2) <3) 6 8 7-1= 19 15 3 8 2 4 _ _ 9—o= —— —— (4) <5) (6) 58 79 36 79 12 10 4 3 21 24 6 2 (7) (8) (9) 15 19 59-2= 346 836 13 12 215 302 —— —— 86-4= ———' (10) (11) (12) 189 399 61 75 56 42 45 7O 2 9 48 36 (13) (14) (15) 92 42 528 292 1067 4498 64 19 64 94 237 825 (16) (17) (18) 624 852 431 963 950 507 193 308 162 594 376 221 (19) (20) (21) 9546 9653 5941 6805 132428 823533 8687 2954 968 978 38679 245838 (22) _ 10000 80030 8192 46759 Level Level Level Level Level Level Level Level Level Level 10 12 +2 19 +6 20 :3_0 23 :2 101 Addition Test 50 :39. 34 :5; 36 +82 + Iwmoom 4:00 Level Level Level Level Level Level Level Level Level 11 12 13 14 15 16 17 18 19 Addition Test (continued) 537 +122 35 +343 33 :25 532 :91 17 6 3 +8 349 +868 64 38 96 :31 12 466 83 +106 9416772 +6541334 603 +115 67 +112 42 :53; 94 +937 4 2 27 :12 914 +879 17 33 14 :72 343 8 14 +173 7634215 +4556l48 232 +145 231 +64 75 :2_6_ 643 :22. 406 +798 21 16 38 :22 684 16 9 +352 3716482 m Level 20 Level 21 Level 22 Addition Test (continued) domme'IChLONmO-fid + 688 964 874 +118 816 961453 4105 + 63 103 hONhNOQO-DWLD-HN + 816 37 9 4864 +718611 + . womeN-bON-DU'I-bd Level Level Level Level Level Level Level Level Level Level Level Level 10 11 12 104 Subtraction Test 16 -5 48 -2 28 -_11 13 -6 15 :11 346 -215 364 - 3 61 -2 36 -27 49 -_23 18 -9 19 :16. 836 -302 287 :_u_ 75 -9 47 :29. 97 -35 15 -7 14 it 666 -422 574 -133 91 -8 75 _-_6§. 105 Subtraction Test (continued) Level 13 37 48 93 :12 _-_2_9_ :51 Level 14 528 292 325 iii :5: -_3; Level 15 1067 4498 9147 -237 -825 -735 Level 16 173 237 576 -89 -l89 -398 Level 17 700 900 600 :15. :25. :_l_9_ Level 18 9546 8132 9758 -7325 -6021 -8543 Level 19 8535 9542 6543 ~7986 -8786 -5754 Level 20 5941 6805 9762 -968 -978 -986 Level 21 132428 823533 173461 -38679 -245835 -96748 Level 22 10000 80030 60011 -8192 -46759 -8965 BIBLIOGRAPHY BIBLIOGRAPHY Anderson, G. L. "Visual-Tactual Devices and Their Efficacy: An Experiment in Grade Eight." The Arithmetic Teacher, November 1957, pp. 196-203. Arnold, Felix. The Measurement of Teaching Efficiency. New York: Lloyd Adams Noble, 1916. Aurich, Sister M. R. "A Comparative Study to Determine the Effectiveness Of the Cuisenaire Method of Arithmetic Instruction with Children Of the First Grade Level." Master's thesis, Catholic University of America, 1963. Ayres, Leonard P. “History and Present Status of Educational Measurement." The Measurement Of Educational Products, in Seventeenth Yearbook Of the National Society for the Study OflEducation, pt. 2. Bloomington, 111.: Public School Pfiblishing Co., 1918, p. 9. Bisio, Robert M. "Effect of Manipulative Materials on Understanding Operations with Fractions in Grade V." Ed.D. dissertation, University of California, Berkeley, 1970. Bloom, Benjamin S., ed. Taxonomy of Educational Objectives: The Classification of Educational Goals, Handbook 1: Cognitive Domain. New York: Longmans, Green & Co., 1956. Bruner, Jerome S. The Process of Education. New York: Vintage Books, 1960. , et a1. Studies in Cognitive Growth. New York: John Wiley and Sons, 1966. Burns, Richard W. "Achievement Testing in Competency-Based Education." Educational Technology, November 1972, pp. 39-42. Carmody, Lenora M. "A Theoretical and Experimental Investigation into the Role of Concrete and Semi-Concrete Materials in the Teaching Of Elementary School Mathematics." Ph.D. dissertation, The Ohio State University, 1970. Carry, L. Ray. "A Critical Assessment of Published Tests for Elementary School Mathematics." The Arithmetic Teacher 21 (1974): 14-18. 106 107 Carver, Ronald P. "The Coleman Report: Using Inappropriately Designed Achievement Tests.“ American Educational Research Journal 12 (1975): 77-86. Cohen, Louis. "An Evaluation of a Technique to Improve Space Perception Abilities Through the Construction Of Models by Students in a Course in Solid Geometry." Ph.D. dissertation, Yishwa University, 1959. Cohen, Martin S. "A Comparison of Effects of Laboratory and Conven- tional Mathematics Teaching upon Underachieving Middle School Boys." Ed.D. dissertation, Temple University, 1970. Coleman, James S., et a1. Equality of Educational Opportunity, 2 vols. Publication of the National Center for EducationéT Statistics, DE 38001. Washington, D.C.: Government Printing Office, 1966. Cronbach, Lee J. "Course Improvement through Evaluation." Teachers College Record 64 (May 1963): 762-683. Crowder, A. B. "A Comparative Study Of Two Methods of Teaching Arithmetic in the First Grade." Ph.D. dissertation, North Texas State University, 1965. Dawson, 0. T., and Ruddell, A. K. "An Experimental Approach to the Division Idea." The Arithmetic Teacher 2 (February 1955): 6-9. De Cecco, John P. The Psychology of Learning and Instruction: Educational Psychology. Englewood C1iff5,TN.J.: Prentice-Hall, 1968. Dobbin, John E. "Measuring Achievement in a Changing Curriculum." Proceedings 1956 Invitational Conference on Testing Problems. Princeton, N.J.: Educational Testing Service, 1957, p. 103. Douglass, Harl R., and Spitzer, Herbert R. "The Importance of Teaching for Understanding." The Measurement Of Understanding, in Forty-Fifth Yearbook Of the National Society for the Study OflEdUCation, pt. 1. Chicago: University of ChicagoTPress, 1946, p. 24. Dressel, Paul L. "Information Which Should Be Provided by Test Publishers and Testing Agencies on the Validity and Use of Their Tests: Achievement Tests." Proceedings, 1949 Invitational Conference on Testing Problems. Princeton, N.J.: EducationalTTesting Service, 1950, p. 73. Ebeid, William P. "An Experimental Study Of the Scheduled Classroom Use of Student Self-Selected Materials in Teaching Junior High School Mathematics." Ph.D. dissertation, The University Of Michigan, 1964. 108 Ebel, Robert L. "Obtaining and Reporting Evidence on Content Validity." Educational and Psychological Measurement 16 (Autumn 1956): 269-282. , ed. Encyclopedia of Educational Research. 4th ed. New York: Macmillan, 1973. Eidson, William P. “The Role of Instructional Aids in Arithmetic Education." Ph.D. dissertation, The Ohio State University, 1956. Ekman, L. G. "A Comparison of the Effectiveness of Different Approaches to the Teaching Of Addition and Subtraction Algorithms in the Third Grade." Ph.D. dissertation, University of Minnesota, 1966. Ewbank, William A. "The Mathematics Laboratory: What? When? How?" The Arithmetic Teacher 18 (1971): 559-564. Exner, John E., Jr. The Rorschach: A Comprehensive System. New York: John Wiley and Sons, T974. Fennema, Elizabeth H. "Models and Mathematics.“ In Teacher-Made Aids for Elementary School Mathematics. Edited by Seaton E. Smith JF: and Carl A. Backman. Reston, Va.: The National Council of Teachers Of Mathematics, Inc., 1974, pp. 17-22. "A Study of the Relative Effectiveness Of a Meaningful Concrete and a Meaningful Symbolic Model in Learning a Selected Mathematical Principle." Technical Report NO. 101. Madison: Wisconsin Research and Development Center for Cognitive Learning, 1969. Fitzgerald, William M., and Higgins, Jon L., eds. Mathematics Laboratories: Implementation, Research, and Evaluation. Columbus, 0.: Center for Sciences and Mathematics Education, 1974. Glaser, Robert. "Adapting the Elementary School Curriculum to Individual Performance." Proceedings of the 1967 Invitational Conference on Testing Problems. Princeton, N.J.: Educational Testing Service, 1968, pp. 3-36. . "Psychology and Instructional Technology.“ In Training Research and Education. Pittsburgh: University of Pittsburgh Press, 1962. Green, Geraldine A. "A Comparison Of Two Approaches, Area and Finding a Part Of, and Two Instructional Materials, Diagrams and Manip- ulative Aids, on Multiplication of Fractional Numbers in Grade Five." Ph.D. dissertation, The University of Michigan, 1969. 109 Green, John A. Introduction to Measurement and Evaluation. New York: DOdd, Mead and Company, 1970. Gronlund, Norman E. Measurement and Evaluation in Teaching, 2nd ed. New York: The Macmillan—Campany, 1971. Haggerty, M. E. "Specific Uses of Measurement in the Solution School Problems.“ The Measurement of Educational Products, in Seventeenth Yearbook of the National Society for the Study of Education, pt. 2. Bloomington, 111.: Public School Publishing Co., 1918, p. 25. Haynes, J. O. “Cuisenaire Rods and the Teaching of Multiplication to Third Grade Children." Ph.D. dissertation, Florida State University, 1963. Hollis, Loye Y. "A Study to Compare the Effect Of Teaching First and Second Grade Mathematics by the Cuisenaire-Gattegno Method with a Traditional Method." School Science and Mathematics 65 (November 1965): 683-687. Holt, John. Freedom and Bgyond. New York: Dell Publishing Company, 1972. Howard, C. F. "Three Methods Of Teaching Arithmetic." California Journal of Educational Research 1 (January 1950): 25-29. Howard, Vivian G. "Teaching Mathematics to the Culturally Deprived and Academically Retarded Rural Child." Ph.D. dissertation, University of Virginia, 1969. Johnson, Donovan A., ed. Evaluation in Mathematics. Reston, Va.: National Council of Teachers Of Mathematics, 1965. Johnson, Randall E. "The Effect of Activity Oriented Lessons on the Achievement and Attitudes Of Seventh Grade Students in Mathe- matics." Ph.D. dissertation, University of Minnesota, 1970. Judd, Charles H. "A Look Forward." The Measurement of Educational Products, in Seventeenth Yearbook Of the National Society for the Study of Edhcation, pt. 2. BTOomington, IlT}: Public School Publishing Co., 1918, pp. 159-160. Kieren, Thomas E. "Manipulative Activity in Mathematics Learning." Journal for Research in Mathematics Education, May 1971, pp. 228-233. . "Review of Research on Activity Learning." Review of Educational Research, October 1969, pp. 509-522. 110 Kerr, Donald R., Jr. in consultation with John F. Le Blanc. "Mathematics Laboratory Evaluation." In Mathematics Laboratories: Implementation, Research, and Evaluation. Edited byTWilliam M. Fitzgerald and Jon L. Higgins. Columbus, 0.: ERIC, November 1974. Krathwohl, David R., Bloom, Benjamin S., and Mason, Bertram B. Taxonomy of Educational Objectives: The Classification of Educational Goals) Handbook II: Affective Domain. New York: David McKay Company, Inc., 1964. Lankford, Frances G. "What Can a Teacher Learn About a Pupil's Thinking Through Oral InterviewS?" The Arithmetic Teacher 1 (January 1974): 26-32. Lewy, Arieh. “Discrimination Among Individuals V. Discrimination Among Groups." Journal Of Educational Measurement 10 (1975): 19-24. Lucas, J. S. "The Effect of Attribute-Block Training on Children's Development Of Arithmetic." Ph.D. dissertation, University of California, Berkeley, 1966. Lucon, William H. “An Experiment with the Cuisenaire Method in Grade Three." American Educational Research Journal 1 (May 1964): 159-167. McNemar, Quinn. Psychological Statistics. New York: John Wiley and Sons, Inc., 1969. Mehrens, W. A., and Lehmann, Irvin J. Standardized Tests in Education. New York: Holt, Rinehart and Winston, Inc., 1969. Merwin, Jack C. "Historical Review of Changing Concepts of Evaluation." Educational Evaluation New Roles, New Means, in The Sixty-Eighth Yearbook of the National Sogiety for the Study of Education, pt. 2. Edited by Ralph W. Tyler. Chicago: The University Of Chicago Press, 1969, pp. 6-25. Monroe, Walter S. Measuringuthe Results Of Teaching. Boston: Houghton Mifflin Co., 1918. Moody, William 8., Abdell, Roberta, and Bausell, Barker R. "The Effect of Activity Oriented Instruction Upon Original Learning, Transfer and Retention." Journal for Research in Mathematics Education, May 1971, pp. 208-212. Mott, E. R. "An Experimental Study Testing the Value of Using Multisensory Experiences in the Teaching of Measurement Units on the Fifth and Sixth Grade Level." Ph.D. dissertation, Pennsylvania State University, 1959. 111 Myers, Shelton S. Mathematics Tests Available in the United States. Washington, D.C.: National Council Of Teachers of Mathematics, April 1959. Nasea, 0. "Comparative Merits of a Manipulative Approach to Second- Grade Arithmetic." The Arithmetic Teacher 13 (March 1966): 221-226. Nickel, Anton P. "A Multi-Experience Approach to Conceputalization for the Purpose of Improvement Of Verbal Problem Solving in Arithmetic." Ph.D. dissertation, University Of Oregon, 1971. Norman, M. "Three Methods Of Teaching Basic Division Facts." Ph.D. dissertation, University Of Iowa, 1955. Nutshall, E., and Snooh, R. "Teaching Models." In Encyclopedia Of Educational Research. 4th ed. Edited by Robert L. EbeTZ New York: Macmillan, 1973. Pace, C. R., and Stern, G. G. "An Approach to the Measurement of Psychological Characteristics of College Environments." Journal Of Educational Ppythology 49 (1959): 269-277. Passy, R. A. "The Effect Of Cuisenaire Materials on Reasoning and Computation.“ The Arithmetic Teacher 10 (November 1963): 439-440. Peck, Donald M., and Jencks, Stanley M. "What the Tests Don't Tell." The Arithmetic Teacher 21 (January 1974): 54-56. Price, R. D. "An Experimental Evaluation Of the Relative Effectiveness Of the Use Of Certain Multi-Sensory Aids in Instruction in the Division Of Fractions." Ph.D. dissertation, University of Minnesota, 1950. Rankin, Paul T. "Environmental Factors Contributing to Learning.‘I Educational Diagnosis, in Thirty-Fourth Yearbook of the National Societyufor the Study of Edfication. Bloomington, 111.: Public School Publishing Co., 1935. Reavis, William C. "Contributions Of Research to Educational Adminis- tration." The Scientific Movement in Education, in Thirt - Seventh Yearbook Of the National Society for the Study of Education, pt. 2. Bloomington, 111.: Public School Publishing Co., 1938, p. 27. Reisman, Fredicka K. A Guide to the Diagnostic Teaching Of Arithmetic. Columbus, 0.: Charles E. Merril Publishing Company. Reys, Robert E. "Considerations for Teachers Using Manipulative Materials." The Arithmetic Teacher 18 (1971): 551-558. 112 Rice, Joseph M. "The Futility Of the Spelling Grind." Forum 23 (April, June 1897): 163-172, 409-419. Ropes, George H. “Multi-Sensory Aids in the Teaching Of Arithmetic to the Second Grade." Ph.D. dissertation, Teachers College, Columbia University, 1973. Rugg: Harold. Statistical Methods Applied in Education. Chicago: University of Chicago Press, 1917. Russell, Butrand. Education and the Good Life. New York: Leveright Paperbound Edition, 1926. Schudson, Michael S. "Organizing the 'Meritocracy': A History of the College Entrance Examination Board." Harvard Educational Review 42 (1972): 34-69. Schwab, Joseph J. “The Concept of Structure in the Subject Field." Paper presented at the 20th Annual Meeting Of the Council on Cooperation in Teacher Education of the American Council on Education, October 1961, Washington, D.C. Chicago: University of Chicago. Schwartz, Frederick J. "The Impact On Learning of COLAMADA Project Materials on Low Achievers in Mathematics." Ph.D. dissertation, University of Virginia, 1971. Scriven, Michael. "The Methodology Of Evaluation." Perspectives of Curriculum Evaluation: American Educational Research Associa- tion, Monogrpph Series on Curriculum Evaluation. Chicago: Rand McNally & Co., 1967. pp. 39-83. Seick, Dana F. “The Value Of Multi-Sensory Learning Aids in the Teach- ing of Arithmetical Skills and Problem Solving--An Experimental Study." Ph.D. dissertation, Northwestern University, 1959. Shoecraft, Paul J. "The Effects of Provisions for Imagery Through Materials and Drawings on Translating Algebra Word Problems, Grades Seven and Nine." Ph.D. dissertation, The University of Michigan, 1971. Simpson, Ray N. Improving_Teaching-Learning Process. New York: Longmans, Green & Co., 1953. Sinclair, Hermine. "Piaget's Theory of Development: The Main Stages." In Piagetian Cognitive-Development Research and Mathematical Education. Edited byTMyron F. Rosokopf. Reston, Va.: National Council of Teachers of Mathematics, 1971. 113 Skinner, B. F. About Behaviorism. New York: Alfred A. Knopf, 1974. Beyond Freedom and Dignity, New York: Alfred A. KnOpf, “197T. . Science and Human Behavior. New York: The Free Press, 1965. . The Technology Of Teaching. New York: Meredith Corporation, 1968. Sole, David. “The Use Of Materials in Teaching Of Arithmetic." Ph.D. dissertation, Columbia University, 1957. Spross, P. M. "A Study of the Effect of a Tangible and Conceptualized Presentation of Arithmetic on Achievement in the Fifth and Sixth Grades." Ph.D. dissertation, Michigan State University, 1962. Squire, A., and Applebee, J. "Language Education." In Enpyclopedia of Educational Research. Edited by Robert L. Ebel. New York: Macmillan, 1966. Stanley, J. C., and Glass, G. V. Statistical Methods in Education and Psychology. Englewood Cliffs, N.J.: Prentice Hall, Inc., 1970. Starch, Daniel. "Standard Tests as Aids in the Classification and Promotion of Pupils.“ Standards and Tests for the Measurement of the Efficiency Of Schools and School Systems, in Fifteenth Yearbook of the National Society for the Study,of Education, pt. 2. Chicago: University Of Chicago Press, 1916, p. 143. Suydam, Marilyn. "Evaluation in Mathematics Classrooms: From What and Why to How and Where." ERIC. Columbus, 0.: Information Analysis Center for Science and Mathematics, 1974. . "Unpublished Instruments for Evaluation in Mathematics Education: An Annotated Listing." ERIC. Columbus, 0.: Information Analysis Center for Science and Mathematics, 1974. Swart, William L. "Evaluation of Mathematics Instruction in the Elementary Classroom." The Arithmetic Teacher 21 (January 1974): 7-11. Taba, Hilda. Teachers' Handbook for Elementary Social Studies. Palo Alto: Addison-Wesley Publishing Company, 1967. Terman, Lewis M., Lyman, Grace, Ordall, George, Ordahl, Louise E., Galbraith, Neva, and Talbert, Wilford. The Stanford Revision and Extension of the Binet-Simon Scale for Measuring Intelli- gence. Baltimore: Warwick and York, Inc., 1917. 114 Toney, JO Anne. "The Effectiveness of Individual Manipulation of Instructional Materials as Compared to a Teacher Demonstration in Developing Understanding in Mathematics." Ph.D. dissertation, Indiana University, 1968. Troyer, Maurice E. Accuracy_and Validity in Evaluation Are Not Enough. New York: Syracuse University Press, 1947. Trueblood, Cedel R. "A Comparison of Two Techniques for Using Visual- Tactual Devices to Teach Exponents and Non-Decimal Bases in Elementary School Mathematics." Ed.D. dissertation, The Pennsylvania State University, 1967. Ullman, Neil R. Statistics--An ApplieduApproach. Lexington, Mass.: Xerox College Publishing, 1972. Vance, James H. “The Effects of a Mathematics Laboratory in Grade 7 and 8. An Experimental Study." Ph.D. dissertation, University of Alberta, 1969. , and Kieren, Thomas E. "Laboratory Settings in Mathematics: What Does Research Say to the Teacher?" The Arithmetic Teacher, December 1971. pp. 585-589. ' Van Engen, H. "Analysis Of Meaning in Arithmetic." Elementary School Journal 49 (February-March 1949): 321-329; 395-400. Wasylyk, E. "A Laboratory Approach to Mathematics for Low Achievers: An Experimental Study." A working paper, University of Alberta, 1970. Weber, Andra W. "Introducing Mathematics to First Grade Children: Manipulative vs. Paper and Pencil." Ed.D. dissertation, University of California, Berkeley, 1969. Wilkinson, Jack D. "A Laboratory Method to Teach Geometry in Selected Sixth Grade Mathematics Classes." Ph.D. dissertation, Iowa State University, 1970. . "Teacher-Directed Evaluation of Mathematics Laboratories." The Arithmetic Teacher 21 (1974). Wolf, Richard. "The Measurement of Environments.“ Proceedings Of the 1964 Invitational Conference on TestingyProblems. Princeton, N.J.: Educational Testing Service, 1965, pp. 93-106. Wynroth, Lloyd Z. “Learning Arithmetic by Playing Games." Ph.D. dissertation, Cornell University, 1970. 1 13111 1211111 111112111 1111,1111