THE DEVELOPMENT AND EVALUATION OF A DIAGNOSTIC SYSTEM OF REMEDIATION FOR AN AUTO-TUTORIAL COURSE IN GENERAL COLLEGE CHEMISTRY A Dissertation for II'Ie Degree of DH. D. MICHIGAN STATE UNIVERSITY Gary William VanKempen I977 L [B R A R Y Michigan State University This is to certify that the thesis entitled THE DEVELOPMENT AND EVALUATION OF A DIAGNOSTIC SYSTEM OF REMEDIATION FOR AN AUTO-TUTORIAL COURSE IN GENERAL COLLEGE CHEMISTRY presented by GARY WILLIAM VANKEMPEN has been accepted towards fulfillment of the requirements for Ph . D . Chemis try and degree in Administration & Higher Education Major professor Date ~Deceuber.10, 1976 0—7639 . . .y~- .Winisms ABSTRACT THE DEVELOPMENT AND EVALUATION OF A DIAGNOSTIC SYSTEM OF REMEDIATION FOR AN AUTO-TUTORIAL COURSE IN GENERAL COLLEGE CHEMISTRY BY Gary William VanKempen A unique system of diagnosis and remediation has been developed for an auto-tutorial computer managed course in general college chemistry. The system is based upon an analysis of examination questions used in the course to identify the kind or kinds of thinking required by each question. A task analysis was used to identify six important kinds of thinking which are: memorization, translation, classification, visualization, reasoning and reasoning with math. The validity of these categories was tested by examining the agreement obtained when several content experts classified the questions independently. The highest interclassifier agreement (over 90 percent) was obtained for the memorization and reasoning categories. For each of the other categories, the agreement was approximately 75 percent. A further test of validity compared the inter-item correlation coefficients between Gary William VanKempen pairs of questions each of the same kind of thinking and pairs of questions in which the kinds of thinking were not matched. The correlations from questions of the same kind of thinking were significantly higher (a = .05) than correlations for different kinds of thinking for the memorization and reasoning with math categories. An experimental remedial system was designed in which students who scored below 60 percent on previous examination questions received remediation based on the kind of thinking in which they were most deficient. The two categories on which the remediation was based were memorization and reasoning with math. A distinction was made between scores obtained from tests involving content discussed in remediation (the initial learning score) and scores obtained when the student was being introduced to new material (the transfer score). The latter repre- sents the transfer of training in a particular kind of thinking to a new topic in the course. Supplementary instructional materials and classes were made available to students for the first four weeks of a ten-week term. For both the memorization and the reasoning with math categories, the experimental group scored significantly higher than the control group on the initial learning score but there was no significant difference between the groups on the transfer score. Thus remediation seems to improve performance on material Gary William VanKempen discussed during the remedial class, but the improvement is not maintained when new material is introduced. THE DEVELOPMENT AND EVALUATION OF A DIAGNOSTIC SYSTEM OF REMEDIATION FOR AN AUTO-TUTORIAL COURSE IN GENERAL COLLEGE CHEMISTRY BY Gary William VanKempen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Departments of Chemistry and Higher Education and Administration 1977 TO DORINDA ii ACKNOWLEDGMENTS I would like to express my sincere thanks to Dr. Robert N. Hammer for his guidance throughout my graduate study. A special thanks is also extended to Dr. Ed Smith for his friendship and for his assistance with this project. Appreciation is also extended to Dr. Jack B. Kinsinger who provided a much needed inspiration. This work was supported in part by a grant from the Educational Development Program of Michigan State Uni- versity and a grant from the Alfred P. Sloan Foundation administered through the College of Engineering at Michigan State. Appreciation is extended to the Chemistry Depart- ment of Michigan State University for supporting me during the initial stage of my graduate study and providing me with my first significant opportunity to teach. Finally, I would like to thank my parents, Peter and Manual VanKempen, and my wife, Dorinda, for their con— tinued support and love. iii TABLE OF CONTENTS Chapter Page LIST OF TABLES . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES. . . . . . . . . . . . . . . . . . . .Viii INTRODUCTION: AN OVERVIEW OF THE PROJECT. . . . . . . l The Problem. . . . . . . . . . . . . . . . . . . . 1 Goals of the Project . . . . . . . . . . . . . . . 2 Research Questions . . . . . . . . . . . . . . . . 4 Generalizability of the Results. . . . . . . . . . 5 HISTORICAL: A REVIEW OF THE LITERATURE. . . . . . . . 6 Classifying the Outcomes of Education. . . . . . . 6 Gagne's Classification of Learning . . . . . . 8 Science Processes. . . . . . . . . . . . . . . ll Empirical Support of Taxonomies. . . . . . . . l2 Diagnosis and Remediation. . . . . . . . . . . . . l4 Diagnosis Based on Piagetian Theory. . . . . . 15 The Effects of Diagnostics . . . . . . . . . . l8 EXPERIMENTAL METHODS AND PROCEDURES. . . . . . . . . . 22 The Instructional Setting. . . . . . . . . . . . . 22 The Computer Management System . . . . . . . . . . 23 The Classification Scheme. . . . . . . . . . . . . 25 Developing the Categories: A Task Analysis. . . . . . . . . .f. . . . . . . 26 Kinds of Thinking. . . . . . . . . . . . . . . 27 Criteria and Procedures for Characterizing Questions . . . . . . . . . . . 28 Validation of the Categories . . . . . . . . . . . 35 iv Chapter Reliability. . . . . . . . . Agreement Among Classifiers. An Analysis of Correlation Coefficients . . . . . . . . The Remedial System: Project CLIC . . . . . Selecting the Sample . . . . The Design . . . . . . . . . The Treatment. . . . . . . . Evaluation of the Remedial System. . . . . . SUMMARY AND DISCUSSION Overview of the Project. . . . . The Validity of the Classification System 0 O O O O O O O O O O I 0 Reliability. 0 O O O O O O O Interclassifier Agreement. . Analysis of Correlation Coefficients . . Evaluation of Project CLIC . . . IMPLICATIONS FOR FUTURE RESEARCH AND A Chemical Education Laboratory. The Difficulty Index . . . . . . Alternative Validity Studies . . Selecting the Sample . . . . . . A Piagetian Classification . . . REFERENCES . . . . . . . . . . . . . APPENDIX A . . . . . . . . . . . . . APPENDIX B . . . . . . . . . . . . . APPENDIX C . . . . . . . . . . . . . APPENDIX D O O O O O O O O O O O O 0 DEVELOPMENT Page 36 38 39 47 47 48 49 50 55 56 56 57 60 63 69 69 71 73 74 75 78 80 93 95 99 LIST OF APPENDICES Letter Referred to on Page Title A 23 Fortran Programs B 34 Characterization of Questions C 40 Calculating Grouped Average Correlation Coefficients D 49 ‘ Outline of CLIC Material Vi LIST OF TABLES Table I. Interclassifier Agreement . . . . . II. Averages and Standard Deviations for Grouped Sets of Correlation Coefficients. . . . . . . . . . . . III. Planned Comparisons of Grouped Correlation Coefficients. . . . . . IV. Two-way Analysis of Variance of Grouped Correlation Coefficients. . V. T-test Evaluation of Project CLIC . VI. A Summary of Interclassifier Agree- ment 0 O O O O O O O O O O O O O O 0 vii Page 40 43 44 46 S4 58 LIST OF FIGURES Figure Page 1 A Two-way Analysis of Variance . . . . . 45 viii INTRODUCTION: AN OVERVIEW OF THE PROJECT The Problem It is obvious to anyone teaching chemistry at the introductory college level that many students find the subject difficult to comprehend. Unfortunately, the sources of this difficulty are much less obvious, as is the remedy. The problem is compounded by the trend toward less stringent admission requirements, the increasing number of special programs for students who are seeking post-secondary education, but do not meet standard admis- sion requirements, and the large enrollments in courses for non-majors which are prerequisite for courses in other academic areas. College science teachers are being asked to deal effectively with students who traditionally would have been considered incapable of doing science, either because of poor backgrounds in high school science or a lack of sufficient basic abilities. These students, when placed into courses with more capable students, present a difficult problem for instructors. The more heterogeneous a group of students becomes, the more difficult it is to present materials and tasks which challenge students, but represent reasonable expectations. There have been many attempts to solve this problem through programmed instruc- tion, personalized systems of instruction, or through traditional remediation. While there are advantages and disadvantages accompany- ing each of these methods, one important advantage of re- medial instruction is its supplementary nature. Any pro- cedure for remediation can be developed as an addition to--rather than a modification of—-an existing course. Remedial instruction, however, can only be effective to the extent that it focuses accurately on student learning problems. The identification of these problems is a dif- ficult task which usually requires a substantial amount of student-teacher contact. Often the high enrollment and emphasis on decreased cost prevents a great deal of the type of contact between students and instructors which is necessary for any kind of systematic look at student learn- ing problems. Goals of the Project This project begins with the assumption that there are a significant number of students in introductory fresh- man chemistry courses who do not achieve at the level they could because they have not developed some of the skills and abilities necessary for success. We realize that there are other factors, such as motivation, attitude, and prior experience in studying chemistry which also affect a student's achievement. We have, however, chosen to focus our atten- tion on a specific set of abilities to be identified through a study of student performance on course examinations. This set of abilities will be divided into categories which are referred to as "kinds of thinking". These kinds of thinking are identified through a task analysis of the questions typically asked on freshman chemistry exams. From this task analysis, generalized task descriptions are developed which are then converted into descriptions of specific kinds of thinking. Each exam question can then be characterized according to the kinds of thinking it requires. The specific goals of this project can be considered in three parts: a. To develop and validate a series of criteria by which one can classify questions commonly asked in general chemistry. This classification will be based upon the kinds of thinking required to answer each question. b. To develop a system of diagnosis which identifies students who are scoring poorly on a group of questions which require the same kind of thinking. c. To develop and evaluate remedial materials which focus on particular kinds of thinking as they apply to general chemistry. The purpose of the criteria described in part "a' above is to guide the characterization of test items in terms of the kinds of thinking required to answer them. Once questions are characterized in this manner, it should be possible to group them into separate classes depending upon whether the question does or does not require specific kinds of thinking. Examination of student scores on a particular class of questions may reveal a pattern of misses which could be traced to a lack of ability to perform the kind of thinking required. Remediation which focuses on these thought processes as they relate to chemistry, might then improve scores on this particular class of questions. Research Questions In general, the project can be viewed as a study of student performance on examination questions in general chemistry as a function of the kinds of thinking required by the questions. To facilitate the discussion of the research questions, the following definitions have been established. a. Class oquuestions: All questions which are classi- fied as having a particular kind of thinking or series of kinds of thinking in common. b. Topic: Any specified content which is discussed sequentially during a course. c. Subtest score: The percentage correct responses to a specified group of questions which are all members of the same class of questions. The project is designed to answer the following research questions. a. Does student performance on examination questions support the categories proposed in this classifica- tion scheme? b. Do the remedial materials which focus on the kind of thinking required in a specific class of questions covering a particular topic in chemistry improve student performance on that class of questions for that particular topic? c. Once students receive remediation on a particular kind of thinking as it applies to general chemistry in one topic, will there be any improvement in their performance on questions requiring the same kind of thinking but covering a different topic? That is, is there transfer of training of a par- ticular kind of thinking to a new content within the course? Generalizability of the Results Since the content of general chemistry is fairly stan- dard throughout the majority of freshman college courses and many high school courses, we expect that the classi- fication scheme will be useful to anyone teaching general chemistry. Since the students involved are representative college freshmen, the identified learning problems are likely to be present in varying degrees in any general chemistry course. The system of diagnosis is not neces— sarily restricted to an individualized course, although frequent examinations and careful record keeping are likely to be essential. The success of the diagnostic system and the supplementary instructional materials in improving students' examination scores in general chemistry at Michigan State will be a clear indication of their probable success at other schools. The classification of a question may be dependent on the kind of instruction given in a course. That is, the kind of thinking used by a student to answer a question may depend upon the instruction received by the student. HISTORICAL: A REVIEW OF THE LITERATURE Classifying the Outcomes of Education The interest in a classification of learning in terms other than the specific content of a discipline has been evident for decades. One of the earliest and most widely used classifications of this type is Bloom's Taxonomy of 1 in which Bloom Educational Objectives: Cognitive Domain, organizes learning into categories labelled knowledge, com- prehension, application, analysis, synthesis, and evaluation. One very popular use of this taxonomy has been to describe the content of achievement tests. Fast2 reported an analysis of the American Chemical Society-National Science Teacher's Association High School Chemistry Achievement Test in which he found 40 percent of the questions to be in the knowledge category, 25 percent each in comprehension and application and 10 percent in the analysis category. He also discovered a trend toward a higher percentage of application and analysis questions during the period from 1957 to 1971. Airaisian3 used Bloom's taxonomy to describe the objectives of two chapters of Chemistry: An Experimental Science according to Bloom's level of the objective. He found that high school chemistry teachers could classify the objec- tives with a 90 percent level of agreement. Airaisian also determined that a majority of the objectives fell into the knowledge class and required only recall of information. 4 used Bloom's taxonomy to analyze the cognitive Scott levels of activities and exercises in a particular set of instructional materials. He examined an early edition of Science--A Process Approach5 and found that many activi- ties required application behavior and some required anal- ysis and synthesis. As the use of the taxonomy became widespread, there developed a consensus that many textbooks and standardized tests fail to provide enough tasks at the higher levels of synthesis, evaluation and application. This has lead in some cases to a decrease in emphasis on tasks which require the student simply to recall informa— tion. This situation illustrates the effect that a classi- fication of the objectives of education can have on the direction of curricular change. One weakness of a thought process approach to ques- tion classification is that these processes are only inferential constructs. They cannot be observed directly. One cannot assume that all students answer the same ques— tions using the same cognitive processes. To help deal with this problem, one should keep in mind the instruc- tional material on which the questions being classified are based. The chances may be greater that two students will answer the same questions using the same processes if they have been exposed to the same instructional mater— ial. The classification of questions which is used in 'the present. project is not based upon a strict cognitive process approach. That is, we do not intend to identify in detail the information processing routines used by students to answer questions. The procedure used in this research is similar to a task analysis described by Smith6 in which tasks are described according to the characteris- tics of the given information and the information which the student is attempting to find. An assumption must be made concerning the information which students bring to a given task. For this research, the assumption will be based upon a knowledge of the instructional material presented to the student and the knowledge gained from many years of experience observing how students perform the required tasks of this freshman chemistry course. Gagne's Classification of Learning Another widely accepted classification of learning is that described by Gagne. In his article on the domains of learning,7 he categorizes learning processes into class- es described as motor skills, verbal information, intellec— tual skills, cognitive strategies, and attitudes. He emphasizes that these kinds of learning can be found to varying extents in all disciplines. He suggests that state- ments concerning optimum learning conditions and methods of testing which are apprOpriate for one of the domains of learning may not be appropriate for another. In particu- lary the instructional procedures which maximize learning ‘mithin one domain are different from those which best encourage learning in a different domain. For example, the nature of instruction designed to teach the learner to perform an acid-base titration should certainly be dif- ferent from instruction on the nomenclature of simple molecules. In the former case one is teaching a motor skill, while the latter involves intellectual skills and verbal information. Gagne also has emphasized that different assessment techniques are required for objectives in different domains. One does not test possession of verbal knowledge in the same manner as he would test the possession of intellectual skills. If this latter idea is correct, it should then be possible to describe the domain or domains of learning of specific test questions and, from scores on these ques- tions, to evaluate student learning within these specific domains. Although Gagne's categories are still best des- cribed as inferential constructs, they have provided sig- nificant guidelines for the design and evaluation of in- struction. They have also greatly influenced the nature of the classifications proposed in this study. Since the categories that have been developed for the present diag- :nostic system.can best be described as verbal information and intellectual skills, these domains will be discussed in more detail . The learning of verbal information means that the learner is able to state in declarative form what he has 10 learned. Verbal information comprises the facts, principles, and generalizations which make up a large part of school learning in any discipline. In introductory chemistry, the student is asked to learn the chemical symbols of the elements. This verbal information can be presented by examining the relationships between the names of elements and their symbols and giving the appropriate Latin or German names. To test whether a student has learned these symbols one usually asks that the name be declared from the symbol, or vise-versa. The learning of verbal informa- tion does not necessarily give the learner the ability to apply that information to a novel situation. Gagne describes intellectual skills as "knowing how as contrasted with knowing that"8. A student learns to convert fractions to decimals or to represent the electronic configuration of the elements or the Lewis dot structures of simple molecules. An intellectual skill is a learned capability which enables the learner to perform a particular group of tasks if he possesses the appropriate knowledge. The lack of an intellectual skill may prevent a student from performing a class of tasks in spite of his knowledge of verbal information. Studies have shown that ability to perform a specific task is greatly increased if the student is taught the prerequisite skills.9 Gagne and Brownlo’11 investigated the learning of a task of constructing formulas for the sums of number series. They identified the "sub- Cufiinate skills" which were necessary to perform the task 11 and related these skills to one another in a hierarchy. The hierarchy therefore represents the sequence of abilities upon which the learner would rely in order to perform the superordinate task. Seven students who were unable to construct the formulas were tested on each of the sub- ordinate skills and given instruction on the skills which they could not perform. The final task was again presented with verbal directions about how to do it but no additional practice. Six out of the seven students were then able to perform the final task. Thus, through an analysis of the task and through appropriate instruction on subordinate tasks, the experimenters were able to significantly increase students' ability to perform the desired task. Science Processes An important classification of the objectives of science education has been in terms of "science processes". A science process can be described as a class of similar tasks which scientists perform. These tasks include ob- serving, comparing, classifying, quantifying, measuring, experimenting, inferring, and predicting. The idea of science processes is included in this review because the philosophy behind the characterization of these processes has influenced the conception of the present project. A science process is a class of similar tasks which scientists perform, while each category developed in this study is a 12 class of similar tasks which students are required to perform. Exploring the nature of these classes of tasks is an important step in determining the skills which are necessary to perform them. Guided mostly by the work of Schwab, efforts were made to characterize the nature of what it is that scientists do with the information that they obtain and how they go about obtaining it. Despite Schwab's warnings, the science teaching establishment accepted the notion that these processes once identified, would prove to be common through- out the various scientific disciplines. Although this did not prove to be completely true, the notion that there are certain important abilities which are essential com- ponents of many science disciplines has remained. Research in this area has focused on the measurement of student per— formance of these process objectives, relating this per- formance to overall course achievement, and studying the nature of the processes themselves. Empirical Support of Taxonomies A serious criticism raised against the taxonomies which have been described is that the abilities suggested in these categories are not reflected in measurements of 12 studied student performance. For example, Tannenbaum science processes by means of an empirical test which he designed. The science processes which he studied include 13 observing, comparing, classifying, quantifying, measuring, experimenting, inferring, and predicting. His test included 96 items chosen from several of the natural sciences. The author suggests (but does not substantiate) that student performance on the test should not depend upon the distribu- tion of questions among the various disciplines. Tannen- baum used textbooks and various research reports to develop a list of behaviors which science students are expected to exhibit. These behaviors were then classified according to the eight categories listed above. The test was admin- istered and several statistical procedures were applied to the results. The author reports overall test reliability as well as subtest reliability, which is the reliability of a group of questions from only one of the categories men- tioned. For four of the eight processes studied, the sub- test reliabilities were not significantly greater than that expected of a random sample of the corresponding number of questions from the entire test. A factor analysis identified only one general factor which accounted for about 50% of the variance in the scores. In an empirical study of the hierarchical nature of Bloom's taxonomy, Stedman13 found no significant difference in scores for questions identified as knowledge and compre— hension or application and analysis. There was however a significant difference between scores on questions from the comprehension and application categories. 14 The failure of many of these taxonomic categories of objectives to be validated in empirical studies may in part be due to their general nature. In this project, the categories are based upon an analysis of the ques- tions typically asked in a general chemistry sequence. We hope that these categories-—which are designed not only for a particular discipline but for a particular group of courses within that discipline~-may be more easily validated and consequently may provide more useful infor— mation about student learning than general categories. Diagnosis and Remediation There have been a number of projects reported which are in some ways similar to the present project. This review is limited to diagnostic and remedial systems used in high school chemistry and physics courses as well as in college science courses because the informa- tion from these areas is the most generalizable to the freshman college chemistry course used as the laboratory for the present research. For the purposes of this study, remediation is defined as an attempt to supply the thinking skills prerequisite to a particular learning task when those prerequisites are not part of the subject matter of the course. Furthermore, the prerequisites are defined as a set of intellectual skills which are applied to the tasks of college level general 15 chemistry. Since a vast literature exists on the effects of various remedial treatments, it seems appropriate to consider only the effect of that part of remediation con— cerned with thinking skills assumed to he possessed by most college freshmen and which are applied throughout most introductory science courses. Diagnosis Based on Piagetian Theory Recent attempts to apply Piagetian theory to college students has produced some unexpected results. According 14 to Piaget, we each pass through four distinct periods of intellectual development as we mature. There are: l. Sensori-motor (0-2 years); 2. Preoperational thought (2-7 years); 3. Concrete operations (7-11 years); 4. Formal operations (11-15 years). Research in this area has focused on (1) the develop- ment of tests on an individual's stage of intellectual development and (2) the mechanism by which one advances to a higher stage and the effect of instructional experiences on that advancement.15 For example, in a study reported by Bredderman,16 the effect of training fifth and sixth grade students to control variables was measured by ad— ministering pre and post tests involving the control of several different variables. According to Piagetian 16 theory, individuals who are not yet in the formal stage of intellectual development will be unable to perform this task. Bredderman demonstrated a significant but small improvement in students' ability to perform these tasks as the result of training. There were some students for whom the training had no effect as demonstrated by their pretest and posttest scores. These results are typical of the Piagetian training studies reviewed by Beilin.15 The application of Piagetian tasks to college students has revealed that many college freshmen do not demonstrate an ability to think at the formal level in some situations. 17 states that in a study at Oklahoma City Univer- McKinnon sity 50 percent of 143 college freshmen failed to perform tasks requiring thinking at the formal level. In a study 18 by Renner and Lawson, only 22 percent of a sample of college freshmen were judged to be in the formal stage 19 and Renner20 of development. Studies by Griffith have produced similar results. The question which remains unanswered is whether the results with college students indicate a general develOp- mental retardation on the part of a vast majority of college freshmen or an inability of these students to think at the formal level in specific situations. An interpretation of these results which is not in conflict with Piagetian theory is that many students who have advanced to the formal stage of intellectual development 17 do not always demonstrate their ability to think at this level in all situations. This may be due to the particular content involved in the Piagetian task, the effect of test anxiety, the student's habit of reverting to concrete think- ing in certain situations, or the lack of validity of the particular test. Support of this interpretation comes 20 with children from grades from a study by Danner and Day five to twelve. Initially, 50% of the older subjects and none of the younger subjects were able to perform tasks which require formal operations. After a few prompts, nearly all of the older subjects and a few of the younger subjects were able to perform at the formal level on a different task. Renner and Lawson also found that high school chemistry, biology and physics students scored significantly higher than the general population of high school students. This may be the result of a selection of science courses by formal thinkers, or is the result of the practice one can get in thinking at the formal level in a typical science class, or both. The evidence does indicate however that many freshman college students do not demon— strate an ability to think at the formal level. Few would deny that college chemistry and physics are taught at the formal level. Students are required to deal with relationships between variables and to have an under— standing of how these relationships are developed and tested. If students taking these courses are not in the 18 habit of thinking at the formal level, they will experience a great deal of difficulty mastering the materials in these courses. Assuming that these students could be identified, they could be provided with instruction de- signed to increase their tendency to use formal operations in their study of science. The optimum nature of this instruction is, at this time, unknown and is a question which deserves some serious attention. The Effects of Diagnostics Lawler has investigated the effects of a diagnostic system which identifies for each student the objectives which he has failed to master in a health sciences course 21 Exams were taken on an at the freshman college level. IBM 1500 terminal designed for computer assisted instruc- tion. Students who received this diagnostic information showed greater achievement as measured by the course final exam than students who did not. Apparently the diagnostic information helped students focus on objectives which they were unable to perform. In a freshman chemistry course at the University of 22 a diagnostic system called CHEM TIPS has Wisconsin, been established in which students take a once-a-week survey which requires them to demonstrate knowledge of recent course material. The responses are computer analyzed according to a set of predetermined criteria. 19 Those students who miss particular questions or groups of questions receive computer generated messages indicating topics they should work on, textbook page numbers where this material can be found, and times during the week when help sessions dealing with this material will be held. Survey results were given to the teaching assistants so they could work on this material during recitation sections. In an attempt to evaluate the effectiveness of this system, one of two lecture sections taught by the same instructor was given the option of taking the CHEM TIPS survey while the other was not. Enrollments were 163 in the experimental group and 167 in the control group. There was in this case no significant difference between the average scores on three course examinations for the two groups. There was, however, a difference in the attitudes of students toward their teaching assistants. The experimental group responded more favorably to questions concerning the teaching assistant's interest in student's progress, ef- fectiveness as a teacher and ability to answer questions. This could be the result of the teaching assistants bene- fiting from the information provided by the CHEM TIPS survey. Riban23 has studied a diagnostic system which identi- fies deficiencies in mathematical abilities based upon patterns of correct and incorrect solutions of physics problems by high school physics students. The mathematical abilities were established through an analysis of the 20 skills required for the solution of each problem. One hundred sixty three separate mathematical abilities were identified. This list was decreased to 42 by rejection of those abilities believed to be present in all students as well as those required by only a few problems. The decision to enter remediation for a specific ability was based upon the percentage of missed questions which required a specific ability. Students diagnosed as deficient in a particular mathematical ability were randomly divided into control and experimental groups. The experimental group received the appropriate programmed remediation. There was no significant difference between these two groups as measured by scores on two subsequent Physical Science Study Commission achievement tests. The author did not present a breakdown of student scores on groups of questions requir- ing the remediated ability. Such a breakdown would reveal whether the remediation actually improved achievement in some areas while decreasing achievement in others. Also the effect on the total test score might be so small that it is masked by other sources of variance. A test given just before remediation revealed that about half the stu— dents diagnosed to be deficient in a particular mathematical ability could perform physics tasks which require that ability. Thus, assuming the validity of this test, many students apparently were misdiagnosed. With 42 separate abilities being tested, it seems apparent that the decision to enter remediation must have been based on a small number 21 of questions. Frequently, a particular ability occurred in only one small segment of the course content; it was therefore impossible to check performance in that ability throughout the course. In the present study, we hope to avoid these two problems by defining the kinds of thinking in a manner which will yield a small number of more general abilities which occur throughout the topics of the course. 24 has reported the effect of remediation in a Larkin college physics course. The remedial instruction involved how to apply relationships in physics. In the first part of the course, a randomly selected group of students re— ceived instruction in how to identify relationships, the important characteristics of a relationship, and how to demonstrate knowledge of a relationship. Her study showed that students who received this instruction were better able to acquire an understanding of the relationships which they encountered throughout the course. 10'11 has demonstrated the As discussed earlier, Gagne effect of remediation when that remediation has been linked to a task analysis of the desired learning. The project also attempts to apply a task analysis, but the application is made on the objectives of an entire course rather than on one specific task. For this reason, the subtasks identified are stated in general terms accord- ing to the criteria mentioned earlier. We hope that pro- viding students with some general intellectual skills will improve their performance on tasks which require those skills. EXPERIMENTAL METHODS AND PROCEDURES The Instructional Setting Before describing the design and procedures which were used to answer the research questions raised in the first chapter, it is necessary to outline the format of the courses to which the project was applied. The Chemistry Department at Michigan State University has transformed the first two courses of one of its intro— ductory chemistry sequences from the traditional lecture- recitation format to a modular self-instructional mode.25 Most students who take these two courses (CEM 130 and CEM 131) are not chemistry majors but are required by their major area to take an introductory chemistry sequence. The primary instruction in these courses is contained in a series of audio cassettes and accompanying work— books. The workbooks contain diagrams and examples to which the students refer as they listen to the tape. Although the students' pace through the course is somewhat flexible, they are required to finish specified amounts of material within two week periods. The course can be described as 26-28 since students are permitted a modified mastery approach (within a designated time frame) to take examinations as often as once a day until they are satisfied with the grade that they have earned. A criterion referenced grading 22 23 scale is used so that students at any point during the course can predict their course grade from performance on course examinations. Thus, a student could make a contract (with himself) for a particular course grade and repeat each exam- ination until it is passed at a level which would translate into the grade desired. The alternate forms of examinations are generated by computer from a bank of over 4000 questions. Each question has a library number which indicates the associated unit of the course and the concept or idea which it is testing. Each fifteen item test contains a specified number of ques- tions from each of the units covered by that exam. Supplementary instruction is provided by graduate (or occasionally undergraduate) student instructors who staff a "Help Room" which is open to students during daytime and evening hours. While this system allows students to get their questions answered at any time, it does not foster the kind of student instructor contact which is necessary to identify and remedy any systematic problems that students may have. Thus, one of the goals of this project was to produce a system of diagnosis and remediation which would effectively deal with student learning problems. The Computer Management System An important part of this remedial system is the com- puter management system which identifies students who are 24 scoring poorly on a particular class of questions. This section will provide a detailed description of the technology that has been developed. A subtest score is defined as the percent correct res- ponses to a specified group of questions which are all members of the same class of questions. Each fifteen item examina- tion (an exam form) in Chemistry 130 and 131 has associated with it an exam composition index (ECI) which is a list of master file library numbers of questions chosen for that exam form. In order to create a subtest score, the classi- fication of each of the questions is first added to the ECI and stored on a disk file. The EDITOR subroutine (Appendix A) is then used to pick out all of the questions requiring the memorization of a property, one would scan the appropriate columns of the file for the MP designation and create a new file with only these questions in it. Additional file modi- fication can be done by deleting questions having undesirable characteristics. In this study, we used those questions which required only memorization for the memorization class and deleted all questions with additional or alternative classi- fications. A file of questions for the reasoning with math group was created in a similar fashion. Students mark their answers to exam questions on machine scorable answer sheets. Fill-in questions are graded by hand and the graders mark the appropriate boxes for correct and incorrect answers. A special information sheet which is 25 prepared for each exam form, indicates the exam form num- ber, the course number, and the time and date the exam was administered. In the scoring process, the information is transferred to magnetic tape and eventually to a disk file. Each record on the disk file contains the student number, the exam form number, the number of questions out of fifteen which the student has answered correctly, and the student's performance on each question. If the student answered a question correctly, a "l" is recorded in the column representing that question. If the question was answered incorrectly, the letter corresponding to the distractor chosen is recorded. Finally, a subtest score is calculated by determining the percentage of correct responses for questions of a particular class. The FORTRAN programs which have been written to perform this analysis are listed in Appendix A- The ANNOVA and FACTOR programs were written for use with the Statistical Package for the 27 Social Sciences (SPSS) subroutines and are also listed in Appendix A- The Classification Scheme The diagnostic system which has been developed is based upon a classification of examination questions in terms of the kind of thinking required to answer them. The first part of this section will describe the process by which 26 the categories were established. The characteristics of each of the categories will then be described, and finally the method of creating classes of questions based on the classification scheme will be discussed. Developing the Categories: A Task Analysis In their book on Learning System Design, Alexander, Yelon, and Davis29 describe a task analysis as a detailed description of how a particular task is to be performed. It takes into account the entry skills of the learner, the type of learning involved, and the particular condi- tions or constraints in the instructional environment which influence the learning process. The process of develop- ing categories for the present classification scheme began with this type of task analysis. From each of the questions used in Chemistry 130-131 a generalized description of the task to be performed was developed. In each case the task was described without reference to any of the chemistry content involved. As the analysis proceeded, it became apparent that a small number of these generalized task descriptions were emerging as the most important types of tasks required of students in these courses. Furthermore, the general task would appear many times throughout the various topics of each course. From each of these task 27 descriptions was developed a statement of a "kind of think- ing" which is required of students taking introductory chemistry. Six major kinds of thinking were identified, some having several subcategories. Once these kinds of thinking were identified, each ques- tion was characterized according to whether it does or does not require a particular kind of thinking. The criteria for this characterization and a description of these kinds of thinking are given in the next sections. Kinds of Thinking The various kinds of thinking required by questions typically asked in a general chemistry course are listed below. I. M-Recalling Memorized Information 1. Mp — recalling memorized properties 2. Mr - recalling memorized relationships II. C—Classification, Discrimination, Pattern Recogni- tion. Identifying an entity as a member or a nonmember of a class without being given the criteria of that identification. III. V-Visualization Forming an image of an object or set of objects which are static, do not require the recognition of color and have been seen by the student. The following designations are added to the V in the situations indicated. d - if the object(s) is dynamic c - if the color of the object(s) is required n - if the object has not been seen but has been described. 28 IV. ~Translation Twm - between words and math T l. 2. Tcw - between words and chemical symbols 3. Tom - between math and chemical symbols V. R—Reasoning The sequencing or combining of ideas to derive or evaluate new ideas. Rs - The sequencing or combining of any of the processes which have been listed in this classification. R1 - A one step reasoning process in which a single relationship is applied to known properties to determine an unknown property. R2 - A more than one step reasoning process in which a number of relationships are applied in sequence. R3 - A process which involves a series of reasoning steps which have been described to students in an algorithm. VI. Mt-Math Me - Working with numbers expressed in exponential notation or as logarithms. Ml - Manipulating linear algebraic equations. M2 - Manipulating nonlinear equations. A more detailed explanation of the criteria by which one can identify the kinds of thinking required by a question is given in the following section. Criteria and Procedures for Characterizing Questions The following is a description of the criteria which are used to decide whether or not a question requires a 29 particular kind of thinking. I. Memorization: This category includes questions which require students to recall memorized in- formation. This information can be classified into two types. 1. Properties: A property is one specific characteristic of an entity. This character- istic is specified through variables (mass, height, color) and their corresponding values (3 grams, 3 feet, red). Sample question: "What is the precision of an analytical balance?" The property of the balance which is to be specified is its precision. The value required is 0.0001 gram. Relationships: Any statement which defines thé dependence of one property on other properties is called a relationship. Sample question: "When a block of solid is dropped into an insulated beaker of liquid, the heat lost by the substance orig- inally at the higher tempera- ture is: Answer: equal to the heat gained by the substance at the lower temperature. This question requires the student to recall the relationship between the heat lost by an object and the heat gained by its surround- ings. This interpretation of the question assumes that the entity being considered is the block of solid and the important properties of that entity are heat lost and heat gained. An alternative interpretation assumes the entity under discussion is the heat lost by that object and a property of the heat lost is that it is equal to the heat gained. In this and similar cases the interpretation which leads to the classification of the information as a relationship will take precedence. II. III. 30 Classification, Discrimination, and Pattern Recog— nition The process of evaluating information for the purpose of grouping. The question must require the student to identify an entity as a member or nonmember of a class without being given, in the question, the criteria of class membership. Sample question: "Which of the following series of elements contains only non- metals?" Most students in introductory chemistry would use the position of the element on the periodic chart to determine whether each element was a member of the class of nonmetals. At an advanced stage, one is able to classify by recognizing patterns of stimuli. For instance, one can usually tell that an object is a chair without examining its properties one by one. Classification also re- quires the consideration of the properties assoc- iated with a particular class and will usually require an Mp process. The Mp designation will be assumed to be a part of the C designation, and therefore is not listed with it. Visualization Questions will be characterized as involving visualization if they require the student to form an image of a static object which has been seen and for which the recognition of color is not important. If the student must move the object in his mind, a Q will be added to indicate a dynamic object. If the question requires the student to remember color, a 2 will be added. If the question requires the student to form an image of an object which has been described but not seen by the student, an n will be added. Sample question: 1. If in a particular course students are shown samples of one mole of various substances, the question "At room temperature the volume occupied by one mole IV. 31 of water is about . . .", would be classified as V since the sample was static and its color was unimportant. 2. "A cube has how many four- fold axes of symmetry?" This question would be classified as involving Vd since it re- quires the student to rotate the object in his mind. Translation Questions which require the students to interpret from one language to another will be characterized as involving translation. There are three possible subcategories. l. Twm - translation between words and math For example: "Given a = b + c, produce a is equal to b plus c." 2. Tcw - translation between chemical symbols and words For example: "Given K + 02 = K02, write one mole of potassium reacts with one mole of ... etc." 3. Tcm - translation between chemical symbols and math For example: "Given N203 state the ratio of oxygen atoms to nitrogen atoms in the molecule." Reasoning The combining or sequencing of ideas to derive or evaluate new ideas. 1. Rs - The combining or sequencing of the kinds of thinking which have been characterized in this classification scheme. Sample question: 1. "A tetrahedron has how many three-fold axes? The complete characterization of this question would be Rs (V,Mp,Vd). The letters in parenthesis indicate the kinds of thinking being se— quenced. In this question the student must first 32 visualize a tetrahedron, then re- call the definition of a three- fold axis, and finally rotate the image to determine how many three-fold axes are present. R1 - A one step reasoning process in which a single relationship is applied to known prop- erties to determine an unknown property. Sample question: Answer: 1. "If the density of a 4 cubic centimeter block of metal is 4 g/cc. What is its mass?" This question re- quires the use of the rela- tionship between density, mass, and volume to determine the mass of an object, given its density and volume. 2. "Two metal samples each contain exactly the same chemical elements. Emission spectra show lines at exactly the same wavelengths. One can conclude: Each of the two samples contain exactly the same chemical elements." This question requires the use of the fact that each element has a unique emission spectrum to determine that two samples which produce exactly the same emission spectra must be composed of exactly the same elements. R2 - A multistep process which requires the ap- plication of two or more relationships without simply applying a learned algorithm. Sample question: 1. "What is the density of a 4 9 cube of metal with 3 centi- meter edges?" The question requires the combination of two relationships; d = m/v and v = e3. 2. "If an atom and an ion contain the same number of electrons, then they: VII. 1. 33 Answer: must be of different elements. The question involves realiz- ing that an atom and an ion have a different charge and that the charge is equal to the number of protons minus the number of electrons. Therefore the number of protons must be different. Since nuclei with different numbers of protons are of different elements, the atom and ion in question must be of different elements. In this sequence, a number of principles have been applied to the given information to determine which of the given statements is correct. R3 - A process which involves a series of reason- ing steps which have been described to students in an algorithm. Specific tasks are common and complex enough that they are often taught through the presenta- tion of an algorithm or step by step procedure. For example, one usually outlines the procedure for finding the percent composition from a molecular formula. Students are expected to solve these problems by applying the proper procedure to them. Obviously, any question to which an algorithm has been applied could be answered in the ab- sence of the algorithm by combining the necessary ideas. Thus an appropriate alternative classi- fication for these types of question is Rs. It is also important to note the characteristics of the steps in the algorithm. This will be done by including in parenthesis after the R3 designation the symbols for kinds of thinking involved. ' Math The mathematics which is required is divided into three categories. Me - scientific notation, exponents, and logarithms 34 This category will include any question which requires the manipulation of exponents or logarithms, or numbers written in scientific notation. 2. M1 - algebra with linear equations All questions requiring the manipulation of linear algebraic equations will be classified as M1. 3. M2 - algebra with quadratic or higher power equations. Sample question: "What is the wavelength of an electromagnetic wave having a frequency of 104 cycles/sec?" This question requires the solving of the equation relating wavelength to frequency (Ml)i and also the manipulation of 104 and 3 x 10 0 (Me). Using this set of criteria, one can characterize examina— tion questions according to the kinds of thinking involved. In this project, the characterization was performed by asking, "What method would be used by the majority of stu- dents in CEM 130 and CEM 131 to answer this question?" There are two important considerations. First of all, an estimation of student performance was used rather than an analysis of how a trained chemist might solve a particular problem. Second, the instructional setting was carefully considered. It was felt that the methods students use to work problems and answer questions will depend upon the information students bring to the problem. This obviously will be a function of the instruction which the students have received. The characterization of the questions asked in CEM 130- 131 would often produce important combinations or sequences 35 of kinds of thinking which could then be considered as a unique class of questions. For example, a very common se- quence is Rl with M1 (a one step application of a mathematical relationship). This combination appeared often enough that it was given a special designation (R1(Ml)) and was considered as a distinct kind of thinking. There were also questions for which there were two alternate methods of solution commonly used by students. For these questions, a slash (/) was used to indicate that a question required one kind of thinking or another kind of thinking depending upon the method a student chose to work the problem. For example, the designation Mr/Rl would be used for questions which some students answer by recalling a memorized relationship and other students answer by applying a relationship in a one step reasoning process. To help illustrate the characterization procedure, some sample questions and their classification are given in Appendix B . Validation of the Categories As discussed earlier, the process of characterizing ques- tions produces a class of questions which is used as an instrument to measure a student's ability to perform a particular kind of thinking. Whenever a new instrument is developed, it should be accompanied by evidence concerning its ability to measure performance accurately and precisely. The usual procedure is to supply data concerning the 36 reliability and validity of the test. Reliability One measure of the reliability of a test is the correla- tion between scores on two tests which attempt to measure the same thing. Often the two tests are obtained by ar- bitrarily dividing a test into two parallel forms of equal length and measuring scores on each half of the test. A very popular measure of reliability is the Kuder-Richardson Formula 21 (K.R.21)30 which essentially creates all possible parallel forms of an examination and averages the correla- tions between them. Since the examinations used in CEM 130-131 are very carefully designed to include an even distribution of ques- tions from a rather wide range of tOpics, the correlation between arbitrary split halves of the test would not be expected to be very high. The correlation between scores on two equivalent forms of an examination was therefore used as a measure of the reproducibility of the test scores and hence the reliability of the questions. In a previous study of CEM 130-131,31 thirty item examina- tions were prepared by combining two alternate forms of the usual fifteen item exams. The items were mixed thoroughly to avoid fatigue or time limit factors. Students were told that the score on each form would be computed individually and they would receive the highest of the two scores. The 37 Pearson product-moment correlation coefficient between these two scores was calculated as follows. n zxi'zyi r = Z N (1) i=1 where x.-§£ 2xi = 1 0x and Yi-Y zyi = 0 Y x and y are the scores obtained on alternate forms. When this analysis was performed on approximately thirty sets of examinations, the average correlation coefficient obtained was 0.69. In order to obtain some measure of the reliability inde- pendent of test length, the Spearman Brown formula (Equation 2) was used to estimate the reliability of these tests if they were composed of more items nr _ s rn _ (n-l)rs+l (2) This relation is used to calculate the reliability (rn) of a test with n times as many items as a shorter test of known reliability (rs). 38 Many nationally used tests of educational achievement containing over 100 items report reliabilities between 0.90 and 0.95. If the tests used in CEM 130-131 were lengthen- ed to 75 items, the calculated reliability would be 0.92 which compares favorably with standard educational achieve- ment tests. Agreement Among Classifiers Another important measure of the validity of the classi-. fication scheme is the agreement obtained when a number of individuals who are familiar with the content of the course attempt to classify questions. In one test of this inter- classifier agreement, three chemistry faculty and the author classified a group of questions independently. In another test, an undergraduate teaching assistant and the author compared their classification of questions. In both cases, the percentage agreement was calculated as the number of classifiers who agreed on the classification of a question divided by the number of classifiers and then multiplied by 100%. For example, if three of four individuals classify a question as memorization and the other as reasoning, a tally of 3 out of 4 would be assigned to the memori- zation class and no tally would be made for the reasoning class. If two of four classified the question as reason- ing and the other two as memorization, a tally of 2 out of 4 would be added to each group. If for a different 39 question, two classifiers identified the question as reason— ing and math and two identified it as memorization and math, the math category would receive a tally of 4 out of 4 and the other two categories 2 out of 4. The results of this analysis are shown in Table I. An Analysis of Correlation Coefficients Campbell and Fiske32 have described a set of procedures for the validation of tests of individual differences. The validation is based upon an analysis of the correlation between scores on tests which are supposed to measure the same trait, compared to the correlations between scores on tests which measure different traits. A "multitrait- multimethod" matrix is created which groups correlations according to the trait being measured and the method used to measure that trait. If the tests were valid, one would expect that the correlations between tests measuring the same trait would be greater than the correlations between tests measuring different traits. In the present study, an item which is characterized as requiring a particular kind of thinking can be thought of as a one item test of the student's ability to perform that kind of thinking. One would expect that the correla- tion between items of the same kind of thinking would be greater than that for items requiring different kinds of thinking. Obviously, there are many other factors control- ing student performance on a particular examination question. Table I. Interclassifier Agreement 40 Number of Kind of Identical Total Thinking Classifications Possible Percentage Memorization 176 190 92.6% Reasoning 78 82 95.1% Translation 50 66 75.8% Classification 30 40 75.0% Visualization 44 58 75.9% Math 42 54 77.8% 41 One of the most important of these is the topic from which the item was chosen. In this study, the Pearson product— moment correlations from selected fifteen item tests were separated into four groups as shown below. Group A: Coefficients between items from a given class of question which are related to the same topic. Group B: Coefficients between items from different classes taken from the same topic. Group C: Coefficients between items from the same class but from different topics. Group D: Coefficients between items from different classes and from different topics. The individual fifteen item tests were selected for this analysis if they produced an approximately equal number of Pearson product moment correlation coefficients in each of the groups listed above. For every test, an average of cor- relation coefficients for each of the four groups was ob- tained. Thus, every fifteen item test yielded four values, each being an average of correlation coefficients from Groups A through D. A description of the procedure used in calculating the coefficients, and a list of tOpics used are given in Appendix C. These values were then used as the data for an analysis of variance. For each of the six major classes of questions, the following hypotheses were tested at the a = .05 level. 42 Hypothesis 1. The average of correlation coefficients for group A will be higher than the average for group B. Hypothesis 2. The average of correlation coefficients for group C will be greater than the average for group D. Hypothesis 3. The average correlation between items from the same topic will be higher than the average correla- tion between items from different tOpics. Hypothesis 4. The average correlation between items of the same class will be greater than the average cor- relation between items from different classes. Hypotheses l and 2 were tested with a planned comparison analysis of variance contrasting group A vs. group B and group C vs. group D using the average correlation co- efficients as the input data. The analysis was performed on each of the six kinds of thinking except the category classification, since there were not enough questions in this category to produce any useful information concerning its validity. The averages, standard deviations, and the number of tests used for the four groups of correlation coefficients for each kind of thinking tested are shown in Table II. The results of the planned comparison analysis are shown in Table III. Hypothesis 1 was supported for the reasoning with math category and was not supported for the other four categories. Hypothesis 2 was supported for the reasoning with math and the memorization category but was 43 Table II. Averages and Standard Deviations for Grouped Sets of Correlation Coefficients. Number Standard Kind of Thinking of Tests Group Average Deviation Memorization 6 A .1213 .0255 B .0970 .0268 C .0970 .0266 D .0648 .0149 Visualization 6 A .1217 .0569 B .1158 .0503 c .1163 .0479 D .0763 .0420 Translation 10 A .1411 .0281 B .1472 .0245 C .1234 .0256 D .1063 .0263 Reasoning 6 A .1358 .0281 B .1151 .0339 C .1007 .0272 D .0824 .0254 Reasoning with 9 A .1729 .0366 ”at“ a .1040 .0432 C .1206 .0241 D .0586 .0160 44 Table III. Planned Comparisons of Grouped Correlation Coefficients. Kind of Thinking Contrasts T Value P Less Than Memorization A vs B 2.03 0.084 C vs D 2.69 0.012 Visualization A vs B .206 0.84 C vs D 1.40 0.15 Translation A vs B -.521 0.61 C vs D 1.46 0.15 Reasoning A vs B 1.24 0.23 C vs D 1.10 0.29 Reasoning A vs B 4.60 .001 with Math C vs D 4.14 .001 45 not supported for the other three categories. To test hypotheses 3 and 4, a two way analysis of var- iance was used. This analysis is illustrated in Figure l. Topic Same Not Same Same Group A Group C Class L.,_._1:I?t....-_s.ém§,ML.--6.179.“) B . l. 939“" ”MIMI Figure 1. A Two-way Analysis of Variance. By combining groups A and B into one group, and groups C and D into another, one can compare directly the correla- tions between items from the same topic and items from dif- ferent tOpics. The results of this analysis are then used as a test of hypothesis 3. Similarly by combining groups A and C into one group and groups B and D into another, one can compare correlations between items from the same class to correlations between items of different classes and thus perform a test of hypothesis 4. The results of this analysis are shown in Table IV. The values listed in the column labelled "Matched" are the means of correlation coefficients between items of either the same tOpic or the same class. In the "Not Matched" column are means of correlation coefficients between items from different topics or classes. .mo. u a pm Damoflwflcmflme 46 «Hoo. o.o~ Hoo. boa. mmmao gums sues «moo. m.oa omo. mma. OHQOB mascommmm «oeo. H.e moo. one. mmmfio «moo. m.~H moo. mNH. canoe mascommmm moo. nee. ems. mma. mmmao «moo. o.mH mas. sea. oeaoe genomemcmee oem. . oN.H omo. mas. mmmao How. m~.H ooo. .maa. oeoos coaemmaamome> sooo. ~m.o Hoo. mos. mmmao «moo. ~m.o Hmo. ooH. oedos coHumNHnoamz cans mama a m omnoumz uoz omnoumz abomoom new: onexceze so ones mucmfloflwmmou cofiumaouuou cmwz mucmfloflwmmou COADmHmuuou pomsono mo moaneum> mo mflm>amc< >m3|039 .>H canoe 47 As shown in this table, there is a class main effect for the categories Memorization, Reasoning and Reasoning with Math. There were no significant two-way interactions found. The Remedial System: Project CLIC Selecting the Sample During Spring Term 1976, an experimental remedial class called CLIC (Comprehensive Learning in Chemistry) was pro- vided for selected students in Chemistry 131. The students were chosen on the basis of their performance in Chemistry 130 during the previous term. Students who scored below 60 percent on tests in CEM 130 were placed in the group cor- responding to the class of questions for which they received the lowest score. Students who failed Chemistry 130 were not invited since they would not be taking Chemistry 131. Of the 120 students invited, 30 indicated that they were not going to enroll in Chemistry 131. Of the remaining 90, 70 participated in the project to some extent, 45 completed all segments of the remedial class, and 54 missed less than one of the three classes and one of the three tapes. This group of 54 students formed the sample for this study. In order to keep the sample size as large as possible, there were three bonus points (out of a possible 90) offered to students who participated in the project. This created 48 an additional incentive which most likely would not be available to students in the ongoing operation of the remedial system. The Design Students in each group were placed into control and experimental groups. The control group in each case was given remediation corresponding to the kind of thinking for which these students were lgggp deficient. That is, a control group student who was diagnosed more deficient in memorization skills would be placed in the reasoning with math class. The dependent variable used to measure the effect of remediation was the score obtained by the student on the class of questions for which he was diagnosed as being most in need of remediation. For example, of the students who scored lowest on questions requiring memoriza- tion, half would be placed in the reasoning with math class (the control group) and half would be placed in the memoriza- tion class (the experimental group). In the subsequent analysis,-only their scores on memorization questions would be examined. This design was chosen to control for the effect of students receiving additional help and indi- vidual attention. If the Eypg of remediation is important, one would expect the experimental group to perform better on the remediated kind of thinking than the control group. 49 The Treatment In Chemistry 131, the students take five examinations and a final. As described earlier, students may repeat examinations, within a specified time period until they are satisfied with the grade they have obtained. The final exam may not be repeated. If, for example, a student is satisfied with the grade he has obtained for exam 1, he will begin studying the topics covered by exam 2. When he feels prepared, he takes his first try of exam 2 and can then repeat exam 2 until he has received a satisfactory grade, or until the deadline for taking exam 2 had passed. Participants in Project CLIC were asked to follow the procedures outlined below in the order given. 1. To study the materials for each exam in their usual fashion. 2. To take one attempt at an exam. 3. To listen to a CLIC Tape. 4. To attend a CLIC Class. 5. To retake the exam until satisfied with the grade obtained. This set of procedures was to be followed for each of the first three examinations. There were no CLIC tapes or classes provided for exams four and five. Each CLIC tape begins with a discussion of the methods by which a student could improve his skill in performing a 50 particular kind of thinking. This is followed by a discussion of how the kind of thinking can be applied to the chemistry topic being discussed. A detailed outline of the material presented on each tape is given in Appendix D. The CLIC classes followed a similar outline, but more time was spent applying the kind of thinking skill to exam- ples from the context of the course. Students were permitted to ask questions and request that the instructor work problems from the study guide or from previous tests. Whenever pos- sible, the instructor would attempt to relate the answer to a student's question to the kind of thinking being remediated. Often questions were asked which related to the wrong kind of thinking. That is, a student in the memorization class would ask a reasoning with math type question. When this occurred the instructor would simply work the problem with- out relating it to a particular kind of thinking. Evaluation of the Remedial System In the evaluation of the effectiveness of the CLIC Project, two distinct factors were considered. They can best be described by restating two of the research ques- tions posed in the first chapter. Research Question b. Initial Learning "Do the remedial materials which focus on the kind of thinking required in a specific class of questions 51 covering a particular topic in chemistry improve stu- dent performance on that class of questions for that particular topic?" Research Question 0. Transfer of Training "Once students receive remediation on a particular kind of thinking as it applies to general chemistry in one topic, will there be any improvement in their performance on questions requiring the same kind of thinking but covering a different topic? That is, is there transfer of training of a particular kind of thinking to new content in the course?" The initial learning score, which is related to research question b, is defined as the percent correct responses to questions which require the kind of thinking for which the student was diagnosed in need of remediation and which were answered during a student's second and sub- sequent tries of exams 1, 2, and 3. This dependent variable measures the effect of remediation which focuses on the kind of thinking required in a specific class of question covering a particular topic in chemistry, on student per- formance on that class of questions for that particular topic. Therefore, a comparison of values of the initial learning score for experimental and control groups will provide the necessary data to answer research question b. The specific hypotheses developed for research question b are as follows: 52 Hypothesis b1. For students diagnosed in need of memorization remediation, the experimental group will score higher than the control group on the initial learning score. Hypothesis b2. For students diagnosed in need of reasoning with math remediation, the experimental group will score higher than the control group on the initial learning score. To answer research question c, we define the transfer score as the percent correct reSponses to questions which require the kind of thinking in which the student is de- ficient, and which were answered during the student's first try of exams 2 and 3, and all tries of exam 4 and 5, and the final. This dependent variable is a measure of a student's performance on a particular class of questions when new chemistry content is introduced. Therefore, an examination of values of the transfer score for experimental and control groups will provide the necessary data to answer research question c. The specific hypotheses developed from research ques- tion 0 are as follows: Hypothesis cl. For students diagnosed in need of memorization remediation, the experimental group will score higher than the control group on the transfer score . 53 Hypothesis c2. For students diagnosed in need of reasoning with math remediation, the experimental group will score higher than the control group on the transfer score. Each of the hypotheses was tested using a simple t- test with a = .05. The results of this analysis are shown in Table V. For both the memorization and the reasoning with math classes there was a significant difference be- tween the experimental and control group for the initial learning score but not for transfer score. Thus, the re- medial classes appear to be effective at the time of remedia- tion, but the learning does not appear to transfer to any new content. 54 Table V. T-test Evaluation of Project CLIC Memorization Class - Initial Learning Score Mean Initial Standard N Learning Score Deviation T-Value P Less Than Experimental 15 .741 .076 5.39 .001 Control 10 .477 .167 Memorization Class - Transfer Score Mean Standard P N Transfer Score Deviation T-Value Less Than Experimental 15 .350 .269 .04 .969 Control 10 .346 .203 Reasoning With Math Class - Initial Learning Score Mean Initial Standard P N Learning Score Deviation T-Value Less Than Experimental 14 .667 .091 4.52 .001 Control 15 .522 .082 Reasoning With Math Class - Transfer Score Mean Standard P N Transfer Score Deviation T-Value Less Than Experimental 14 .418 .124 .16 .872 Control 15 .426 .144 SUMMARY AND DISCUSSION In this chapter results of the study will be summar- ized and interpreted. The results of a test of each of the hypothesies will also be presented. Overview of the Project This project has produced and tested a unique diag- nostic remedial system which is based upon a classification _of the tasks which students are asked to perform in a general chemistry course. The classification scheme is based on the kind of thinking required by the test questions used in the course. The scheme has been evaluated by first examining the extent of agreement obtained when content experts classify questions and second by calculating inter- item correlation coefficients for questions grouped according to the categories proposed. Two classes of questions (reasoning with math and . memorization) were chosen as the basis of remediation. Remedial materials and classes were made available to students during the first half of Chemistry 131, Spring Term, 1976. The effectiveness of this remediation was measured by monitoring student performance on a specific class of questions during the term. A distinction was made between scores obtained from tests involving content discussed in remediation and scores obtained when the student was being introduced to new material. The latter 55 56 represents the effect of transfer of learning to think in a particular way to new material in the course. This transfer of training represents the ultimate goal of this type of remediation. The Validity of the Classification Scheme Reliability The reliability of the questions used in this study compares favorably with the reliability reported for standard achievement tests when adjusted for length. Since the same bank of questions is used each term to create the individual tests, questions that are misinterpreted by stu- dents and questions which tend to mislead students have been systematically removed from the file. This process has created a set of questions which have withstood a great deal of scrutiny by both students and faculty and are there- fore considered to be good tests of student achievement. It is important to remember that this test of reliability refers to the measurement of overall achievement in the course, and not to the measurement of a student's ability to perform a particular kind of thinking. The latter would be obtained by comparing scores on arbitrary halves of a group of questions of the same kind of thinking. This type of analysis was reported by Tannenbaum12 in his evaluation of science processes. In the present study, the individual 57 fifteen item tests did not provide a large enough sample of questions to permit this type of analysis to yield any useful results. Interclassifier Agreement Table VI summarizes the results obtained when content experts who are familiar with the specific nature of the course attempt to classify questions according to the proposed scheme. The agreement for the memorization and reasoning categories, which is much higher than the other categories, may well be due to the large number of questions which require these kinds of thinking. The discrepencies which did occur were usually caused by a classifier omitting a kind of thinking because it was trivial compared to another kind of thinking required by the question. This problem also accounted for most of the discrepencies which occurred for the translation, visualization and math categories. There was considerable discussion among the chemistry faculty concerning the distinction between reasoning and memorization. Questions which required a very simple application of a principle were considered by some to be memorization and by others to be reasoning. For example, a question which requires the determination of the density of an object given its mass and volume, would be classified according to the scheme as a reasoning question because it requires the student to apply the relationship between 58 Table VI. A Summary of Interclassifier Agreement. Kind of Thinking Percentage Agreement Memorization 92.6% Reasoning 95.1 Translation 75.8 Classification 75.0 Visualization 75.9 Math 77.8 59 density, mass and volume. It has been argued that this question requires only the memorization of the relation- ship and the reasoning is trivial. To resolve this problem, one might measure the correlation between these kinds of questions and questions which are definitely in the memory category and compare the result with correlations between these kinds of questions and questions which are definitely in the reasoning category. That is, let an analysis of student performance determine the category into which these types of questions would be placed. The category called "classification" proved to be extremely difficult to use because it is actually a sub- set of the reasoning category. The differences between these two categories are too subtle to yield reliable classifications. It has been suggested that the ease and reliability of question classification might be improved by asking the classifier to simply choose the one most important kind of thinking in a particular question. The "most important" kind of thinking would be defined as the kind of thinking which would be most likely to cause the student to miss the question. This kind of analysis would eliminate de- ciding whether a kind of thinking was too trivial to include in the characterization of the question but adds a decision concerning the relative importance of more than one kind of thinking when several are required by a question. The data presented in Table II (page 42) amplifies the need 60 for some type of modification of the classification scheme. It would seem appropriate to deal specifically with the translation, visualization and math categories and to consider the elimination of the category of classification. Restricting each question to only one category may also improve the usefulness of the classification scheme. We recognize, however, that it may be unreasonable to classify a question into only one category when the question clearly requires two different kinds of thinking. A further refinement of the classification scheme may be to assign weighting factors to indicate the relative importance of the contributing categories. Analysis of Correlation Coefficients A comparison of inter-item correlation coefficients has been made for questions grouped according to content and kind of thinking. The first hypotheses tested by this analysis are as follows: Hypothesis 1: The average of correlation coefficients for group A will be higher than the average for group B. Hypothesis 2. The average of correlation coefficients for group C will be higher than the average for group D. A planned comparison between the correlations among group A questions and the correlations among group B ques- tions revealed a significant difference for only the reason- ing with math category. (See Table III, page 43). Thus, 61 for questions of the same content, the fact that they are also of the same kind of thinking will significantly raise the correlation only for the reasoning with math category. For the memorization category, the average correlation for group A was substantially greater than that for group B but the difference was not significant at the a = 0.05 level. The data indicate that a subsequent analysis using a larger sample of tests would probably produce a significant dif- ference between groups A and B for the memorization category. A test of the differences in correlations for groups C and D revealed significant differences for only the mem- orization and reasoning with math categories. That is, when one compares correlations between pairs of items each from a different topic, the fact that the items are from the same kind of thinking seems to increase the correlation for the reasoning with math, and the memorization categories but not for the visualization, translation, or reasoning categories. A comparison which was not tested statistically but can be made informally is that between the average correlation coefficients for groups B and C. An examination of Table II, page 42, reveals that for most categories the differences between these two groups are relatively small. That is, the correlation between questions of the same content but dif- ferent kind of thinking are not significantly different from the correlation between questions of different content but the same kind of thinking. It was initially thought that the 62 content which a question is testing will be the most impor- tant factor controlling student performance on that question In this study the constraint of using fifteen item tests as the basis of the analysis forced a rather broad defini- tion of each content category. If a stricter delimiting of content categories were used, the content correlation would probably increase significantly. The third hypothesis of this part of the study is: Hypothesis 3. The average correlation between items from the same topic will be higher than the average correlation between items from different tOpics. This hypothesis was supported for all of the categories tested except for visualization. The distribution of visualization questions was such that relatively few cor- relation coefficients could be obtained for most tests. Thus the averages tended to fluctuate more than they did for the other categories. The data obtained for visualiza- tion are probably a result of this fluctuation. The hypothesis most directly related to the validation of the categories is hypothesis 4 which states: Hypothesis 4. For each of the six major classes of questions, the average correlation between items of the same class will be greater than the average cor- relation between items from different classes. The data shown in Table IV, page 45’support this hypothesis 63 for the categories memorization, reasoning and reasoning with math. The hypothesis is not supported for visualiza- tion and translation and was not tested for classification. Unlike the visualization category, the translation cate- gory showed a significant topic main effect but did not show a kind of thinking main effect. Thus the category translation is not validated by student performance. It may be that translation, while identifiable, is too closely related to the more prevalent categories memorization and reasoning to be distinguishable by an examination of stu- dent performance. To summarize this part of the study, we can say that questions requiring memorization tend to correlate higher with one another than they do with ques- tions requiring some other kind of thinking. This is also true for reasoning and reasoning with math questions. Evaluation of Project CLIC The categories of memorization and reasoning with math were chosen to be the basis for the experimental remedia- tion. The evaluation hypotheses are divided into two parts. The first being those related to student progress with material being learned during remediation, and the second being related to achievement when new chemistry content is introduced. These latter hypotheses have been identified as the transfer hypotheses because they relate to the student's ability to transfer what he has learned 64 about a particular kind of thinking to new content in the course. The initial learning score has been defined as the propo - tion of correct responses to questions from topics which have been discussed in the context of the kind of thinking remedia- tion. The transfer score is the proportion of correct responses to questions from topics not yet discussed during remediation. The hypotheses concerning the initial learning score are as follows. Hypothesis b1: For students diagnosed in need of memorization remediation, the experimental group will score higher than the control group on the initial learning score. Hypothesis b2: For students diagnosed in need of reasoning with math remediation, the experimental group will score higher than the control group on the initial learning score. For the memorization group, the average value of the initial learning score was 0.741 for the experimental group and 0.477 for the control group. The probability that these values are different only by chance is less than 1 in 1000. For the reasoning with math group, the values were 0.667 for the experimental group and 0.522 for the control group. This difference is also highly significant. These data indicate that students who received remediation on a 65 particular kind of thinking did better on questions requir- ing that kind of thinking than did students who received remediation on some other kind of thinking. Unfortunately these results do not prove that the kind of thinking addressed by the remediation was the determining factor. In attempting to teach the student to think in a particular way, we used specific examples from the present content of the course. It may be that the students simply learned from these examples and from the additional presentation of the related content. This would not have been as sig- nificant a problem if the two kinds of thinking were evenly distributed throughout the various topics and subtopics of the course. This, however, was not the case. Thus, the tests of initial learning for the control group probably contained less of the content discussed in remediation than the test for the experimental group. The transfer hypotheses related to the question of transfer are as follows. Hypothesis cl: For students diagnosed in need of memorization remediation, the experimental group will score higher than the control group on the transfer score . Hypgthesis c2: For students diagnosed in need of reasoning with math remediation, the experimental group will score higher than the control group on the transfer score. 66 Neither of these hypotheses has been supported by the data. For the memorization class, the average value ofthe variable was 0.350 for the experimental group and 0.346 for the control group. The difference is not significant at the a = .05 leve1.For the reasoning with math class, the value for the experimental group was 0.418 and the value for the control group was 0.426. The difference between these two values is also not significant. With this type of data, there are many alternative hypotheses which can be put forth. I will mention two and discuss each briefly. The data indicate that the remediation had some positive effect but they do not support the statement that this positive effect was the result of improving students' ability to think in a particular way. It may be that the remediation failed to teach the students very much about the kind of thinking involved but instead simply presented once again, a certain segment of the material. Since the remedial material suggested specific methods and strategies for the students to use, it was possible to determine if these methods were being used successfully. An informal check of the notebooks of five students indi— cated that three of the five were using most of the tech- niques and two were not using them at all. During the reasoning with math remediation, the instructor would often ask students to work problems using the techniques that had been discussed. Many of the students did not demonstrate 67 that they had mastered the technique of dimensional analysis which was one of the topics discussed in the class. In general, one could conclude that some of the students simply did not master the key techniques and did not learn the key concepts of the remedial material. This suggests that the remedial classes could be more effective if based on a mastery model designed so that all students mastered the basic skills and ideas of the remedial class. There would however, be practical problems involved in requiring mastery of material which would be viewed by students as supplementary to the material of the course. Another alternative hypothesis is that students who mastered the skills presented in the remedial class were unable to apply those skills when dealing with new material in the course. Learning to apply one's knowledge to new situations has always been an important goal of education. Many remedial programs in science education have been based on the idea that the mastery of certain basic skills and ideas which are applied in a discipline will improve ones learning in that discipline. This is very appealing because once the student has learned the skill, he will supposedly be able to use that skill throughout his study. What is sometimes forgotten is that knowing how to do some- thing is not exactly the same as knowing when to do some- thing. A student who has learned the technique of dimen- sional analysis, for example, may be able to apply it to a problem if dimensional analysis is fresh in his mind or 68 if he is instructed to use the technique, but may not think to apply it to a new problem encountered a week later. Successful attempts to teach a general learning skill and to demonstrate the transfer of that skill have been re- d.24 Hopefully more of this type of research will porte be forthcoming. The study presented here represents an initial step in the application of current ideas in the field of edu- cational psychology to the teaching of freshman chemistry. We obviously have a great deal to learn about the teaching and learning of chemistry at this level. IMPLICATIONS FOR FUTURE RESEARCH AND DEVELOPMENT Like many research projects, this study has created more questions than it has answered. It has also initiated the development of a unique learning laboratory for the study of the learning of chemistry at the college level. This chapter will outline the characteristics and poten- tial of this learning laboratory and will also present several proposals for a continuing research program in chemical education. A Chemical Education Laboratogy The sequence of courses to which this study was ap- plied has an average enrollment of approximately 1500 students per term. The instruction which is delivered to the students through the taped cassettes can be thought of as a very controllable and well defined experimental treatment. One knows exactly what information has been communicated through the tapes to the student. This informa— tion does not vary uncontrollably from term to term as does the information delivered via a lecture mode. The instru- ments used to measure achievement are created from a bank of questions which can also be easily controlled. Although each question is used only once during a term, it is in most cases used again in subsequent terms. By studying student performance on individual questions 69 70 or groups of related questions one can gain valuable in- formation about which concepts or ideas are being communi~ cated effectively and which parts of the course need improve- ment. Also, anytime that modifications of instructional materials are made, they can be easily tested by establish- ing an experimental group who study the new material and compare this group's achievement with that of a control group which studies the old material. In this manner, objective evidence concerning the effectiveness of instruc— tion can be routinely obtained. The computer management system needed to compile and analyze the data has been established and is presently working well, although many improvements have been prOposed. As described in Chapter 3, the students mark their answers on machine scorable answer sheets which are then processed by an optical scanner linked to a CDC 6500 computer. Student performance data is stored in disc files which in this study are analyzed by the Statistical Package for the Social Sciences subroutines. This computerized record keeping and data analysis system permits the processing of thousands of pieces of data with a relatively small expenditure of resources. Students take an average of two attempts at each exam. Including the final, there are a total of 13 exams administered during the two term sequence. Obviously, this amount of testing creates a large quantity of data in a relatively short time. Computerized record keeping also makes it easier 71 to store from term to term statistical data on each item in the question bank. Presently, only an index of difficulty, defined as the proportion of students who get the question wrong, is being stored along with the number of students who have answered the question. There is theoretically no limit to the information concerning each question which could be stored on the question file. Considering the number of students who take introductory chemistry and considering the increasing numbers of students who find the kinds of thinking required in introductory chemistry difficult, it would appear that the information to be gained from this learning laboratory would be an important contribution to chemistry instruction. Specific sugges- tions for research and development studies are outlined in the next section. The Difficulty Index As mentioned earlier, the routine processing of informa— tion includes calculating a difficulty index for each ques- tion. Preliminary studies indicate that these indices vary dramatically. There is a significant number of ques- tions which over 90 percent of the students get wrong and there is also a significant number which less than 10 percent get wrong. To this author's knowledge, most instructors using the mastery approach which requires repeated examinations, assume that their alternate exam 72 forms are of approximately the same difficulty. The pre- liminary data obtained in this study indicate that this is probably not the case. It is therefore recommended that a careful study of exam form difficulty be undertaken and, if necessary, a method be established to keep the dif— ficulty of exams to within an acceptable limit. A related area of interest is the relationship between the difficulty of a question and the kind of thinking (as defined by this study) which the question requires. An informal inquiry into this question indicates no difference in the average difficulty of the various classes of ques- tions. The memory questions for example do not appear to be any easier or any more difficult than the reasoning with math questions. This needs to be studied more carefully on a long term basis. Throughout the present research project there has been concern expressed about the effect of not taking into ac- count the inherent difficulty of a question when attempting to validate the classification scheme. It seems logical that a student who answers three difficult memory questions correctly should receive a higher score in ”memory ability" than a student who answers three easy memory questions. The details of assigning some type of weighting factor for the purposes of this analysis need to be considered care- fully. An alternative to the weighting factor would be to control the difficulty of the questions used so that the scores obtained are the result of questions of 73 approximately the same difficulty. Alternative Validity Studies There have been many suggestions made pertaining to the validity test of the proposed classification cate- gories. This section will discuss two of these plans. The analysis of correlation coefficients was performed using fifteen item examinations as the basic instrument. This produced a relatively small number of correlation co— efficients in each of the groups A, B, C, and D which in turn caused the averages to be calculated from as few as three coefficients. This has in some cases produced an unstable statistic and may account for the inability to obtain significant differences. If a substantially longer exam were given, the number of inter-item correlation co- efficients would also increase as would the stability of the average. The number of useful coefficients obtained from a test of n items is %In (n-l). It has also been pointed out that the present classifica- tion scheme permits the assignment of a question to more than one class of questions. For example, a question characterized as Rs (R1, V) would be included in both the visualization and the reasoning category even though one of the two may have no impact on performance on the question. The alternative is to classify each question into only one category, that category being the kind of thinking which 74 is most likely to cause the student to miss the question. This strategy of classification is more likely to group similar questions together and hence should improve the average correlation between questions of the same class, and may even result in a significant difference for the visualization and translation categories. Selecting the Sample A student's ability to think in a particular way is obviously only one of the factors that influence perfor- mance in Chemistry 130-131. Basic interest in chemistry and motivation to study are also very important factors. Most of the students enrolled in this sequence of courses are not chemistry majors but are taking the courses be- cause they are required to do so by their major area. In selecting those students who scored below 60 percent on a particular kind of thinking, we chose students who were for the most part in the bottom 40 percent of the class when ranked by gradepoint in CEM 130. It is quite probable that this group on the average has a lower level of interest and motivation to study chemistry than a group ranking from 40 percent to 75 percent in class average. It has been suggested that the CLIC project may have been more success- ful if applied to this latter group, since these students are likely to be more motivated and interested in the study of chemistry. If the CLIC experiment could be run with 75 both of these groups simultaneously some interesting com- parisons of achievement, improvement and participation could be made. A Piagetian Classification A theory of intellectual development which has had an impact on the teaching of chemistry at the college level is that advanced by Swiss psychologist Jean Piaget. An important aspect of Piaget's theory is his "stages of intellectual development" which are listed below. 1. Sensori-motor stage (0-2 years) 2. Preoperational stage (2-7 years) 3. Concrete separations (7-11 years) 4. Formal operations. Each stage represents a distinct set of abilities which are usually developed during the ages indicated. Piaget has designed many different tests of intellectual develop- ment which are supposed to identify in which of the four stages a person is operating. As mentioned earlier, recent applications of the tests to college students indicate that many of these students do not demonstrate formal thinking in specific situations. In simple terms, this means that the student does not deal successfully with problems re- quiring the formation and testing of hypotheses involving relationships between variables, or problems requiring 76 the control of one variable, by systematically testing all possibilities. Chemists have begun to look critically at the informa- tion concerning the intellectual development of college students and ask what effect this situation might have on the teaching of college chemistry.35:37 It is generally agreed that much of the thinking required in introductory chemistry is at the formal level. We are only beginning to sort out specifically which of the various tasks re- quired of an introductory chemistry student would not be done by someone not demonstrating formal thinking. Some initial work on this question has been reported by Herron38 who identifies tasks which he believes require formal thought but does not present any empirical evidence to support the identification. The kind of analysis employed in the present study would be ideally suited to the development and testing of a classification scheme based on Piaget's concrete and formal operations levels of intellectual development. Questions used in Chemistry 130—131 could be classified according to the Piagetian level required and a correlation analysis could be run. In addition, the score on a set of items from a chemistry test could be compared to scores on a traditional Piagetian test. If a reliable classifica- tion of general chemistry questions can be made, then a diagnosis of the Piagetial level demonstrated by students taking chemistry could be routinely obtained. 77 Furthermore, procedures for increasing the students' tendency to think at the level of formal operations could be developed in the context of a freshman chemistry course. This last development would be of extreme importance to instruction since so little presently is known about how or if a person who is not in the practice of thinking at the formal level in a particular situation, can be taught to do so. REFERENCES 12. 13. 14. 15. 16. 17. 18. 19. 20. B. S. Bloom, Taxonomy pf Educational Objectives: Cogni- tive Domain, David McKay, New York (1956T. K. V. Fast, Dissertation Abstracts, g1, 2194A (1972). P. W. Airasian, Science Education, g5, 91-95 (1970). H. V. Scott, Science Education, £1, 291-296 (1973). American Association for the Advancement of Science, Science Q Process Approach, Washington, D.C. (1964). E. L. Smith, American Educational Research Association Annual Meetings, Chicago, Illinois (1974). R. M. Gagne, Interchange, ;, 1-8 (1972). R. M. Gagne, The Essentials 9: Learning for Instruction, Dryden, Hinsdale, Illinois (1974). R. M. Gagne, Educational Psychologist, g, 1-9 (1968). R. M. Gagne, Psychological Review, gg, 355-365 (1962). R. M. Gagne and L. T. Brown, Journal g: Expegimental Psychology, 62, 313-321 (1961). R. S. Tannenbaum, Journal pf Research i3 Science Teaching, 8, 123-136 (1971). C. H. Stedman, Journal 9; Research lg Science Teaching, 12, 235-241 (1973). Jean Piaget, Journal pf Research i3 Science Teaching, 3, 176-186 (1964). H. Beilin, Piagetian Research and Mathematical Education, National Council of Teachers of Mathematics, Washington, D.C. (1970). T. A. Bredderman, Journal pf Research in Science Teach- ing, lg, 189-200 (1973). J. W. McKinnon, American Journal pf Physics, 32, 1047-52 (1971). A. E. Lawson, and J. W. Renner, Science Education, 5L 545-559 (1974) . D. Griffiths, Unpublished Ed. D. Dissertation, Rutgers University, New Brunswick, NJ (1973). F. W. Danner and M. C. Day, American Educational Re- search Association Annual Meeting, San Francisco, CA, April (1976). 78 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 79 M. R. Lawler and M. Riser, The Journal 92 Experimental Education, 55, 45-52 (1974). B. Z. Shakhashiri, Journal 2: Chemical Education, 52, 588-592 (1975). D. M. Riban, Journal of Research 52 Science Teaching, 3, 72-82 (196§T'—. ‘— J. H. Larkin, American Association of Physics Teachers Meeting, Chicago, Illinois (1975). R. N. Hammer, 167th National Meeting, American Chemical Society, Los Angeles, California, March 1974. Benjamin S. Bloom, Evaluation Comment, 2 (1968). Fred S. Keller, Journal 2: Applied Behavioral Analysis, 2 (1968), 79-89. James H. Block (ed.), Master Learning: Theory and Practice. New York. HoIt, Rinefiart and WIHSton, Inc., 1971. R. H. Davis, L. T. Alexander, and S. L. Yelon, Learning System Design, McGraw-Hill Book Company, New Yor (1974). R. L. Ebel, Essentials of Educational Measurement, Prentice-Hall,Inc.,EngIewood Cliff§, New Jersey (1972). E. Kales, Personal communication, March 1976. D. T. Campbell and D. W. Fiske, Psychological Bulletin, 55, 82 (1959). G. V. Glass and J. C. Stanley, Statistical Methods in Education and Psychology, Prentice-Hall, Inc., EnglEWood CIiffs, New Jersey (19 0). D. G. Boyle, Students' Guide 59 Piaget, Pergaman Press, New York (1969Y. J. Piaget, and B. Inhelder, The Early Growth 25 Logic in the Child, W. W. Norton and Company, Inc., New York 7'1'9697. B. S. Craig, Journal 22 Chemical Education, 45, 807 (1972). D. W. Beistel, Journal 9: Chemical Education, 52, 151 (1975). J. D. Herron, Journal 92 Chemical Education, 52, 147 (1975). APPENDICES APPENDIX A FORTRAN PROGRAMS The following are listings of the fortran programs used to perform the data analysis for this study. Each listing begins with a brief description of the function of each program. Program READ Program READ requests information from a nine-track tape prepared at the scoring center. A disc file (student record file) is created in which each record contains the student number, exam form number and an indication of the student's responses to each question. FTN. MAP(OFF) ATTACH,TAPEIZ,WAITSP76131,PW=OLIVIA. LGO. UNLOAD,TAPE12. REWIND,TAPE14. REWIND,TAPEIS. SORTMRG.CHEM131 CATALOG,TAPE69,SAVETHIS,RP=999,ID=BOB,TK=OLIVIA. REWIND,TAPE69. LISTTY,I=TAPEG9,B,NS,1-20. PROGRAM A(INPUT=64,0UTPUT=512,TAPE12=512,TAPEI4=512, 80 81 XTAPEG=512,TAPES=512,TAPEB9=64,TAPESO=512,TAPESI=512,TAP E36=512, (15). 232 XTAPEB7=512) DIMENSION JK(15),IANS(15),LD(3),IQCP(4),IACP(4,12),HOLD XIMZT(3),LTE(6),LF(3) IEND=0 LS=9 ACOUNT=0. DO 232 NMR=1,15 HOLD(NMR)=0. IQZ=0 ICHM=0 ICP=0 DO 2 JL=1,20000 C READS THE 9 TRACK TAPE FOR NEW INFORMATION. 1 READ(12,1)IT,ID,(JK(NA),NA=1,15) FORMAT(I6,I3,15R1) IF(EOF(12).NE.0)GO TO 98 IF(IT.NE.1)GO TO 68 C STUDENT NO. 1 READ AS SPECIAL INFORMATION 78 66 160 164 98 2009 100 530 283 291 506 544 ILQ=0 DO 78 IXD=1,4 IQCP(IXD)=0 DO 66 NJ=1,3 LD(NJ)=JK(NJ) REWIND 89 WRITE(89,160)IT,ID,(JK(NA),NA=1,15) FORMAT(16,I3,15R1) REWIND 89 READ(89,164)IT,ID,(JK(NA),NA=1,15) FORMAT(I6,I3,3R1,1211) LS=JK(9) IF(IQZ.EQ.0)GO TO 853 GO TO 100 IEND=1 DO 2009 NMR=1,3 LF(NMR)=LD(NMR) LN=LS IF(LN.EQ.1)GO TO 427 IECIS=0 REWIND 50 IF(IECIS.EQ.15)GO TO 952 READ(50,291)(IMZT(NMR),NMR=1,3),HMZT,(LTE(NMR),NMR=1,6) FORMAT(R1,Rl,Rl,12,I6,A10,A10,Al0,A10,A10) IF(EOF(50).NE.0)GO TO 506 GO TO 730 PRINT 544,(LF(NMR),NMR=1,3) FORMAT(* HELP130*,5X,II,11,I1) IF(IEND.EQ.1)GO TO 38 GO TO 854 82 730 IF(IMZT(1) .ED.LF(1) .AND.IMZT(2) .ED.LF(2) .AND.IMZT(3) .ED.LF(3) )804, X283 804 IECIS=IEXIIS+1 C ABZT IS THE VALUE OF THE RATIO OF THE NUMBER WRONG TO THE TOTAL (ACDU NT) ABZT=HOLD ( IECIS) /ACIIJNT IZZ=ACOUNT WRITE(36,299) (IMZT(NMR) ,NMR=1 ,3) ,HMZT, (LTE(NMR) ,NMR=1 ,6) ,IZZ,ABZT 299 FORMAT(R1,R1,R1,IZ,I6,A10,A10,A10,A10,A10,I4,F8.6) GO TO 530 427 IEKIIS=0 REWIND 51 430 IF(IE)CIS.ED.15)GO TO 952 429 RFAD(51,391) (IMZT(NMR) ,NMR=1,3) ,PHZT, (LTE(NMR) ,NMR=1,6) 391 FORMAT(R1,R1,R1,IZ,I6,A10,A10,A10,A10,A10) IF(ECF(51) .NE.0)GO TO 510 GO TO 731 510 PRINT 509,(LF(NMR) ,NMR=1,3) 509 FORMAT(* HEIPl31*,5X,Il,Il,Il) IF(IEND.EQ.1)GO TO 38 GO TO 854 731 IF(IMZT(1) .ED.I..F(1) .AND.IMZT(2) .EX).LF(2) .AND.IMZT(3) .EX).LF(3) )805, X429 805 IECIS=IECIS+1 C ABZT IS THE VALUE OF THE RATIO OF THE NUMBER WRONG '10 THE TOTAL (AmU NT) ABZT=HOLD ( IECIS) /A(IIJNT IZZ=ACDUNT WRITE(37,296) (IMZT(NMR) ,NMR=1 ,3) ,HMZT, (LTE(NMR) ,NMR=1,6) ,IZZ,ABZT 296 FORMAT(R1,R1,Rl,IZ,I6,A10,A10,A10,A10,A10,I4,F8.6) GO TO 430 952 II) 230 NMR=1,15 230 HOLD(NMR) =0. IF(IEND.EQ.1)GO 'IO 38 ACUJN'I‘=0. GO TO 854 68 IF(IT.NE.2)GO '10 81 C STUDENT NO. 2 READ AS KEY 82 II) 84 NJ=1,15 84 IANS(NJ)=JK(NJ) G0 '10 2 C S'IUDENI‘ NO. 3 READ FOR CORRECTIONS TO THE KEY 81 IF(IT.NE.3)GO ‘IO 67 ILQ=1 REWIND 89 WRITE(89,1201) IT,ID, (JK(NA) ,NA=1 ,15) 1201 FORMAT(16,I3,15R1) REWIND 89 READ(89,1202) IT,ID, (JK(NA) ,NA=1 ,15) 1202 FORMAT(16,I3,211,13R1) ICP=ICP+1 ICflP(ICP)=JK(l)*10+JK(2) 83 DO 77 ICD=3,12 77 IACP(ICP,ICD)=JK(ICD) CD ‘10 2 C CONVERI'S RESPONSES(R FORMATL A 1 IF CORRECT, A LEI'IER REPRESENTED TH E C RESPONSE CPIBEIN IF INCORRECT) 67 DO 17 NJ=1,15 IF(IANS(NJ)-JK(NJ))14,18,14 18 JK(NJ)=1R1 (I) TO 17 14 IF(IID.E0.0)GO ‘IO 71 DO 79 NA=1,4 IF(NJ.EO.IQCP(NA))GO ‘IO 39 79 CONTINUE GO TO 71 39 DO 75 III=3,12 IF(IACP(NA,III) .ED.1R9)GO 'IO 71 75 IF(JK(NJ) .EO.IACP(NA,III) )GO TO 18 71 JK(NJ)=JK(NJ)-3ZB 17 CONTINUE ACOJINT=AOIJNT+L DO 210 NMR=1,15 IF (JK(NMR) .EO.1R1)GO ‘10 210 C HOLD KEEPS TRACK OF THE NUMER WRONG PER EXAM FORM NUMBER HOLD (NMR) =HOID (NMR) +1. 210 CONTINUE IF(LS-1)94,27,27 94 WRITE(5,2047)IT,ID 2047 WT(I6,I3.3) C WRITES ON TAPE 14 FG? CHEM 130 STUDENTS WRITE(14,90) IT, (LD(NA) ,NA=1 ,3) ,ID, (JK(NA) ,NA=1,15) 90 FORMAT(I6,3R1,I3,15R1) m TO 2 27 WRITE(6,1749)IT,ID 1749 FORMAT(I6,I3.3) C WRITES ON TAPE 15 FOR CHEM 131 STUDENTS WRITE(15,91) IT, (LD(NA) ,NA=1 ,3) ,ID, (JK(NA) ,NA=1 ,15) 91 FORMT(I6,3R1,I3,15R1) GO TO 2 853 IQZ=1 854 DO 609 NMR=1,3 609 LF(INMR)=LD(NMR) LN=LS IF(ICI~M.NE.I)GO TO 126 C WRITBS ON TAPES FOR ACCESS BY TELETEST FOR CHEM 130 SCORES. WRITE(5,70)LD(1) 70 FORMAT(*S(DRE*/*SO)RE*/Rl ,* ,0*) C WRITES ON TAPEG FOR ACCESS BY TELETES‘T FOR CHEM 131 SCORES. WRITE(6,72)LD(1) 72 FORMAT(*SCI)RE*/*SCDRE*/R1 ,* ,0*) ICHM=1 126 IF(LS-1)1524,1523,1523 1524 WRITE(5,8S) (JK(NA) ,NA=4 ,8) ,LD(1) 85 FORMAT(*CD*/2I1.*.*.2I1/Rl,*.0*/*AUIO*/Rl) 84 m TO 2 1523 WRITE(6,97) (JK(NA) ,NA=4,8) ,LD(1) 97 FORMAT(*CD*/211,*.*,2I1/R1,*.0*/*AUTO*/R1) 2 CONTINUE 38 WRITE(5,74) 74 FORMAT(*E*) WRITE(6,105) 105 FORMAT(*E*) PRINT 1005,JL 1005 FORMAT(* THE LOOP IS NON AT*,I5) 98 EDDFILE 14 ENDFILE 15 EM) SORT(1,1,90) FILE (TAPE15,S,D, ,O,N) FILE(TAPEG9,0,D, ,O,N) KEY(A,C,7 ,9) REEORD(I,U,90) END SORT(1,1,90) FILE (TAPE37 ,S,D, ,O,N) FILE (TAPE69,M,D, ,O,N) FILE (TAPE47 ,O,D, ,O,N) KEY(A,C,1,5) REXZORD(I,U,90) SORT(1,1,90) FILE (TAPE15 ,S,D, ,O,N) FILE (TAPE68 ,M,D, ,O,N) FILE (TAPE60 ,O,D, ,O ,N) KEY(A,C,7 ,9) REEORD(I,U,90) EM) Program PERCENTAGES Program PERCENTAGES processes the ECI file and the student record file and calculates the proportion of questions from a specified class of questions, which the student has answered correctly. ATTACH ,TAPESl , SPMP ,PW-=OLIVIA . ATTACH ,TAPE52 , SPMR , PW=OLIVIA . ATTACH ,TAPE53 , SPMl ,PW=OLIVIA. ATTACH ,TAPE54 , SPMZ ,PW=OLIV IA . SORTMRG . SORTMRG . REWIND ,TAPEZ . REWI ND ,TAPE8 . ATTACH ,TAPE10 , FINALl 31 ,PW=OLIVIA. 85 FIN. mo. CATALCE ,TAPEl 5 ,SP76FINAL131 , ID=BOB ,RP=999 ,TK=OLIVIA. REWIND,TAPE15. COPYSBF ,TAPEIS ,OUT'PUT . SORT(2,1 ,90) FILE(TAPE51 ,S,D, ,O,N) FILE (TAPE52,S,D, ,O,N) FILE (TAPE2,0,D, ,O,N) KEY (A,C,1 ,5) RECORD(I ,U,90) END SORT(2,1 ,90) FILE (TAPE53,S,D, ,O,N) FILE (TAPE54,S,D, ,O,N) FILE (TAPES ,O,D, ,O,N) KEY(A ,C ,1 ,5) RECORD(I ,U,90) END PRmRAM B (INPUT=64 ,OUTPUT-f512 ,TAPE2=512 ,TAPE8=512 ,TAPE10=512 ,TAPE X14=512 ,TAPEl 5=512 ,TAPE20=51 2 ,TAPE80=512) DIMENSION JK(15) ,IBMA(15) ,ICMA(15) ,IHAD(15) ,IBT(15) ,AN(15) 400' ILTC=0 KN=0 KQ=0 KL=0 ILT‘=0 19 READ(10,3)ISN,K,NS, (JK(NA) ,NA=1,15) 3 EORMAT(I6,I3,I3,15A1) IF(EOF(10) .NE.0)GO TO 121 4634 BP=0. BR=0. 8&0. IF(ILT.EQ.1)GO TO 139 98 IF(KL-K)62,36,139 36 II) 5 NA=1,15 IF(IBMA(NA) .ED.0)GO TO 139 NIFIBMA(NA) BP=BP+1. IF (JK(NA) .NE.1H1)GO TO 97 101 BR=BR+1. 97 BC=BR/BP*100.+.5 5 CONTINUE 139 IBP=BP IBR=BR IBC=BC GO TO 39 17 IL'I‘=1 GO TO 139 39 WRITE(14,8) IQN,K,NS, (JK(NA) ,NA=1,15) ,IBR,IBP,IBC 8 FORMAT(I6,I3,12,15A1,I2,IZ,I3) GO TO 19 62 NB=1 DO 72 NA=1,15 86 72 IBMA(NA)=0 IF(KL.EQ.0)GO TO 78 IBMA(1)=KQ NB=2 78 DO 52 NZ=NB,16 KL=KN IF(ILTC.ED.1)GO TO 17 83 READ(2,1)KN,IBMA(NZ) 1 FCRMAT(I3,IZ) IF (EGWZ) .NE.0)GO TO 57 IF(KL.EQ.0)GO TO 52 IF(KN-KL)52,52,61 52 CONTINUE GO TO 61 57 ILT'C=1 61 KQ=IBMA(NZ) IBMA(NZ)=0 GO TO 98 121 REWIND l4 ILTC=0 ILT=0 KN=0 KQ=0 KL=0 16 READ(14,99) ISN,K,NS, (JK(NA) ,NA=1 ,15) ,IBR,IBP,IBC 99 FORMAT(I6,I3,IZ,15A1,I2,I2,I3) IF(EOF(14) .NE.0)GO TO 2027 CP=0. CR=0. CC=0. IF(ILT.EQ.1)GO TO 149 88 IF(KL—K)9,46,149 46 DO 26 NA=1,15 IF(ICMA(NA) .ED.0)GO TO 149 NL=ICMA(NA) CP=CP+1. IF(JK(NA) .NE.1Hl)GO TO 87 111 CR=CR+1. 87 CG-CR/CP*100.+.5 26 CONTINUE 149 ICP=CP ICR=CR ICC=CC GO TO 49 27 ILT=1 GO TO 149 49 WRITE (15,18)ISN,K,1NS, (JK(NA) ,NA=1 ,15) ,IBR,IBP,IBC,ICR,ICP,ICC 18 FORMAT(I6,I3,12,15A1,I2,IZ,I3,I2,12,I3) GO TO 16 9 NB=1 DO 71 NA=1,15 71 ICMA(NA)=0 IF(KL.EQ.0)GO TO 68 IONA(1)=KQ 87 NB=2 68 DO 51 NZ=NB,16 KIFKN IF(ILTC.ED.1)GO TO 27 84 READ(8,10)KN,ICMA(NZ) 1I FORMAT(I3,12) IF(EO?(8) .NE.0)GO TO 56 IF(KL.ED.0)GO TO 51 IF (KN-KL) 51 , 51 ,41 51 CONTINUE (I) TO 41 56 ILTC=1 41 KQ=I01A(NZ) IONA(NZ)=0 GO TO 88 2027 CONTINUE END Program PRINTOUT Program PRINTUUT calculates average subtest scores for all tries of each exam for each student and then a grand average subtest score for the entire term. ATTACH ,TAPE15 ,SAVEWAIT76131 ,PW=OLIVIA. PNPURGE ,PPN=IWAIT131 . ATI'ACH ,TAPE72 ,1 IWAIT131 ,PW=OLIVIA. SOKl‘MRG. CATALCB , TAPEZ , SP76STUDENTRECORIB , ID=BCB ,RP=999 ,TK=OLIVIA . REWIND,TAPE2. MAP (OFF) F'IN. IGO. REWIND,TAPE20 . COPYBF,TAPE20 ,OUI'PUT. SORT(2 ,1 ,90) FILE (TAPE15,S,D, ,O ,N) FILE (TAPE72,S,D, ,O,N) FILE (TAPEZ ,O,D, ,O,N) KEY (A ,C,l ,9) RECORD (I ,U ,90) END PRCXSRAM PR ( IWSIZ ,OJT'PU'I‘=512 ,TAPE2=512 ,TAPE20=512 ,TAPE42=512 , XTAPE4 3) DATA J,N,I ,IT,IWS*0/,ZA,ZB,ZC,ZD,AB,AR,BR,DR,ER,BB,DB,EB,SCORE, XCOUNT,WRMA ,WRMB,WRRA,WRRB/18*0./ 88 WRITE(42,1000) 1000 WT(*SW MIMBER* '5X'*M(1) *,5X,*M(1-3) *'SX,*M(4_6) *,5X,*R(1 ) X*,5X,*R(1-3)*,5X,*R(4-6)*,5X,*TOI‘AL PERCENT”) 2 JK=J IF(JK.ED.0)GO TO 1002 IF(Im.NE.0)GO TO 99 906 WRMA=WRMA+A WRMB=WRMB+B WRRA=WRRA+D WRRB=WRIB+E IF(J.NE.1)GO TO 400 IF(WEMB.EQ.0)O) TO 201 MPER=WR4A/WHNB*1 00 . +. 5 203 IF(WRIB.EQ.0)GO TO 205 IRPER=WRRA/WRRB*100.+.5 (I) TO 400 201 MPER=0 GO TO 203 205 IRPER=0 400 IF(J.GT.3)GO TO 800 IF(WRBB.EQ.0)GO TO 401 MMR=WR4A/WRMB*100.+.5 403 IF (WRRB.EQ.0)CD TO 405 IRRPER=WRRA/WRRB*1 I0 . +. 5 GO TO 1002 401 MMPER=I GO TO 403 405 IRRPER=0 GO TO 1002 800 AR=AR+A BR=BR+B DR=DR+D ER=ER+E IF(BR.EX).0)GO TO 901 LAPERM=AR/BR*100.+.5 903 IF(ER.EQ.0)GO TO 905 LAPERR=DR/ER*100.+.5 GO TO 1002 901 LAPERM=0 GO TO 903 905 LAPERR=0 GO TO 1002 99 IF(CONI'.NE.1)GO TO 980 IPER=0 GO TO 1005 980 IPER=((SO)RE-K)/((CO1NT-1.)*15.))*100.+.5 1005 WRITE(42,98) IA,MPER,MMPER,LAPERM,IRPER,IRRPER,LAPERR,IPER 98 FORMAT(4X,I6,8X,I4,6X,I4,7X,I4,6X,I4,6X,I4,7X,I4,10X,I4) WRITE(43,67) IA,MPER,W1PER,LAPERM,IRPER,IRRPER,LAPERR,IPER 67 FORMAT(16,I4,I4,I4,I4,I4,I4,I4) IF(IT.EQ.1)GO TO 100 Im=0 89 SCDRE=K OJUNT=1. WRMA=0. WRMB=0. W0. WRRB=0. AR=0. BR=0. Dk0. BR=0. GO TO 906 1002 IA=I READ(2,1)I,N,K,KA,KB,KC,A,B,C,D,E,F 1 EOH‘IAT(I6,I3,12,A5,A5,A5,F2.0,F2.0,F3.0,F2.0,F2.0,F3.0) IF (EOF(2) .NE.0)GO TO 200 SCDRE=SCDRE+K COINT=COJNT+L IAA=A IBB=B ICC=C IDD=D IEE=E IFF=F J=N/100 GO TO 51 200 IT=1 GO TO 10 51 IF(IA.EQ.0)GO TO 11 IF(IA.NE.I)GO TO 10 IF(JK.NE.J)GO TO 8 11 AB=AB+A BB=BB+B DB=IB+D EB=EB+E WRITE(20,26)I,N,K,KA,KB,KC,IAA,IBB,ICC,IO),IEE,IFF 26 FCRMAT(1X,I6,1X,I3,1X,12,1X,A5,A5,A5,I2,12,I3,12,12,I3) GO TO 2 8 IF(BB.EQ.0.)GO TO 74 APA=AB/BB* 1 0 0 . (I) TO 22 74 APA=0. 22 IF(EB.EQ.0.)CD TO 77 APB=[B/EB*100. GO TO 33 77 APB=0. 33 IPB=APB+.5 IPA=APA+.5 LA=AB =BB LD=DB LE=EB WRITE(2I,27)JK,LA,LB,IPA,LD,LE,IPB 27 FORMAT(1H+,45X,*THE TOTALS FOR EXAM*,I3,* ARE*,I4,I4,I5,I4,I4,I5) 90 ZA=ZA+AB ZB=ZB+BB ZC=ZC+lB ZD=ZD+EB WRITE(20,88)I,N,K,KA,KB,KC,IAA,IBB,ICC,III),IEE,IFF 88 FORMAT(1X,I6,1X,I3,1X,I2,1X,A5,A5,A5,I2,12,I3,I2,I2,I3) AB=0. BB=0. BB=0. BB=0. AB=AB+A BkBBi-B DB=DB+D EB=EB+E GO TO 2 10 IF(BB.EQ.0.)(I) TO 75 APA=AB/BB*1 0 0 . GO TO 23 75 APA=0. 23 IF(EB.EQ.0.)GO TO 76 APB=DB/EB*100. GO TO 34 76 APB=0. 34 IPA=APA+.5 IPB=APB+.5 LA=AB LB=BB LD=DB LE=EB WRITE(20,28)JK,LA,LB,IPA,LD,LE,IPB 28 FORMAT(1H+,45X,*THE TOTALS FOR EXAM*,I3,* ARE*,I4,I4,I5,I4,I4,IS) ZA=ZA+AB ZB=ZB+BB ZC=ZC+DB ZD=ZD+EB IF(ZB.EQ.0)GO TO 105 ZPA=ZA/ZB*1 00 . GO TO 106 105 ZPA=0. 106 IF(ZD.EQ.0.)GO TO 1I7 ZPB=ZC/ZD*1 00 . GO TO 108 107 ZP$0. 108 MV=ZA MX=ZB MY=ZC MZ=ZD IZPA=ZPA IZPB=ZPB Im=1 WRITE(20,38) MV,MX,IZPA,MY,MZ ,IZPB 38 FORMAT(46X,*THE GRAND TOTALS ARE*,6X,I4,I4,IS,I4,I4,IS,/) IF(IT.EQ.1)GO TO 2 1'. .r‘ rid" ‘3 1 91 ZA=0. ZB=0 . ZC=0 . ZD=0 . AB=0. BB=0. DB=0 . EB=0. GO TO 11 1I0 CONTINUE END READY 00.06.32 Program FACTOR Program FACIOR performs a factor analysis on specified exams and produces a matrix of inter-item correlation coefficients. HAL,SPSS,D=X. REWIND,HIIIIJT. MAP(OFF) FTN. LO). REWIND,TAPE6. COPYSBF ,‘MPE6 ,OJT‘PUT. RUN NAME FACTOR ANALYSIS FOR CEM 130 EXAM 217 DATA LIST FIXED /1 STUNUM 1-6,EFN 7-9,S 10-11,Ql TO 015 12-26 (A)/ SELECT IF (EFN ED 217) N OF CASES UNKNOAN ng Q]. m 015 (usfl'lAfl,IBI'ICIO'IDI'IIEI'OIFII'IIGII'IHII'IIIII'IJUI=0) (CONVERT) FREQJENCIES INTEGERfll TO 015 (0,1) OPTIONS 8 STATISTICS 1,5 READ INPUT DATA FACTOR VARIABLES=QI TO 01 5/T'YPE=PA2/FACSCORE/NFAC'IOFB = 3/ OPTIONS 5 STATISTICS ALL FINISH PRCBRAM A (INPUT=64 ,OJTPUT‘=112 ,m ,TAPES=BCDOJT,TAPE6=112) DIMENSION B(15,15) ,C(15,15) SD=0 XS=0 WRITE(6,3) 3 FORMAT(52X,*X VALUES* ,17X,*Z VALUES*,//) DO 50 I=1,15 50 READ(5,6) (B(I,J) ,J=1,15) 6 EORMAT(8F10.7) DO 5 NL=1,15 DO 5 NA=1,15 92 DO 5 NL=1,15 DO 5 NA=1,15 IF(NA.EQ.NL)GO TO 5 XS=XS+B (NL ,NA) 5 CONTINUE XA=XS/225. DO 8 NLF1,15 DO 8 NA=1,15 IF(NA.EQ.NL)GO TO 8 C(NL,NA) =.5*AL£I;( (1+B(NL,NA) )/(1-B (NL,NA) )) 8 CONTIMJE DO 9 NLF1,15 DO 9 NA=1,15 IF(NA.EQ.NL)GO TO 9 WRITE(6,2)B(NL,NA) ,C(NL,NA) 2 FORMAT(50X,F10.7,15X,F10.7) 9 CONTINUE WRITE(6,10)XA 10 FORMAT(//,* THE AVERAGE IS*,F10.7) EIND READY 00.17.42 Program ANOVA Program ANOVA performs a one-way and a two~way analysis of variance as well as posthoc analyses on the average correlation coefficients. HAL,SPSS. RUN NAME ANOVA AND ONEWAY VISUALIZATION VS DUN-VISUALIZATION DATA LIST FIXED /1 EXAM 1-4,ZRBAR 6-9,m 11,CL 13,W\YONE 15/ N OF CASES 24 ANOVA ZRBAR BY CO (1,2) CL (1,2)/ READ INPUT DATA ONEWXY ZRBAR BY WAYONE (l,4)/ RABBES = TUKEY/ RAMSES = SCHEFFE (.05)/ STATISTICS ALL FINISH APPENDIX B SAMPLE QUESTION CLASSIFICATIONS To help clarify the method of characterizing questions, a series of sample questions and their classifications are given below. 1. "An empty aluminum Coke can weighs 50 grams. How many moles of aluminum does one Coke can contain? (Atomic weight Al=27)" Ans. 1.85 moles Classification: R1(M1) The relationship being applied is 27 grams A1 = 1 mole. The student must find the number of moles in 50 grams by setting up and solving a linear equation. 2. "Elements which are most metallic are found in what general area of the periodic chart?" Ans. Lower left. Classification: Mr This question is included because it can easily be interpreted in two different ways. We could say that a property of the most metallic elements is that they are located in the lower left on the periodic chart. An alternative interpretation is that there is a relationship between the metallic character of an element and its position on the periodic chart. By convention, the mem- orized information is interpreted as a rela- tionship and the question is classified as Mr, 3. "Fifteen grams of nitric oxide (NO) contain how many molecules?" Ans. 3.0 x 1023 C1a331f1cation Rs (TmC,R2(M )) 1e 93 94 The R2 implies two relationships being applied to the problem. The student must realize that for N0, 1 mole = 20 grams. This step is Tmc, since the chemical symbol N0 is translated to a mathematical relationship. At this point the relationship is used to find the number of moles in 15 gram R1(M1) and finally the memorized relationship 1 mole = 6.02 x 1023 particles is used to find the number of molecules. (R1(Mle)). The e designates the use of a number written in scientific notation. Whenever the results of an R process are used in a subsequent R process, the two can be combined into one R2 process. The RS designation is used because the question requires the sequencing of the translation and reasoning steps. 4. "What is the percentage by weight of fluorine in phosphorus (III) fluoride" (PF3)? Ans. 65 Classification R3 (Tmc, R2(M1)) The question is given the classification R3 because in its associated instructional setting the step by step procedure for solving these types of problems is given to the student. The solution then becomes a matter of following the direc- tions given. Within the algorithm, the student is in- structed to translate from the chemical symbol to the mathematical relationship between the number of constituent atoms in a molecule. The number of atoms is then converted to the weight of the atoms which is then expressed as a percentage (R2(M1)). APPENDIX C CALCULATING GROUPED AVERAGE CORRELATION COEFFICIENTS The following example is included to help clarify the procedure for calculating the average correlation co- efficients for the groups of questions A, B, C, D. In this example,a 15 item test is analyzed for the reasoning category. The content of the test is divided into two topics, T1 and T2. The first step is to identify questions containing a reasoning process. In this case they are: l. Reasoning Questions: 2, 4, 6, 8, 12, 13, 14, 15 Questions which do not require reasoning are therefore: Nonreasoning Questions: 1, 3, 5, 7, 9, 10, 11 2. The questions are also classified by topic. Topic 1 Questions: 1, 2, 4, 5, 10, ll, 14, 15 Topic 2 Questions: 3, 6, 7, 8, 9, 12, 13 3. Groups of correlation coefficients are formed as follows: Group A is all coefficients between questions from the same content which require reasoning. That is, Group A = {r(K1Tx, KlTx)} where Kl = kind of thinking 1, which in this case is reasoning; and r = the correlation coefficient operator. Group A coefficients are: 95 96 r r r 2.4 r2,15 ”4,15 8,12 6,12 ”12,13 r2.14 r4,14 ”8,6 ”8,13 ”6,13 ”14,15 The values of the above correlation coefficients are then assigned to obtain an average correlation for Group A. Group B is all coefficients between pairs of questions from the same topic with only one of the pair being a reasoning question. That is, Group B = {r(K1Tx, KyTx)} y # 1 Group B coefficients are: ”1,2 ”2,5 ”4,10 ”10,14 ”3,6 ”3,13 ”7,12 ”1,4 ”2,10 ”4,10 ”10,15 ”3,8 ”6,7 ”7,13 ”1,14 ”2,11 ”5,14 ”11,14 ”7,8 r6,9 ”8,9 ”1,15 ”4,5 ”5,15 ”11,15 ”3,12 ”7,8 ”9,12 ”9,13 Group C is all coefficients between pairs of questions both of which require reasoning but each of the pair being from a different topic. That is, Group C = {r(K1 Tx'KlTy)} x # y Group C coefficients are: r r ”2,8 ”2,6 2,12 ”4,12 ”8,14 6,14 ”12,14 r13,14 ”2,13 ”4,8 ”4,6 ”4,13 ”8,15 ”6,15 r12,15 ”13,15 Group D is all coefficients between pairs of questions, only one of which requires reasoning, with each of 97 the pair being from a different topic. That is, Group D = {r(K1Tx,KyTz)} x i z, y # 1 Group D coefficients are: ”2,3 ”4,7 ”8,10 ”6,10 ”12,10 ”13,10 ”2,7 ”4,9 ”8,11 ”6,11 ”12,11 ”13,11 ”2,9 ”8,1 ”6,1 ”12,1 ”13,1 ”4,3 ”8,5 ”6,5 ”12,5 ”13,5 In this manner, one test would yield four average cor- relation coefficients, one from each of the groups A, B, C and D. A set of tests analyzed for the reasoning cate- gory would yield a set of average correlation coefficients for each group. These sets of coefficients are then used as the data for the analysis of variance. The specific topics which were chosen for this analysis are listed below. 1. CEM 130 Exam 2 Topic 1 - Crystal Structure Topic 2 - Electromagnetic Radiation Topic 3 - Structure Determination in Crystals. 2. CEM 130 Exam 3 Topic 1 - Particles and Waves Topic 2 - Emission Spectroscopy Topic 3 - Quantum Numbers 98 CEM 131 Exam 1 Topic 1 - Ideal Gases TOpic 2 - Phase Transformations CEM 131 Exam 2 Topic 1 - The Equilibrium Constant Topic 2 - Calculations Based upon the Equilibrium Law CEM 131 Exam 3 Topic 1 - Solutions Topic 2 - Concentration and Colligative Properties Topic 3 - Ionic Equilibria APPENDIX D CLIC TAPE OUTLINES CLIC Tape A-l 1. Memory Skills A. The importance of memorization B. Note taking 1. Noting definitions and examples 2. Previewing study guide questions 3.‘ The 2-5-1 format C. Reviewing your notes 1. Cueing 2. Establishing memory traces 3. Association and understanding 4. Repression - developing a good attitude 5. Self confidence 6. Timing your review II. Application to Chemistry Concepts A. Vapor pressure 1. The "gas can” example 2. Factors effecting vapor pressure B. Cooling curves 1. A related experiment 2. Heat capacity. CLIC Tape A-2 I. Review of Memory Skills A. Lecture cueing 99 II. II. 100 B. Examples C. Note taking and studying Application to Concepts of Chemistry A. Irreversible Processes 1. Definitions: reversible, irreversible 2. Examples B. Equilibria 1. Reaction rates 2. The equilibrium law 3. LeChatelier's Principle a. Changing concentration b. Changing pressure c. Changing temperature CLIC Tape A-3 Review of Memory Skills A. Using the study guide B. Lecture cues - examples C. Studying and repression Chemistry Concepts A. Solutions and Mixtures B. Concentration terms 1. Normality 2. Molarity 3. Molality 4. Weight percent 5. Saturated 6. Supersaturated II. 101 Factors which effect solubility 1. Charge density - charge to radius ratio 2. Temperature 3. Pressure Colligative properties Electrolytes CLIC Tape B-l Developing Reasoning With Math Skills A. Symbolic equations 1. Properties represented 2. Units of the variables - units conversion 3. Manipulating symbolic equations 4. Using two symbolic equations in sequence 5. Checking your mathematics B. A General approach to problem solving 1. Reading the problem, noting givens and unknowns. 2. Applying relationships to the problem 3. Setting up the solution 4. Checking the units and the math Applications A. Ideal Gas Law calculations 1. PV = nRT 2. Boyles and Charles Law problems B. Dimensional analysis applied to specific heat problems II. II. 102 CLIC Tape B-2 Problem Solving Principles A. Using symbolic equations in problem solving 1. Symbol-variables 2. Units-unit conversion 3. Manipulating the equation B. Developing a problem solving approach C. Using dimensional analysis Applications A. The Equilibrium Law 1. The symbolic equation 2. Variables and units 3. Working with initial concentrations CLIC Tape B-3 Review of Symbolic Equations A. Variables and units B. Deriving new equations C. Unit conversions D. Dimensional analysis Applications A. Colligative Properties 1. Freezing point depression 2. Boiling point elevation B. Weight percent problems C. Concentration problems and dimensional analysis 1. 2. Molarity Normality 103 ‘- fl- Irv 3" MICHIGAN STATE UNIV. LIBRARIES llHI‘WIHHIIWI”WWI!WHH‘IINI‘IHHIIW 31293010796542