DOCTORAL DISSERTATION SERIES Mb TEST ft MEASUfiE SbME i f M /NbbtWE ASPECTS bf SC fflTM mum author MEET MICE tftXSMll SUMMESTEA UNIVERSITY DEGREE M/C/t STATE Coil, £ 10. DATE / PUBLICATION NO. m I UNIVERSITY MICROFILMS ANN ARBOR • MICHIGAN . fS! COPYRIGHTED BY Mary Alice Horswill Burmester 1953 THE C O N S T R U C T I O N A N D V A L I D A T I O N TO M E A S U R E S O M E OF THE ASPECTS OF SCIENTIFIC OF A TEST INDUCTIVE THINKING BY Mary Alice Burmester A THESIS S u b m i t t e d to the G r a d u a t e S c h o o l of M i c h i g a n S t a t e C o l l e g e of A g r i c u l t u r e a n d A p p l i e d S c i e n c e In p a r t i a l f u l f i l l m e n t of the r e q u i r e m e n t s f o r the d e g r e e of D O C T O R OF E D U C A T I O N D e p a r t m e n t of E d u c a t i o n 1 9 5 1 ACKNOWLEDGMENTS The writer wishes to express appreciation for the assistance given by Dr. Victor H. Noll, thesis adviser, and to the other members of the advisory committee. She also wishes to thank Dr. Clarence H. Nelson of the Board of Examiners of Michigan State College for valuable sug­ gestions on the construction of test items and to ac ­ knowledge the cooperation of all of the members of the Department of Biological Science of Michigan State College for their aid in the study. THE CONSTRUCTION AND VALIDATION OF A TEST TO MEASURE SOME OF THE INDUCTIVE ASPECTS OF SCIENTIFIC THINKING Mary Alice Burmester AN ABSTRACT Submitted to the Graduate School of Michigan State College of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION Department of Education 1 9 Approved 5 1 The purpose of this study was to devise a valid test to measure some of the induotive aspects of the ability to think scientifically, in the area of biologioal science. The educational objectives related to scientific thinking were formulated and were defined in terms of desired behaviors involved. In all, 98 behaviors were recognized as attending the critical, as opposed to the creative, aspects of scientific thinking. Nine tryout tests, consisting of a total of 637 items were constructed to evaluate these behaviors. These tests were administered during the spring term of 1950 to 168 students taking the third term of the three-term sequence of Biological Science At Michigan State College. Item validity and item difficulty were calculated for eaoh item of the tryout tests. Test I, The Ability to Think Scientifically, constructed from discriminating items of the tryout tests, consisted of 150 items. Test I was administered in the spring of 1950 to 500 students at the end of the three-term sequence of Biological Science, and in the fall of 1950 to another group of 240 students who had had no college biology. The reliabilities of the test for the two groups were .89 and .91 respect­ ively. Because Test I proved too long, 25 of the poorer items, as identified by item analysis, were eliminated. The remainder constituted Test IA, The Ability to Think Scientifically. This test was administered in the fall of 1950 to 330 students who had had no college biology, and to 136 of these same students after completion of one term of Biological Science. The reliabilities for the two groups were .91 and .90 respect- 2 The eurrieular validity of the test was established by: 1. Designing the test items to measure the behaviors involved in soientifio thinking. 2. Submission of the tryout tests to competent judges for criticism. 3. Using free responses of students as items wherever feasible. 4. Careful selection of materials utilized in the construction of the test items. Three general methods were used in the statistical validation of the test, namely, 1. Scores made on the test of the ability to think scientific­ ally were correlated with measures of intelligence, of reading ability and of knowledge of biological facts. These correlations ranged from •33 to .51. 2. Mean scores made by students who had had no college biology were compared with mean scores made by students who had had Biological Science. The means of those having had Biological Science were signifi­ cantly higher. 3. Scores made on Test IA by 143 students were compared with ratings of these students by their instructors on their ability to think scientifically. The chi-square test, a comparison of means of students receiving superior, average and inferior ratings, and a correlation of scores on the test with the ratings all gave evidence of the statistical validity of the test. The correlation between scores on the test and the ratings of the instructors was .77 for the test when administered as a pretest, and .72 when administered as a post-test. Mary Alice Burmester candidate for the degree of Doctor of Education Final examination, May 10, 1951, 3*00 F. M. Dissertation: The Construction and Validation of a Test to Measure Some of the Inductive Aspects of Scientific Thinking Outline of Studies Major subject: Education Cognate area: Physiology Biographical Items Born, September 1, 1909, Oakland, California Undergraduate Studies, Graduate Studies, Experience: University of California, 1926-1930 University of California, 1930-1933 Michigan State College, 1946-1951 Teaching Assistant in Physiology, University of California, 1931-1934, Instructor in Biological Science, Michigan State College, 1945-1948, Assistant Professor in Biological Science, Michigan State College, 1948-1951 Member of Kappa Delta Pi, Phi Sigma, Sigma Xi TABLE OF CONTENTS CHAPTER I. PAG-E THE BACKGROUND OF THE PROBLEM .............. Introduction The problem .......................... 1 ....... , 10 Statement of the problem................. 10 ........... 11 ......... 11 ............... 12 Delimitation of the problem Basic assumptions of thestudy Importance of the study Organization of the remainder of II. 1 thethesis 13 REVIEW OF RESEARCH RELATED TO THE PROBLEM . 15 Steps and skills of scientific thinking . 15 The measurement of problem-solving abilities ........................... 22 Summary concerning tests on abilities involved in problem-solving ..... 60 Relationship between problem-solving and other abilities ........................ 63 Relation of intelligence to abilities involved in problem-solving..... ...... 63 Summary of studies concerning the relation of intelligence to problem­ solving .............................. Educability in problem-solving ...... 73 75 Summary of studies on educability in problem-solving ...................... 82 iv CHAPTER PAGE Relation of reading to abilities involved in problem-solving ......... 82 Summary of studies concerning the relation of reading ability to problem-solving ...................... 85 Relation of factual information to the abilities involved in problem-solving . 85 Summary of studies concerning the rela­ tion of knowledge of facts to problem­ solving abilities .................... 89 Summary of research related to the problem III. ............................... GENERAL PROCEDURES INVOLVED IN THE DEVELOP­ MENT OF THE TEST IV. 89 ......................... THE DEVELOPMENT OF THE TEST ITEMS ......... 92 107 The formulation of the educational objectives ............................. The definition of the behaviors 107 ......... 109 Methods used to determine the behaviors. 109 An outline of the behaviors 116 ........... The location of the source materials from which the items could be constructed .. 121 The construction of the evaluation instruments ............................ 124 Analysis of the tryout tests in terms of the behaviors involved ................ 157 V CHAPTER V. PAG-E THE STATISTICAL ANALYSES OF THE TESTS AND THE TEST ITEMS.... ........................... 144 Methods used initem-analysis .............. 144 .................... 146 Analysis of tryouttests Analysis of Test A - Some Steps in Scientific Thinking ..... . 146 Analysis of Test B - The Delimitation of Problems .......................... 148 Analysis of Test C - Experimental Procedures ....................... 150 Analysis of Test D - Organization of Data .................................. 152 Analysis of Test E - Evaluation of Hypothesis ........................... 154 Analysis of Test F - Experimentation and the Interpretation of Data ...... 155 Analysis of Test G - Drawing of Conclusions .......................... 156 Analysis of Test H - Interpretation of Data .................................. 157 Analysis of Test J - G-eneralizations and Assumptions ..................... 159 Analysis of tryout tests considered as a single test ........................ Intercorrelations oftryout test scores. 162 162 Vi CHAPTER PAGE Correlations of scores on tryout tests with scores on intelligence and read­ ing tests ............................... 171 The preparation of Test I - The Ability to Think Scientifically .............. Analyses of Test I and Test IA 173 175 Analysis of Test I - The Ability to Think Scientifically ................. 175 Analysis of Test IA - The Ability to VI. Think Scientifically ................. 183 THE VALIDATION OF THE TEST ................. 187 The curricular validation of the test The statistical validation of the test ... 188 .. 192 Validation by correlation with measures of intelligence, reading ability, and factual information ............ 192 Validation by comparison of scores of various groups ........................ 198 Validation by comparison of scores with ratings of students by competent judges VII. ................................. SUMMARY AND CONCLUSIONS Summary .................... ....... Conclusions 211 211 ................................ Educational implications 202 ................. 219 220 vii CHAPTER PACE Educational Implications for Biological Science at Michigan State College ... 220 courses in general education.... ...... 221 Educational implications for science Other educational implications ...... Problems suggested by the study ......... LITERATURE CITED APPENDIX I APPENDIX II APPENDIX I I I AFPENDIX IV ..................................... 222 222 227 .......................................... 236 ......................................... 351 ......................................................................................................... ......................................... 383 403 LIST OF TABLES TABLE I. II. III. PAGrE Behaviors Measured by the Tryout Tests Pertinent Data for Test A ... ................. 138 148 Item Analysis Data on the Seven Items of Test B which Measured Ability to Recognize Assumptions Underlying Problems ............. 149 IV. Pertinent Data for Test B ................. 150 V. Pertinent Data for Test 3 ............ 151 VI. Pertinent Data for Test D ................. 154 VII. Pertinent Data for Test S ........... 155 VIII. Pertinent Data for Test F ................. 156 IX. Pertinent Data for Test G- ................. 157 X. Pertinent Data for Test H ................. 158 XI. Pertinent Data for Test J ................. 159 XII. Comparison of Means, Standard Deviations, and Reliabilities of the Tryout Tests XIII. ., 160 Comparison of Mean Item Validities and Mean Item Difficulties of the Tryout Tests .. 161 Pertinent Data for the Tryout Test Battery. 162 XV. Intercorrelations of Tryout Test Scores 163 XVI. Intercorrelations of Tryout Test Scores XIV. Corrected forAttenuation XVII. .♦ .................. 165 Coefficients of Determination of Tryout Tests ....................................... 166 ix TABLE XVIII. PAO-E Correlation of Total Scores on Tryout Test Battery with Each of the Tryout Tests XIX. 167 Multiple Correlation of Tryout Total with Two of the Tryout Tests XX. .. .................... 169 Multiple Correlation of Tryout Tests with the Criterion - Obtained by the tfherryDoolittle Method XXI. ............................ Correlation of Tryout Test Scores with Intelligence Test and Reading Test Scores .XXII. XXIII. 170 Pertinent Data for Test I .................. 172 177 Comparison of Discrimination Indices and of Difficulty Indices of Identical Items as Obtained from Item Analysis of Tryout Tests and as Obtained from Item Analysis. XXIV. 178 Summary of Item Analysis Data for Tryout Test Items Used in Construction of Test I, Items of Test I, and Items of Test I Used in Construction of Test IA XXV.' XXVI. ................ Pertinent Data for Test IA .................. 183 185 Correlation of Tryout Test Scores and Scores on Test I with Psychological Examination Scores and Reading Test XXVII. Scores 194 Intercorrelations of Tryout Test, Psychological Examination, and Reading Test ......................................... 195 X table XXVIII. page Intercorrelation of Test I, Psychological Examination and Reading Test XXIX. .......... Intercorrelation of Total Tryout Test Scores and Scores on Other Tests XXX. 196 ..... 197 Comparison of Means and Standard Deviation of Test I for a 3-roup Before Taking Bio­ logical Science with Another Group After Taking Three Terms of Biological Science XXXI. 200 Comparison of Means and Standard Devia­ tions of Test IA on Pre-Test and PostTest XXXII. ................................ Expectancy Chart Showing the Comparison of Scores on Test IA Pre-Test and Ratings . XXXIII. 206 Expectancy Chart Showing the Comparison of Scores on Test IA Post-Test and Ratings XXXIV. 201 207 Mean Gains of Students Rated as Superior, Inferior, and Average on Test IA ..... 208 XXXV. Differences in Means and Critical Ratios of Differences between Students Rated Superior and Students Rated Average and Students Rated Average and Students Rated Inferior ......................... 209 XXXVI. Item Analysis Data for Test A .......... 246 XXXVII. Item Analysis Data for Test B ........... 256 XXXVIII. Item Analysis Data for Test C ........... 269 xi TABLE Pa GE XXXIX. Item Analysis Data for Test D ............. 279 XL. Item Analysis Data for Test E ............. 289 XLI. Item Analysis Data for Test F ............. 301 XLII. Item Analysis Data for Test G ............. 317 XLIII. Item Analysis Data for Test H ............. 340 XLIV. Item Analysis Data for Test J ............ 345 XLV. Item Analysis Data for Test I ............. 373 CHAPTER I THE BACKGROUND OF THE PROBLEM INTRODUCTION With the growth of a general education program in the secondary schools and the lower college years there has been an increased emphasis upon the acquisition of knowledge, skills, and attitudes which are required for participation in a democratic society.^ One of these skills, which has become a major objective of education, is the ability to solve problems. This objective has been stated variously by different educators. They refer to it as reflective thinking, critical thinking, clear thinking, or as scientific thinking. Although different terms are used they all refer to the kind of thinking involved in the solution of a problem. As early as 1909, Dewey scientific habits of mind. 2 advocated the teaching of He asserted then and has con­ tinued to contend^ that the problem of problems in our 1 American Council on Education, Executive Committee of the Cooperative Study in General Education, Cooperation in General Education. Washington: American Council on Education. 194-7. p. 12. ^ John Dewey, How We Think. Company. 1909. (preface). Boston: D. C. Heath and ^ John Dewey, "Method in science teaching.’* Education. 29:119-23* April, 1945. Science 2 education is to discover how to teach scientific habits of thought. Almost every major educational committee in the last twenty-five years has emphasized the importance of this instructional objective, not alone as an objective of science courses, but as an objective for general education.. Evidence for this is presented in the paragraphs that follow. Eurich, 4 in a report in the Thirty-eighth Yearbook of the National Society for Education said that there should be a "deepened desire to do something that will make education more effective than it has been in the past, largely, per­ haps, in the hope that future generations will be able to solve better such social problems as those that baffle pres­ ent-day society." The Educational Policies Commission^ in 1944 made a plea for the reorganization of the secondary schools of America. A plan was presented for the education of all American youth. The following quotation gives the broad outline of this plans Schools should be dedicated to the proposition that every youth in these United States - regardless of sex, economic status, geographic location, or race - should Alvin C. Eurich, "A renewed emphasis upon general education," in G-eneral Education in the American College. Thirty-eighth Yearbook of the National Society for the Study of Education, Part II, p. 6-7• Bloomington, Illinois: Public School Publishing Company, 1939* ^ Educational Policies Commission, Sduoation for All American Y o u t h . Washington: National Education Association. 1944. p. 21. 3 experience a broad and balanced education which will (1) equip him to enter an occupation suited to his abilities and offering reasonable opportunity for personal growth and social usefulness; (2) prepare him to assume the full responsibilities of American citizenship; (3) give him a fair chance to exercise his right to the pursuit of happiness; (4) stimulate intellectual curiosity, engender satisfaction in in­ tellectual achievement, and cultivate the ability to think rationally; and (5) help him to develop an appreciation of the ethical values which should under­ gird all life in a democratic society. It is the duty of a democratic society to provide opportunities for such education through its schools.o Further evidence that the ability to think critically is a major objective of education is supplied by the follow­ ing statement of a committee which evaluated educational objectives; "The committee believes that the ability to think reflectively and the disposition to do so in all the problem situations of life is an especially important educa­ tional o b j e c t i v e . T h i s same committee stated that this ability is ''peculiarly necessary in a democracy, where each is expected to take part in policy-making."® The importance of this objective is also emphasized in the following quotations The responsibility of secondary schools for training citizens who can think clearly has been so long and so frequently acknowledged that it is now almost taken for granted. The educational objectives classifiable under the generic heading "clear thinking" are numerous and varied as to statement, but there can be little doubt ^ Loc. c i t . ^ Progressive Education Association, Science in General Education. New York: D. Appleton-Oentury Company. 1 9 3 8 . p. 306. ® Ibid., p. 46. 4 concerning their fundamental Importance. Although in recent years there has been increasing recognition of other responsibilities and purposes, there has been little accompanying tendency to demote clear thinking to a minor role as an educational objective. It was therefore not surprising to find considerable emphasis upon this objective in the statements of purposes sub­ mitted to the Evaluation Staff by the schools partici­ pating in the Eight-Year Study.9 The Harvard Committee1^ and the President's Commission on Higher Education11 both recognized reflective thinking as a major objective of education. The much quoted report of the Harvard Committee on G-eneral Education stressed the values of reflective thinking. According to this report abilities which should be sought above all others in the general educa­ tion program are the ability to think effectively, to communi­ cate thought, to make relevant judgments, and to discriminate among values. The President’s Commission on Higher Education included the ability Mto acquire and use the skills'and habits involved in critical and constructive thinking" as one of the eleven basic objectives o*f general education. As may be seen from the above discussion the ability to solve problems is a stated objective of general education 9 Eugene R. Smith, Ralph W. Tyler and the Evaluation Staff, Appraising and Recording Student Progress. New York: Harper and Brothers. 1942. p. 35. 1(^ Harvard University, General Education in a Free Society. Cambridge: Harvard University Press. 1945. p. 65. 11 President's Commission on Higher Education, Higher Education for American Democracy. Volume I. Establishing the Goals. New York: Harper and Brothers. 1947* PP. 57-58. 5 for all subject-matter courses. stated as a major objective. For science courses it Is Problem-solving was mentioned as a specific objective of science teaching as early as 1920, when the report HReorganization of Science in Second­ ary Schools” 12 suggested ways in which science instruction could contribute to the ’’Cardinal Principles of Secondary Education." In this report it wa3 stated that useful methods of solving problems were specific values of the study of science. The development of scientific attitudes was mentioned as one of the major objectives of science teaching in the Thirty-first Yearbook of the National Society for the Study of Education.^ The Progressive Education Association lists the ability to think reflectively as one of the five broad areas of needs of adolescents.^ In "Science Education in American Schools," certain criteria were established for the formulation of objectives. The recommendations were made that objectives should be practicable for the classroom teacher. They also should be National Education Association, Reorganization of Science in Secondary Schools. U. S’. Bureau of Education Bulletin, 1920, No. 26, Washington: G-overnment Printing Office, pp. 12-15. 13 Program for Teaching Science. Thirty-first Year­ book of the National Society for the Study of Education, Part I, p. 44. Bloomington, Illinois: Public School Publishing Company, 1932. 14 A Progressive Education Association. s- o£. clt.. p. 46. 6 psychologically sound, possible of attainment, universal in a democratic society and should indicate the relationship of classroom activity to the desired changes in behavior. On the basis of these criteria the committee suggested eight categories of objectives; one of these was problem­ solving skills That problem-solving skills are still one of the major objectives of the teaching of science is attested to by the fact that the Committee on Research in Secondary School Science of the National Association for Research in Science Teaching has set as one of its major tasks the identification of some of the important problems dealing with the teaching of problem-solving. Not only is the ability to solve problems a major objective of the secondary and elementary schools; but as shown by the following examples, it is also stated as a major objective of science teaching at the college level. The Harvard r e p o r t ^ recommended that a part of the general education program in colleges be the teaching of an under­ standing of the means by which science has progressed. ^ Science Education in American Schools. Fortysixth Yearbook of the National Society for the Study of Education, Part I, pp. 19-4-0. Chicago: University of Chicago Press. 194-7. 16 Committee on Research in Secondary-School Science, "Problems related to the teaching of problem-solving that need to be investigated." Science Education. 34-: 180-184-, April, 1950. I? Harvard University, ojd. clt.. pp. 220-230. 7 Gray1® in 1931 listed "facility in application of the scientific method" as one of the objectives in the teaching of biology at the University of Chicago. In 1937, Greulack1^ in a committee report gave as one of the desired outcomes of biology teaching the development of scientific methods of thinking. To Impart knowledge of the scientific method and encourage its use in thinking were listed as major object­ ives for the biology course at the University of Minnesota. Although the ability to think scientifically has been stated as a major objective of science by almost all educa­ tors there are still many unsolved problems in regard to this objective. In fact, as one considers the list of problems presented by the Committee on Research In Secondary-School Science 21 one wonders if anything at all is known about the teaching of the scientific method. The major problem areas considered by the committee were: 1. What is the nature of problem-solving In science? 18 William S. Gray, editor, Recent Trends in American College Education. Chicago: University of Chicago Press. 1931. pp. 61-67. 19 Muskingum College, A College Looks at its Program. Columbus: The Spahr and Glen Company. 1937. pp. 139-146. 20 Ivol Spafford, editor, Building a Curriculum for General Education. Minneapolis: The University of Minnesota Press. 1943. pp. 243-261. 21 Committee on Research in Secondary-School Science, o p . clt.. pp. 180-184. 8 2. How should problem-solving be taught? 3. How should ability in problem-solving be measured? Approximately 150 problems were suggested by 53 of the members of the National Association for Research in Science Teaching who replied to a questionnaire concerning problems needing solving in the above areas. Some of the questions concerning the nature of problem-solving object­ ives in science teaching-learning situations which need to be answered and which are related directly or indirectly to the present investigation are: A. B. C. What are the specific skills and abilities necessary for successful problem-solving? 1. Is problem-solving one ability or a composite of many different abilities? 2. What are the fundamental components of the problem-solving ability? 3. What is the relationship of problem-solving ability to general intelligence? 4. Does the development of ability to solve problems depend chiefly upon thesubject matter material or upon the manner in which it is presented? What is the relationship of individual differences in the following factors to the teaching of prob­ lem-solving? 1. Ability to reason. 2. Ability to read. What techniques can be used to measure a person's problem-solving ability? 9 1. Can the several kinds of problem-solving ability be expressed in any common measure? 2. Can the several components of problem­ solving ability be appraised individually? 3. How can the validity of techniques for measuring problem-solving ability be established? Reliability?22 Almost all of the questions presented above are based on the assumption that there will be improvement in the ability to think scientifically if the teaching is directed toward that objective. But is this true? Some educators believe that the ability is an inherent one and that it does not yield to educative efforts. This point of view will be discussed more fully in Chapter II. Answers to most of the questions concerning methods of teaching scientific thinking, and the nature of scientific thinking depend upon a valid instrument to measure the ability to think scientifically. Although some tests have been devised to test certain abilities Involved in scientific thinking, there are few if any tests now available which attempt to measure all of the inductive aspects of scientific thinking; nor are there any tests especially designed to measure these aspects of thinking for a course in first year college biology. The present study is an outgrowth of an interest in writing laboratory studies for the laboratory guide used in Loc. cit. 10 Biological Science at Michigan State College which purports, among other things, to teach the student to think scientif­ ically. Early in the evaluation of the effectiveness of the laboratory studies it became evident that until some measur­ ing device for the ability to think scientifically was avail­ able no evaluation of the methods used in this laboratory guide was possible. THE PROBLEM Statement of the problem. The purpose of this study was to devise a valid test to measure some of the inductive aspects of the ability to think scientifically. The construction of test items required the identifi­ cation of skills, and steps involved in scientific thinking, and the definition of behaviors which would give evidence of the ability to perform these skills. The validation of the test required the investigation of the relationship of what­ ever was measured by the test to (a) Intelligence, (b) read­ ing ability, (c) knowledge of biology, and (d) other measures of the ability to think scientifically, as evidenced by lab­ oratory situations. In addition, it would require investiga­ tion to determine whether there was an increase in proficiency on the test after the completion of a course in biology which had as one of its major objectives the teaching of the ability to think scientifically. 11 Delimitation of the problem. The problem was limited to the construction of a test to measure the critical aspects of the inductive phases of scientific thinking. In this study the aspects of scientific thinking which were not creative activities, such as the sensing of a problem and the actual formulation of hypotheses, have been considered the critical aspects of thinking. A more detailed definition of these critical aspects of thinking and the reasons for limiting the test to the critical aspects will be discussed in Chapter IV. The reason for also limiting the test to the inductive phases was the fact that these phases of thinking were emphasized in the writing of laboratory studies for the course in Biological Science at Michigan State College. The items of the test were chosen from biological areas because the test was specifically devised for a course in first year biological science at the college level. No attempt has been made in this study to d e ­ vise items to test the ability to apply principles of biology to new situations, nor has any attempt been made to construct items to test the attitudes which are assumed to attend the ability to think scientifically, namely, the scientific atti­ tudes. ;v„ Basic assumptions of this study. The following are the major assumptions which underly this research. 1. Individuals differ in their ability to think scientifically. 2. These differences can be measured by direct 12 observation of the behavior of the Individuals, and by indirect methods such as paper and pencil tests, 3. There are a number of skills involved in scientific thinking, 4. The behaviors which attend these skills can be described with sufficient objectivity to permit the devising of valid test items. 5. A sampling of an individual's reactions will give a measure of his reactions to a much larger range of situa­ tions. 6. The Investigation of the ability to think scien­ tifically is an important area of educational research. Importance of the study. If the ability to think scientifically is an innate ability or if it is in reality 23 24 general intelligence, as some educators believe, * it is useless to attempt to attain it through the teaching of science. If, on the other hand, the ability is not innate or identical with general intelligence, as most educators believe, it should be teachable and it should be possible to determine which methods of teaching are most effective. 23 Marion L. Billings, "Problem-solving in different fields of endeavor." American Journal of Psychology. 46:259-272, April, Ben D. Wood and F. S. Beers, "Knowledge versus thinking." Teachers College Record. 3 7 J487-499, March, 1936. 13 In order to determine which of the above contrary opinions Is correct, a test for the ability to think scientifically should be available. ORGANIZATION OP THE REMAINDER OP THE THESIS In Chapter II is presented a review of the research literature related to the problem. The first area of re­ search reported is concerned with the identification of the steps involved in scientific thinking. The second portion of the review of literature is devoted to a discussion of tests which have been devised to measure various aspects of scien­ tific or critical thinking. This discussion is followed by a review of research on the relationship of various aspects of critical thinking to such factors as intelligence, reading ability and knowledge of facts. Chapter III is a discussion of the procedures involved in the development of a test designed to measure the ability to think scientifically. Chapter IV is concerned with the steps involved in the development of the test items. The objectives, their defini­ tion in terms of desired behaviors, and illustrations of test items are included in this chapter. Chapter V is concerned with the statistical analysis of the test and the test items. Item analysis data on the items of the preliminary tests and the statistical treatment of the preliminary and final forms of the test are presented. 14 Methods used to validate the test are presented In Chapter VI. Chapter VII brings together the findings of this study with the conclusions to be drawn from them. This is followed by a discussion of the problems which the study has suggested and by the educational implications of the research. CHAPTER II REVIEW OF RESEARCH RELATED TO THE PROBLEM In order to devise a test to measure the ability to think scientifically, it was necessary to determine the steps and skills involved in the use of the scientific method. Literature on this aspect of the problem is presented. This is followed by a review of tests which have been devised to measure various phases of scientific thinking. Previous work on the relation of the ability to think scientifically to various other characteristics such as intelligence, reading ability, and factual information is presented. A few studies on educability in ability to think scientifically are discussed. STEPS AND SKILLS OF SCIENTIFIC THINKING Although much of a philosophic nature has been written on scientific method and individual scientists have described their methods of solving such problems, a review of these works has not been attempted here. Instead, the emphasis was placed on research aimed at determining the nature of this method. One exception was made in the case of Dewey, since he has been quoted frequently as an authority on problem-solv­ ing. The steps of problem-solving as conceived by Dewey‘S ares ^ John Dewey, How We Think. Company. 1909. p. 72. Bostons D. C. Heath and 1. 2. 3. 4. 5. A felt difficulty. Its location and definition. Suggestion of possible solution. Development by reasoning of the bearings of the suggestion. Further observation and experimentation leading to Its acceptance or rejection. Until fairly recently little research had been done to determine the nature of the scientific method, although much has been written in the past 30 years on the desiro ability of teaching this method. Keeslar surmised that the reluctance on the part of educators to investigate the steps of the method was due, (1) to the fact that problem­ solving depends to some extent on the nature of the problem and, (2) to the tendency among researchers and writers to confuse the elements of the scientific method with scien­ tific attitudes. One of the earliest analyses of the elements of the scientific method was made by Downing^ in 1928. For his steps in scientific thinking h e drew upon illustrations from the history of science. In his list he Included elements and safeguards of the scientific method. His safeguards were, in some instances, skills involved such as; inferences must Oreon Keeslar, *’A survey of research studies dealing with the elements of scientific method as objectives of investigation in s c i e n c e . Science Education. 29? 212216, October, 1945. ^ Elliot R. Downing, MThe elements and safeguards of scientific thinking.’1 Scientific M o n t h l y . 26:231-243, March, 1928. be tested experimentally and, in other cases, attitudes such as; they were Judgment must be unprejudiced. It was 4. Keeslar*s opinion that this failure to distinguish a t t i ­ tudes from elements has led to confusion of later workers and may have prevented a clear-cut definition of scientific method. Tyler® discussed phases of scientific thinking in relation to the construction of tests to measure this abil­ ity. Davis,® LeSourd,^ Downing,® and Beauchamp^ described classroom techniques for the teaching of phases of scien­ tific thinking. Curtis*0 analyzed the foregoing discussions and also incidents in the history of science. On the basis of these analyses he presented the following characteristics of scientific method as distinct from scientific attitudes. - Keeslar, op. pit., p. 212. 5 Ralph W. Tyler, Constructing Achievement T e s t s . Columbus, Ohio: Ohio State U n i v e r s i t y . 1934. pp. 24-30. ® Ira C. Davis, "is this the scientific method?" School Science and M athematics. 34: 83-86, January, 1934. Homer W. LeSourd, "Teaching scientific method." School Science and M a t h e matics. 34; 234-235, March, 1934. ® Elliot R. Downing, "Teaching scientific method." School Science and M a t h e matics. 34; 400-405, April, 1934. ^ Wilber L. Beauchamp, "Teaching scientific method." School Science and Mathematics. 34; 508-510, May, 1934. 10 Francis D. Curtis, "Teaching scientific methods." School Science and M athematics. 34: 816-819, November, 1934. 18 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Locating problems. Making hypotheses, or generalizations from given facts or observations. Recognizing errors and defects in conditions or experiments described. Evaluating data or procedures. Evaluating conclusions in the light of facts or observations upon which they are based. Planning and making new observations to find, out whether certain conclusions are sound. Making inferences from facts and observations. Inventing check experiments. Using controls. Isolating the experimental factors. In 1937, C r o w e l l ^ prepared a list of 29 attitudes and 25 skills involved in scientific thinking. This list was derived from books and articles on philosophy, logic, science education, and science measurement. This list was presented to 64 science educators for evaluation. The skills rated as Important by 80 percent of the Judges are listed below in the 1. 2. 3. 4. 5. 6. 7. 8. order of their importance. Skill in observing accurately. Skill in recording observations accurately and orderly. Skill in forming independent Judgments based on facts. Skill in distinguishing between a fact and a theory. Skill in picking out pertinent elements from a complex situation. Skill in recognizing errors and defects in conditions and processes. Evaluating conclusions in the light of facts or observations on which they are based. Isolating the experimental factor. Victor L. Crowell, Jr. MThe scientific method.H School Science and Mathematics. 37:525-531, May, 1937. 12 Loc. cit. 19 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. Forming sound Judgments concerning adequacy of data. Synthesizing or putting together separate facts to form a conclusion. Gathering data systematically. Planning an experiment to determine whether or not a proposed hypothesis is true. Evaluating data and procedures. Recognizing omissions or deficiencies in set ups. Profiting from worthwhile criticism (an attitude?). Forming a reasonable generalization. Arranging and classifying data in sequence and making conclusions obvious. Applying general principles to a new situation. Recalling selectively items essential to a problem. Locating problems. Disregarding irrelevant facts. Directing imagination into new and worthwhile channels. Using the scientific instruments common in the laboratory• Although 23 skills were rated as Important by 80 per­ cent of the respondees, no attempt was made to organize these skills into a plan for over-all problem-solving techniques. Until 1945, when Keeslar1^ reported his study on the elements of scientific method, no adequately validated list of these elements, was available. His original list of ele­ ments of scientific method was prepared on the basis of a survey of 43 books and articles on the scientific method. This list was then presented for validation to 22 research scientists at the University of Michigan. Elements consid­ ered to be of minor importance by the Judges were eliminated from the list. ^ The 42 remaining items were considered and Keeslar, o£. c l t .. pp. 212-216. combined, and were reorganized to form a final list of 10 major and 17 minor elements set forth in the order in which they might logically be expected to occur in the solution of a problem. This list was then checked by three special­ ists in the teaching of science. The following is Keeslar's1^ list of major and minor elements of scientific thinkings: I. II. III. IV. V. VI. Sensing a problem and deciding to trv to find the answer to i t . (italics in the original) Defining the problem. (italics in the original) Stating the problem in words. Analyzing the problem into its essential factors. Studying the situation for all facts and clues bearing upon the problem. (italics in the original) Drawing upon past experience, both personal and those reported in literature, for possible explanations or generalizations to account for the phenomena observed. Making the best tentative explanations or hypotheses as to the possible solution of the problem. (italics in the original) Recognizing the assumptions which must be made If one goes beyond the known facts in formulating a hypothesis. Selecting the most likely hypothesis. the original) (italics in Inventing and carefully planning one or more experi­ ments to test the hypothesis, isolating the experi­ mental factor wherever possible by using a control, (italics in the original) Deciding upon the kinds of evidence which should be collected. 21 Choosing reliable methods of collecting the evidence. Refining measuring instrument to the degree warranted by the nature of the problem. Practicing to gain skill in manipulation in order to secure accurate results. VII. Testing the hypothesis bv carrying; out the exper­ iment with great care and accuracy. (italics in the original) Preventing, as far as possible, all uncontrolled variations in the conditions which might affect the results. Making quantitative measurement of experimental results and estimating the probable error of such measurements. Recording the results, adhering strictly to standard definitions and usage of scientific terms. Organizing the pertinent data so that they may be studied and summarized. VIII. Running check experiments involving the same exper­ imental factor to verify the results secured in the original experiment, (italics in the original) Studying the condition of the experiment in order to detect any omissions, defects, or errors, particularly those errors which might have been introduced in the experimental results by coin­ cidence or chance. Recognizing and, if possible, checking further the validity of the assumptions involved in setting up the experiment. IX. X. Drawing a conclusion. (italics in the original) Arriving at a solution to the problem based on an honest, unbiased appraisal of the data. Suspending Judgment when results are not conclusive. Galling attention in the conclusion to those basic assumptions which it has been necessary to main­ tain throughout the procedure. Making inferences based on this conclusion when facing new situations in which the same factors are operating.14 (italics in the original) 14 Keeslar, l o c . cit. 22 Keeslar1^ concluded that the elements of the scien­ tific method are definite, are distinct from attitudes, and are known and used by scientists. There was a high degree of agreement among the research scientists concerning the nature of these elements, thereby indicating that the scien­ tific method has developed beyond the introspection stage and that teaching and testing can be based upon these skills. The 46th Yearbook1^ presented a somewhat more comprehensive list of skills than Keeslar's. Apparently it waB based on Keeslar's list plus additions from various other sources. The foregoing discussion has presented a brief survey of the research which has led to a definition of scientific method. It is interesting to note that the steps conceived by Dewey1*^ in 1909, were basically the same as those derived from research in this area. THE MEASUREMENT OF PROBLEM-SOLVING ABILITIES In the last three decades a number of tests have been devised to measure various phases of scientific thinking. Some of these tests purported to measure numerous behaviors 15 Loo. clt. ^ Science Education in American S c h o o l s . Fortysixth Yearbook of the National Society for the Study of Education, Part I, pp. 145-147. Chicago: The University of Chicago Press, 1947. Dewey, ojd. c l t .. p. 72. while others were designed to m e a s u r e v e r y specific b e h a v ­ iors; such as, the ability to interpret data, to plan experiments. or the ability The following d i s c u s s i o n presents the historical sequence of the tests w h i c h h a v e b e e n d e v i s e d a n d the techniques which h ave been u s e d to a p p r a i s e the a bilities involved. As Glaser lfi has poi n t e d out, several of the ab ilities included under the concept of the a b i l i t y to think critically are, to some extent, m e a s u r e d by intelligence tests. A l t h o u g h such tests may be related in g e n e r a l to tests of scientific thinking no a t t e m p t will be made in this r e v i e w to include tests or parts of tests which pur p o r t to m e a s u r e g e n e ral in­ telligence or any of its aspects. Tests and scales h a v e been d e v i s e d to measure both the skills involved in problem-solving a n d the attitudes which attend these abilities. Some p u r p o r t e d to m e a s u r e b o t h skills and attitudes while others, which w e r e called a t t i t u d e tests, contained some of the skills involved in scientific thinking. This r e view of tests will be limited to those w h ich seem to measure skills involved in scientific thinking, a n d w ill not include tests and scales that m e a s u r e at t i t u d e s only. One of the earliest tests d e v i s e d to mea s u r e the ability 18 Edw a r d M. G-laser, A n E x p e r i m e n t in the Development of Critical T h i n k i n g . Contributions to Education, No. 843. New York: Bu r e a u of Publications, Teachers College, Columbia University. 1941. p. 73. 24 to think scientifically was published in 1918 by H e r r i n g . ^ On the basis of an analysis of the work of such men as Francis Bacon, John Stuart Mill, a n d Karl Pearson, Herring selected eleven processes which h e believed could be evalu­ ated by a test. Herring stated that all of his eleven p ro­ cesses together did not constitute the whole of the scien­ tific method, but h e did believe that they all fell within the concept. His eleven processes, expressed in terms of the abilities involved, were (1) value, (3) definition, (7) recording, (4) clarity, (2) feasibility, (5) statistics, (8) comparison, (6) relevancy, (9) classification, (10) arrangement, and (11) sufficiency. The test was devised for elementary and high school classes in geography. It contained thirty-three items of the multiple choice type. followed by twelve choices. A direction was given which was A thirteenth choice was avail­ able to indicate that none of the twelve choices were satis­ factory. The test was validated by being submitted to six judges. The judges indicated the answers they considered to be the correct ones and judged the fitness of the items as measures of the abilities which they were supposed to ^ John P. Herring, ‘'Measurement of some abilities in scientific thinking.” Journal of Educational P s y c h o l o g y . 9:535-558, December, 1918. 25 measure. given. Estimates of the reliability of the test were not An interesting point about the test was that the processes described were expressed in terms of the abilities to be measured. In 1924, Curtis20 devised a test to measure the values derived from extensive reading in general science. It was designated as an attitude test and purported to meas­ ure, (1) a conviction of the universality of cause and effect relations, (2) the habit of delayed response, (3) the habit of weighing evidence with respect to pertinence, soundness, and adequacy, and (4) respect for another's point of view. The test was comprised of 34 items; some short answer items, and some multiple choice items. No reliabilities were given for the test. Watson, 21 in 1925, published a test of fair-mindedness which purported to measure prejudice. In reality, this test probably measured much more than prejudice. The test was made up of six different types of sub-tests, some of which seemed to be measures of prejudice while others appeared to be meas­ ures of ability to think critically, A description of his 20 Francis D. Curtis, Some Values Derived from an Extensive Reading of General Science. Contributions to Education, No. 163. New York: Bureau of Publications, Teachers College, Columbia University. 1924. pp. 57-67. 21 G-oodwin B. Watson, The Measurement of Falrmindedness. Contributions to Education, No. 176. New York: Bureau of Publications, Teachers College, Columbia Univer­ sity. 1925. pp. 9-35. 26 six sub-tests follows: 1. Form A was a list of 51 words. Instructions were given to cross out annoying or distasteful words. 2. Form B presented 53 statements about religious or economic matters upon which authorities differ. Instructions were given to mark each statement as true, probably true, u n ­ certain or doubtful, probably false, or false. This type of key has probably been used more frequently in tests devised to measure abilities involved in critical thinking than any other single type of answer key. Watson*s Op test seems to be the first one in which it was used. 3. Form 0, entitled the Inference Test, presented statements of fact followed by conclusions which might be drawn from the facts. Instructions were given to check only Inferences which were certain and not to check those which were merely probable. One of the alternative answers was that no such conclusion could fairly be drawn. In each case one of the conclusions was a restatement of the data. In each case the only answers considered correct were the re­ statement of the data or the response that no conclusion could be drawn. 4. Form D was a moral judgments test. Fifteen in­ stances of behavior were presented to be judged. 5. 22 Form E was an arguments test based on the Watson, loc. c i t . 27 assumption that a person will tend to feel that all a r g u ­ ments on the other side are weak. Twelve issues were pre­ sented followed by arguments. 6. Form F, the Generalization Test, contained u n ­ warranted generalizations about groups as a whole. Sub­ jects were asked to Indicate whether the statement was true for all, most, many, few, or no individuals of the group. This test was scored on a negative basis, that is, a high score indicated that a person was not fairminded, a low score indicated that he was fairminded. The estimate of the reliability, determined by the split-half method was .96. The test was validated by: 1. Examination of the tests with reference to what they seemed to be measuring. 2. A study of the scores obtained by persons who were considered by their groups to be fairminded. This group actually had a lower average score than an unselected group (indicating fairmindedness). 3. A study of individuals who were supposed to prejudiced by persons who knew them well. 4. A study of groups who would be suspected of certain lines of prejudice. 5. Acorrelation of test scores with other test scores. Results showed almost zero correlation both with reading test scores and with intelli­ gence test s c o r e s . 23 be In the same year in which Watson described his test, 23 Watson, loc. clt. 28 oA D a i l y ^ d e s c r i b e d a t est to m e a s u r e the a b i l i t y of h i g h school pupils to sele c t e s s e n t i a l data The test was n o t a n o b j e c t i v e here b e c a u s e it seems in s o l v i n g problems. test, b u t h a s b e e n i n c l u d e d to b e one of the f i rst tests d e v i s e d to measure a student*s a b i l i t y to r e c o g n i z e insufficiency of data a nd a b i l i t y to s e l e c t p e r t i n e n t data. Eighteen short paragraphs c o n t a i n i n g d ata w e r e p r e s ented. cases the data were superfluous data. the answers were, insuf f i c i e n t ; In some in o t h e r cases there were The s t u d e n t was a s k e d to a n s w e r questions; in r eality, c o n c l u s i o n s b a s e d on the data. D a ily2 -* r e p o r t e d the r e l i a b i l i t y of the test to be .73. The r e l i a bility was e s t i m a t e d b y p r e s e n t a t i o n of the same test seven weeks a f t e r the f i r s t a d m i n i s t r a t i o n of the test. The S t a n f o r d S c i e n t i f i c A p t i t u d e t est was d e v i s e d in 1927 by Zyve 0 to s a t i s f y a n e e d for m ore a c c u r a t e g u i d a n c e of incoming college students. It h a s b e e n c a l l e d a n a p t i ­ tude test b e c a u s e Zyve c l a i m e d that it t e s t e d i n h e r e n t a b i l ­ ity of the indivi d u a l a n d n o t h i s a c h i e v e m e n t . cluded elev e n elements of s c i e n t i f i c a p t itude; The test in­ namely, (l) 24 B e n j a m i n W. Daily, The A b i l i t y of H i g h S c h o o l Pupils to S e l e c t E s s e n t i a l D ata in S o l v i n g P r o b l e m s . C o n t r i ­ bu t ions to Education, No. 190. N e w York: B u r e a u of P u b l i c a ­ tions, Teac h e r s College, C o l u m b i a U niversity. 1925. PP. 59-60, 90-96. 25 Loc. clt. 2^ D. L. Zyve, **A test of s c i e n t i f i c aptitude.*' Jo u r n a l of E d u c a t i o n a l P s y c h o l o g y . 18:525-546, N o v ember, 1927. experimental bent, (2) clarity of definition, versus snap Judgment, (4) ability to reason, (3) suspended (5) ability to detect Inconsistencies, (6) ability to detect fallacies, (7) induction, deduction and generalization, and thoroughness, (8) caution (9) discrimination of values in selecting and arranging experimental data, (10) accuracy of interpre­ tation, and (11) accuracy of observation. The estimated reliability of the test was .93. The test was validated by having two Judges rank students a c ­ cording to their aptitude for science. These rankings were compared with the rank of the students in their test perfor­ mance. The coefficient of correlation between the scores on the Stanford Scientific Aptitude Test and the ratings of the Judges was .74. The means of the test for science and engin­ eering students and for a science faculty group were consider ably higher than the means of a group of entering freshmen and non-science faculty. 27 Z y v e fs test appears to be one of the first tests to make a successful attempt to measure scientific ability. Whether the test measures innate aptitudes which it purports to measure, or whether it measures an ability which can be learned does not seem to have been investigated despite the fact that the test has been rather widely used. 30 Hoff2** devised a scientific attitude test in 1930 which included the habit of weighing evidence as one of the attitudes measured. The test was validated by fifteen expert Judges and by correlation with intelligence test scores and reading scores. tive but low. These correlations were posi­ The reliability given was .76, calculated by the split-half method. A test of scientific thinking was published by Down­ ing2^ in 1936, but had been used as early as 1931 by Strauss. 30 The test was designed to measure skill in the use of fifteen elements and safeguards involved in scientific thinking. 1. 2. 3. 4. 5. 6. 7* 8. 9. 10. 11. 12. The items were designed to test: Accuracy of observation. Ability to pick out pertinent elements from a complex situation. Ability to synthesize. Selective recall. Fertility of hypotheses. Ability to define a problem before trying to solve it. Ability to hold in mind a complex of relations. Problem-solving ability. Judgment on adequacy of data. Tendency to try to solve a problem scientifically rather than by trial and error. Tendency to suspend Judgment on moot questions. Ability to apply a rule or law. Alfred G-. Hoff, "A Test for Scientific Attitude.*1 Unpublished Master's thesis, Department of Education, University of Iowa, 1930. pp. 1-42. ^ Elliot R. Downing, ’’Some results of a test on scientific thinking.” Science Education. 20:121-128, October, 1936. Sam Strauss, "Some results of the test of scien­ tific thinking." Science Education. 16:89-93, December, 1931. 13. 14. 15. Tendency to test an hypothesis by collecting facts. Awarements of the danger of reasoning by analogy• Ability to arrange data in sequence to make the conclusion evident.51 As determined by the split-half method, the relia­ bility of the test was .99 for a group of eighth through twelfth grade students. In general, each of the abilities tested was measured by a single question. G-laser 32 has criticized this test from the point of view of sound test construction and raises serious questions concerning its reliability and validity. In 1933, W e l l e r ^ constructed a test of 21 items which was designed to measure the effectiveness of teaching of scientific thinking in the elementary schools. of items were used. Seven sets The first item of each set attempted to measure observation, the second item asked the student to draw a conclusion from simple data, and the third item asked for a proof or possible verification of the conclusion drawn. She found the reliability of this portion of her test to be .54. Noll,-^ in 1933, described a test of scientific Downing, op. c i t .. pp. 121-128. 32 G-laser, op. c l t .. p. 76. ■^5 Florence Weller, ’’Attitudes and skills in element­ ary science.” Science Education. 17s 90-97, April, 1933. ^ Victor H. Noll, The Habit of Scientific Thinking:. A Handbook for Teachers. New Yorks Bureau of Publications, Teachers College, Columbia University. 1935. pp. 18-25. 32 thinking entitled, "What do You Think?" The test was con­ structed to satisfy a need in the schools for a test to evaluate the teaching of scientific thinking. Six habits of thinking were selected as a basis for constructing the preliminary forms of the test. Each ques­ tion was intended to express a situation which was familiar to most persons, and which afforded an opportunity for scientific thinking. The preliminary form of the test in­ cluded 134- items, most of which were of the true-false type. Approximately 25 items were designed to measure each of the six habits of thinking, namely; accuracy of observation, in­ tellectual honesty, openmindedness, suspended judgment, a conviction of universal operation of the law of cause and effect, and criticism. The reliabilities of the two final forms of the test were determined in two ways. The method of split-halves corrected by the Spearman-Brown formula gave a reliability of .82 for Form I and a reliability of .92 for Form II. A correlation between the two forms of the test gave a relia­ bility of .69. N o l l ^ believed that the true reliability coefficient was probably somewhere between the highest and the lowest figures obtained. The test was validated by correlation with I.Q.'s and by the determination of item validity. ^ Noll, loc. clt. The correlation of 33 the test with I.Q.'s ranged from .30 to .41, indicating that native ability was not being tested to a large extent. Norms for grades eight through twelve were presented. In 1936 Frutchey, Tyler and H e n d r i c k s ^ reported a test to measure the ability to interpret experimental data. This report is of Interest, not because it presents the con­ struction of a complete test, but because it reports an in­ vestigation of the validity of a particular type of item. In Test I an experiment was described and the student was asked to write a conclusion. Although this is, in some ways, a very satisfactory method of evaluating a student's ability to draw conclusions, Test II was prepared. it is difficult to grade; therefore, The same experiments were used and five conclusions were selected from the free responses of students in Test I. the best conclusion. The students were instructed to select This method did not give a valid meas­ ure of a student's ability to formulate conclusions since the correlation of the scores on Test I and Test II was only .38. This same test was rendered more valid when the student was asked to check the best conclusion and the one contradicted by the data. This was designated as Test III. tion with Test I was .85. 36 It's correla­ In the final form of the test which Fred P. Frutchey, Ralph W. Tyler and B. Clifford Hendricks, "Measuring the ability to interpret experimental data." Journal of Chemical Education. 13s 62-64, February, 1936. 34 p r o v e d to b e the m o s t v a l i d , the s a m e t e s t b u t s t u d e n t s w e r e i n s t r u c t e d to m a r k e a c h items w e r e u s e d item acc o r d i n g to the f o l l o w i n g k e y : . Mark with a 1 every s t a tement which i n t e r p r e t a t i o n o f the data. is a r e a s o n a b l e M a r k with a 2 every statement which might possibly be t r u e b u t for w h i c h i n s u f f i c i e n t f a c t s a r e g i v e n to Justify the i n t e r p r e t a t i o n . M a r k w i t h a 5 e v e r y s t a t e m e n t w h i c h c a n n o t b e true b e c a u s e it is c o n t r a d i c t e d b y the r e s u l t s o b t a i n e d in the exp e r i m e n t . Love,^ in 1937, d e v i s e d a t e s t of s c i e n t i f i c a t t i ­ tudes a n d s c i e n t i f i c thinking. a n d c o n t a i n e d 24 items. The test was in t h r ee parts P a r t I d e a l t w i t h the c r i t i c i z i n g a n d p l a n n i n g of e x p e r i m e n t s ; Parts II a n d III t e s t e d the a b i l i t y to r e c o g n i z e a s s u m p t i o n s u p o n w h i c h conclusions were based. Raths,^® in 1938, uate t h i n k i n g ability. with d e s c r i b e d a t est d e s i g n e d to e v a l ­ T h e f i r s t p o r t i o n of the t e s t d e a l t I n t e r p r e t a t i o n of data. The s t u d e n t was r e q u i r e d to d e t e r m i n e the p r o b a b l e t r u t h or f a l s i t y of a ser i e s of s t a t e ­ m e n t s c o n c e r n i n g the data. T h e s e c o n d p o r t i o n of t h e test c o n t a i n e d a d e s c r i p t i o n of a s i t u a t i o n f o l l o w e d b y t h r e e c o n ­ clusions. T h e s t u d e n t s w e r e i n s t r u c t e d to c h o o s e the b e s t 37 K e n n e t h Gr. Love, ’’S c i e n t i f i c A t t i t u d e - T h i n k i n g ." E v e r y P u p i l T e s t . C o l umbus, Ohio: T h e S t a t e D e p a r t m e n t of E d u c ation. Apr i l . 1937. L o u i s E. R a t h s , " E v a l u a t i n g the p r o g r a m o f a school. E d u c a t i o n a l R e s e a r c h B u l l e t i n . 1 7 : 3 7 - 8 4 , M a r c h , 1938. 35 conclusion. The conclusions were followed by a Beries of reasons which could be used to explain why the conclusion was chosen. The students were instructed to indicate the reasons they had chosen a particular conclusion. The third portion of the test presented a situation and a conclusion based upon the situation. These were followed by a series of statements some of which were assumptions. The student was instructed to check the assumptions and to indicate those upon which the conclusion was based. He was then required to organize a proof for the conclusion using the assumptions and data. No reliabilities for the test were given. The tests devised by the evaluation staff of the Eight-Year Study were published in 1938, and were described in detail by Smith and T y l e r ^ in 1942. The Eight-Year 4q Study was planned to implement broad objectives of educa­ tion in the secondary schools without regard to college entrance requirements. The experiment was confined to thirty selected secondary schools throught the United States. Stud­ ents from these schools were admitted to colleges on the basis of recommendation by the principal of the school and 39 Eugene R. Smith, Ralph W. Tyler, and the Evaluation Staff, Appraising and Recording Student Progress. New York: Harper & Brothers. 1942. pp. 5-15^• ^ Wilford M. Aikin, The Story of the Eight-Year Study. New York: Harper and Brothers. 1942. pp. 12-24. 36 not on the basis of college entrance requirements or exam­ inations. Extensive studies of objectives and means of evaluation of objectives were, among other things, a part of this project. The behaviors which were to be measured by the tests were defined by committees composed of the members of the evaluation staff of the Eight-Year S t u d y ^ and representatives from each school interested in the ob­ jectives being measured. Two of the objectives related to the present study were, the ability to interpret data, and the ability to understand the nature of proof. The earlier forms of the interpretation of data tests were intended primarily for use in the senior high school. Ten sets of data, presented in various forms including prose, graphs, tables, and charts were each followed by 15 state­ ments. The students were instructed to evaluate each of these on the basis of the following keys (1) are sufficient to make the statement true. (2) are sufficient to indicate that the statement is probably true. (3) are not sufficient to indicate whether there is any degree of truth or falsity in the statement. (4) are sufficient to indicate that the statement is probably false. 42 (5) are sufficient to make the statement false. 41 42 Smith and Tyler, o£. c i t . . pp. 3-156. Ibid., p. 52. 37 In the early history of the development of tests to measure the ability to Interpret data, tests were devised for specific subject matter fields. However, the evalua­ tion staff believed that the behaviors involved in these tests were not essentially different so a single measuring instrument was constructed. In all, nine forms have been used; the last two, Interpretation of Data Test Form 2.51 and 2.52, have been prepared as alternate forms. Forms 2.71 and 2.72 were prepared for use in the Junior high schools. The answers to the test items were validated by the Judgment of a group of experts in the field and by prelimin­ ary tryouts on groups of students. The method of scoring these tests is of considerable interest. The tests were scored four separate times to give the following scores: 1. General accuracy score was the total number of answers which agreed with the answers of the Jury of experts. This score was expressed as the percent of the maximum possible number of correct responses. 2. The ’’going beyond data” score was calculated by determining the number of times a student considered a statement to be true which the Jury had considered only probably true, or probably true when the Jury had con­ sidered it as insufficient data, etc. 3. The ’’caution” score indicated the extent to which a student marked statements keyed true as prob­ ably true; keyed probably true as insufficient data,etc. 4. The ’’crude error” score was obtained by deter­ mining the extent to which students marked items in contradiction to the data.43 ^ Ibid., pp. 54-55. 38 The tests on interpretation of data were validated; (a) by comparing the behaviors demanded of students in the test with the behaviors defined in the statement of object­ ives to be measured, (b) by selecting data which were of the type which students encounter in textbooks, and (c) by studying the distribution and means of scoreB made by stud­ ents in various grades of school. grade levels. The means increased with Another method used in the validation of the tests was the comparison of test scores with essay responses on the same data. The reliabilities of the various types of scores on Form 2.52 of the test computed by use of the Kuder-Richardson formula ranged from .81 to .95. score was the most reliable. The general accuracy The split-halves method of est­ imating reliability was used for Form 2.51. ranged from .86 to .92. Reliabilities Comparisons of the two forms yielded reliability coefficients of from .65 to .85»^4 Another of the tests devised by the Evaluation Staff of the Eight-Year Study4^ was the HNature of Proof.w This test was devised to measure the ability of students to locate and appraise the basic assumptions upon which the proof of a statement depended. by a conclusion. 44 A paragraph containing data was followed Following this were 14 statements, some of Ibid.. pp. 65-76. 45 Ibid.. pp. 128-154. which w e r e a s s u m p t i o n s u n d e r l y i n g t h e a r g u m e n t . first p a r t of the test, In the t h e s t u d e n t w a s a s k e d to d e c i d e which s t a t e m e n t s w e r e r e l e v a n t to the c o n c l u s i o n a n d to m ar k t h e m as e i t h e r clusion. s u p p o r t i n g o r c o n t r a d i c t i n g the c o n ­ In the s e c o n d p a r t o f the a s k e d to test, the s t u d e n t was i n d i c a t e w h i c h o f t h e s t a t e m e n t s m a r k e d as s u p ­ porting the c o n c l u s i o n h e w o u l d c h a l l e n g e . part of the test, In the t h i r d the s t u d e n t w a s i n s t r u c t e d to c h o o s e o ne of three s t a t e d c o n c l u s i o n s . In t h e f o u r t h part, ent was a s k e d to s e l e c t a c t i v i t i e s w h i c h m i g h t be u s e f u l in the s o l u t i o n of a p r o b l e m r e l a t e d to the p r e v i o u s sions. In p a r t f i v e the s t u d ­ conclu­ the s t u d e n t was d i r e c t e d to i n d i c a t e w h i c h of these a c t i v i t i e s could be situation. of the v a r i o u s p a r t s c o r e s on the Reliabilities test r a n g e d f r o m T wo .20 to c a r r i e d o u t in a s c h o o l .82. interesting types of i t e m s d e v i s e d to m e a s u r e critical t h i n k i n g in a s c i e n c e c o u r s e h a v e b e e n d e s c r i b e d b y Hart. 46 A s t a t e m e n t o f a s i t u a t i o n was p r e s e n t e d . was f o l l o w e d b y a n u m b e r e d s e r i e s of o b s e r v a t i o n s . hypotheses were then p r e s e n t e d and the s t u d e n t w a s This Five instructed to list b y n u m b e r s a l l of the o b s e r v a t i o n s w h i c h s u p p o r t e d each of the h y p o t h e s e s . H e was a l s o i n s t r u c t e d to lis t by n u m b e r a l l o f the f a c t s w h i c h w e a k e n e d e a c h h y p o t h e s e s . 46 h Then E. H. H a r t , M e a s u r i n g c r i t i c a l t h i n k i n g in a science course." C a l i f o r n i a J o u r n a l of S e c o n d a r y E d u c a t i o n . 1 4:334-338, O c t o b e r , 1939. 40 the most valid hypothesis was to be checked. The second type of Item described was similar to the above but an hypothesis was chosen by the student before the data were presented. The data were used to support, weaken, or elim­ inate the hypothesis. No data on validity or reliability of tests composed of such items were given. 47 In 1940, Gans described a test used in a study of critical reading comprehension. The test was devised to measure ability to recognize problems and to solve problems by critical selection and rejection of data. containing problems were presented. had as its foils three problems. Paragraphs An item followed which The student was asked to determine which problem had been presented in the paragraph. The problem item was followed by a series of paragraphs con­ taining facts which were directly related, indirectly related, or unrelated to the problem. The student was instructed to mark each paragraph according as to whether it did or did not aid in the solution of the problem. These paragraphs were followed by a three-choice item asking for the major problem under consideration. This was followed by single statements of facts taken from the paragraphs previously presented. Again, the student was requested to indicate whether the fact Roma G-ans, A Study of Critical Reading Comprehen­ sion in the Intermediate Grades. Contributions to Education, No. 811. New York: Bureau of Publications, Teachers College, Columbia University. 1940. pp. 59-89. ^ 41 helped or did not help In the solution of a problem. In addition, he was asked to Judge the truth or falsity of each of these statements. The test was scored as five subtests. ities of the subtests ranged from .6? to .90. The reliabil­ The total test reliability was not given. 48 Engelhart and Lewis, in 1941, described a 23 item portion of a pretest for a physical science survey course at Chicago City Junior College. These 23 items were designed to measure scientific thinking. In an introductory paragraph the terms hypothesis and conclusion were defined. An experi­ mental situation was described, a problem was stated, and the following key was presented: "Below are given a series of hypotheses, each of which is followed by numbered items which represent data. After each item number on the answer sheet blacken space A. if the item directly helps to prove the hypothesis true. B. if the item indirectly helps to prove the hypothesis true. C. if the item directly helps to prove the hypothesis false. D. if the item indirectly helps to prove the hypothesis false. E. if the item neither.directly nor Indirectly helps to prove the hypothesis true or false."49 48 Max D. Engelhart and Hugh B. Lewis, "An attempt to measure scientific thinking.1' Educational and Psychological Measurement. 1:289-294, Third Quarter, 1941. 4q Loc. cit. 42 Three hypotheses were presented, each hypothesis was followed by five statements of fact. These statements con­ stituted the items, which were marked by the above key. These items constituted 15 items of the test. The student was directed to Judge each hypothesis as to its truth or falsity. test. These judgments constituted three items of the Following these 18 items five conclusions were given. Each conclusion was to be judged either the best, the worst, or neither best nor worst. The items of this test proved to be quite discrimin­ ating, the range of correlations of items with the total score on the 23 items being from .17 to .61. The reliability of the test was estimated to be .72 by means of the KuderRlchardson formula. The Watson-G-laser Tests of Critical Thinking were 50 described by Glaser in 1941. These tests were designed to appraise some of the abilities involved in critical thinking. They were, in effect, an extensive revision of Watson's tests of fair-mindedness. All of the tests were validated by 15 judges. Test A, A Survey of Opinions, was devised primarily to show the extent of a person's consistency of opinion. The test-retest reliability was .88; the correlation between scores on Section I of the test and Section II of the test 50 Edward M. G-laser, op. c i t .. pp. 87-92. 43 was .85. Test B, the General Logical Reasoning Test, was de­ signed to measure the ability to think in accord with the rules of logic. The test-retest coefficient of reliability was given as .82. Test C, the Inference Test, was designed to measure ability to Judge the probable truth or falsity and the rele­ vance of inferences drawn from given facts. The persons taking the test were instructed to determine whether the con­ clusions drawn were true, probably true, false, probably false, or questionable. The test was validated by the fact that the test significantly distinguished between two groups. of students Judged by their teachers to be either superior or inferior in ability to think logically. Test-retest reli­ ability was found to be .86. Te3t D, the Generalization Test, was substantially the same as the one of the same name devised by Watson and dis­ cussed earlier in this review. The reliability of this test was reported as .88. Test E, the Discrimination of Arguments, was also sub­ stantially the same as the arguments test of Watson's earlier edition. The reliability given for this test was .76. Test F, the Evaluation of Arguments Test, was a new test in the series. Each test item consisted of a paragraph followed by three alternative conclusions, only one of which was logical on the basis of the data presented in the 44 paragraph. Following the conclusions six reasons were listed, one of which explained why the correct conclusion was the logical one. The testee was instructed to check the reason explaining his conclusion. The test-retest co­ efficient of reliability for this test was .83«^1 52 Fleming, in 1942, described a test used in his analysis of outcomes of a course in biological science. A portion of the test was devoted to the measurement of the ability to think scientifically. had been chosen from course. parts; The items for the test examinations given previously in the This portion of the test was divided into four Part A was designed to measure the recognition of steps in problem solving, Part B was an evaluation of state­ ments with reference to a problem, Part G was designed to measure the ability to evaluate inferences, and Part D was the selection of data pertinent to the solution of a problem situation. The tests were described but no test items were included in Fleming's dissertation. The reliability of this portion of the test was not given. A test designed to measure a student's ability to Judge conclusions was constructed by H i g g i n s ^ in 1942 to 51 G-laser, loc. cit. 52 Maurice G. Fleming, "An Analytical Study of Certain Outcomes of a Course for Orientation in Biological Science at College Level. Unpublished Doctor's thesis, Department of Education, New York University, 1942. Appendix. Conwell D. Higgins, "Educability of Adolescents in Inductive Ability." Unpublished Doctor's thesis. Department of Education, New York University, 1942. pp. 36-40, 133-137. 5 3 45 evaluate educability in inductive ability. Twelve experi­ ments were described; each experiment was followed by a series of conclusions which constituted the items. There were a total of 97 items which had been selected from free responses of students. The testees were instructed to deter­ mine whether the conclusions were complete, incomplete, based on insufficient data, or false. The test was validated by agreement of four judges as to the correct answers to the items. The estimate of reliability, as determined by the split-half method, was .90. Ter Keurst and Bugbee,-^ in 1943, published a test by which the authors claim ”teachers or students can check them­ selves on the understanding of the methodology of science.” The test consists of a series of four-choice items which pur­ port to measure knowledge of skills, attitude, and terminology of scientific method. not behavior. The test seemed to test knowledge but However, it is of interest to note, that it apparently had a certain degree of validity. The test was administered to a group of students who had been named as the five best and the five worst students in science classes in respect to their ability to think scientifically. The crit­ ical ratio of the difference of the means of these two groups was 5.01. The test was also validated by opinion of experts. ^ Arthur J. Ter Keurst and Robert E. Bugbee, "A test on scientific method.” Journal of Educational Research. 36:489-501, March, 1943. 46 The estimate of the reliability by means of the split-half method was .82. A very interesting test in two forms entitled, ”Do You Think Straight?” was described by J o h n s o n , i n 1943. The test was designed to measure the relation of reflective thinking to ability in debating and discussion. Because her test was an attempt to overcome some of the inadequacies of earlier tests her criticisms of existing tests are presented here: These tests, though useful, appear to be inadequate for the diagnosis and measurement of the process ( ital­ ics in the original) of reflective thinking. Each test is deficient on two or more of the following counts: 1. It breaks the process of reflective thinking into what may be superficially (italics in the original) distinct and uncoordinated units. 2. Even in measuring such units, the following factors or steps are not considered:. a. The formulation of a problem. b. The analysis into major variables. c. The determination of criteria and application of them to the evaluation of possible solutions. d. The construction and comparison of hypotheses. 3. It deals with a great variety of problems - each item relating to a different problem, in most tests whereas the need in actual life situations (and the need in discussion and other forms of public speak­ ing) is to think through (Italics in the original) a particular problem. 4. It emphasizes the logic of intentional (italics in the original) reasoning - the discrimination among formally valid and invalid conclusions and "reasons*1 for conclusions - rather than the logic of construct ive (italics in the original) reasoning or scientific 55 Alma Johnson, ”An experimental study in the analy sis and measurement of reflective thinking.” Speech Mono­ graphs. 10: 83-96, Annual, 1943. discovery. In fact, those tests which require the subject to check a conclusion and then to check reasons for his choice appear to be measuring little except expertness in "rationalizing." 5o Johnson's tests were constructed on the assumptions that; (1) Dewey's steps were a correct description of the thought process, and (2) there were discoverable and observ­ able obstacles to reflective thinking. C7 Forms A and B, were each designed around a single problem. Section I of each test was an attempt to measure attitudes about the problem. problems were presented. In Section II, ten subsidiary The student was instructed to num­ ber these in order of their usefulness as starting points in the solution of the overall problem. Section III presented four groups of questions, each composed of three subordinate questions; the most important one to be checked. In Section IV, data were presented which might aid in the solution of the four major questions posed in Section III. These data were followed by statements which the student was Instructed to mark as being true, probably true, insufficient data, probably false, or false. In Section V, ten syllogisms or pseudo-syllogisms were presented; followed by conclusions which the students were instructed to mark as sound or un­ sound. The students were instructed to rank the six solutions to the overall problem. This constituted Section VI of the 56 Ibid.. p. 85. Ibid.. pp. 83-96 48 test. Section VII r equired the m atching of advantages and disadvantages, which were summaries of statements of infor­ mation given throughout the test, with the three best solu­ tions of Section VI. Section VII, In the final section of the test, the student was instructed to classify each of ten conclusions as critical, uncritical, hypercritical, or dogmatic. Johnson stated that there was inherent v a lidity in the test since it was patterned after Dewey's steps of thinking and since the syllogism test followed the rules of logic. In addition, however, the test was validated by 15 experts in the fields of logic and scientific method. also found that scores of students She Judged superior in the abilities involved in reflective thinking were hig h e r than those Judged as average, and those higher than those Judged average scored Judged as Inferior. She also cited an increase in scores in college grade levels as evidence for the validity of the test. These increases in scores with college grade levels were at the 5 percent level of signifi­ cance or better. The estimate of reliability, determined by c o rrelat­ ing the scores made on the two forms of the test, was 82 * .02. The scores on the attitude portion of the test were not included in the total test scores. A portion of the test, which was used to appraise methods of teaching scientific method, was designed by 49 Thelen 58 to measure an understanding of experimental design. The purpose of an experiment was given; this was followed by conditions of the experiment and statements about the experimental material. The student was instructed to indi­ cate which factor or factors were to be varied, which were to be fixed, which might be assumed to be negligible, which were irrelevant, and which factors the student did not understand. In all, there were 60 such items. No reliabil­ ity was given nor was any evidence concerning the validity of the test presented. In 1944, R a t h s ^ devised the "Ohio Thinking Checkup,*' a thinking test for students in the third, fourth, and fifth grades. Twelve problem situations were presented. Each problem was followed by eight statements which the students were instructed to mark as true, false, or questionable. Items were devised to reveal nine types of errors in think­ ing; namely, 1. 2. 3. 4. 3. 6. 7. Interpretation through personal Judgment. Evading of issue by name-calling or ridicule. Leaning on authority. Believing in superstition. Generalizing from insufficient evidence. Rationalizing or misinterpreting data. Calling either-or statements true. CQ Herbert A. Thelen, "An Appraisal of Two Methods for Teaching Scientific Method in General Chemistry." Unpublished Doctor's thesis, Department of Education, University of Chicago, 1944. pp. 365-369. ^ Louis E. Raths, "A thinking test." Research B u l l e t i n . 23*72-75, March, 1944. Educational 50 8. 9. Galling if-then atatements true. Leaning on school loyalty. The reported reliabilities of the tests as deter­ mined by the method of matched halves were .89 for the fourth grade, .91 for the fifth grade, and .93 for the sixth grade. Grant and Meder, in 1944, suggested a type of item to evaluate reasoning ability. A statement was presented followed by six reasons for agreeing with the statement and six reasons for disagreeing. The student was instructed to check valid reasons from either or both lists and then to decide whether he agreed or disagreed with the statement. In 1944, reports of the high-school and the college chemistry tests for the armed forces were published. In each of these tests one section was devoted to items de­ signed to measure abilities involved in scientific thinking. Ashford,^1 in reporting on the college test, listed six of these abilities which were to be measured. Items were devised to test the ability to (1) distinguish between observed phenomena and their theoretical explanation, explain phenomena in terms of theory, mental evidence for a theory, (2) (3) give the experi­ (4) identify the assumptions Charlotte L. Grant and Elsa M. Meder, ‘'Some evaluation instruments for biology students." Science Education. 28:106-110, March, 1944. Theodore A. Ashford, "The college chemistry test in the Armed Forces Institute." Journal of Chemical Educa­ tion. 21:386-392, August, 1944. 51 necessary for a given conclusion, (5) identify the factor that must he controlled in an experiment, and (6) identify statements which are true merely by definition. The test was prepared in two forms; one for the armed forces, one for civilian use. Hered and T h e l e n ^ devised a similar test for use at the high-school level. Single items were devised to measure each of the abilities which they had considered to be important in scientific thinking. The reliability co­ efficients of the tests were not given; however, Hered and Thelen reported that the reliability of the high-school test was satisfactory. The ability of ninth grade students to make conclu­ sions was investigated by Teichman.^ For this investigation he designed three tests. Test A, which was not objective, presented 16 paragraphs from which the students were to draw conclusions. In Test B, 29 experiments were described; each was followed by four conclusions. The students were instruct­ ed to choose the best one. sented, followed by data. was stated. In Test 0, 15 problems were pre­ A conclusion, which was faulty, These 15 faulty conclusions constituted the ^ William Hered and Herbert A. Thelen, ”The highschool chemistry test of the Armed Forces Institute.” Journal of Chemical Education. 21:507-515, October, 1944. TE Louis Teichman, ”The ability of science students to make conclusions.” Science Education. 28:268-279, December, 1944. 52 Items of Test C. Students were instructed to evaluate the faulty conclusions according to the following key: (a) It does not answer the problem or question, (b) It does not agree with the facts of the experiment. (c) There are not enough facts to make the conclu­ sion valid (correct). (d) The facts have not been obtained by proper control (comparison) in the experiment. The test was validated by unanimous agreement of three prominent educators in the field of science, by item analysis, and by intercorrelations of the three tests. The reliabilities were estimated by the split-half method. The reliability of Test A was .88, of Test B was .88, and of Test C was .68. The total test reliability was given as .91. A l p e r n , ^ in 1946, devised a test for high-school students to measure the ability to suggest procedures to test hypotheses. From the responses of this non-objective test he constructed an objective test to measure the ability to select methods of testing hypotheses. Each of the test items consisted of (1) a situation, (2) a statement of the problem, (3) an hypothesis offered as an explanation, and (4) four suggested procedures. These last constituted the foils of each item; the student was Instructed to choose the best experiment to test the hypothesis given. ^ The Morris L. A l p e m , wThe ability to test hypotheses.” Science Education. 30:220-229, October, 1946. 53 preliminary forms of this test were revised on the basis of criticism of experts and on the basis of item analysis. Twenty items constituted the test which had an estimated reliability coefficient of .75. The test was validated by the Judgment of 41 educators in science, by item analysis, by a consideration of the range of difficulty of the items, and by the fact that average scores increased through suc­ cessive grades, from ninth through twelfth. A test to measure certain aspects of scientific think­ ing in the area of college physics was devised by Dunning . ^ The test was constructed to measure ability to interpret data and ability to apply principles. The method of evalua­ tion used by Dunning to test the ability to interpret data 66 was substantially that reported by Smith and Tyler. Dunning's unique contribution to the measurement of this objective was his use of four methods of scoring the papers in order to determine the effects of variously weighted scorings on the reliability. He found the method of giving a single point for the keyed answer gave the highest estimate of reliability by the split-half method. given as .83. The reliability was In addition, he found that this method also ^ Gordon M. Dunning, "The Construction and Validation of A Test to Measure Certain Aspects of Scientific Thinking in the Area of First Year College Physics." Unpublished Doctor's thesis, Department of Education, Syracuse University, 1948. Smith and Tyler, og. clt.. pp. 15-28. 54 gave the highest validity coefficient when he correlated scores on the test with teacher ratings of the students. The validity coefficient obtained was .56. A second method of validation of the test was the correlation of scores made on the objective test with scores on the same material on an essay test. This correlation was .66. 67 Ullsvik constructed a test which was designed to measure critical Judgment in geometry classes. however, was on non-geometric subjects. three parts: The test, The test was in Part I was called "Judging of Conclusions" and instructed the students to mark the conclusions given as acceptable, not acceptable, or insufficient evidence, Part II was an evaluation of definitions, Part III presented a paragraph followed by 15 statements. The student was instructed to select the two statements which were the most crucial in leading one to accept the conclusion, and the two which were the most crucial in leading one to reject the con* elusion. The reliability of the test was not given. In 1949, Read^® published a description of a non­ verbal test of the ability to use the scientific method. An BJarne R. Ullsvik, "An attempt to measure critical Judgment." School Science and Mathematics. 49:445-452, June, 1949. 68 John G-. Read, "A non-verbal test of the ability to use the scientific method as a pattern for thinking." Science Education. 33:561-366, December, 1949. 55 6q analysis of Keeslar’s major elements of scientific method 70 led Read to the inference that many of these steps involv­ ed discriminatory choices. The inventing and planning of experiments could only be measured by physical methods but the other elements he claimed all Involve discriminatory choices. These were summarized as follows: 1. Observation is only valuable when it is discriminating. 2. The defining of a problem means a choice among possible problems. 3. Classification of data is discrimination between items. 4. Setting up hypotheses is the choosing of one or more possible explanations of the data. 5. Selecting the most likely hypothesis is critical discrimination. 6. Drawing conclusions is selecting and fitting of data, again critical discrimination. 7. Validation of the conclusion is again a matter of discrimination and choice. On the basis of his contention that scientific think­ ing is primarily the making of discriminatory choices, he devised a picture-test to appraise the ability to make these choices. He described his test as follows: The picture-test is a series of sub-tests, related in that they are all aspects of the environment, and that they all pose problems which can be solved through 69 Keeslar, £2* clt.. pp. 212-216. 7° Read, ojo. cit.. pp. 361-366. 56 the association of two sets of pictures. There are seven categories; each edlineated by four pictures, each of which represents a particular sub-division of the category. (Three more categories of a bio­ logical nature have been added). The categories have to do with electricity, with air pressure, with one phase of chemistry, with mechanics; they are samples of common environmental science. The four pictures are mounted on a card, ...... the card is placed in a box. Under each of the four pictures is a small bin. From six to eighteen sepa­ rate loose pictures may be picked up by the testee, closely examined, sorted, compared, and finally dropped into one of the bins. The only directions are to "place each picture in the bin where it fits best.” High scores are obtained by those who discover what the four pictures on the card represent. As each card is on a single topic, the task is to dis­ cover the more or less fine shades of dis-similarity (italics in the original) among the four pictures. The loose pictures serve as clues, and as they can be moved around without penalty, once the pattern exhibited by the four pictures on the card is dis­ covered, the way is open for careful comparison and critical discrimination.71 Read originally used 133 pictures which he presented to eleven science specialists for sorting. Of these 133, seventy were placed by all of the Judges in the same bins. Item-analysis showed that 27 of these were non-discriminatory; the remaining 43 pictures made up the items of the test. twelve. The test was designed for grades seven through By means of the Kuder-Richardson formula, Read found the reliability of the test to be .78. The test was validated by administering it to 18 members of the group who won high honors in a state science contest. The scores made by these students was significantly higher than scores 71 Ibid.. pp. 362-363 57 made by students who had had no science. Bingham 72 devised a series of tests for general science, biology, chemistry, and physics which were used primarily as teaching devices. The instructor performed an experiment and then a twelve-item test was given. Item 1 was concerned with the results of the experiment. Item 2 described experiments; the student was directed to select the one actually performed. Item 3 presented five h y po t h ­ eses to account for what happened; structed to choose the best one. the student was in­ In items 4-8 additional facts were given and the student was directed to choose the fact which showed the untenable hypotheses presented in Item 3 to be unsound. The choice, "none of these,11 could be used for the hypothesis which was sound. Item 9 tested an understanding of the assumptions underlying the conclu­ sion drawn; Item 10 was concerned with new problems arising out of the experiment, while Item 11 presented assumptions underlying the application of the conclusion to new situa­ tions. Item 12 tested the ability to apply the conclusion to new situations. No data on the reliability or validity of the test were presented. E d w a r d s , ^ i n 1950, reported on two tests, Test A and Eldred N. Bingham, "A direct approach to the teaching of the scientific method." Science E d u c a t i o n. 33:241-249, April, 1949. 73 ii v Thomas B. Edwards, Measurement of Some Aspects of Critical Thinking." Journal of Experimental E d u c a t ion. 18:263-279, March, 1950. 58 Test C, which he devised to measure certain aspects of critical thinking. Test A was devised to measure induction. Four principles were stated; each principle was followed by five facts. The pupil was instructed to choose the fact which supported the principle. The estimate of reliability of the test was ,88 as determined by the method of splithalves, .80 as measured by a correlation of the two forms of the test. Edwards claimed that the validity was built into the test by using an accepted theory of critical think­ ing and by using facts familiar to students. Additional evidence for validity was found in an increase in scores from grades ten through grade fourteen (college sophomore) and in a correlation of only .17 with intelligence. Test 0 was called a Judgment Test. Four opinions were stated; these were labeled A, B, G, and D. One opinion was sound, one fairly adequate, one irrelevant, and one totally incorrect. The opinions were then presented in pairs, AB, AG, etc., giving six items for each set of four opinions. The student was instructed to choose the better of each pair. This test was prepared in two forms. Reli­ ability coefficients ranged from .49 to .75 when determined by the split-half method. forms was .32. for Test A. The correlation between the two The methods of validation were the same as The correlation of Test G with intelligence wa s .15. Tests A and G were two tests of a battery of tests 59 devised by Edwards 74 who o r i g i n a l l y set out to m e a s u r e seven aspects of c r i t i c a l thinking. vised. S e v e n tests w e r e d e ­ Test I a i m e d to test the a b i l i t y to liability of sources of information. Judge the r e ­ A series of s t a t e ­ ments concerning m e a s u r e m e n t s w e r e presen t e d . was instructed to u n d e r l i n e the letter R The student if he felt that the accuracy m e n t i o n e d was p o s s i b l e b y m e a n s of the device used, but to u n d e r l i n e the l e t t e r N not measure as a c c u r a t e l y as was if the device c o u l d i n d i c a t e d in the statement. Edwards states that this test s h o w e d some promise, b u t that it was not d e v e l o p e d b e y o n d the p r e l i m i n a r y stages b e c a u s e the reliabil i t y was low. Test II was a test of re l e v a n c e . sisted of two statements. underline the letter R Each question con­ The student was In s t r u c t e d to if the two s t a t e m e n t s were related, to underline the l e t t e r N if they were n o t related. This test was n ot r e v i s e d a f t e r the f i r s t tryout b e c a u s e of the difficulty of o b t a i n i n g facts w h i c h the test c o n s t r u c t o r was sure al l of the s t u d e n t s w o u l d know. induction test d i s c u s s e d as T e s t A above. T e s t III was the Test IV was a deduction test d e v i s e d to m e a s u r e the student's a b i l i t y to Judge goo d a n d poor a r g u m e n t s . called Test B. 74 T his test was r e v i s e d a n d The r e l i a b i l i t i e s were not stable; „ they Thomas B. Edw a r d s , M e a s u r e m e n t of Some A s p e c t s of C r i tical Thinking.'1 U n p u b l i s h e d D o ctor's thesis, Department of Education, U n i v e r s i t y of California, 1949. PP. 23-50. 60 ranged from .20 to .86. Test V was the Judgment test dis­ cussed as Test C above. Test VI presented ten paragraphs, each of which was followed by three conclusions; one sound, one irrelevant, and one contradicted by the data. These were labeled A, B, and 0 and were presented in pairs. The student was instructed to choose the better of the pair. Test VII was similar to test VI, but the conclusions were all based upon the data. The student was instructed to choose the better of a pair of the conclusions. upon revision, became Test D. were .82 and .84. This test, The estimated reliabilities The correlation of this test with intelligence was .22. Summary concerning tests on abilities involved in oroblem-solving. Considerable progress has been made in the testing of abilities involved in problem-solving in the three decades since Herring ^ published his test of scien­ tific thinking. His pioneer work was of considerable in­ terest because it was the first test of such a nature to be published and because he defined the kinds of behaviors which he associated with scientific thinking. Watson's Test of Fairraindedness, though designed to measure prejud­ ice, was a forerunner of most of the tests which have been devised to measure the ability to interpret data. ^ 78 Herring, op. c i t .. pp. 535-558. Watson, op. cit., pp. 9-35. In 61 addition, it was later modified by Watson and G-laser and became the highly successful Test of Critical Thinking. Watson's contribution was also significant in that he validated the test by curricular and statistical methods. Another significant test of the mid-twenties was Zyve's^ Stanford Scientific Aptitude Test, which purported to measure eleven scientific aptitudes. This test appears to have been the first test of this type and has been widely used. This test, also, was quite well validated. Downing's*^® test of scientific thinking was a distinct contribution because it was designed to measure many of the skills and safeguards of scientific thinking. The primary contribution 70 of Weller'* was the recognition of the distinction between the skills of scientific thinking and the scientific atti­ tudes. One of the best of the attitudes tests was, "What' Do You Think?", constructed by Noll, who defined attitudes as habits of thinking. This test also has been widely used. 81 The tests devised for the Eight-Year Study were 77 78 79 Zyve, op. cit., pp. 525-546. Downing, op. pit., pp. 121-128. Weller, pp. cit., pp. 90-97. Oq Noll, pp. cit.. pp. 18-25. O 1 Smith and Tyler, pp. cit.. pp. 3-156. 62 noteworthy contributions to test construction because in the development of these tests the behaviors attending the major objectives were considered in detail, and because the abilities involved in critical thinking were recog­ nized as major outcomes of secondary education. The Inter­ pretation of Data tests devised for the Eight-Year Study have been used very extensively. In the last decade the trend toward increased emphasis on the teaching of critical thinking has culminated in the production of a number of tests devised to test phases of this major objective. The Watson-G-laser Test of Critical 82 S'? Thinking, previously referred to, was reported. Johnson ^ made a significant contribution in devising a test revolving 84 88 around a single major problem. Telchman and Alpern devised interesting tests to appraise the abilities to draw conclusions from data and the ability to devise experiments, respectively. An entirely new approach to the problem of measuring 86 the ability to think scientifically was presented by Read in his Non-verbal Test of Scientific Thinking. 82 Q-Z J G-laser, pp. cit. . pp. 87-92. Johnson, op. c it .. pp. 83-96. Teichman, pp. cit.. pp. 268-279. 88 ^ Alpern, op. £ i t . f pp. 220-229. Read, pp. cit. . pp. 361-366. This test 63 was designed on the assumption that critical discrimination is the keynote of scientific thinking, and presents an Interesting method of isolating this factor. No attempt has been made in this summary to include mention of all of the tests and testing techniques which have been developed. Only the highlights in the measure­ ment of problem-solving have been treated. It is, however, of interest to note, that tests have been devised for almost all educational levels from fourth grade through college, and that some tests have been devised without regard to subject matter areas, whereas, others have been designed for specific subjects. RELATIONSHIP BETWEEN PROBLEM-SOLVINGAND OTHER ABILITIES Relation of Intelligence to abilities involved in problem-solving. It is the opinion of a few investigators that the abilities involved in problem-solving are identical with intelligence. The majority of investigators seem to believe that there is a moderate to substantial relationship between intelligence and the abilities Involved in problem­ solving. A few, however, contend that the two abilities are almost completely unrelated. 87 Billings ' has cited some evidence to support the y f tt „ Marion L. Billings, Problem-solving in different fields of endeavor.11 American Journal of Psychology. 46:259-272, April, 1934. 64 viewpoint that problem-solving is a general intelligence factor. In an attempt to ascertain the nature of problem­ solving, he presented his subjects with problems in eight different subject-matter areas. The subject matter necess­ ary to the solution of the problems was taught prior to the administration of the tests. He obtained correlations ranging from .53 to .78 between the tests of reasoning in the various subject-matter areas. was .67. The average correlation Correlations between the tests of reasoning in the various fields and intelligence, as measured by the Army Alpha test, ranged from .42 to .59. Since he found a higher average correlation between the scores on reasoning in various fields than between reasoning in a particular field and information in that field, he inferred that prob­ lem-solving was an important part of Spearman's general factor of intelligence, if not intelligence itself. 88 It is Interesting to note that Billings attributed problem-solving to intelligence with correlations of from .42 to .59 between his test and an intelligence test, while other investigators obtaining similar correlations have not interpreted their data as indicating particularly high rela­ tionships between problem-solving ability and intelligence. Billings, loc. c i t . 65 Zyve, Qq y qq Sinclair and Tolman, and Downing 91 seem to believe, however, that critical or scientific thinking is an Innate characteristic. On the other hand, many investi­ gators have shown that the ability to think scientifically can be taught. If this is true, problem-solving could not be identical with Intelligence nor could it be an innate ability. A discussion of these alternate viewpoints follows 92 Zyve, who considered his test to be a measure of scientific aptitude, did not claim that the aptitude was intelligence itself. His data gave evidence that it was not intelligence, since he found a correlation of .44 to .51 between his test and intelligence as measured by the Thorn­ dike intelligence test. 9-5 A study by Sinclair and Tolman on the effect of scientific training on logical thinking showed that students in the science and engineering fields in college were superior to students in other fields in their ability to make inferences, as evidenced by the Inference test of the 89 Zyve, op. cit., pp. 525-546. James H. Sinclair and Ruth S. Tolman, "An attempt to study the effect of scientific training upon prejudice and illogicality of thought." Journal of Educational Psychology. 24:362-370, May, 1933. 91 7 Downing, pp. c i t .. p. 128. Zyve, op. pit., pp. 525-546. 93 Sinclair and Tolman, pp. c i t .. pp. 362-370. 66 Wats o n test of F a i r m i n d e d n e s s . The a u t h o r s that this m i g h t m e a n t h a t s t u d e n t s w h o 94 s u g g e s t e d elect science and e n g i n e e r i n g s h o w a t e n d e n c y to s u p e r i o r i t y in t his a b i l i t y . This s u g g e s t i o n w o u l d l e a d one to b e l i e v e and T o l m a n c o n s i d e r the a b i l i t y ability. to i n f e r to b e a n i n n a t e They report a correlation of on the T h o r n d i k e I n t e l l i g e n c e that Sinclair .49 b e t w e e n s c o r e s test a n d s c o r e s o n W a t s o n 1s I n f e rence test. QC D o w n i n g 27-' r e p o r t e d a c o r r e l a t i o n o f .66 b e t w e e n h i s Test on S c i e n t i f i c Thinking and Intelligence for students in the s e n i o r h i g h school, these t r aits f o r s t u d e n t s a n d a c o r r e l a t i o n of in the c o n c l u d e d that I n t e l l i g e n c e , J u n i o r h i g h school. as e x p r e s s e d b y e rent f r o m the e l e m e n t s or s a f e g u a r d s ing. convincing evidence in However, the s a f e g u a r d s h e does n o t p r e s e n t s u p p o r t of this v i e w p o i n t . S t r a u s s f o u n d a c o r r e l a t i o n of .64 b e t w e e n a n d scor e s on the O t i s I n t e l l i g e n c e test. u s e d in this s t u d y w e r e b e t w e e n 94 think­ of s c i e n t i f i c t h i n k i n g w e r e due to i n h e r i t e d a b i l i t y w h i l e instruction. He IQ, was d i f f ­ of s c i e n t i f i c It was h i s o p i n i o n t h a t the e l e m e n t s were the r e s u l t of .47 b e t w e e n sco r e s on D o w n i n g ' s the a g e s ^ Sinclair and Tolman, loc. cit. 9 5 Do wning, op. cit., pp. 9 5 Strauss, op. pit., pp. 89-93- 121-128. test T h e 90 s t u d e n t s of 10 a n d 18. 96 67 Ter Keurst and Bugbee 97 administered their test on the scientific method to college freshmen and sophomores. They found correlations of .51 and .66, respectively, be ­ tween the scores made by these groups on their test and the scores on the American Council on Education Psycholog­ ical Examination. Since their test measured knowledge of the method of science rather than ability to use the scien­ tific method, these correlations cannot justifiably be com­ pared with the other correlations reported here. 98 3-laser reported correlations ranging from .03 to .52 between Intelligence, as measured by the Otis Mental Ability test, and the six tests which make up the Watson- Grlaser Test of Critical Thinking. The correlation of scores on the entire critical thinking test with scores on the Otis Mental Ability test was .46 for the initial administration of the test and .48 for the final administration of the test. 99 Howell attempted to discover the effect of debating on critical thinking. As a part of his study he correlated the composite Scores on five of the six Watson-G-laser tests with intelligence quotients. He obtained a correlation of .63. 97 9 8 Ter Keurst and Bugbee, op. cit.. pp. 489-501. G-laser, op. cit., 142-147. ^ William S. Howell, "The effect of high school debating on critical thinking." Speech Monographs. 10: 96-102, Annual, 1943. 68 In a study of the ability of ninth grade students to make conclusions, T e i c h m a n ^ ^ found a correlation of .65 between the scores on his test and scores on a measure of mental ability. He found no significant relationship b e ­ tween intelligence and growth in the ability to make con­ clusions; Higgins,as a part of his study on the educability of adolescents in inductive ability, devised a test entitled Judge Conclusions. He found that the correlation between the scores on this test and scores on the Henmon-Nelson Test of Mental Ability was .54. Of particular interest, however, was his finding of a correlation of only .36 between his test and Thurstone's Induction Test. One would expect that his test, which he believed measured abilities involved in inductive reasoning, would have had a higher correlation with a test which purported to measure the inductive factor of in­ telligence than with a general intelligence test, such as the Henmon-Nelson Test of Mental Ability. 102 Weisman, in her study of factors related to the ability to interpret data, reported correlations of .64 to 100 Teichman, op. cit., pp. 268-279. Higgins, o£. cit.. p. 40. Leah L. Weisman, "Some Factors Related to the Ability to Interpret Data In Biological Science." Unpub­ lished Doctor's thesis, Department of Education, University of Chicago, 1946. p. 91. .69 "between intelligence as measured by the Henmon-Nelson Test of Mental Ability and ability to interpret data as measured by the Progressive Education Association Inter­ pretation of Data test. The studies considered thus far have all given evi­ dence of a moderate to substantial relationship between intelligence and problem-solving abilities. Two studies, utilizing the technique of partial correlations, have shown that the true relationship between intelligence and problem­ solving is probably not shown by simple correlations. In a study devised to investigate the relationship between ability to recall and ability to reason, S m i t h found a correlation of .58 between ability to reason and IQ. When ability to recall was held constant, by means of a partial correlation, this coefficient of correlation between ability to reason and IQi IQ was reduced to .2 3 . Alpern, ity of students to test hypotheses, in his study on the abil­ found a correlation of .53 between intelligence and ability to test hypotheses. How­ ever, by holding reading grade and chronological age constant by the use of a partial correlation, he found the correlation was reduced to .11. victor G. Smith, "A study of the degree of rela­ tionship existing between ability to recall and two measures of ability to reason.” Science Ed u c a t i o n . 30:88-90, March, 1946 b ~ 70 Somewhat lower correlations between abilities intelligence and i n v o l v e d in c r i t i c a l t h i n k i n g h a v e b e e n r e p o r t e d in a n u m b e r of studies. Hoff^^ r e p o r t e d a c o r r e l a t i o n of .36 b e t w e e n i n t e l l i g e n c e a s m e a s u r e d b y the A m e r i c a n C o u n c i l on E d u c a t i o n P s y c h o l o g i c a l e x a m i n a t i o n a n d h i s scientific at t i t u d e s . t est for Noll*^^ found moderate positive cor­ relations, r a n g i n g f rom p r e l i minary forms of h i s .30 to test, correlations, h e b e l i e v e d , .41 b e t w e e n IQs a n d scores on " W h a t Do Y o u T h i n k . ” indicated that his T h ese test measured factors o t h e r than i n t e l l i g e n c e o r n a t i v e a b i l i t y of the eighth to t w e l f t h g r a d e s t u d e n t s to w h o m h e a d m i n i s t e r e d the tests. Bedell , 107 'in a s t u d y on the r e l a t i o n b e t w e e n the a b i l ­ ity to infer a n d the a b i l i t y to r e c a l l , c o r r e lations b e t w e e n intelligence of found low positive junior a n d s e n i o r h i g h school s t u d e n t s a n d t h e i r a b i l i t y to infer. data r e v e a l e d that However, his the l o w e s t q u a r t e r o f the group, in terms of scores on the i n t e l l i g e n c e test, scored scarcely better than c h a n c e on the He i n f e r e n c e test. that a c e r t a i n d e g r e e of i n t e l l i g e n c e conclu d e d , t e ntatively, is e s s e n t i a l to p r o b l e m ­ solving abili t y . Hoff, op. c i t . . pp. 28-35. Noll, o g . c i t . , p. 24. -*-°7 R a l p h C. Bed e l l , "The R e l a t i o n s h i p B e t w e e n the A b i l i t y to In fer in S p e c i f i c L e a r n i n g S i t u a t i o n s . ” U n p u b ­ l i shed D o c t o r ' s thesis, D e p a r t m e n t of E d u c a t i o n , U n i v e r s i t y of M i s s o u r i , 1934. pp. 36-37. 71 Johnson i o ft correlated scores made on her test devised to measure reflective thinking with mental alertness, as measured by the Ohio Psychological examination. She r e ­ ported a coefficient of correlation of .40 for a group of 84 college students. She believed that the data revealed that those aspects of reflective thinking measured by her test may depend on college level intelligence, but that other variables were more significant. Furst, 109 in a study of changes evoked in two years of general education, gave a series of tests to measure, among other things, changes in the ability to think critically. As a part of his study, he correlated the scores made on the portions of his test which measured critical thinking with intelligence as measured by the American Council on Education Psychological examination. correlations were below .40. He found that 80 percent of these He asserted that his data indi­ cated that the various tests of critical thinking measured characteristics of s t u de n t’s behavior which were not highly related to. measures of scholastic aptitude. He believed that, at the secondary school level and the lower college level, students with relatively low scholastic aptitude may loft ° Johnson, o£. c i t .. pp. 83-96. Seward J. Furst, ’’Changes in Organization of Various Abilities and Skills after Two Years of General Education at the Secondary-School Level.” Unpublished Doctor's thesis, Department of Education, University of Chicago, 1948. p. 155. 72 be able to perform as well as those with high scholastic aptitude on tests of critical thinking. Dunning110 studied the relationship of the ability to interpret data, as measured by his test, to factors of Intelligence. As a measure of the factors of intelligence he used a battery of Thurstone's Primary Mental Abilities tests. He found correlations of from .04 to .24 between the various factors of intelligence as measured by this test and the scores on the interpretation of data, portion of his Test of Scientific Thinking. He concluded that the ability to interpret data was a different ability than any of the factors of intelligence. Ill Head reported a correlation of .39 between intell­ igence and his non-verbal test of the ability to use the scientific method. Edwards 112 found correlations ranging from .00 to .22 between measures of intelligence and his four tests which were designed to measure (1) induction, (2) deduction, (3) Judging opinions, and (4) Judging conclu­ sions . 110 G-ordon M. Dunning, "The construction and valida­ tion of a test to measure certain aspects of scientific thinking in the area of first year college physics." Science Education. 33*221-235, April, 1949. 111 Read, 112 o p . cit.. pp. 261-266. Edwards, pp. cit., pp. 80-85. Fleming^1^ studied the outcomes of a course in biology at the college level. One of the purposes of his investigation was to measure growth in understanding of the elements of the scientific method. As a part of this study he correlated the scores made on his test of scien­ tific thinking with intelligence. He reported a coeffic­ ient of correlation of .34. Summary of studies concerning the relation of in­ telligence to problem-soIving. There is no substantial agreement among investigators concerning the relationship of problem-solving to intelligence. A number of investi­ gator's correlations ranged from .40 to .69, indicating a fairly substantial relationship between intelligence and problem-solving abilities. Billings interpreted such correlations as indicating that problem-solving ability is a general factor, if not intelligence itself, whereas other investigators made no such claim. On the other hand, how­ ever, some investigators have found correlations ranging from .00 to .40, indicating no relationship to moderate relationship between these characteristics. Evidence ob­ tained by the use of partial correlations indicated that other factors, such as memory and reading ability may account for some of the relatively high correlations. Fleming, op. cit.. p. 185* Billings, op. cit.. pp. 259-272. 74 Although many of the correlations show a moderate to substantial relationship between intelligence and the abilities involved in problem-solving, these correlations are not as high as correlations between scores on intelli­ gence tests and achievement tests over information previously learned. Stroud 115 has stated that correlations between scores on achievement batteries and intelligence tests are of the magnitude of .8, and K e l l e y c l a i m e d that there was a 90 percent overlapping between a general intelligence test and a general achievement test. These findings seem to indicate that there is somewhat less rela­ tionship between intelligence and ability to think scien­ tifically than between intelligence and general academic achievement. Zyve, D o w n i n g , a n d Sinclair and Tolman^"^ sup­ port the viewpoint that the ability to think critically is an innate characteristic. If this is true, no appreciable Improvement in scores on thinking tests as a result of in­ struction would be anticipated. Evidence to the contrary 115 James B. Stroud, Psychology in Education. New York: Longmans, Green and C o m p a n y . 1946. pp. 558-339. Truman L. Kelley, Interpretation of Educational Measurements. Yonkers-on-Hudson: World Book Company. 1927. pp. 363. 117 Zyve, o p . cit. . pp. 525-546. Downing, og. cit.. pp. 121-123. ■^9 Sinclair and Tolman, ojd . cit.. pp. 262-270. 75 is presented in the discussion which follows. Educability in problem-solving. Related to the prob­ lem of the relationship of intelligence to abilities Involved in critical thinking, is the problem of educability in the thinking process. If abilities involved in critical think­ ing were primarily due to intelligence as suggested by Billings, there should be little, if any, improvement in the ability with training. The evidence seems to Indicate that these abilities can be improved if they become specific ob­ jectives of instruction. On the contrary, there is no evi­ dence to indicate that they are a necessary by-product of the study of science. As Indicated by Noll, 120 the attain­ ment of these objectives will come when they are taught; that is, when the emphasis of teaching is upon learning to think rather than on memorization of facts. There is considerable evidence to show that skills of the scientific method can be taught effectively to students 121 of all grade levels. Weller found a significant differ­ ence between two equated groups of sixth grade students; one group received specific instruction in both scientific atti­ tudes and skills of scientific thinking, while the other received no special training. She concluded that growth in 120 victor H. Noll, "Teaching the habits of scientific thinking.” Teachers College Record. 35*202-212, December, 1933. 121 Weller, 0£. cit., pp. 90-97. 76 both attitudes and skills could he stimulated if they were specific objectives of instruction. Arnold,122 in a study of fifth and sixth grade students, also concluded that critical thinking can be taught in the elementary school. G-rener and Raths12^ found significant gains in the ability to think critically in a group of third grade pupils after a five month period of teaching for critical thinking. Curtis12^ and Daily12^ both found that Junior high school pupils benefited from direct instruction in critical thinking. Blair and Goodson 126 conducted an experiment which showed that ninth grade students receiving instruction in 127 scientific thinking improved more on Noll's "What Do You Think" test than did the two groups which did not receive noft this special instruction. One of the control groups 122 Dwight Arnold, "Testing Ability to use data in the fifth and sixth grades." Educational Research Bulletin. 17:255-259, December, 1937.. 12^ Norma G-rener and Louis E. Raths, "Thinking in third grade." Educational Research Bulletin. 24:38-42, February, 1945. 122f Curtis, pp. cit.. p. 78. 12^ Daily, pp. cit.. p. 81. •*•26 G-ienn M. Blair and Max R. G-oodson, "Development of scientific thought in general science. School Review. 47:696-700, November, 1939. 127 Noll, Habits of Scientific Thinking, op. cit.. PP. 27. 1PR Blair and Goodson, pp. cit.. pp. 696-700. 77 received no science instruction, while the other control group received science instruction by the usual methods. The means for all three groups were higher on the post-test than on the pre-test. The comparison of means for the two control groups showed no significant difference which seems 129 to support Downing’s viewpoint that science instruction does not necessarily produce growth in ability to think scientifically. Teichman 130 investigated the ability of ninth grade students to draw conclusions. Twelve groups, designated as controls, were taught the regular course in science. Eight groups were given additional training in the drawing of con­ clusions. He found that although both groups made gains in these abilities, the experimentals made significantly greater gains. H i g g i n s ^ ^ studied the educability of adolescents in inductive ability. He reported that the gains of students receiving special instruction in problem-solving in a course in high school biology were meaningfully greater than the gains of other students taking biology but not receiving special instruction in problem-solving. Downing, op. cit.. pp. 121-128. 130 Teichman, pp. pit., pp. 268-279. Conwell D. Higgins, "The educability of adoles­ cents in inductive ability." Science Education. 29.*82-85, March, 1945. Neuhof 132 f o u n d t hat s t u d e n t s t a ki n g h i g h s c h o o l chemistry i m p r o v e d m a r k e d l y in t h e i r a b i l i t y to i n t e r p r e t data, as m e a s u r e d b y the P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n I nterpretation of D at a tests, pretation of data. study. Gains students. after training in the i n t e r ­ N o c o n t r o l g r o u p was e m p l o y e d in this in s c or es w e r e n o t l i m i t e d to the b e t t e r H e c o n c l u d e d t ha t d e f i n i t e l y m e a s u r a b l e r e s u l t s could be a c h i e v e d in the t e a c h i n g of such c o m p l e x m e n t a l processes as the i n t e r p r e t a t i o n of data. Weisman 133 i n v e s t i g a t e d the d e v e l o p m e n t of skills of scientific t h i n k i n g in h i g h s c h o o l biology. taught by the i n v e s t i g a t o r using problem-solving techniques were c o m p a r e d w i t h six c l a s s es lieve that the a b i l i t y S i x c l a ss es t a u g h t by teac he r s who b e ­ to t h i nk s c i e n t i f i c a l l y c o u l d be taught w i t h o u t s p e c i a l instruc ti o n. Weisman found her e xper­ imental group s g a i n e d s i g n i f i c a n t l y more than the c on tr o l s on the P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n I n t e r p r e t a t i o n of Data tests. There was a ls o a s i g n i f i c a n t g ain on s e v e r a l of the 7/atson-Glaser Tests of C r i t i c a l Thi n ki ng . A l t h o u g h th ese results are c o n s i s t e n t w i t h r e s u l t s of m a n y o t h e r studies, K a l l i s o n ^ ^ ^ c r i t i c i z e d the tests. i m p l i c a t i o n of the f i n d in g b e c a u s e 132 M a r ]£ Neuh of , ” I n t e g r a t e d i n t e r p r e t a t i o n of data S c i e n c e E d u c a t i o n . 26:21-26, January, 194-2. 133 ^ Weisman, op. cit., pp. 77-83. G e o r g e G-. M a l l i s o n , **The im pl i ca ti on s of r e c e n t res ea r ch in the t e a c h i n g of science a t the s e c o n d a r y - s c h o o l l e v e l . ” J o u r n a l of E d u c a t i o n a l R e s e a r c h . 43:321-342, January, 1950. 79 the study failed to take into account the fact that the investigator may have been a superior teacher. Glaser 135 utilized four control and four experimental classes in twelfth grade English to measure changes in abil­ ity to think critically. The experimental classes were given instruction to stimulate critical thinking. G-laser found that the average gains on the battery of critical thinking tests of the four experimental classes, after ten weeks of instruction, were significantly greater than the average gains of the control classes. This study is especially sig­ nificant in that it included a follow-up study. The students were tested again six months after the experimental period. The growth in ability to think scientifically had been re­ tained. Glaser predicted that some aspects of the growth would probably be retained more or less permanently, and would afford a basis for further growth in the ability to think critically. A few studies have been reported on teachability of the skills involved in scientific thinking at the college level. Teller 136 J used an experimental and a control group of students taking a course in the history of education. Both groups had classes five days a week, but one class Glaser, op. cit., pp. 131-14-0. 136 James D. Teller, " Improving ability to interpret educational data." Educational Research Bulletin. 19*363371, September, 194-0. 80 p e r i o d e a c h w e e k wa s t o r i c a l d ata d e v o t e d to the in the e x p e r i m e n t a l interpretation of h i s ­ section. Teller that the e x p e r i m e n t a l g r o u p s h o w e d g r e a t e r Improvement the a b i l i t y to i n t e r p r e t d a t a as m e a s u r e d by a s t r u c t e d to a p p r a i s e the a bili ty to found test in con­ Interpret historical da t a . Tyler 137 for s t u d e n t s r e p o r t e d a s t u d y on r e m e d i a l enrolled in a c o u r s e Students who received remedial in f r e s h m a n z o o l o g y . instruction ing t e c h n i q u e s g a i n e d s i g n i f i c a n t l y m o r e the r e m e d i a l instruction. instruction Students in p r o b l e m - s o l v ­ than those without In th i s m a t c h e d on the b a s i s o f i n t e l l i g e n c e , study were pre-test s c o re s, sex, and instructor. Fleming 138 reported a study comes of a c o u r s e ap p r a i s e d was two g r o u p s in b i o l o g i c a l the a b i l i t y of s t u d e n t s , Thelen 139 certain science, f o u n d that, out­ O n e o f the o u t c o m e s to t h i n k s c i e n t i f i c a l l y . He in the a b i l i t y ing the s c i e n c e s ci en c e . one taking no ing b i o l o g i c a l s c i e n c e . made gains to m e a s u r e He equated the o t h e r t a k ­ although both groups to t h i n k s c i e n t i f i c a l l y , those tak­ course made significantly g r e a t e r gains. m a d e a s t u d y of the e f f e c t o f 137 instruction R a l p h W. T y l e r , S e r v i c e S t u d i e s in H i g h e r E d u c a ­ t ion . C o l u m b u s , Ohio: T h e O h i o S t a t e U n i v e r s i t y . 1932. pp. 11 9 -1 22 . Fleming, 139 t h e l e n , o£. cit., . cit.. ojq pp. pp. 172 -1 79 . 2 34 -2 6 1 . 81 p lanned to p r o d u c e g r o w t h tifically. In the a b i l i t y The experiment was a course in f r e s h m a n taught b y t r a d i t i o n a l to t h i n k s c i e n ­ conducted with students chemistry. in The control groups were laboratory methods, perimental groups were given op portunities in i n d u c t i v e t h i n k i n g as o f t e n as w a s whereas the e x ­ to p a r t i c i p a t e feasible. Thelen's test on e x p e r i m e n t a l p r o c e d u r e s a n d the P r o g r e s s i v e E d u c a ­ tion A s s o c i a t i o n I n t e r p r e t a t i o n of D a t a eva lu a te the se a b i l i t i e s . of cov ar ia n ce , U s i n g the t e st w er e u s e d to t e c h n i q u e of a n a l y s i s h e f o u n d th a t the e x p e r i m e n t a l g r o u p s w e r e s u p e ri or to the con tr ol s. However, the g a i n s w e r e n o t g r ea t in t e r ms o f p e r c e n t gain s. Bon d, 140 in a s t ud y s i m i l a r to T h e l e n ’s f o u n d s u p e r ­ iority in a n e x p e r i m e n t a l g r o u p . The subject-matter area of B o n d ’s s t u d y was a u n i t on g e n e t i c s in a c o u r s e in c o l l e g e biology. Barnard 141 c o m p a r e d the r e l a t i v e e f f e c t i v e n e s s of the l e c t u r e - d e m o n s t r a t i o n m e t h o d w i t h the p r o b l e m - s o l v i n g m e t h o d in the t e a c h i n g of a c o u r s e tion i n s t r u m e n t s in c o l l e g e science. The evalua­ i n c l u d e d a t e s t o n the a b i l i t y to s o l ve 140 A u s t i n D. M. B o n d , A n E x p e r i m e n t in the T e a c h i n g of G e n e t i c s w i t h S p e c i a l R e f e r e n c e to the O b j e c t i v e s of G e n e r a l E d u c a t i o n . C o n t r i b u t i o n s to E d u c a t i o n , No. 797. N e w York; B u r e a u of P u b l i c a t i o n s , T e a c h e r s Col le ge , C o l u m b i a University. 1940.. pp. 77-7 9. 141 J. D a r r e l l B a r n a r d , " T he L e c t u r e - d e m o n s t r a t i o n vs p r o b l e m - s o l v i n g m e t h o d of t e a c h i n g a c o l l e g e s c i e n c e course." Science E d u c a t i o n . 26:121-132, O c t o be r, 1942. problems. The g r o u p s u s e d w er e e q u a t e d on the b a s i s tests and scores on p s y c h o l o g i c a l e x a m i n a t i o n s . of p r e ­ He found that the p r o b l e m - s o l v i n g m e t h o d p r o d u c e d s i g n i f i c a n t l y greater gains on the t e st s d e s i g n e d to m e a s u r e p r o b l e m - s o l v ­ ing abilities. S u m m ar y of s t u d i e s on e d u c a b i l i t y in p r o b l e m solving;. The evidence p r e s e n t e d in this p o r t i o n of the r e v i e w of l i t ­ erature seems to i n d i c a t e thinking are n o t to a n y the teaching of science. to the h y p o t h e s i s v i di ng that a b i l i t i e s i n v o l v e d in c r i t i c a l c o n s i d e r a b l e e x t e n t a b y - p r o d u c t of The e v i d e n c e a l s o lends credence that c r i t i c a l t h i n k i n g can be t a u g h t p r o ­ it is a s p e c i f i c o b j e c t i v e of instruction. the evidence Howe ve r , is still f r a g m e n t a r y a n d the c o n c l u s i o n is t e n ­ tative. R e l a t i o n of r e a d i n g a b i l i t y in p r o b l e m - s o l v i n g . to the a b i l i t i e s Involved T h e r e is c o n s i d e r a b l e e v i d e n c e to show that there is a r e l a t i o n s h i p b e t w e e n r e a d i n g a b i l i t y a n d the abil it y to th ink c r i t i c a l l y . A n i n t e r e s t i n g p o i n t in this r e142 gard is the f act that B u r o s p l a c e d the P r o g r e s s i v e E d u c a ­ tion A s s o c i a t i o n I n t e r p r e t a t i o n of D a ta tests a m o n g h i s list of r e a d i n g tests in the 19 4 0 M e n t a l M e a s u r e m e n t Y ea rbook. G-rim-^3 f o u n d c o r r e l a t i o n s r a n g i n g f r o m .51 to .66 142 O s c a r K. B u r o s , The N i n e t e e n - F o r t y M e n t a l M e a s u r e ­ ment Y e a r b o o k . H i g h l a n d Park, N. J.: The M e n t a l M e a s u r e m e n t Yearbook. 1941. pp. 546-347. P a u l R. G-rim, M I n t e r p r e t a t i o n of da t a a n d r e a d in g a b i l i t y in s o c i a l s t u d i e s . ’* E d u c a t i o n a l R e s e a r c h B u l l e t i n . 19:372-374, September, 1940. 83 bet we e n scores on P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n pretation of D a t a t e st s a n d s c o r e s junior h i g h school students. on r e a d i n g t e st s a m o n g Weisman^ g re ss iv e E d u c a t i o n A s s o c i a t i o n Inter­ also used the P r o ­ I n t e r p r e t a t i o n of D a t a te s t In h e r study on f a c t o r s r e l a t e d to the a b i l i t y to I n t e r p r e t data a m o n g h i g h s c h o o l s t u d e nt s. b e t w e e n scores on this She found correlations t e s t a n d s c o r e s on the R e a d i n g test to r a n g e f r o m .57 to .65. Iowa S i l e n t A partial correla­ tion b e t w e e n scores on the r e a d i n g t e s t a n d s c o r e s i n t e r p r e t a t i o n of d at a t est w i t h Dunning 145 ^ compared scores test in ph ysics, 146 held the constant was .34. i n t e r p r e t a t i o n of d a t a d e s i g n e d f o r c o l l e g e fre s hm en , on a r e a d i n g test. 3-l aser on h i s I Q , on with He r e p o r t e d a correlation of reported correlations of scores ,36. .32 a n d .36 b e t w e e n the c o m p o s i t e s c or e on the Watson-G-laser b a t t e r y of t e s ts a n d scores on the N e l s o n - D e n n y r e a d i n g of scores on the r e a d i n g test. Correlation t e s t a n d s c o re s on the six individ­ ual tests of the Watson-G-laser b a t t e r y r a n g e d f r o m - .06 for the g e n e r a l i z a t i o n t e s t to is of i n t e r e s t to n o t e .55 fo r the t h a t th ere i n f e r e n c e test. is a r e l a t i v e l y h i g h r e l a t i o n b e t w e e n r e a d i n g a b i l i t y a n d a n a b i l i t y to d e g r e e of tru th or f a l s i t y of s t a t e m e n t s . Weisman, 145 J op. c i t . . pp. Du nning, p p. p i t . , G-laser, op. 147 I b i d . . pp. p. 232. c i t .. pp. 166-167. 97-98. 142-147. It cor­ judge the G-laser'*'2*’^ f o u n d 84 higher correlations between of R e a d i n g C o m p r e h e n s i o n . .36 to .77 f or the for his b a t t e r y sco re s on h i s test and a These correlations i n d i v i d u a l te sts a n d f r o m test ranged .77 to from .82 of tests. T e r K e u r s t a n d B u g b e e 1^® o b t a i n e d c o r r e l a t i o n s .57 a n d .59 b e t w e e n scores on t h e i r t e s t on s c i e n t i f i c m e t h o d a n d sco re s on the N e l s o n - D e n n y r e a d i n g test. m en ti on ed , t hi s of As test of scientific m e t h o d seems previously to m e a s u r e k n o w l e d g e of s t e p s a n d a t t i t u d e s r a t h e r t h a n b e h a v i o r s . this b a s i s one m i g h t e x p e c t r a t h e r h i g h reading ability and scores Teichman, stu de nt s between on this test. found a correlation of this a b i l i t y as m e a s u r e d b y h i s Alpern correlation between 14Q 7 in s t u d y i n g the a b i l i t y o f n i n t h g r a d e to d r a w c o n c l u s i o n s , abi li t y. ISO found similar correlations between his to t es t h y p o t h e s e s scho ol pupils. H e r e p o r t e d a c o r r e l a t i o n of found that by h o l d i n g tial c o r r e l a t i o n , Hoff 151 this 148 However, constant by means of a p a r ­ c o r r e l a t i o n w a s r e d u c e d to Ter K eu rst and Bugbee, Alpern, H o ff , op. ojd. o£. ojd. c l t .. pp. c l t . . pp. in h i g h .57- H e r e p o r t e d a c o r r e l a t i o n of l4Q ^ Teichman, 151 I Q , and reading grade f o u n d low c o r r e l a t i o n s b e t w e e n h i s reading ability. .61 test a n d r e a d i n g test on a b i l i t y Alpern On c l t . . pp. .3 6 . test and .19 b e t w e e n 489-501. 268-279. 2 2 0 - 22 9 . c l t . . pp. 28-35. M scores on h i s t es t a n d s c o r e s on t h e comprehension of the A m e r i c a n C o u n c i l o n E d u c a t i o n R e a d i n g correlation between scores on his ing scor es on the A m e r i c a n test was portion test. The t e s t a n d s p e e d of r e a d ­ C o u n c i l on E d u c a t i o n R e a d i n g .09. Summary of studies ing to p r o b l e m - s o l v i n g . concerning The evidence the r e l a t i o n of r e a d ­ presented ind ic a te t h a t r e a d i n g a b i l i t y a n d a b i l i t y tifically are to s o m e d e g r e e data tests a n d o t h e r tests scientific thinking are u po n r e a d i n g a b i l i t y . Interpretation of i n v o l v e d in substantial degree dependent the o t h e r h a n d , tests d i d n o t s e e m to d e p e n d to to t h i n k s c i e n ­ measuring abilities to a On related. seems scores on a t t i t u d e to a n y m a r k e d e x t e n t on r e a d i n g ability. Relation volved of f a c t u a l i n f o r m a t i o n to the a b i l i t i e s in p r o b l e m - s o l v i n g . According in­ to W o o d a n d B e e r s , t h i n k i n g a n d t h i n k i n g a b i l i t y a r e n o t u n d e r the c o n t r o l of t e a c h i n g e x c e p t as This statement thinking seems k n o w l e d g e of f a c t s to influenced by knowledge. i m p l y t h at g e n e r a l Intelligence and s h o u l d a c c o u n t f o r a l l of in s c o r e s on t h i n k i n g test s. view is the v a r i a b i l i t y T h e e v i d e n c e f o r t h i s p o i n t of is s o m e w h a t c o n t r a d i c t o r y as m a y b e se e n in the follow­ ing d i s c u s s i o n . 152 thinking." 1936. B e n D. W o o d a n d F. S. B e e r s , " K n o w l e d g e v e r s u s Teachers College Record, 37s487-499, March, Bedell ship b e t w e e n planned a study the a b i l i t y Thirty paragraphs to r e c a l l a n d containing facts could infer principles w er e c o n s t r u c t e d ; measure one the a b i l i t y were given. to m e a s u r e to m a k e a d m i n i s t e r e d to 3 2 4 s t u d e n t s schools. Bedell found a b i l i t y to to h i s difficult process Billings, from scores .53 to The average age correlation b e t w e e n problem-solving t e st s e d t h a t t h o s e who 154 155 Bedell, op. Billings, Smith, Q£. know could in d i f f e r e n t ojo. c i t . . pp. 259-272. 88-90. to fields to s o l v e solve ranged .6 7 . T h e a v e r ­ tests and .45. He conclud­ the m a t e r i a l , solve 10-50. c l t . . pp. the Correlations correlations between c l t . . pp. were unrelated. the a b i l i t y Information the m a t e r i a l to infer was a more in v a r i o u s s o l v e d the p r o b l e m s S m i t h 1-^ f o u n d h i g h 153 tests completely in the s a m e f i e l d w a s that n o t a l l w h o k n e w one to r e c a l l a n d same a r e a . on i t em s senior high correlation was scores student of f a c t s , than b e t w e e n a b i l i t y in the infer. to r e c a l l , correlations between in p r o b l e m - s o l v i n g .78. to to of t e s t in s t u d y i n g p r o b l e m - s o l v i n g information relation­ the These junior a n d t h a n the a b i l i t y 154 sets knowledge the a b i l i t y in d i f f e r e n t a r e a s problems a n d between Two t h a t the a b i l i t y findings found higher problems from which Inferences. in the t he a b i l i t y infer were different but not According a r ea s to d e t e r m i n e but the p r o b l e m s . the a b i l i t y to r e a s o n a n d k n o w l e d g e of facts. obt ai n ed was *77. T he r e d u c t i o n T he c o r r e l a t i o n h e in this c o r r e l a t i o n w as slight w h e n IQ w a s h e l d c o n s t a n t b y m e a n s of a p a r t i a l c o r ­ relation. T h e p a r t i a l c o r r e l a t i o n was that the a b i l i t y to r e c a l l see r e l a t i o n s h i p s .65. He concluded i n f o r m a t i o n a n d the a b i l i t y s e e m e d to be to two p r o d u c t s o f the s a m e learning p rocess. Dunning 1^6 r e p o r t e d a c o r r e l a t i o n of J i n t e r p r e t a t l o n of d a t a t e s t for a p h y s i c s f r eshmen a n d a f a c t u a l topics. Since his overlapping between f a c t ua l test, he information c o r r e l a t i o n of the .56 b e t w e e n h i s course for college test c o v e r i n g the .56 i n d i c a t e d a 38 p e r c e n t I n t e r p r e t a t i o n of d a t a c o n c l u d e d t hat k n o w l e d g e m a t i o n was n o g u a r a n t e e of a b i l i t y same to use test a n d the of f a c t u a l the infor­ information in the s o l v i n g of p r o b l e m s . Fleming, course tion 157 as a p a r t o f h i s in b i o l o g y a t the c o l l e g e of .57 b e t w e e n the t es t he to t h in k s c i e n t i f i c a l l y s t u d y on o u t c o m e s level, in a reported a correla­ u s e d to m e a s u r e a n d the t e s t h e u s e d the a b i l i t y to m e a s u r e k n o w l e d g e of facts. Welsman, ity to infer, 156 J 157 158 in a s t u d y of f a c t o r s r e l a t e d to the a b i l ­ r e p o r t e d a c o r r e l a t i o n of Dunning, pp. .63 b e t w e e n s c o re s c l t . . p. 232. Fleming, op. c l t . . pp. 186-187. Weisman, op. c l t . . pp. 104-105- 88 on the Progressive Education Associa ti on Interpretation of Data test and scores made on the Cooperative Biology test. She found, however, that there was little re lationship b e ­ tween scores on the interpretation of data test and gain in knowledge of biology, or between gain in ability to in­ terpret data and knowledge of facts. Read Igq „ found a correlation of .53 between scores on his non-verbal test of the ability to use the scientific method and scores made on the Cooperative 3-eneral Science test. In a course in elementary biology, Tyler found a correlation of .41 between scores on an information test and scores on a test measuring ability to interpret data. He reported a correlation of .46 between scores on the in­ formation test and a test designed to measure the ability to plan experiments to test hypotheses, and a correlation of .35 between knowledge of technical terms and ability to draw inferences. In another study of college students tak­ ing various subjects, h e 1^ 1 found correlations ranging from .20 to .53 between scores on tests of recall and scores on 159 160 Read, ojc. c i t ... pp. 3 6 1 -366. „ Ralph W. Tyler, Measuring the results of college instruction.“ Educational Research B u l l e t i n . 11:253-260, May, 1932. Ralph W. Tyler, in Charles H. Judd, Education as Cultivation of the Higher Mental P r o c e s s e s . New York: The Macmillan Company. 1936. p. l4. 89 tests r e q u i r i n g that t h e r e w a s students to d r a w i n f e r e n c e s little relationship between and concluded these two abilities. Summary of s t u d i e s concerning the r e l a t i o n o f k n o w ­ ledge of f a c t s to p r o b l e m - s o l v i n g a b i l i t i e s . indicates there t ween that the a b i l i t i e s knowledge support is a m o d e r a t e involved of f ac t s . These the c o n c l u s i o n not guarantee an ability seem, f a c ts a r e but evidence correlation b e ­ in p r o b l e m - s o l v i n g a n d findings that a n d to p r o b l e m - s o l v i n g , positive The to u s e the in g e n e r a l , essential that k n o w l e d g e facts the of to thought, facts in t h e to does solution of a p r o b l e m . Summary of r e s e a r c h attempt has been made criptive analysis in th i s o f the chapter to s h o w h o w t he a b i l i t y the d e v e l o p m e n t o f r e s e a r c h on the a b i l i t y to the p r o b l e m . s t e p s of s c i e n t i f i c r e l a t e d to m e a s u r e m e n t of and how related t e st s h a s An the thinking to t h i n k des­ is scientifically, influenced educational to t h i n k s c i e n t i f i c a l l y . Early work in the d e s c r i p t i v e a n a l y s i s was d o n e by p h i l o s o p h e r s a n d dividual systematic scientists, but no s t ep s involved about t w e n t y - f i v e y e a r s a go . tributions in s c i e n t i f i c Since that time to a n u n d e r s t a n d i n g of t he n a t u r e is of s p e c i a l of the thinking was a t t e m p t e d u n t i l thinking have been made by various standing evaluation in­ importance workers. important con­ of s c i e n t i f i c Such an under­ to the m e a s u r e m e n t of ability to t h i n k s c i e n t i f i c a l l y b e c a u s e ments of sc i en ti fi c to be tested, thinking provide and because the steps or e l e ­ specific objectives the steps o f f e r s u g g e s t i o n s of the types of b e h a v i o r s w h i c h a t t e n d or w h i c h r e p r e s e n t scientific thinking. The r e c o g n i t i o n of the a b i l i t y to t h i n k s c i e n t i f ­ ically as a m a j o r o b j e c t i v e of e d u c a t i o n s t i m u l a t e d the c o ns t ru ct io n of tests to a p p r a i s e v a r i o u s a s p e c t s ability. This testing m o v e m e n t , of this w h i l e slow a t first, h as r e s u l t e d in the p r o d u c t i o n of a n u m b e r of tests w h ic h are quite r e l i a b l e a n d w h i c h s e e m to h a v e ity. considerable v a l i d ­ A v a r i e t y of t e c h n i q u e s h a v e b e e n e v o l v e d to m e a s u r e the a b i l i t i e s inv ol ve d in s c i e n t i f i c thinking. M a n y of the techniques a p p e a r to be u s e f u l m e t h o d s of o b t a i n i n g e v i d e n c e of the a b il it ie s. The d e v e l o p m e n t of i n s t r u m e n t s t h i n ki ng has to m e a s u r e s c i e n ti fi c led to studies of the r e l a t i o n s h i p of this a b i l i t y to v a r i o u s o t h er traits s u ch as, r e a d in g ability, intelligence and a n d to the k n o w l e d g e of facts. dence p r e s e n t e d supports the i n f e r en ce The e v i ­ that there is a d ir ec t r e l a t i o n s h i p b e t w e e n the a b i l i t y to t h i n k s c i e n t i f i c ­ ally a n d the a b o v e m e n t i o n e d traits. Ho we v er , most investi­ gators are of the o p i n i o n that the se f actors do n o t a c c o u n t for a l l of the v a r i a b i l i t y measure ability in s c o re s on tests d e s i g n e d to inv ol ve d in p r o b l e m - s o l v i n g . 91 O n e of gators the m o s t stimulating into the n a t u r e o f apparently, objective of this v i ew . he taug ht ; problem-solving particularly instruction. findings The b u l k if it is of the that is a of e v i d e n c e investi it can, specific supports CHAPTER III GENERAL PROCEDURES INVOLVED IN THE D E V E L O P M E N T OF T H E TEST The pur p os e of this chapter Is to de s c r i b e : (1) the m a n n e r in w h i c h the test was de v e l o p e d , (2) the m e t h o d s used in the c o n s t r u c t i o n of the test (3) the n a t u r e of the group s various items, to w h ic h the test was a d m i n i s t e r e d in its stages of d e v e l o p me nt , (4) sta ti s ti ca l a n a l y s i s of the test, the m e t h o d s u s e d in the and (5) the m e t h o d s u s e d in the v a l i d a t i o n of the test. The g e n e r a l p r o c e d u r e s f o l l o w e d in the d e v e l o p m e n t of the test to m e a s u r e the a b i l i t y to t hink s c i e n t i f i c a l l y were s i m i l a r to those u s e d by S m i t h a n d T y l e r ^ in the d e v e l o p m e n t of the tests u s e d in e v a l u a t i n g the r e s u l t s of the E i g h t - Y e a r Study. S e v e r a l st eps in the p r o c es s a n d a d e t a i l e d d e s c r i p t i o n of the p r o c e d u r e w i t h i n each s t e p as m o d i f i e d for its use in the p r e s e n t study a r e g i v e n below. The first four st eps w e r e : . (1) objectives, (2) the d e f i n i t i o n of e a c h of these o b j e c t i v e s in terms of d e s i r e d be ha vi or , situati on s the s e t t i n g u p of the (3) the i d e n t i f i c a t i o n of in w hich s t u d e nt s c ould be e x p e c t e d to d i s p l a y these behaviors, and (4) the w r i t i n g of items to e v a lu at e 1 E u g e n e R. Smith, R a l p h W. T y l e r a n d the E v a l u a t i o n Staff, A p p r a i s i n g a n d R e c o r d i n g S t u d e n t P r o g r e s s . N e w York: H a r p e r a n d B ro thers. 1942. pp. 15-28. the b e h a v i o r s . c on s t r u c t e d , The fifth s te p w a s the t r y o u t o f t h e the a n a l y s i s of t h e s e items, a t i o n of the b e s t it ems The sixth tion a n d a n a l y s i s test. of th i s step was of m e t h o d s the c h a p t e r s w h i c h d e a l w i t h it was step was felt that used in test are t h es e a s p e c t s complete p r e s e n t e d in C h a p t e r d i s c u s s i o n of s t ep s t h es e d i s c u s s i o n s trea tme nt of IV. Chapter V f i v e a n d six. v a l i d a t i o n of the test, Chapter VI the e d u c a t i o n a l o b j e c t i v e s objectives o Columbus, deals with the the t e s t d e ­ to b e m e a s u r e d involved a consideration I n v o l v e d in s c i e n t i f i c thinking as discussed a n d a c o n s i d e r a t i o n of the o b j e c t i v e s e ac h o f second step was into is to t h i n k s c i e n t i f i c a l l y was: The f o r m u l a t i o n of the o b j e c t i v e s implied by t he y s t e p seven. the f o r m u l a t i o n of The with which is d e v o t e d to a d e t a i l e d the a b i l i t y in C h a p t e r II, the p r o b l e m the f i r s t f o u r s t e p s s i g n e d to a p p r a i s e of the e l e m e n t s the c o n s t r u c ­ wou ld be more The f i r s t s t e p in t he c o n s t r u c t i o n o f teaching the reserved for of m e a n i n g f u l w h e n p r e s e n t e d w i t h the m a t e r i a l s A the a b i l i t y the a d m i n i s t r a ­ The seventh tion, a n a l y s i s a n d v a l i d a t i o n of the were used. incorpor­ the test. Detailed discussions because the into a t e s t to m e a s u r e to th ink s c i e n t i f i c a l l y . validation of and items of these elements. the d e f i n i t i o n of e a c h o f t e r m s of d e s i r e d b e h a v i o r . As Tyler these 2 has R a l p h W. Tyler, Constructing; A c h i e v e m e n t T e s t s . Ohio: Ohio State University. 1934. pp. 4-23• 94 stressed, this step is one of the c r u ci al ones of test c o n ­ struction since objectives are usually stated in r a t h e r broad general terms. For example: the abil it y to inte rp re t data is an o f t - s t a t e d objective of science teaching. But what are the specific things that a person does w h e n he terprets data? in ­ What are the kinds of errors m a de by persons who do not consistently a c h i e v e this objective? In o r d e r to determine what these beh a vi or s are a study m u s t be made of the types of reactions made by persons who are c o m p e t e n t in this objective. Sources of these b eh aviors were minor elements (1) the m a j o r a n d involved in scientific thinking, ture on test construction, e sp ecially on tests d e v i s e d to measure various aspects of scientific thinking, tee reports on behaviors (2) l i t e r a ­ (3) c o m m i t ­ involved in scientific thinking, (4) reports of r e s ea rc h on b eh aviors of persons doing s c i e n ­ tific research, and (5) interviews with teachers of science who are at t e m p t i n g to teach scientific thinking. The third step was the identification of situations in which students could be exp ec te d to display the types of behaviors identified in step two. to select materials, students, It was d ee m e d ad visable which would be of some interest to the which dealt with bi o l o g i c a l subject m a t t e r free of technical terms, and wh ich w o u ld be comprehensible to students who h a d h a d no previous experience with b i o l og i ca l subject matter. T ec hnical journals, popular journals a n d 95 textbooks were examined for situations which could be utilized in the construction of test items* The fourth step involved the selection and trial of promising methods of measuring behaviors which would give evidence of the attainment of the objectives. This step included the writing of the items and the organization of tryout tests. It is customary to construct two to five times as many items as used in the final form of the test so that poor items may be eliminated. For this reason a series of nine tryout tests was constructed. Each of these tests was designed to measure a limited number of the behaviors invol­ ved in scientific thinking. The tests were designed so that they could be scored on International Business Machine answer sheets. The five choice answer sheet was selected as the most appropriate for the purpose of this test. The detailed discussion of the con­ struction of the tryout tests and examples of items from each of them will be presented in Chapter IV. A total of 637 items was constructed for the nine try­ out tests. They were given to four members of the depart­ ment of Biological Science at Michigan State College and to one expert in the field of testing in biological science for criticisms and suggestions. The fifth step was the administration of the tryout tests, the determination of the difficulty a n d validity of 96 the items, a n d the s e l e c t i o n of the "best items. The tests were a d m i n i s t e r e d to a g r o u p of 168 s t u d e n t s the t h ir d t e r m of the t h r e e - t e r m s e q u e n c e S c i e n c e at M i c h i g a n S t a t e C o l l e g e d u r i n g 1950. O nl y s tu d e n t s tryout taking of B i o l o g i c a l the s p r i n g t e r m of for w h o m c o m p a r a b l e p s y c h o l o g i c a l a n d r e a d i n g e x a m i n a t i o n scores w e r e a v a i l a b l e w e r e u s e d in this testing. F o r this r e a s o n o nl y s t u d e n t s w h o h a d t a k e n the e x a m i n a t i o n s g i v e n to e n t e r i n g f r e s h m e n b y aminers the B o a r d of E x ­ in the f all of 194-9 we r e a d m i t t e d to the six s e c t i o n s w h i c h h a d b e e n d e s i g n a t e d as e x p e r i m e n t a l s e c t i o n s tests. However, for tryout six of the 168 s t u d e n t s a c t u a l l y e n r o l l e d in these s e c t i o n s w e r e n o t f r e s h m e n w h o h a d e n t e r e d M i c h i g a n State College in the fa l l of 194-9. These students h a d been p r e - r e g i s t e r e d in one of the e x p e r i m e n t a l s e c t i o n s by d e p a r t m e n t of E n g i n e e r i n g . Consequently t r a n s f e r r e d to o t h e r sections. The they c o u l d n o t be scor es of these s t u de nt s on the tests w e r e u s e d in the c a l c u l a t i o n of means, deviations and reliabilities. w e r e a ls o u s e d The papers in the c a l c u l a t i o n of the standard of these s t u d e n t s item d i f f i c u l t i e s a n d ite m v a l i d i t i e s b u t t h e i r scores w e r e n o t u t i l i z e d in the c o m p u t a t i o n of c o r r e l a t i o n s b e t w e e n on i n t e l l i g e n c e tests, test scores a n d scores r e a d i n g tests, Of the 168 s t u d e n t s a n d f a c t u a l tests. to w h o m the t r y o u t tests w e r e g i v e n 83 w e r e m a l e s a n d 8 5 w e r e f emales. this g r o u p a t the b e g i n n i n g of the 17 y ears to 25 years; T h e a ge r a n g e of s p r i n g q u a r t e r was the m e a n a g e was 18.76 years. from 97 The tryout tests were given during each alt er na te laboratory period during the term* was one hour and fifty minutes The laboratory period in length. Students were permitted to work at their own rate of speed on these tests and all students were all ow ed to finish all of the items on all of the tests. Some students finished as many as three of the tryout tests during one period while others only one or two per period. completed The students were instructed to answer all items even if it was necessary to guess. A l l of the tests were scored on the basis of the total n u m b e r of correct answers and no correction for chance was u se d in the scoring. As previously mentioned, 162 of the students c o m p l e t ­ ing the testing program h a d entered Michigan State College in the fall of 194-9. A t that time they had been g i ve n the 194-9 edition of the Amer i ca n Council on Education P s y c h o l o g ­ ical Examination, which purports to measure the linguistic and quantitative factors of intelligence. A composite score, referred to as the total psychological score, is obtained as well as a score on the linguistic portion of the test and a score on the quantitative portion of the test. F or m Y of the American Council on Education Rea di ng Comprehension Test was administered at the same time. This test yields a total reading score, a vocabulary score, a speed of reading com­ prehension score and a level of reading comprehension score. 98 At the completion of the year course in Biological Science a comprehensive examination covering the year's work in biology is given to the students. This examination is prepared by the Board of Examiners of Michigan State College. The score obtained by the student on this examin­ ation determines the mark which he receives for the year's work. entire The comprehensive examination scores were obtained for each of the 168 students to whom the tryout tests were administered. In addition, the comprehensive examination papers of these students were rescored on the basis of items which were purely factual and items involving the ability to think scientifically. The latter items differ from the items of the tryout tests in that they involve a knowledge of biological facts and principles. Of a total of 300 items in the comprehensive examination, 53 were purely factual while 247 required some use of skills involved in scientific thinking. Although the student's mark in biological science is determined entirely by his performance on the comprehensive examination, his progress through the three terms of the course is dependent upon the kind of work he does during the year. The work accomplished is reflected on the term-end examinations which are constructed and directed by a com­ mittee composed of members of the department of Biological Science. The scores made by each of the 168 students on their term-end examinations for the first and second terms 99 of the course were obtained. The means and standard deviations were calculated for each of the tryout tests and for the entire battery of tests considered as a single test. The reliabilities of each of the tryout tests were calculated by correlating the scores on the odd-numbered items with the scores on the even-numb­ ered items. These correlations gave reliabilities of a test half as long as the actual tests. The corrected reliabili­ ties of each of the tryout tests were estimated by means of the Spearman-Brown prophecy formula. The reliability of the test battery was calculated by one of the Kuder-Richardson formulas. 3 The Kuder-Richardson formulas were designed to overcome the disadvantages of test-retest, equivalent forms, 4 and split-half methods. Adkins states that they are super­ ior to other methods of determining the reliabilities of tests. The formula used in this study required only the n u m ­ ber of items of the test, the mean of the test, and the stan­ dard deviation of the test. It is well to note that there are certain assumptions upon which this method rests. These assumptions are (1) that the test measures only one factor, (2) that the intercorrelation of all items are equal, and (3) that the items are equal in difficulty. If the ^ Dorothy 0. Adkins, Construction and Analysis of Achievement Tests. Washington: U. S. Government Printing Office. 1947. P. 153-154. A Loc. oit. assumptions are not met, the v a l u e o b t a i n e d is an u n d e r e s t ­ imate of the reliability. The v a l u e o b t a i n e d re pr esents the minimum reliability of the test for this g r o u p . . Item analysis is the a n a l y z i n g of each item of a test to determine its v alidity a n d difficulty. Item ana ly si s data were obtained for a l l items of a l l of the tryout tests. Item validity may be defined as a m e a s u r e of the item's correlation with a criterion. 5 The purpose of d e t e r m in in g the v a l i d ­ ity of the items is to identify items wh ich d i s c r im in a te well Items difficulty is usually exp re ss e d as the percent of p e r ­ sons answering the item correctly. S ince items a n s w e r e d c or ­ rectly by almost all of the students or by al m o s t none of the students cannot have any functional v a lu e in an achieve me nt test inasmuch as they do not serve to d is criminate b etween students, them. it is generally co n s i d e r e d de sirable to eliminate A d etailed discussion of the meth od s u s ed in the v a l ­ idation of the test items a n d in the c al cu la ti o n of the item difficulties will be p r e s e nt ed in C h a p t e r V. The scores on each tryout test were correlated with the scores on each of the other tryout tests. This was done to determine whether a large degree of overlapping existed between the tests and to determine w hether any tests might be eliminated in the construction of the single test used to measure the ability to think scientifically. 5 I b i d . . p. 180. The scores on 101 each of the tryout tests were also correlated with the quantitative score and the linguistic score on the American Council on Education Psychological Examination and with the total score on the reading test. These correlations were in reality measures of some phase of intelligence or of reading ability. The purpose of administering the tryout tests was to identify good items to be used in the construction of a test to measure the ability to think scientifically. The tryout tests went through two revisions. The first revision resulted in a test, referred to as Test I, consisting of 150 items. This test was too long to be administered in the hour and fifty minute laboratory period, therefore twenty-five of the poorer items were eliminated from it. This final form of the test consisting of 125 items, is hereafter referred to as Test IA. Both Test I and Test IA have been called, The Abil­ ity to Think Scientifically. In the construction of Test I it was necessary, in most cases, to select blocks of items from the tryout tests rather than individual items since items were presented in blocks centering around a particular problem of experiment. The best blocks of items from each of the tryout tests, as determined by item analysis, were selected for inclusion in Test I* Poor items, as identified in the same manner, were eliminated from these blocks of items unless they were necessary to the development of the concept developed within the block of items. A total of 150 102 Items were chosen to comprise Test I. The sixth step involved the administration of Test I, the determination of the mean, standard deviation, and reli­ ability of the test, and the analysis of the individual items. Test IA, the final form of the test, was constructed from Test I by the elimination of 25 of the poorer items. The sixth step also included the administration and statist­ ical analysis of this final form of the test. Te 3t I was given in May, 1950, to 500 students who had completed the three-term sequence of Biological Science. This group has not previously been mentioned in this study. Of this group 291 were males and 209 were females. The age range was from 17 years to 37 years. The mean age was 20.04 years. Two hundred and sixty-four were freshmen who had en­ tered Michigan State College in the fall term of 1949 and who had taken the 1949 edition of the American Council on Educa­ tion Psychological Examination and Form Y of the American Council on Education Reading Comprehension Test at that time.. The remaining students were either freshmen who had taken entrance examinations during the summer of 1949 or they were sophomores, juniors, or seniors. These students had all been given alternate forms of the American Council on Educa­ tion Psychological Examination and the American Council on Education Reading Comprehension Test. Correlations of scores on Test I with scores on psychological examinations and with scores on the reading test were therefore based on 103 the score of these 264 students who had taken the forms of the latter tests given in the fall of 1949. This was done because it could not be assumed that scores on the various forms of these tests were directly comparable, and because raw scores were not available for any of the examinations given prior to the fall of 1949. Prior to this time only percentiles had been available. The mean and the standard deviation were calculated for the group which completed Test I in the spring of 1950. An estimate of the reliability of the test for this group was determined by correlating the scores on the odd-numbered items with the scores on the even-numbered items. These cor­ relations were adjusted for the total test by means of the Spearman-Brown formula. A second method used to determine the reliability of the test was the Kuder-Richardson formula. This calculation was done to compare the reliability obtained by the split-half method with a method which gives a minimum reliability. (This method will be discussed in greater detail in Chapter V). The test papers of the 500 students taking Test I in May, 1950, were used for item analysis. These item analysis data were utilized in the construction of Test IA. In order to determine whether there was a difference in the ability to think scientifically before and after the completion of the course in Biological Science, Test I was administered in September, 1950, to 240 students who had had 104 no biological science at the college level. These students were beginning their first term of the three-term sequence of biological science. This group was also different from any previously mentioned. 86 were females. Of this group 144 were males and The age range was 17 years to 34 years, with a mean age of 19.18 years. The mean and the standard deviation of the test scores were calculated for this group. The reliabilities of the test were determined by the splithalf method and by the Kuder-Richardson formula. As there was no means of predicting the exact length of a test of this nature to fulfill the time requirement of one hour and fifty minutes, the number of items used was purely arbitrary. The actual execution of the test indicated that it was too long for all students to complete. Of the 500 students tak­ ing the examination in May, 1950, 54 or 10.8 percent failed to finish in the allotted time. the test in September, Of the 240 students taking 1950, 24 or 10 percent failed to com­ plete the test. Since the test was too long the poorer items, as determined by item-analysis, were eliminated. The remain­ ing items constituted Test IA. This final form of the test, consisting of 125 items, was administered to 330 students at the beginning of the three-term sequence of biological science in September, 1950. This is a different group from any previously mentioned in this study, and included 182 males and 148 females. The age range was from 16 years to 38 years with a mean of 18.62 105 years. Thirteen, or 3.7 percent, did not complete the test. The mean and standard deviation of the test was calculated for this group. The reliability of the test was estimated for this group by correlating the scores made on the oddnumbered items with those made on the even-numbered items. This correlation was corrected by the Spearman-Brown form­ ula. The minimum reliability for this group was estimated by means of the Kuder-Richardson formula. The seventh step in the construction of the test was its validation. The most important characteristic of a test is its validity which may be defined as the extent to which a test measures what it purports to measure. 7 Chapter VI is devoted to a discussion of this characteristic of the test. The curricular validity of the test was based on the follow­ ing considerations: (1 ) designing the test to measure the specific behaviors which attend the steps of scientific thinking, (2 ) submitting the test to qualified judges for criticism, and (3 ) using free responses of students as foils wherever feasible. The test was validated statistically by correlating total scores made on the battery of tryout tests with such traits as (1 ) intelligence, (2 ) reading ability, and (3 ) ^ Herbert S. Hawkes, E. F. Lindquist and C. R. Mann, The Construction and Use of Achievement Examinations.__ Cambridge, Mass.: Houghton Mifflin Company. 1936. p. 21. ^ Adkins, ojc. c l t . . p. 160. 106 knowledge of biological facts. As previously mentioned, psychological examination scores and reading test scores were available for 264 of the 500 students who took Test I, The Ability to Think Scientifically, in the spring of 1950. These scores were correlated with the scores made by these students on Test I. Another method of validating the test was the compar­ ison of the scores made by students on Test I at the b e g in ­ ning of the course in Biological Science with the scores made by another group after taking three quarters of B i o ­ logical Science. The assumptions underlying this comparison will be discussed in Chapter VI. Test IA was administered to 136 students at the beginning and at the end of the first quarter of the three-term Biological Science sequence. The scores made by these students at these two times were compared. Scores made by a group of 143 students on Test IA were compared with ratings of these students on their abil­ ity to think scientifically. The ratings were made by the instructors who taught these students in Biological Science. The rating sheet and the methods used to obtain scores from these ratings and the statistical treatment of these data will be discussed in detail in the chapter on the validation of the test. CHAPTER IV THE DEVELOPMENT OF THE TEST This steps chapter ITEMS is d e v o t e d to a d i s c u s s i o n o f t h o s e in t he c o n s t r u c t i o n of the t e s t w h i c h p r e c e d e d a n d i n c l u d e d the w r i t i n g of t h e p r e l i m i n a r y u s e d in the t r y o u t tests. These steps u l a t i o n of the e d u c a t i o n a l o b j e c t i v e s items which were i n c l u d e d the f o r m ­ to b e d e f i n i t i o n of t h e b e h a v i o r s w h i c h a t t e n d the i d e n t i f i c a t i o n of s i t u a t i o n s could be e x p e c t e d to d i s p l a y ified in s t e p appraise two a n d II, scientific major elements as to i t em s designed ident­ to measured by As the t e s t was d i s c u s s e d in C h a p t e r I n v o l v e s a n u m b e r of e l e m e n t s . The outlined by Keeslar^ have been reworded and are p r e s e n t e d he r e as the a b i l i t y to be think scientifically. thinking of b e h a v i o r s OF T H E E D U C A T I O N A L O B J E C T I V E S The overall objective to of students identified. THE FORMULATION the a b i l i t y the t y p e s the these objectives, in w h i c h t h e the w r i t i n g the b e h a v i o r s tested, the m a j o r objectives i n v o l v e d in think scientifically. 1. The ability to s ense a p r o b l e m . 2. The ability to st ate a p r o b l e m . ^ O r e o n K e e s l a r , ,fThe e l e m e n t s of s c i e n t i f i c m e t h o d .'1 S c i e n c e E d u c a t i o n . 2 9 5 2 7 5 - 2 7 8 , D e c e m b e r , 194-5* 3. The a b i l i t y to del i mi t a problem. 4. The a b i l i t y to r e c o g n i z e facts w h i c h a r e r e l a t e d to the problem. 5. The a b i l i t y to for mu la te h y p o t h e s e s . 6. The a b i l i t y to plan e x p e r i m e n t s hypotheses• to test 7. The a b i li ty to carry out experiments. 8. The abi li t y to in terpret data. 9. 10. The a b i l i t y to for mu la te g e n e r a l i z a t i o n s b a s e d on data. The a b i l i t y to ap ply g e n e r a l i z a t i o n s to n e w situations. S ome of the a b o v e a b i l i t i e s are creative, critical, w h i l e others aspects of sc ie ntific of a pro bl em involve b o t h c ri t i c a l a n d cre at iv e thinking. F o r example, is a creative activity. f ormulation of h yp ot he se s, hypotheses o t h e r s are the sens in g So also is the a c t u a l but the d e t e c t i n g of illo g ic al is a c ri ti c a l activity. The p l a n n i n g of experi- ments also has bo t h cre at i ve a n d c r i t ic al aspects. As Burke points out, 2 there is o v e r l a p p i n g b e t w e e n cri ti ca l a n d c r e a t ­ ive thinking, a n d the d e c i si on as must be b a s e d on pra gm at ic to w h e re to draw the line considerations. Thus, he in cluded the draw in g of v a l i d inferences from data as critical t h i nk ­ ing since it may be m e a s u r e d by o bj ective tests. The b e ­ haviors w h i c h h av e b e e n c o n s i d e r e d p r i m ar il y c ritical will 2 P a u l J. Burke, M Tes ti ng for cr itical thinking in physics," A m e r i c a n J o u r n a l of P h y s i c s . 1 7 • 527-532, December, 1949. 109 be discussed in detail in a later portion of this chapter. The tests designed in this study have been limited to the appraisal of the critical aspects of scientific thinking because no method for evaluating the creative behaviors was found in the literature, nor did the writer find it possible to devise satisfactory methods for evaluating these creative aspects of thinking. According to Burke 3 critical thinking is an abstrac­ tion and can have concrete meaning only when applied to some subject matter. Therefore, the behaviors which con­ stitute the elements of critical thinking must be thought of in relation to some specific field; in this instance, the field was biology. THE DEFINITION OF THE BEHAVIORS Methods used to determine the behaviors. In order to determine the kinds of behaviors attending the steps in the scientific method several approaches were used. The lists of steps in scientific thinking as preA sented by Keeslar c and as presented in the 46th Yearbook, both of which were reviewed in Chapter II, offered a source 3 ^ Burke, loc. cit. ^ Keeslar, ojo. c i t . . pp. 273-278. ^ Science Education in American Schools. Forty-Sixth Yearbook of the Society for the Study of Education, Part I. pp. 145-147. Chicago: University of Chicago Press, 1947. 110 for the definition of many of the behaviors involved in scientific thinking. The major steps constituted the prim­ ary objectives while the minor steps, in many cases, implied specific behaviors which could be measured. A second source of behaviors was literature on tests and test construction, committee reports on behaviors invol­ ved in scientific thinking, and reports of research on b e ­ haviors of persons doing scientific research. In' his book on the construction of achievement tests, Tyler discussed tests to measure the ability to use the scientific method and the ability to infer. In these sec­ tions he described some of the behaviors involved. This was a rather early piece of work in the area of definitions of behaviors and was included here more for its historic inter­ est than for its value as a source of behaviors. Hawkes, Lindquist and Mann 7 , in a chapter on examin­ ations in the natural sciences, discussed some of the behav­ iors which give evidence of the student’s ability to use reliable sources of information, to recognize unsolved prob­ lems, to draw reasonable generalization from data, and to plan experiments. A very useful source of behaviors involved in ^ Ralph W. Tyler, Constructing Achievement T ests. Columbus, Ohio: Ohio State U n i v e r s i t y . 1 9 3 4 . pp. 24-30. 7 Herbert E. Hawkes, E. F. Lindquist and C. R. Mann, The Construction and Use of Achievement Examinations. Cambridge, Mass.: Houghton Mifflin Company. 1936. PP. 2 31-247. Q scientific thi nk in g was M S c l e n c e in G e n e r a l E d u c a t i o n .'1 portion of one c h a p t e r of this b o o k A is d e v o t e d to a dis cu ss ion of the n a t u r e of r e f l e c t i v e thinking. A n o ther chapter is devo te d a l m o s t e n t i r e l y to the e v a l u a t i o n of s tudents growth in r e f l e c t i v e thinking. Situations are described which show the k i nds of b e h a v i o r s e x p e c t e d of s t u d e n t s who are p r o f i ci en t in the a b i l i t y to t h in k r e f l e c t i v e l y . objectives a n a l y z e d are: define problems, ( 1 ) the a b i l i t y to d i s c o v e r a n d (2 ) the a b i l i t y to o b s e r v e a c c u r a t e l y ,( 3 ) the a b i li ty to s e l ec t facts r e l e v a n t to a problem, ability to c o l l ec t a n d o r g a n i z e facts, draw inferences proof, The from facts, and (7 ) the a b i l i t y (A) the (3 ) the a b i l i t y to (6 ) the a b i l i t y to r e c o g n i z e to plan e x p e r i m e n t s to test hypotheses. In the r e p o r t on the m e t h od s of e v a l u a t i n g s t u de nt progress in the E i g h t - Y e a r Study, in det a il the b e h a v i o r s S m i t h a n d T y l e r ^ d i s c us s i n v o l v e d in the s t u d e n t s a b i l i t y to interpret data a n d in some detail the b e h a v i o r s an u n d e r s t a n d i n g of the n a t u r e of proof. vol ve d in the a b i l i t y i n v o l v e d in The b e h a v i o r s in­ to interpret data were d e r i v e d from discussions of the c o m m i t t e e on the i n t e r p r e t a t i o n of data. ® P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n , S c i e nc e in General E d u c a t i o n . N e w York: D. A p p l e t o n - C e n t u r y Company. 193 8 . pp. 393-412. ^ E u g e n e R. Smith, R a l p h W. T y l e r a n d the E v a l u a t i o n Staff, A p p r a i s i n g a n d R e c o r d i n g S t u d e n t P r o g r e s s . N e w York: H a rp er a n d Br others. 1942. p p . 38-41, 126-130• The committee was comprised of a representative from each school interested in this objective, and the members of the Evaluation Staff of the Eight-Year Study. committee was quite exhaustive. The work of this M o s t of the behaviors listed under interpretation of data in the list of behaviors presented in this thesis are either mentioned or implied in Smith and Tyler's discussion of behaviors involved in their discussion of the interpretation of data and their discuss­ ion on the nature of proof. Joh ns on ,1^ in a discussion of h er test of straight thinking, presents the kinds of behaviors which her test purported to measure. The major abilities discussed are: (1) the ability to analyze a problem, interpret data, (2) the ability to (3) the ability to evaluate arguments, (4) the ability to test hypotheses through reasoning, and (5) the ability to recognize valid causal relationships. The Committee on Research in Secondary School Science focused its attention on the development of problem-solving as the area in which research was needed. The members of this committee considered problem-solving to be a general type of human behavior which included specific, inter- ^ Alma Johnson, "An experimental study in analysis and measurement of reflective thinking." Speech Mo n o g r a p h s . 10:83-96, (Annual) 1943. Committee on Research in Secondary School Science, "Problem-solving as an objective of science teaching." Science E d u c a t i o n . 33s192-195, April, 1949. 113 related behaviors. They a n a l y z e d these b eh aviors in the following areas:; 1. Behaviors co nc er ne d with the i de nt if ic a ti on of problems• 2. Behaviors r e l at ed to the e st ablishment of facts about the problem. 3. Behaviors r e l a t e d to the formulation of hypotheses. 4. Behaviors rela te d to the testing of hypotheses. 5. Behaviors concerned with the results of testing hypotheses. The behaviors listed by this committee were i n c o r p o r ­ ated into the list of behaviors pre se nt ed in the present s t ud y . Burke, 12 in discuss i ng the d evelopment of test items to test the ability to think scientifically, says that before any test of critical thinking could be constructed, or before any orderly attempt could b e made to teach the scientific method, the concept m us t be made more precise than it has been previously. H e presents an operational de fi nition c o n ­ sisting of a set of a b ou t 30 behaviors. as a tentative definition. He offers the list M o s t of the behaviors in his list have been incorporated in the outline of behaviors presented in this chapter. A study sponsored by the A m e r i c a n Institution of 12 Burke, op. c i t . . pp. 27-32. 114 Research an d the A m e r ic an Council on E d u c at io n a nd supervised by F l a n a g a n , ^ was made of the activity of r esearch workers on the job, to identify a n d define the characteristics of effective scientific personnel, in terms of specific o b s e r v a ­ tions and records of the work behavior of these personnel. The m e th od used to obtain these behaviors was n o t the opinions or beliefs of supervisors of research, but rather the actual experiences, in the form of reports of b e h a v i o r which led to success or failure of individuals on var i ou s parts of their jobs. Reports of what actually h a p p e n e d were turned in to the committee. Ab out 500 research workers were contacted, who were a s ke d to describe critical incidents in which a person h a d been effective or Ineffective in research techniques. Upon the completion of the interviews the b e h a v ­ iors described were classified into groups of similar beh av io rs . On the basis of the classification of the behaviors a comprehensive check list was prepared for the evaluation of research workers. Each area was divided into sub-areas. In addition to the check list which Included descriptions of effective and ineffective behavior in each of the areas, definitions of the areas were written to provide a general description of the content of the area. 13 John 0. Flanagan, Critical Requirements for Research P er s on ne l. Pittsburg: Am er ic an Institute for Research. 1949. PP. 24-39. 115 Area I was the formulation of hypotheses and p rob­ lems. This area was defined as stressing creative b e h a v ­ ior, and included the sensing and exploring of new problem areas, delimiting problems and the proposing of hypotheses to fit the available facts. Within this area 21 effective and eleven ineffective types of behaviors were described. These made up the items of the check list. Area II dealt with the planning and designing of an investigation; Area III was concerned with the conducting of the investigation and Area IV was the interpretation of research results. Areas V, VI, VII, and VIII were not related to scientific thinking but dealt with preparing reports, administration of research, organizational respon­ sibility and personal responsibility and were not related to the present investigation. Although this work was outstanding in its thorough­ ness and although over 100 behaviors relating to research ability were presented, most of them have not been Incorp­ orated, into the outline presented in this chapter because many were creative activities, and many others were manipu­ lative activities. The critical activities, however, were Incorporated into the outline of behaviors which will be presented later in this chapter. A third source used in the Identification of b e h av ­ iors involved in scientific thinking was the interviewing 1 16 of some of the members of the department of Biological Science at Michigan State College. These persons were asked to describe the behaviors they had observed in stud­ ents whom they believed to show considerable ability to think scientifically, and the kinds of behaviors they had observed in students who seemed to them to be very inferior in their ability to think scientifically. The major ab i l ­ ities mentioned in these interviews were the ability to devise and evaluate experiments, and the ability to inter­ pret data. Specific behaviors were described. (These will be discussed in greater detail in Chapter VI where a des­ cription of the ratings sheet devised for the validation of the test will be discussed.) The final source used in the definition of behaviors was the experience of the writer as an instructor in the course of Biological Science at Michigan State College and her experience as a member of the committee responsible for the construction of departmental examinations. An Outline of the Be ha vi or s. Below is an analysis, in outline form, of the types of behaviors involved in scientific thinking which it was believed could be measured by objective tests. inclusive list. It is not assumed that this is an all- It is, however, a synthesis of the b ehav­ iors Identified from the above mentioned sources. 117 1.00 Ability to recognize problems. 1.10 1.20 1.30 1.40 1.50 2.00 Ability to delimit a problem. 2.10 2.20 2.30 2.40 2.50 2.60 2.70 3.00 Ability to distinguish between major and minor problems. Ability to Isolate the single major problem or single major idea in a problem. Ability to see the relationship of minor problems to the major problems. Ability to distinguish between relevant and irrelevant problems. Ability to analyze the problem into its essential parts. Ability to concentrate on the main problem. Ability to recognize the basic assumptions of a problem. Ability to recognize and accumulate facts related to the solution of a problem. 3.10 3.20 3.30 3.40 3.50 4.00 Ability to recognize a problem or a perplexity in the context of a paragraph or an article. Ability to distinguish between a fact (observation) and a perplexity or problem. Ability to recognize a problem even when it is stated in expository form rather than in interogatory form. Ability to distinguish a problem from a possible solution to a problem (hypothesis) even when the hypothesis is presented in interogatory form. Ability to avoid becoming diverted from the major problem into side Issues. Ability to select the kind of information needed to solve the problem. Ability to recognize valid evidence. Ability to differentiate between reliable and unreliable sources of information. Ability to select data pertinent to the solution of the problem. Ability to recognize the difference between data pertinent to the solution of the problem and that which is unrelated. Ability to recognize an hypothesis. 118 4.10 Ability to distinguish an hypothesis from a problem. Ability to differentiate between a statement that describes an observation and a statement which is an hypothesis about the fact. Ability to distinguish between an hypothesis as a possible, solution to a problem and a conclusion (probable solution to a problem). Ability to recognize the tentativeness of an hypothesis. 4.20 4.30 4.40 5.00 Ability to plan experiments to test hypotheses. 5.10 Ability to select the most reasonable hypothesis to test. 5.20 Ability to differentiate between an uncontrolled observation and an experiment involving controls. 5.30 Ability to recognize the fact that only one factor in an experiment should be variable. 5.31 Ability to recognize what factors must be controlled. 5.32 Ability to recognize the overall control. 5.33 Ability to recognize the partial controls. 5.34 Ability to recognize the variable factor. 5.35 Ability to understand why the overall control was included in an experiment. 5.36 Ability to recognize the factor being held constant in the overall control. 5.37 Ability to recognize the factors being held constant in the partial, controls. 5.40 Ability to recognize experimental and technical problems inherent in the experiment. 5.50 Ability to criticize faulty experiments when: 5.51 The experimental design was such that it could not yield an answer to the problem. 5.52 The experiment was not designed to test the specific hypothesis stated. 5.53 The method of collecting the data was unreliable. 5.54 The data were not accurate. 5.55 The data were insufficient in number. 5.56 Proper controls were not included. 5.57 No controls were included. 5.00 Ability to carry out experiments. 6.10 Ability to recognize existence of errors in measurement. 119 6.20 Ability to recognize when the precision of measure ment given is warranted by the nature of the problem. 6.30 Ability to make accurate observations. 6.31 Ability to observe differences in situations which are similar. 6.32 Ability to observe similarities in situations which are different. 6.40 Ability to organize facts into table, graphs, etc. for easy Interpretation. 7.00 7.10 Ability to interpret data. Ability to handle certain basic skills necessary to the interpretation of data. 7.11 Ability to read tables and graphs. 7.12 Ability to perform simple computations. 7.20 Ability to evaluate relevancy of data. 7.21 Ability to recognize hypothesis and conclusions contradicted by the data. 7.22 Ability to recognize hypotheses and conclusions which are unrelated to the data. 7.23 Ability to select the hypothesis from a group of hypotheses which most adequately explains the data. 7.24 Ability to recognize facts which support an hypothesis or a conclusion. 7.25 Ability to recognize facts which contradict an hypothesis or a conclusion. 7.30 Ability to differentiate between facts and Inferences. 7.31 Ability to differentiate between an observation and a conclusion drawn from the observation. 7.32 Ability to differentiate a conclusion from an hypothesis. 7.33 Ability to distinguish an assumption upon which a conclusion depends and the conclusion itself. 7.34 Ability to distinguish a fact from an assumption. 7.40 Ability to recognize the limitations of data. 7.41 Ability to differentiate between what is established by the data alone and what is implied by the data. 7.42 Ability to recognize that a statement which goes beyond the data cannot be absolutely true. 7.43 Ability torecognize that generalizations from results of an experiment can only be extended to new situations when there is considerable similarity between the situations. 7.44 Ability to confine definite conclusions to the evidence at hand. 120 7.50 Ability to consider as possibly true or probably true inferences b a s ed on the data. 7.51 Ability to make inference on the basis of trends. 7.52 Ability to extrapolate. 7.53 A bility to interpolate. 7.54 Ability not to be so overcautious that all s t a t e ­ ments which go beyond the data are re jected because of insufficient evidence. 7.60 Ability to perceive relationships in data. 7.61 Ability to make comparisons. 7.62 Ability to see element in common to several items of data. 7.63 Ability to recognize prevailing tendencies and trends in data. 7.64 Ability to recognize that when two things vary together that there may be a relationship between them, but does not assign cause and effect Judgments on the basis of this r e l a t i o n ­ ship. 7.65 Ability to formulate reasonable generalizations b ased upon the data. 7.70 Ability to recognize the nature of evidence. 7 .71 Ability to recognize the difference between direct a n d Indirect evidence. 7.72 Ability to recognize a statement which is given as evidence as not being evidence when the statement contradicts the conclusion. 7.73 Ability to recognize a statement which is given as evidences as not being evidence when the statement is unrelated to the conclusion. 7 .74 Ability to recognize evidence for an inference and to choose such evidence from a series of statements. 7.75 Ability to recognize the validity of the evidence used to support conclusions. 7.80 Ability to recognize the assumptions involved in the formulation of hypotheses and conclusions. 7.81 Ability to recognize assumptions which go bey on d the data but which are essential to the f o r mu ­ lation of an hypothesis. 7.82 A bility to recognize assumptions which must be maintained in the drawing of a conclusion. 7.83 Ability to recognize assumptions which can be checked experimentally. 7.84 Ability to recognize invalid assumptions. 8.00 Ability to apply generalizations to new situations. 12 1 8.10 Ability to refrain from applying generalizations to new situations when the new situation does not closely parallel the experimental situation. Ability to be aware of the tentativeness of pre­ dictions about new situations even when there is a close parallel between the two situations. Ability to recognize the assumptions which must be made in applying a generalization to a new situation. 8.20 8.30 THE LOCATION OF THE SOURCE MATERIALS FROM WHICH THE ITEMS COULD BE CONSTRUCTED The third step in the development of the test was the identification of situations In which the student could be expected to display the types of behaviors implied in the steps of scientific thinking. Each major objective was considered and situations were considered which might be utilized in the construction of items to test the abilities involved in these objectives. There were certain requirements which should be met in the selection of the material. It was considered reason­ able that in all cases the material should be (1 ) of some interest to the student, (2 ) free from technical terms, (3 ) comprehensible to the student who had had no training in biology, (4) on biological subjects, and (5) obtained from valid sources. It was thought that the abilities involved in the recognition of a problem, an hypothesis, a fact, and a con­ clusion could be discovered by having a student actually locate them in his reading. In the development of an 122 objective test it seemed that one way in which these behav­ iors could be measured was by the presentation of short essays or paragraphs which contained problems, etc. and having the student identify them. With this in mind, popu­ lar and scientific journals were Inspected for descriptions of experiments or observations which contained problems, hypotheses, experiments, observations and conclusions. These were judged by the following criteria: 1. They should be of such a nature that they could be condensed into a paragraph or two. 2. They should each contain a problem or problems, hypotheses, observations and experiments, and a conclusion. It was tentatively assumed that a student's ability to delimit a problem might be measured by giving him a com­ prehensive problem so stated that it could not be solved unless it were broken down .into a series of minor problems. Such problems were located in textbooks, research journals, and by interview of members of the Department of Biological Science of Michigan State College. The criteria used in the selection of the problems were: 1. Unsolved problems were chosen so that the student could not know the solution to the problem. 2. The problems should be broad major problems. In order to measure a student's ability to plan exper iments it was necessary to locate problems and hypotheses already under investigation or those which might be investi­ gated, thus limiting the possibility of the student having had experience with the problem. Some of these were found in research Journals and some were obtained by interviews with staff members of the Department of Biological Science at Michigan State College. The criteria by which they were Judged were: 1. They should be of such a nature that no technical apparatus would be needed to design an experiment. 2. They should be within the experience of the student; that is, the general problem should deal with situations which could reasonably be assumed to be familiar to him. In order to test a student's understanding of experi­ mental design actual experiments were located in which the student could identify controls, partial controls, etc. These experiments were located in scientific Journals. It was assumed that the experiments should be: 1. Entirely new to the students. 2. On a subject with which the student was familiar. These assumptions were met by choosing experiments from technical Journals which the average student would not have read, and by choosing experiments which were about rather common subjects, such as food, plants, etc. It was thought that the ability to organize data could be tested by giving students raw data to graph. search of material. data were: A textbooks and journals produced this type of The criteria used to judge the usability of the 124 1. The data must be in units familiar to the student. 2. The data must be such that only few points would be needed to plot a curve so that a number of curves could be plotted in a mi n i ­ mum of t i m e . Scientific journals and advanced textbooks were exam­ ined for data which the student could Interpret. It was assumed that these data should be entirely new to the stud­ ent. THS CONSTRUCTION OF THE EVALUATION INSTRUMENTS The fourth step in the development of the test was the selection of promising techniques, and the inventing of new techniques to obtain evidence concerning the attainment of the objectives. Previous tests designed to test certain phases of scientific thinking were examined. No tests for biology were found which measured all of the objectives listed. There were only a few which measured any of the ob­ jectives. New techniques for appraising the desired behav­ iors were devised, paragraphs from sources were rewritten, students were presented with some of the materials identified in step three for free responses which were culled and class­ ified. On the basis of this work nine tryout tests were de­ vised. The following discussion gives in more detail the method used in the construction of each of the Instruments and the objectives and types of behavior which each was intended to evaluate. 125 In the development of the test Items certain require­ ments regarding mechanics were set up. The first requirement was that the test be easily scored. A five-response machine scored answer sheet was chosen as the most appropriate for the purposes of this test. test form. A second consideration was the A five-choice key was selected as the most suit­ able form inasmuch as a single key for each test would enable the student to answer a rather large number of items in a fairly short time, thus increasing the reliability of the test. He would become ac q u a i n t e d with the key and thus r e ­ duce the reading time of the test. Sach tryout test had a separate key. After the test items h a d b een constructed they were given to five experts for keying, criticism and suggestion. The items were revised on the basis of these Judgments, and assembled into tryout tests. (See Ap pendix I.) The first tryout test, her ea ft er referred to as Test A, was designed to evaluate the student's ability to recog­ nize problems, hypotheses, clusions. experimental conditions and con­ Five paragraphs were written, each on a different subject and each based on short articles from popular maga­ zines. Certain parts of the paragraphs were underlined; these underlined portions, the item number, preceded by a number indicating constituted the 7 4- items of the test. The directions given to the student, the key for the test and a oortion of one of the paragraphs follows: 126 TEST A SOME STEPS IN SCIENTIFIC THINKING This test is designed to measure your ability to differentiate phases of thinking. These steps include major problems or perplexities, possible solutions to problems, observations which are not results of experi­ mentation but rather preliminary observations, results of experimentation, and conclusions. Certain parts of the paragraph are underlined, and each underlined item is a question. Choose the proper re­ sponse from the key and blacken the appropriate space on the answer sheet. Key 1. 2. 3. 4. 5. A major problem (either stated or implied). Hypothesis (possible solution to problem). Results of experimentation. Observations (not experimental). Conclusion (probable solution to problem). Ever since the days of Hippocrates one of medicine's big mysteries has been (1 ) the bodily process that transforms disease into death. With a special type of equipment which makes blood vessels transparent and three dimensional under a microscope, one investigator began examining the blood of healthy animals. The (2) blood cells of the healthy animals are separate and move rapidly. One day while observing the blood of a monkey dying of malaria, this researcher saw that the (3 ) blood was flowing slowly. Test B . designed to test the student's ability to delimit problems, was constructed from free responses of students. For example, several facts about colds were given to the students. They were asked to read the paragraph and 127 3tate briefly the problem or problems presented. problem was: What causes colds? The major In constructing the test this problem was followed by other problems which the stud­ ents had suggested. Four major problems were presented; each of which was followed by a series of questions.. was a total of 67 such questions in this tryout test. There A portion of Test B follows: TEST B THE DELIMITATION OF PROBLEMS This portion of the test is designed to test your ability to delimit a problem. A problem is presented. This is followed by a series of questions. Rate the questions according to the following key. 1. 2. 3. 4. 5. PROBLEM: Key This question must be answered in order to solve the problem. This question if answered mi^ht be useful in the solution of the problem. The answer to this, question, though related to the problem, would not help in the solu­ tion to the problem. This question is completely unrelated to the problem. This question if answered in the affirmative is a basic assumption of the problem. What causes colds? QUESTIONS: 1. Do all people have colds? Test G was designed to measure the student’s under­ standing of the experimental method. This test was also constructed on the basis of free responses from students. They were presented with a problem and hypotheses and were 128 instructed to design an experiment to test each hypothesis presented. For example: Problem: requirements of sprouting seeds? What are some of the Hypothesis: Oxygen is a requirement of sprouting seeds. The papers were cut so that the experiments designed to test a single hypothesis could be sorted and these were placed in piles according to the key which was used in Test C. Some of the responses were satisfactory experiments, were faulty for one reason or another, others some were faulty for several reasons. Those which were faulty in more than one way were discarded. Ten or eleven responses for each problem were chosen as the test items. Six series of experiments with a total of 62 items constituted Test C, a portion of which is presented here: TEST C EXPERIMENTAL PROCEDURES This test is designed to measure your ability to recognize faulty experimental procedures and to test your ability to select the best of a series of experiments. In each case a problem and a possible solution to the problem (an hypothesis) are presented. In each case the experi­ ments were designed by students to test the hypotheses. Judge each experiment according to the following key. Ml 1. 2. 3. 4. This experiment is satisfactory. This experiment is unsatisfactory because it lacks a control or comparison. This experiment is unsatisfactory because the control or comparison is faulty. This experiment is unsatisfactory because it is unrelated to the hypothesis. 129 5. None of the above - the experiment or situation is unsatisfactory for reasons other than those listed in 2, 3, and A. PROBLEM: What are some of the requirements for the sprouting of seeds? HYPOTHESIS: Oxygen is a requirement for the sprouting of seeds 1. Plant one seed in a container where oxygen is av ail­ able and place another seed in a container where all oxygen has been removed. Keep all other conditions the s a m e . Test D . designed to measure the student's ability to organize data, contained twenty items similar to the one illustrated here: TEST D ORGANIZATION OF DATA This test is designed to test your ability to organize data. Select from the key below the curve which best fits the data. If none of the curves fit the data mark space five on your answer sheet. Key 5. none of the curves. 1. The horizontal axis represents temperature. The vertical axis represents the amount of Substance A derived from Substance B. Temperature 10°C. 25°C. 35°C. 60°C. Amount of Substance A A 7 9 1A grams grams grams grams 130 Teat E is similar to one described by Engelhart and 14 Lewis. It was designed to measure the student's under­ standing of the relation of facts to the solution of a prob­ lem. All of the 74 items of this test were related to the overall problem: What factors are involved in the trans­ mission and development of Infantile Paralysis (Poliomyeli­ tis)? Six hypotheses were presented. Each hypothesis was followed by a series of facts which constituted the items. The data for the test were obtained from articles on infan­ tile paralysis in research journals and medical journals. A portion of Test E follows: TEST E EVALUATION OF HYPOTHESES This test Is designed to measure your understanding of the relation of facts to the solution of a problem. The overall problem involved in this test is presented. This is followed by a series of possible solutions to the problem (hypotheses). After each hypothesis there are a number of items, all of which are true statements of fact. Determine how the statement is related to the hypothesis and mark each statement according to the key which follows the hypothesis. GENERAL PROBLEM: What factors are Involved in the trans­ mission and development of Infantile Paralysis (Poliomyelitis)? HYPOTHESIS I: In man the disease is contracted by direct contact with persons having the disease. 14 Max D. Engelhart and Hugh B. Lewis, ’*An attempt to measure scientific thinking." Educational and Psycholog­ ical Measurement. 1:289-294, Third quarter, 1941. 131 Mx For items 1 through 11 mark space if the item offers: 1. Direct evidence in support of the hypothesis. "d. Indirect evidence in support of the hypothesis. 3. Evidence which has no bearing on the hypothesis. 4. Indirect evidence against the hypothesis. 3. Direct evidence against the hypothesis. 1. Monkeys free from the disease almost never catch in­ fantile paralysis from infected monkeys. 2. Most strains of infantile paralysis virus can be transferred from man only to monkeys and apes and not to other animals. 12. What is the status of hypothesis I? 1. It is true. 2. It is probably true. 3. It is false. 4. It is probably false. 5. The data are contradictory, hence its truth or falsity cannot be Judged. Test F was designed to measure the student's ability to interpret data and to test his understanding of experi­ mentation. The directions for this tryout test and a por­ tion of the test are given below: TEST F E X P E R I M E N T A T I O N A N D THE I N T E R P R E T A T I O N OF DATA This test was designed to measure your ability to in­ terpret data and to test your understanding of experimenta­ tion. In each case the numbers in the first column are the numbers which you will use as your answer. Thus the table presented becomes both the source of data and your key for the questions which follow it. In each case where a test tube number or group number is called for the one which gives positive evidence for the statement should be given. Below this the control or comparison is called for. This is the test tube or group number of the data which offers a compari­ son. For example: 132 1. 2. L e a f in d a r k L e a f in l i g h t - n o starch, - starch. L i g h t is n e c e s s a r y f or the p r o d u c t i o n of starch. Y o u w o u l d m a r k s p a c e 2 b e c a u s e this is the p o s i t i v e e v i ­ dence, b u t it w o u l d b e m e a n i n g l e s s if it w e r e n o t c o m ­ pared w i t h the le a f in the dark. T h e r e f o r e , the f o l l o w ­ ing item, HW h a t is the c o n t r o l (c o m p a r i s o n ) for i t e m I ? ’1 , w o uld be m a r k e d s p a c e 1. Items 1 t h r o u g h 15 r e f e r to the d a t a p r e s e n t e d below. S o m e t e s t t ubes w e r e set u p a n d ea c h c o n t a i n e d 1 g r a m of fat. T h e y w e r e m a r k e d 1, 2, 3, 4, a n d 5. Mark each item a c c o r d i n g to t h e t e s t tu b e n u m b e r c a l l e d for,. V a r i o u s s u b s t a n c e s w e r e a d d e d to the tubes c o n t a i n i n g fat. A l l s u b s t a n c e s w e r e d i s s o l v e d in w a t e r b e f o r e they w e r e a d d e d to the fat. A l l t e s t tubes w e r e k e p t a t 85 F. (Water b o i l s at 2 1 2 ° F.) F o r t est tu b e 5, S u b s t a n c e A was b o i l e d a n d t h e n a l l o w e d to c o o l b e f o r e it was a d d e d to the fat. Test T ub e Number 1 2 3 4 3 Content of tube F a t plus S u b s t a n c e F a t plus S u b s t a n c e p lus S u b s t a n c e F a t plus W a t e r F a t plus S u b s t a n c e F a t plus S u b s t a n c e (boiled) Amt. of S u b s t a n c e B p r e s e n t a f t e r 24 h o u r s A A C C A .1 g r a m .5 g r a m .0 g r a m .0 g r a m .0 g r a m 1. 3-ive the n u m b e r of the test t ube w h i c h acts as a c o n t r o l (c o m p a r i s o n ) for the e n t i r e ex pe ri me nt . 2. 3-ive the n u m b e r of the tube w h i c h g i v e s e v i d e n c e that f a t does n o t b r e a k d o w n s p o n t a n e o u s l y into S u b s t a n c e B in 2 4 h o ur s . 3. 3 i v e the n u m b e r of the tube u s e d to s h ow that a t e m p e r a t u r e of 8 5 ° F. was n o t s u f f i c i e n t to c a use fat to b e b r o k e n d o w n into S u b s t a n c e B. 4. 3 i v e the te s t tube n u m b e r of the tube w h i ch g i v e s e v i d e n c e t ha t S u b s t a n c e A is the a c t i v e s u b s t a n c e in the b r e a k d o w n of fat to S u b s t a n c e B. 5. 3 i v e the t est tu b e n u m b e r of the tube w h i ch c o n t r o l ( c o m p a ri so n) for i t em § 4. is the M 133 Five such series of items were included in Test F. The total number of items was 72. Test G is somewhat like the test described by IS Teichman ^ which was constructed to evaluate conclusions in terms of reasonableness, sufficiency and pertinent data. This test was constructed from free responses of students. A problem was presented. This was followed by data. example: A student was interested in developing a test for a certain substance. tive. For In all 100 cases his test was posi­ The students were requested to state a conclusion. In some instances, as in the above, there was no control included so no conclusion was really possible. Some of the students realized thisj others wrote conclusions. The answers were sorted into stacks according to the key used for Test 3-. . The most appropriate responses were chosen as the 100 items for the test. TEST 0 DRAWING OF CONCLUSIONS This test was designed to measure your ability to make conclusions. When facts are analyzed and studied they sometimes yield evidence which help in the solution of a problem. However, any conclusion must be checked before it can be accepted. The following key includes four ways in which conclusions may be faulty. Each of 15 Louis Teichman, ’’The ability of science students to make conclusions.” Science Education. 28: 268-279, December, 194-4. 134 the items present a question or problem, a brief descrip­ tion of an experiment and one or more conclusions drawn from the experiment. Each experiment was repeated many times. Read each problem, experiment and the conclusions. Where several conclusions are given evaluate each conclu­ sion separately. Is the conclusion tentatively justified by the data? If so, mark space 1 on your answer sheet. If the conclusion is not Justified determine whether 2, 3, 4, or 5 in the key is the best reason for it being faulty and mark the proper space on your answer sheet. Key The conclusion is: 1. Tentatively Justified. 2. Unjustified because it does not answer the pro bl e m.3. Unjustified because the experiment lacks a control (comparison). 4. Unjustified because the data are faulty or inadequate, though a control was included. 5. Unjustified because it is contradicted by the d a t a . PROBLEM: A student was interested in developing a test for a certain type of substance. In all 100 cases his test was positive. 1. He concluded that the test was a specific test for the substance. The final tryout test was in reality two tests, Test H and Test J . combined into one. contained 168 items. Test H was devised to measure the student's ability to interpret data. to the students. In all, these tests Data were presented These were followed by a series of items which were possible interpretations, restatements, explana­ tions, extensions, and comparisons of the data. constituted Test J. These items 135 TEST H I NTERPRETATION OF DATA This test was d e s i g n e d to m e a s u r e y ou r a b i li ty to interpret data. F o l l o w i n g the d ata y o u will find a number of statements. Y o u are to a s s u m e that the data as presented are true. E v a l ua te e ach s t a te me nt a c c o r d ­ ing to the f ol lowing key a n d m ark the a p p r o p r i a t e space on your answer sheet. Ml 1. 2. 3. 4. 5. True: The data alone a r e s uf f i c i e n t to show that the sta te me nt is true. Probably true: The data indicate that the sta te m en t is p r o b a b l y true, that it is logical on the basis of the data b ut the data are not su f f i c i e n t to say that it is d e f i n i t e l y true. In su ff ic i en t evidence: There are no data to indicate w h e t h e r there is any degree of truth or falsity in the statement. Pr obably false: The data indicate that the s t a t em en t is probably false, that is, it is n o t logical on the basis of the data but the data are n o t s uf ficient to say that it is d e f i n i t e l y false. False: The data alone are suf fi ci en t to show that the st atement is false. In freezing of ve ge ta bl es the common practice for both commercial a nd h om e frozen v e g e t a b l e s is to sc ald the vegetables first by placing them in b o i l i n g water for two to three minutes. The following data were ob tained in an experiment w hich m e a s u r e d the amou nt s of V i t a m i n G in fresh vegetables, scalded v e g e ta bl es b e f o r e freezing, a nd v e g e ­ tables frozen for six months. One g r ou p of the frozen vegetables was frozen with ou t first scalding, the other group was first scalded. The V i t a m i n G content of the frozen v e g e t a b l e s was d e t e r mi n ed b e fo r e a n d af ter they were cooked. A l l figures indicate the a m o u n t of V i t a m i n G in mg. per 100 cc. Veg et ab le Fresh Chard (greens) 60 Spinach 82 Peas 29 Green be ans 34 Lima beans .21 Scalded 37 43 21 29 20 Frozen Unscalded S c a ld ed Cooked Cooked Raw Raw 14 24 2 20 16 1 10 27 16 20 14 10 17 23 13 25 14 20 18 26 136 1. 2. Scalding of all vegetables causes destruction of some of the Vitamin 0 content of the vegetables. Spinach is a good source of Vitamin C. TEST J GENERALIZATIONS AND ASSUMPTIONS Items 16 through 21 are a re-evaluation of some of the items 1 through 15. Re-read items 1, 3, 9, 11, 13 and 15 and determine whether they are generalizations, extensions of data, explanations of the data or merely restatements of the data, etc. Answer each according to the following key: 1. 2. 3. 4. 5. 16. Key A generalization; that is the data says it is true for this situation, a generalization says it is true for all similar situations. The data Indicates a trend which if continued in either direction would make the statement true. An explanation of the data in terms of cause and eff ec t .. A restatement of results. None of the above. Item 1. This phase of the test Is designed to measure your understanding of assumptions underlying conclusions. A conclusion is given. (This conclusion Is not necessarily justified by the d a t a ) . The statements which follow the conclusions are the items which are to be evaluated accord­ ing to the following key. These items all relate to the data presented for items 1 through 15. 1. 2. 3. 4. 5. Key An assumption which must be made to make the conclusion valid (true). An assumption which if made would make the conclusion false. An assumption which has no relation to the validity (truth) of the conclusion. Not an assumption; a restatement of fact. Not an assumption; a conclusion. Conclusion 1: The breakdown of Vitamin C proceeds spon­ taneously but is a relatively slow process at low tempera­ ture . 22. 23. Vitamin C is a stable substance. There is order in the universe. 137 A N A L Y S I S OF T H E T R Y O U T T E S T S IN T E R M S O F T HE B E H A V I O R S I N V O L V E D Table behaviors I has been p r e p a r e d to outlined earlier in t his out t e st s w a s designed are p r e s e n t e d in t h e t a b l e . h a v i o r s r e w o r d e d in t o to m e a s u r e . Test Test Test Test Test Test A B C D E F T e s t 3Test H Test J An was m a d e to f a c i l i t a t e test battery. table valid sources of data, ments. we r e has been made the b e ­ tests have titles are the t ab l e . indicates the b e h a v i o r s observable covered by seen these in the t h a t a f e w of the i n v a l i d d ata, a n d the a b i l i t y in p e r s o n s thinking the tests, s u c h as valid and the in­ to c a r r y o u t e x p e r i ­ omitted chiefly because to t e a c h that a n a t t emp t of s c i e n t i f i c It w i l l b e to r e c o g n i z e v a l i d a n d These The of try­ objectives descriptive the r e a d i n g critical aspects behaviors were not well ability this to c o v e r m o s t of preliminary the S o m e S t e p s in S c i e n t i f i c T h i n k i n g T he D e l i m i t a t i o n of P r o b l e m s Experimental Procedures O r g a n i z a t i o n of D a t a E v a l u a t i o n of H y p o t h e s e s E x p e r i m e n t a t i o n a n d the I n t e r p r e t a t i o n of D a t a D r a w i n g of C o n c l u s i o n s I n t e r p r e t a t i o n of D a t a Generalizations and Assumptions i n s p e c t i o n of employing the The m a j o r These are fo l l o w e d by b u t the of c h a p t e r e a c h o f t he shorter statements. previously b e e n described, presented here Indicate which two o b j e c t i v e s little attempt in the c o u r s e Biological Science at Michigan State College. in 138 TABLE I BEHAVIORS MEASURED BY THE TRYOUT TESTS Behaviors______________ ___________________________________ A Recognizes Problems 1.10 Recognizes problems in context 1.20 Distinguishes fact from problem 1.30 Recognizes problem in expository form 1.40 Distinguishes problem from hypothesis 1.50 Distinguishes problem from side issues Delimits Problem 2.10 Distinguishes major problem from minor ones 2.20 Isolates major problem or major idea 2.30 Sees relation of minor problems to major one 2.40 Distinguishes relevant from irrelevant problems 2.50 Analyses problem into essential parts 2.60 Concentrates on main problem 2.70 Recognizes basic assumptlons of problem B O Tests___________ D E F O H __ J. X X X X X X X Recognizes Facts Related to solution of problem 3.10 Selects information needed to solve problem 3.20 Recognizes valid evidence 3.30 Recognizes reliable sources of information 3.40 Selects data pertinent to solution of problem 3.50 Distinguishes between pertinent and unrelated data X X X X X X X X X X X X X X X X X X 139 TABLE I (continued) Behaviors __________________________________ A Recognizes hypotheses 4.10 Distinguishes hypothesis from problem 4.20 Differentiates observation from hypothesis 4.30 Distinguishes hypothesis from conclusion 4.40 Recognizes tentativeness of hypothesis B G D Testa__________ E F G - H J X X X X X X X X X X X Plans Experiments 5.10 Selects proper hypothesis to test 5.20 Differentiates observaX tion from experiment 5.30 Uses single variable factor 5.31 Controls proper factors 5.32 Recognizes overall control 5.33 Recognizes partial control 5.34 Recognizes variable factor 5.35 Understands reason for overall control 5.36 Recognizes constant factor of overall control 5.37 Recognizes constant factor of partial control 5.40 Recognizes problems inherent in experiment 5.50 Criticizes faulty experiments when 5.51 Not designed to answer problem 5.52 Not designed to test hypothesis 5.53 Methods were not reliable 5.54 Data were not accurate 5.55 Data were insufficient in number 5.56 Proper controls were not included 5.57 No controls were included X X X X X X X X X X X X X X X X X X X X X X X X X X X X 140 TABLE I (continued) Behaviors ______________________________________ A Garries out experiments 6.10 Recognizes m ea su re m en t errors 6.20 Recognizes precision of measurement necessary 6.30 Makes accurate obs er va ­ tions 6.31 Observes differences in similar situations 6.32 Observes similarities in different situations 6.40 Organizes facts for interpretation Interprets data 7.10 Handles skills necessary to interpretation 7.11 Gan read tables a n d graphs 7.12 Gan perform simple computations 7.20 Evaluates relevancy of data 7.21 Recognizes inferences contradicted by data 7.22 Recognizes inferences unrelated to data 7.23 Selects b est hypothesis to explain data 7.24 Recognizes facts supporting inference 7.25 Recognizes facts contradicting inference 7.30 Distinguishes facts X from inferences 7.31 Distinguishes observaX tion from conclusion 7.32 Distinguishes hypothesis X from conclusion 7.33 Distinguishes assumpX tion from conclusion 7.34 Distinguishes fact X from assumption B G D Tests__________ E F O H J X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 141 TABLE I (continued) Behaviors 1 7.40 Recognizes limitations of data 7.41 Distinguishes data from what is implied by data 7.42 Recognizes inferences as not absolutely true 7.43 Recognizes limitations in applying generaliza­ tions 7.44 Confines definite con­ clusions to evidence 7.50 Makes inferences based on data 7.51 Makes inferences based on trends 7.52 Makes inferences based on extrapolations 7.53 Makes inferences based on interpolations 7.54 Is not too over­ cautious 7.60 Perceives relationships in data 7.61 Makes comparisons in data 7.62 Sees common elements in data 7.63 Recognizes tendencies and trends 7.64 Suspends cause and effect Judgments 7.70 Recognizes nature of evidence 7.71 Distinguishes direct from indirect evidence 7.72 Recognizes evidence which contradicts conclusion 7.73 Recognizes evidence unrelated to conclusion 7.74 Recognizes evidence for inferences 7.75 Recognizes validity of evidence A B O Tests D E F 0 H J X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 142 TABLE I (continued) Behaviors ______________________________________ A 7.80 Recognizes assumptions underlying inferences 7.81 Recognizes essential assumptions 7.82 Recognizes assumptions underlying conclusions 7.83 Recognizes testable assumptions 7.84 Recognizes invalid assumptions Applies generalizations 8.10 Is cautious in a pplying generalizations 8.20 Is aware of tentativeness of applications 8.30 Recognizes assumptions underlying applications B O X X D E F Tests____ G - H J X X X X X X X X X X X X X X X X CHAPTER V THE STATISTICAL ANALYSES OF THE TESTS AND THE TEST ITEMS This chapter is devoted to a presentation of the statistical analyses of the tests and the test items. The means, standard deviations, and reliabilities of each of the tryout tests are presented. Item analysis data for the items in the tryout tests have been summarized. Inter­ correlation of the tryout test scores have been calculated and data concerning the degree of overlapping of the tryout tests are discussed. This discussion is followed by analyses of Tests I and IA and by the item analysis data on these tests. METHODS USED IN ITEM-ANALYSIS Item validity may be defined as a measure of the item's correlation with a criterion.^" In this case the cri­ terion used was the scores on the tryout test which included the particular item for which the validity was to be deter­ mined. The purpose of determining item validity is to ident­ ify good items to be retained and poor items to be eliminated or revised. Poor items are generally defined as those lacking in discriminative power while good items are discriminatory ^ Dorothy C. Adkins, Construction and Analysis of Achievement T e s t s . Washington: U. S. Government Printing Office. 1947. P. 180. 144 G-ood items are those missed more often by those persons who have a low degree of the quality being measured, (in this case, the ability to think scientifically), and answered correctly more often by persons having much of this same quality, whereas poor items are answered correctly by the same number of persons, irrespective of their ability. Item validity may be estimated by any one of several methods. Test items are usually validated by comparing the proportion of persons having high scores on the test who answer the item correctly with the proportion of persons having low scores on 2 the test who answer the item correctly. Kelley has shown that the best estimates of the correlation of the item with the total test score can be obtained by using the responses of the upper 27 percent or total score and the lower 27 per­ cent on total score of the group for the calculations. The estimated item correlation was determined by two methods, both of which required the determination of the per­ cent in the upper 27 percent of the group and the percent in the lower 27 percent of the group answering the items cor­ rectly. One method was devised by Flanagan.-^ By this method 2 Truman L. Kelly, "The selection of upper and lower groups for the validation of test items." Journal of Sducational Psychology. 30:17-24, January, 1939* ^ John G. Flanagan, "General Considerations in the selection of test items and a short method of estimating the product-moment coefficient from data at the tails of the distribution." Journal of Educational Psychology. 30:674-680, December, 1939. 145 validity la read from a chart, the chart being entered by the percent of successes of each of the groups. The second method used for the estimates of the discrimination power of the items was that of Davis. 4 This method also involves the use of the upper and lower 27 percent of the group. A table is entered by percent of successes of each group; how­ ever, the percent successes are calculated differently by Davis' than by Flanagan's method. Straight percent successes are used in the Flanagan method whereas the method devised by Davis Involves a correction for guessing. In addition, Davis' method yields a figure which he calls the discrimina­ tion index, which is a linear function of the hyperbolic arc tangent of the product-moment coefficient of correlation. He believes that this figure is truly comparable from item to item whereas the coefficient Itself is not. The coeffi­ cient of correlation cannot Justifiably be averaged. However, a table is included in his monograph whereby the discrimina­ tion index can be converted to a coefficient of correlation 5 for comparison with results obtained by other methods. Item difficulties, stated in terms of the percent of persons answering the items correctly, were estimated by the 4 Frederick B. Davis, Item-Analvsl3 Data. Cambridge: Graduate School of Education, Harvard University. 1946. pp. 8-15. ^ Ibid. . pp. 14-15. 146 method proposed by Davis.^ These were estimated from the percents of successes of the upper and lower 27 percents of the group. Davis suggests the use of a difficulty index, which like the discrimination index, is read from the table included in his monograph. Like his discrimination index, the difficulty index is corrected for chance. Because item difficulties, when expressed as percents passing the item, cannot Justifiably be averaged he devised a difficulty index which is a linear scale. The actual percents can be obtained by use of a table to convert the difficulty indices to per­ cents passing the item. ANALYSES OF TRYOUT TESTS The tryout tests were administered to 168 students in the spring term of 1950. The tests were scored on the basis of total number of items answered correctly. No correction was made for guessing since students were instructed to answer all items. Analysis of Test A - Some Steps in Scientific Thinking. Test A, designed to measure an understanding of some of the steps of scientific thinking as described in Chapter IV (see page 126), was comprised of a total of 74 items. on this test ranged from 24 to 67. The scores The mean and its standard error were 50.60 ± 0.62 and the standard deviation and its standard error, 8.13 ± 0.44. ^ Ibid., pp. 2-3. The reliability as estimated 147 by the split-half method and adjusted by the Sp>earman-Brown prophecy formula, was .87 ±.02. Complete item-analysis data for Test A are presented in Table XXXVI of Appendix I. The range of item discrimina­ tion, as expressed in terms of estimated coefficients of cor­ relation with total test score, was from .00 to .77. The range in terms of Davis' discrimination indices was from 0 to 61. As previously mentioned, the coefficients of correla­ tion cannot justifiably be averaged, whereas the discrimina­ tion indices can be averaged. was 29.45. The mean discrimination index Davis’ Table of conversion of indices to equiva­ lent values of coefficients of correlation gave an estimated mean correlation of .45. The range of difficulty of the items of Test A was from 0 to 95 percent. 0 to 85. The range of indices of difficulty was from Since the difficulty index is subject to statistical treatment these were averaged giving a mean of 55.51. This was equivalent to 60 percent of the group answering these items correctly. The item analysis data, and the mean of the test indi­ cate that the test was rather easy; the item discrimination data gave evidence that as a whole the items discriminated quite adequately between those students having considerable understanding of the steps of scientific thinking and those not having such an understanding. The reliability coefficient of the test indicated that the test measured whatever quality it was measuring quite consistently. are summarized in the following table. The data for this test 148 TA3LS II PERTINENT DATA FOR TEST A .............................. 74 Number of Items .............................. 24 - 67 Range of scores Mean ................ .......................... 50.60 t 0.62 ............................ 8.13 1 0.44 Standard deviation Reliability coefficient ......................... 87 ± .02 Range of discrimination indices ............ 0-61 ................... 29.45 Mean discrimination index Range of difficulty Indices ................. 0-85 Mean difficulty index ....................... 55.51 Analysis of Test B - The Delimitation of Problems. Test B, devised to measure the ability to delimit problems (see page 127, Chapter IV), as presented originally contained 67 items. Preliminary item analysis revealed that 17 of the items were either lacking in discriminatory power or were negatively discriminating. Since negatively discriminating items reduce the reliability of a test it was deemed advis­ able to eliminate these 17 items, rescore the papers and on this basis recalculate the item difficulties and item dis­ criminatory values. The scores on the fifty items remaining ranged from 12 to 3 3 J the mean was 22.46 ± 0.37. The stand­ ard deviation was 4.77 ± 0.26 while the estimated reliability coefficient was .61 ± .05. Complete item analysis data for this test are presented in Table XXXVII of Appendix I. The range of item discrimination when expressed as an estimated coefficient of correlation was from .04 to .83. of discrimination indices was from 4 to 72. The range The mean dis­ crimination index was 27.08, which when converted to an estimated coefficient of correlation was .44. The range of difficulty expressed in percent of 149 successes for Test B was from 4 to 88, indices of d i f f ic ul ty was 39.40. the me a n b e in g When converted to percent of successes this becam e 30 percent. cesses from 11 to 75, The range of the The mean of the test a n d the percent of s u c ­ indicated that this test was r e l a t iv el y difficult. The standard deviation and the r a n ge of scores also g ave evidence that the items were not a l l fu nc tioning to d i s c r i m ­ inate b e t w e e n those with superior a b i l i t y to delimit problems and those inferior in this ability. presented in Table X XX V I I A n inspection of the data (Appendix I) and of the test items (Appendix I) shows that the most d i s c r i m i n a t i n g items of the test were those involving the r e c o g n i t i o n of the basic a s s u m p ­ tions upon w hich the problem itself rested. This point seemed to be of sufficient interest to present these items separately. The following table gives the discr im in at io n and the d if ficulty indices of the seven items of the test which p u r p o rt ed to m e a su re the student*s a b i l it y to recognize assumptions u n d e r ly in g problems. TABLE III ITEM A N A L Y S I S DATA ON THE SEVEN ITEMS CF TEST B W H I C H M E A S U R E D A B I L I T Y T O .R E C O G N I Z E A S S U M P T I O N S UNDERLYING- PROBLEMS Item N u m b e r D is cr im in at io n Index 5 9 21 28 38 45 58 mean 48 48 72 39 53 52 63 53.57 Difficulty Index 45 42 44 46 44 34 40 42.14 These Items were no more difficult than the othe r items of the test, in fact, they were a n s w e r e d c or rectly slightly more often than was the average item, but they were much more discriminating. part, They accounted, to a large for the r a t h e r h i g h mean di sc r imination value of the items of the test. The average e st imated coeffi ci en t of correlation of these items with the total test score was .71 while the mean difficulty of these items when ex p r e s s e d as percent of successes was 35 percent. The p er tinent data for Test B are presented in Table IV. TABLE IV PERT IIIENT BATA FOR TEST B Number of items .................................. 50 Range of scores ................................... 12 - 33 Kean ................................................ 22.46 t 0.37 Standard d ev iation ................................. 4.77 £ 0.26 Reliability coeffic ie nt ............................. 61 t .05 Range of di sc r i m i n a t i o n indices .. ............ 4 - 72 Kean discr im in at i on index ....................... 27.08 Range of diffic ul ty indices ..................... 11 - 75 Kean difficulty index ............................ 39.40 A na ly s i s of Test C - Experimental P r o c e d u r e s . Test G, designed to m e a s u r e an un de rstanding of experimental p r o c e d ­ ures (see page 128, Chapter IV), was comprised of 62 items. The scores ran ge d from 15 to 44; 26.30 t the mean of the test was 0.41, and the standard deviation was 5.31 £ 0.29. The reliability, as estimated by the split-half m e t h o d and adjusted by means of the Spearman-Brown prophecy formula was .59 i .05. The item analysis data for Test 0 are presented in 151 Table XXXVIII of Appendix I. The range of estimated corre­ lations of the items with the total test score was from -.17 to .78, the range of discrimination indices was from -10 to 63* The mean discrimination index was 21.52 which when changed to an estimated coefficient of correlation was .34-. The range of difficulty indices was from 0 to 59; the mean difficulty index was 34.37, or in terms of percent of suc­ cess, 23 percent. This low percent of success and the low mean of the test both testify to the difficulty of this par­ ticular test. The large number of non-functioning items, that is; those with low discriminating power and those answered correctly by sufficiently few students to be accounted for on the basis of chance alone, plus the negatively discriminat ing items, may account for the rather low reliability of Test C. However, there was a sufficiently large number of satis factory items in the test to warrant the use of some of the items in the construction of Test I, The Ability to Think Scientifically. Table V is concerned with the pertinent data on Test G. TABLE V PERTINENT DATA FOR TEST G Number of items .............. Range of scores .............. Mean .......................... Standard deviation ........... Reliability coefficient ..... Range of discrimination indices Mean discrimination index Range of difficulty indices Mean difficulty index ....... 62 15 - 44 26.30 t 0.41 5.31 t 0.29 .59 i .05 -10 - 63 21.52 0-59 34.37 152 Analysis of Test D - Organization of D a t a . designed to measure ability to organize data Chapter IV), was comprised of 20 items. test ranged from one to ten. 10.94 - 0 , 3 2 , Test D, (see page 1 2 9 , The scores on this The mean of the test was the standard deviation was 4.12 ± 0.23. test had a reliability of .93 - The *01 as determined by the method of split-halves and correction by the Spearman-Brown formula. The item analysis data for Test D are presented in Table XXXIX of Appendix I. The range of item discrimina­ tions, as expressed by an estimated coefficient of corre­ lation with the total test score, was from .14 to .90; the range of discrimination indices was from 22 to 90. The mean discrimination index was 52.60 which has a correspond­ ing value in terms of coefficient of correlation of .7 0 . The range of difficulty indices was from 22 to 55, the mean being 45.90. This value corresponds to 42 percent successes. The item analysis data and the mean of the test indi­ cate that the test was of average difficulty. were unusually discriminating. The items As previously mentioned, the tryout test scores were used as the criteria for deter­ mining item validity. Since a test score is simply the sum of the scores on individual items the correlation between items and test score is related to the inter-correlations 153 of individual test items. item validity A s p o i n t e d out by Conrad,"^ high is an indication that the items are h i g h l y consistent or homogenous with o t h e r items of the test, and if all of the items are d i s c r i m i n a t i n g it means that there is internal consistency or h o m o g e n e i t y of the entire test. Such internal consistency results in a h i g h split-half r e ­ liability coefficient. That T es t D h a d c onsiderable inter n­ al consistency is shown by the h i g h high reliability of the test. An item validity a n d the inspection of the test itself also gives evidence of its internal consistence, since the items were all very similar. A n inspection of Table I in Chapter IV shows that this test was d e s i g n e d to test a very limited range of behaviors. From the standpoint of item analysis data and test reliability, most successful of the tryout tests. Test D was the However, the fact that it tested a very narr ow range of ab il it i es limited its u s e ­ fulness as a measure of the ability since this ability to think scientifically, includes a wide range of abilities as shown by the analysis of be haviors involved in scientific thinking as presented in Chapter IV. Table VI presents a summary of the pertinent data for Test D. 7 H e r b er t 3. Conrad, C h a ra ct er is ti c s and Use of Item-Analvsis D a t a . A m e r i c a n P sy c h o l o g i c a l Association, Psychological M o n o g r a o h s : d e n e r a l a n d Applied. No. 295. 1948. p. 15. 154 TABLE VI PERTINENT DATA FOR TEST D umber of items .............. Range of scores Mean .......................... Standard deviation ........... Reliability coefficient ..... Range of discrimination indices Mean discrimination index Range of difficulty Indices Mean difficulty index ........ 20 1-19 10.94 ± 0.32 4.12 ± 0.23 .93 i .01 22 - 90 52.60 22 - 55 45.90 Analysis of Test E - Evaluation of Hypotheses. Test E was designed to measure the ability to evaluate hypotheses (see page 130, Chapter IV) and was comprised of 74 items. The scores on this test ranged from 15 to 53. the test was 34.37 6.38 t 0.35. - The mean of 0.49 and the standard deviation was The estimated reliability as calculated by the split-half method and adjusted by the Spearman-Brown formula was .71 - .04. The item analysis data for this test are presented in Table XXXX of Appendix I. The range of item discriminations expressed in estimated coefficients of correlation of the items with the total test score was from .00 to .71» the range of discrimination indices was from 0 to 54. The mean discrimination index was -24.50 which, when expressed in terms of estimated coefficients of correlation, was .38. The range of difficulty indices was from 0 - 77; the mean was 40.57. This gave a value of 32 percent of successes. percent when expressed as 155 The items were, as a whole, moderately successful as evidenced by the mean discrimination index. However, the test was somewhat difficult as shown by the fact that the mean of the test was less than half of the total possible points and also by the relatively low mean difficulty index. However, this was also true of most of the tryout tests. Table VII presents a summary of the pertinent data for Test E. TABLE VII PERTINENT DATA FOR TEST E Number of items .............. Range of scores .............. Mean ........................... Standard deviation ....... Reliability coefficient ..... Range of discrimination indices Mean discrimination index Range of difficulty indices Mean difficulty index ........ 74 15 - 55 54.57 t 0.49 6.58 t 0.55 .71 t .04 0-54 24.6 0-77 40.57 Analysis of Test F - Experimentation and Interpreta­ tion of D a t a . Test F was designed to measure the ability to recognize experimental controls and the ability to interpret data (see page 151, Chapter IV). ranged from IS to 62. The scores on this test The total number of items was 72. The mean of Test F was 47.S5 ± 0.66; the standard deviation was 3.48 i 0.46. The estimated reliability was .89 - .02. Item analysis data for Test F are presented in Table XKXXI of Appendix I. The range of coefficients of correla­ tion with total test scores ranged from .00 to .75. The dis­ crimination indices ranged from 0 to 59; the mean was 50.66. 156 This gave an e s t i m a t e d m e a n items wi t h t o ta l score of from 0 to 100; c o e f f i c i e n t of c o r r e l a t i o n of .47. The item d i f f i c u l t i e s r a n g e d the d i f f i c u l t y indices also r a n g e d f rom 0 to 100, the m e an was 55*13. This gave a mean item d i f f i c u l t y of 59 percent of successes, V/ith the ex c e p t i o n of Test D, this test was one of the m ost s u c c e s s f u l tests of the tryou t b a t t e r y as e v i d e n c e d by a r e l a t i v e l y h i g h re liability, ity, The test was a n d by the h i g h item v a l i d ­ so mewhat easier than most of the tests of the tryout b a t t e r y as shown by the m ea n of the test a n d the item difficulty. A summary of the p er tinent data for Test F is p r e s e n t e d in the f o l lo wi ng table. T A BL E V I I I P E R T I N E N T DATA F O R TEST F ............................... 72 Number of items Range of scores ...... ........................ 18 - 62 Mean ........................... ................ 47,.85 * 0,66 Standard d e v i a t i o n ............................. 8.48 * 0.46 R e l i a bi l it y c o e f f i c i e n t ......................... 89 ± *02 Range of d i s c r i m i n a t i o n indices ............ 0 - 5 9 Mean d i s c r i m i n a t i o n index ................... 30.66 Range of d i f f i c u l t y indices ................ 0 - 100 Mean d i f f i c u l t y index ........................ 55*13 Analysis of Test G- - Draw in g of C o n c l u s i o n s . T est 0, a h u n d r e d item test, was d e s i g n e d to measure the abi li ty to recognize log i ca l conclusions (see page 133, C h a p te r IV). The scores on this test r a n g e d from 6 to 64. 38.01 ± .92; The m ean was the st andard dev ia ti on was 11.95 * 0.65* estimated r e l i a b i l i t y of Test 3- was The .90 - .01. Item a n a l y s i s data for this test are p re sented in 157 Table XXXXII of Appendix I. -.07 to .88. Item validities ranged from Discrimination indices ranged from -4 to 80; the mean discrimination index was 31.82. This figure repre­ sents a mean correlation of .48 of the items with the total test score. The item'difficulties ranged from 0 to 89 per­ cent of successes and the difficulty indices was from 0 to ?6. The mean difficulty index was 32.54 or an average of 20 percent of success. The test mean and the percent successes indicate that this was a very difficult test. However, the test seemed to offer considerable promise since the reliability of the test was high and the items were on the average quite discriminat­ ing. Table IX presents a summary of the pertinent data for Test G-, TABLE IX PERTINENT DATA FOR TEST 0 Number of items .............. Range of scores .............. Mean .......................... Standard deviation ........... Reliability coefficient ..... Range of discrimination indices Mean discrimination index Range of difficulty indices Mean difficulty index ........ 100 6-64 38.01 ± 0.92 11.95 ± 0.65 .90 ± .01 -4 - 80 31.82 0 - 7 6 32.54 Analysis of Test H - Interpretation of D ata. Test H and Test J were presented to the students as a single test of 158 items (see page 135, Chapter IV). However, for the pur­ poses of analysis this single test was considered as two tests; Test H, Interpretation of Data and Test J, 158 generalizations and Assumptions. The 75 items of the 168 item test which were answered by the key: true, probably true, insufficient data, probably false, and false, constituted Test H. The range of scores for this test was from 16 to 48. The mean of the test was 32.19 * 0.49 and the standard devia­ tion was 6.38 * 0.35* The estimated reliability was .70 * .04. Complete item analysis data on Test H are presented in Table XXXXIII of Appendix I. The range of item discrimina­ tions expressed as an estimated coefficient of correlation with the total test score was from -.27 to .76. The discrim­ ination indices ranged from -17 to 60 , resulting in a mean of 24.69. This corresponds to an estimated coefficient of corre­ lation with the total test score of .39* The range of item difficulties was from 0 to 89 per­ cent of successes. The range of indices of difficulty was from 0 to 76 giving a mean difficulty index of 35.69 and 25 percent success on the items. This figure and the mean of the test gave evidence that the test as a whole was quite difficult. A summary of the pertinent data for Test H is given in Table X. TABL3 X PSRTINStfT DATA FOR TEST H Mumber of items ........................ 75 Range of scores ............................. 16 - 48 he an ......................................... 32.19 * 0.49 Standard deviation ........................... 6.38 - 0.35 Reliability coefficient ....................... 70 - •04 ............ ^ Range of discrimination indices Mean discrimination index .................. 24.69 Range of difficulty indices ................ 0-76 Mean difficulty Index .................... . 35.69 159 Analysis of Test J - Generalizations and Assumptions. Test J, consisting of 95 items of the 168 items which con­ stituted the combination Tests H and J, was designed to meas­ ure an understanding of generalizations and assumptions. The scores on this test ranged from 16 to 59. The mean of Test J was 37.37 - 0.71 while the standard deviation was 9*31 £ 0.51 and the estimated reliability of the test was .81 - .03. Complete item analysis data for Test J are presented in Table XXXXIV of Appendix I. The range of item validity values was from -.04 to .81. 0 to 68. The discrimination indices ranged from The mean discrimination index was 25*76. This is equivalent to an estimated coefficient of correlation of .40. Tho item difficulties ranged from 0 to 66 in terms of percents answering the item correctly. The range of difficulty indices was 0 to 59 and the mean was 34.62. This figure corresponds to a value of 23 percent when converted into percent passing the item. The mean of the test and mean item difficulty both test ifled that this test, like Test H, was quite difficult. Table XI presents a summary of the pertinent data for Test J. TABLE XI PERTINENT DATA FOR TEST J ............................ 93 Number of items Range of scores ............................. 16 - 59 ......................................... 37.37 * 0.71 Mean Standard deviation ........................... 9.31 * 0.51 Reliability coefficient ........................ 81 ± .03 Range of discrimination indices ............ 0 - 68 Mean discrimination index .................... 25.76 Range of difficulty indices ................ 0 - 59 Mean difficulty index ....................... 34.62 160 The data on the means, liabilities Test B, The two least r e l ia b le tests were w h i c h p u r p o r t e d to m e a s u r e the abi li ty it problems a nd ize data, to d e l i m ­ (2) Te s t C, w h ic h was d e s i g n e d to m e a s u r e an u nd erstanding of e x p e r i m e n t a l design. reliable. and re­ for all of the tests of the try ou t b a t t e r y are summarized in Table XII. (1) s t a n d a r d deviations, This test, Test D was the most- d e s i g n e d to meas ur e abil it y to o r g a n ­ c o n t a i n e d items which probably tested a v e r y na rr o w range of a b i l i t y and items wh ich were all v er y similar. A, pur po r ti ng thinking, to m e a s u r e k no wledge of steps of scientific Test F, d e s i g n e d to m e a s u r e abil it y to interpret data and an u n d e r s t a n d i n g of controls, to measure abi li ty 1 ^\ o -v M p to draw conclusions, and Test G-, d e s i g n e d were all fairly re- • T A BL E XII C OM P A R I S O N OF MEANS, S T A N D A R D DEVIATIONS, A N D R E L I A B I L I T I E S OF THE TRYOUT TESTS No. of Items a 7^ -I F or TO J Test 74 50 62 20 74 72 100 75 95 Mean 50.60 22.46 26.30 10.94 34.37 47.85 38.01 32.19 37.37 ± ± t * ± ± ± ± ± .62 .37 .41 .32 .49 .66 .92 .49 .71 S t a n da r d D ev iation + .44 8.13 + .26 4.77 + 5.31 .29 + 4.12 .23 6.38 t .35 8.48 + .46 + .65 11.95 6.38 ± .35 + .51 9.31 Reliability .87 .61 .59 .93 .71 .89 .90 .70 .81 + + ± + + + + + .02 .05 .05 .01 .04 .02 .01 .04 .03 161 A summary of Item analysis data for all of the tests of the tryout hattery is presented in Table XIII. Inspection of this table reveals that the mean item discrimination in­ dices were all above the criterion value of 20 suggested by o Davis. Test D, the test to measure ability to organize data, had the highest mean discrimination index of any of the tests. Tests A, F, and G-, judged on the basis of mean i discrimination indices, were the next most successful tests, lest C, Judged on the same basis, was the poorest. It is of interest to note that the rank order of the mean discrimina­ tion indices is very similar to the rank order of the reli­ abilities of the tests. TABLE XIII COMPARISON OF MEAN ITEM VALIDITIES AND MEAN ITEM DIFFICULTIES OF THE TRIGUT TESTS Test Mean Discrimination Coefficient A 3 .-1 W D E F 'I Lf li J .45 .44 .34 .70 .38 .47 .48 .39 .40 Mean Discrimination Index 29.45 27.08 21.52 52.60 24.60 30.66 31.82 24.69 25.76 Q Davis, op. pit., p. 15* Mean Percent Success 60 30 23 42 32 59 20 25 23 Mean Difficulty Index 55.51 39.40 34.37 45.90 40.57 .55.13 32.54 35.69 34.62 162 The mean difficulty 55.51, cult. Indices ranged from 32.54 to indicating that the tests were all relatively diffi­ A criticism of the tests as a whole might be that they were a little too difficult for the group for which they were intended. Analysis of tryout tests considered as a single test. In all there were 620 items used in the determination of the scores on the total tryout battery. from 183 to 399. The mean The range of scores was for the entire battery of tests was 291.12 ± 3.48, while the standard deviation was 44.22 * 2.26. The minimum reliability of the test, as estimated by the Kuder-Richardson^ formula, was .92 * .01 for this group of students. Table AIV presents a summary of the pertinent data for the tryout test battery. TABLE XIV PERTINENT DATA FOR THE TRYOUT TEST BATTERY ............................. 640 Number of items Range of scores ............................. 183 - 399 Mean ......................................... 291.12 ± 3.48 .......................... 44.22 ± 2.26 Standard deviation Reliability coefficient ........................ 92 * .01 Intercorrelation of tr.vout test scores. In order to determine whether there was sufficient overlapping in the tests to justify the elimination of any of the types of item3 presented in the tryout tests in the preparation of the final form of the test, intercorrelations were calculated for all of the tryout tests. These intercorrelations are presented ^ Adkins, ojc. clt. . p. 154. 163 in Table XV. The standard errors of these correlations were small; they ranged from .05 to .08. TABLE XV INTERG0RRE1ATIONS OF TRYOUT TEST SCORES Tests Tests A* B C A B C D E F G H J .18 .34 .27 .28 .37 .34 .44 .44 .21 .22 .30 .32 .18 .16 .11 .22 .26 .39 .32 .35 .33 .26 .26 .28 .29 .14 .47 .50 .45 .41 .47 .47 .45 .50 .31 D E F r\ U * H .59 J A, Steps in Scientific Thinking. B, Delimitation of Problems. C, Experimental Procedures. D, Organization of Data. E, Evaluation of Hypotheses. F, Experimentation and the Interpretation of Data. G, Drawing of Conclusions. H, Interpretation of Data. J, Generalizations and Assump­ tions. These data show that Test D, the test devised to meas­ ure ability to organize data, had a low correlation with all of the other tests of the battery. Tests H and J, the tests devised to measure interpretation of data, and the ability to recognize generalizations and assumptions respectively, which were presented to the students as a single test, had the high est intercorrelation of any of the tests. Was this due to 164 the fact that the same subject matt er was used for both tests? Or was It due to the fact that an understanding of generalizations and as su mp t io ns was necessary for correct interpretation of data? The data presented are not such that they suggest possible answers to these questions. The correlation bet we en two tests is considered to be lowered if the test scores are u n r e l i a b l e . ^ In order to estimate the correlation betw ee n the true scores of two tests a correction known as the correction for attenuation 11 is frequently made which takes the unreliability of both tests into account. This correction gives the maximum correlation which could be obt ai ne d bet we en the two test scores if both measures were perfectly reliable; that is, if the r eliabil­ ity coefficient of each test was 1.00. mind, however, It must be kept in that this is a theoretical value. The inter­ correlations corrected for attenuation are given in Table XVI. A comparison of Tables XV a n d XVI reveals the fact that all of the correlations have been increased by the correctipn for attenuation. The comparison also shows that the corrections of tests which were quite reliable, as Test D, were increased much less than tests which were rather unre­ liable, like Tests B a nd G. In addition, it can be seen that the lower correlations were increased less than the higher 10 H enry S. G-arrett, Statistics in Psychology and E d u c a t i o n . N e w York: Longmans, G-reen and Company. 1947. P. 396. 11 L o c . cit. 165 correlations. TABLE XVI INTERCORRELATIONS OF TRYOUT TEST SCORES CORRECTED FOR ATTENUATION Tests Tests A* A B C D E F .25 .48 .30 .35 .42 .39 .56 .52 .35 .29 .45 .44 .24 .24 .17 .30 .40 .53 .44 .55 .48 .32 .29 .30 .36 .16 .59 .63 .63 .54 .53 .59 .53 .63 .36 B C D S F Or H Gr ■ H J .73 J * A, Steps in Scientific Thinking. B, Delimitation of Problems. C, Experimental Procedures. D, Organization of Data. E, Evaluation of Hypotheses. F, Experimentation and the Interpretation of Data. G-, Drawing of Conclusions. H, Interpretation of Data. J, Generalizations and Assump­ tions . Since the purpose of these correlations was to deter­ mine whether there was sufficient overlapping of factors in the tests to warrant the omission of certain of these types of items in the preparation of the final form of the test, the degree of overlapping was determined by the coefficient of determination.12 This figure is obtained by squaring the 12 Ibid.. p. 338. c oefficient of correla ti o n. In ord er to obtain a f i gu re rep r e s e n t i n g the m a x i m u m overlap, the c oefficients of c o r ­ relation c o r r e c t e d for a t t e n u a t i o n were used. The c o e f ­ ficient of d e t e r m i n a t i o n denotes the percent of v a r i a n c e in one test a s s o c i a t e d w i t h the o t h e r test. is u s u al ly e x p r e s s e d as a percent. This figure For example, the c o e f ­ ficient of d e t e r m i n a t i o n b e t we en T e s t A a n d Test B Is which means t hat 6 p e r c e n t of the v a r i an c e of T es t A a s s o c i a t e d w i t h Test B. .06, is The co efficients of d e t e r m i n a t i o n for the t r y ou t tests are g i v e n in Table XVII. T A B L E XVII C O E F F I C I E N T S OF D E T E R M I N A T I O N OF THE TRYOUT TESTS A* B C D E F 0 H A B C D S F .06 .23 .09 .12 .18 .12 .08 .20 .19 .09 .16 .10 H J H VJI Tests • Tests .31 .27 .06 .*06 .03 .28 .19 .30 .23 .08 .09 .13 .03 .35 .40 .40 .29 .28 .35 .28 .40 .13 0 .53 J A, S t e p s in S c i e n t i f i c Thinking. B, D e l i m i t a t i o n of Problems. C, E x p e r i m e n t a l Procedures. D, O r g a n i z a t i o n of Data. E, E v a l u a t i o n of Hypotheses. F, E x p e r i m e n t a t i o n and the I n t e r p r e t a t i o n of Data. G-, Draw in g of Conclusions. H, I n t e r p r e t a t i o n of Data. J, G-eneralizations a n d A s s u m p ­ tions. 167 The coefficients indicate that the degree of over­ lapping in these tests is low. Since the maximum over­ lapping is only 53 percent all of the types of items repre­ sented in the tryout battery were used in the construction of Test I, The Ability to Think Scientifically. To determine whether the correlation between any of the tests of the tryout test battery of tests was sufficient­ ly high to be used instead of the composite of the scores the tryout test battery, the scores on each of the tests were correlated with the total scores. on These correlations are given in Table XVIII. TABLE XVIII CORRELATION OF TOTAL SCORES ON TRYOUT TEST BATTERY V/ITH EACH OF THE TRYOUT TESTS Tests B C D E F G- H J .62 .44 .55 H • Tryout total A .73 .74 .71 .71 .69 The standard errors of these coefficients ranged from .04 to .07. It Is of interest to note that Test D, The Abil­ ity to Organize Data, had the lowest correlation with the total scores on the tryout test battery. This was to be ex­ pected on the basis of the nature of the test. Inspection of the test reveals that it was testing a much more restricted range of objectives than any of the other tests, therefore, it would not be expected that it would have as high a 168 correlation with the composite score as a test me as ur in g a wider range of behaviors. That T est F, Experi me nt at io n and the Interpretation of Data, would h a v e the h i g h e s t correlation with the scores on the total test battery was to be expected, since that test m e a s u r e d b oth u n d e r s t a n d ­ ing of e x p e ri me nt at io n an d the abil i ty to interpret data, that is, it m e a s ur ed a wider range of the behaviors m e a s ­ ured by the battery of tests than did any other tryout test M u l t i p l e correlations of the scores on the total tr y ­ out test bat te ry with each combination of two of the indi­ vidual tests of the battery were c al culated to determine which two tests w o ul d be the m o st satisfactory to use in appraising the ability to think scientifically. The follow 13 ing formula ^ was u s e d for the c alculation of these multiple correlations: 13 Y/illiam D. ha ten, Elementary M at h e m a t i c a l S t a t i s t i c s . N e w York: John V/iley and Sons. 1938. p. 187. 169 TABLE XIX M U L T I P L E C O R R E L A T I O N OF T R Y O U T T O T A L W I T H T WO OF T HE T R YOUT T E STS .67 .84 .83 .82 .79 .77 .64 .54 .77 .77 .63 .82 .81 .76 .77 .74 D .85 .85 .84 .87 H Co -<] in • G 00 F .84 • .86 .76 GO E .74 00 C .78 CO B • -72 00 .69** ->] J • H CVI CO G h• F K\ E C H 00 D o• A* B • A Tests • T ests .77 J A, S t e p s in S c i e n t i f i c Thinking. B, D e l i m i t a t i o n of Problems. C, E x p e r i m e n t a l Procedures. D, O r g a n i z a t i o n of Data. E, E v a l u a t i o n of Hypotheses. F, E x p e r i m e n t a t i o n and the I n t e r p r e t a t i o n of Data. 3-, D r a w i n g of Conclusions. H, I n t e r p r e t a t i o n of Data. J, G e n e r a l i z a t i o n s a nd A s s u m p ­ tions. ■**■* This is to be read: M u l t i p l e c o r r e l a t i o n of t r y ­ out total w ith Tests A a n d B. T able X I X is s i g n i f i c a n t In that it shows that any two tests of the b a t t e r y gave fairly tion w i t h the crite'rion. Test D we r e Multiple substantial correla­ corr e l a t i o n s involving lower than any of the o t her correlations. The h i g h e s t m u l t i p l e c o r r e l a t i o n was o b t a i n e d w i t h Tests G a n d J. This highest is i n t e r e s t i n g since n e i t h e r of these c o r r e l a t i o n with the criterion. This tests h a d the can probably be e x p l a i n e d by the fact that they h a d a r e l a t i v e l y low 170 correlation with each other as can be seen in Table XV. In problems involving more than four variables the mechanics of calculating multiple correlations is almost prohibitive unless some systematic method of solution is used.1^ The Wherry-Doolittle method,^5 in addition to being a systematic method of calculating multiple correlations, corrects the correlation for chance errors. Table XX pre­ sents the results of this method of obtaining multiple cor­ relations of the tryout tests with the criterion, which was the total score on the tryout test, and shows the correla­ tions obtained by the addition of each successive test. using this method the first test used, Test F, is the one with the highest simple correlation with the criterion. TABLE XX MULTIPLE CORRELATION OF TRYOUT TESTS WITH THE CRITERION - OBTAINED BY THE WHERRY-DCOLITTLE METHOD Tests F F. E F, E, T? 2, T yrt *? ■^ S. S, ” 9 9 4 9 J 9 J J, j, j , J, Multiple correlations Ga, b a, b , a G-, B, C, H * A simple correlation Garrett, ojc. c i t. . p. 4-35. 15 Ibid.. pp. 435 - 448. .740* .856 .907 .948 .963 .972 .977 In 171 Tests k and D were not added because the Increase in the multiple correlation by the addition of Test H had been so slight that further additions seemed unnecessary. As shown by the data presented in Table XX each successive test added less to the correlation. It would appear that if a few of the individual tests of the tryout battery were to be used as a measure of the ability to think scien­ tifically, Test 2, The Evaluation of Hypotheses, Test F, Experimentation and the Interpretation of Data, Test J, generalizations and Assumptions, and Test 3-, Drawing of Conclusions, would yield scores sufficiently like the ones obtained from the entire battery to Justify the use of only these four tests. Correlations of scores on trvout tests with scores on Intelligence tests and reading tests. In order to deter­ mine whether the tryout tests were measuring intelligence or reading ability to a considerable extent, the scores made by students on each of the tryout tests were correlated with the quantitative score and with the linguistic score on the American Council on Education Psychological Examination and with the scores on the American Council on Education Reading Comprehension Test. These correlations are presented in the correlations of tryout test scores with intelligence test and reading test scores in Table XXI. TABLi XXI CORRELATION3 CF TRYOUT TEST SCORES WITH INTELLIGENCE TEST AND READING TEST SCORES Tests Tests Quantitative Linguistic Reading A* .17 .43 .25 B .24 .26 .13 "n .28 .41 .39 D .31 .11 .10 E .33 .42 .41 F .38 .34 .35 .37 .21 .29 .37 .18 .25 .37 .29 .35 * i J * A, Steps in Scientific Thinking. B, Delimitation of Problems. C, Experimental Procedures. D, Organisation cf Data. E, Evaluation of Hypotheses. F, Experimentation and the Interpretation of Data. G, Drawing of Conclusions. H, Interpretation of Data. J,. Generalizations and Assump­ tions. The abilities measured by the tryout tests, although all positively correlated with the quantitative and linguis tic factors of intelligence and with reading ability, do no appear to be identical with any of these mental functions. These data also give evidence that the inclusion of all of the types of items presented in the tryout battery could justifiably be included in the final form of the test since none of the tryout tests seemed to be measuring either of the factors of intelligence or reading ability. 173 The preparation of Test I. - The Ability to Think 5cier.tifleally. Test I, presented in Appendix II, was constructed from items of the tryout tests. Because of the nature of the items it was necessary to choose hlocks of items from the tryout tests rather than individual items. Therefore, it was necessary to select the "best blocks of items from each of the tryout tests. Items within blocks were eliminated if the estimated coefficient of correlation of the item with the tryout test score was low. Certain items had to be retained even if the item correlation was low because the information given in them was essential to the development of an understanding of the entire block of items. An attempt was made to eliminate all items with a discrimination index of less than 20 which corresponds to a coefficient of correlation of .33. This is in accord with the recommendation of D a v i s . ^ Test authorities do not agree on the best form of 17 distribution of item difficulties. 1 Some recommend all items as near 50 percent difficulty as possible; others recommend equal distribution of Items from 0 to 100 percent difficulty. lo 17 Davis, ojc. c i t. . p. 15. Herbert E. Hawkes, E. F. Lindquist, and C. R. M a n n , The Construction and Use of Achievement x! = 8.62 - Kj, The range of improvement on Test IA is of interest. Of the 136 students who retook the test, three did not change their scores, seven had scores from one to ten points lower on the post-test, and the remaining 126 students im­ proved their scores from one to 41 points. Since in both comparisons the differences between the means were highly significant and both in the same direction 202 we may make the inference that the test had some validity in that there was an increase in score attending instruction in the methods of science. One is obliged, however, to hold this inference as tentative until (1) further evidence con­ cerning the relationship between increased knowledge of the subject matter of the course and performance on the test is further investigated, (2) until it is demonstrated that maturation did not produce the observed results, and (3) until it is shown that other methods of instruction do not produce the same results. Validation bv comparison of scores with ratings of students by competent judges. The final method used in the statistical validation of the test was the comparison of scores made on the test with the rating of competent Judges. A rating scale for the ability to use the scientific method (Appendix IV) was prepared. Several members of the Department of Biological Science at Michigan State College were interviewed in order to determine the types of behaviors which they had observed in students whom they considered to have superior ability to think scientifically and the types of behavior which they had observed in students whom they believed to be very inferior in this ability. The two areas in which they agreed that ratings of the students could be made on the basis of obser­ vation of their performance in laboratory classes were 203 (1) the a b i l it y to devise a n d evaluate experiments, the ability to interpret data, a n d (2) including the abi li t y to form h y p o t h e s e s a n d draw conclusions. The instru ct io ns for rating the students were: Y/ill y o u please rato the pers o n w h o se name appe ar s above on the two fo l lowing c h ar ac te ri st ic s ? The two extremes of these charac te r is ti cs are described. Place a cross (X) on the line indicating your judgment of the i n d i vi du a l w ith respect to the qua l it ie s in question. A person hav in g a h i g h degree of a b i l i t y to evaluate and devise experim en ts was d e s c r i b e d in the following manner: Includes control factors, controls a ll but one v a r i ­ able, unders t an ds problem and devises experi me nt to test hypothesis. Can devise experiments which w i l l y i e l d results, recogn iz e s problems inherent in the experiment, a n d has an u n d e r s t a n d i n g of what is h a p p e n i n g in the experiment. A person h a v i n g a low degree of a b i l i t y to evaluate and devise experim en ts was described: E x p e r im e nt s lack control or control is faulty, e x p e r ­ iment u n r e l a t e d to hypothesis. S t u d e n t does not u n d e r ­ stand the e x p e r i me nt al set-up, or the problems inherent in the experiment. P r of i ci en cy in a b i l it y to interpret data could be r e c o gn iz ed by the following description of a person very superior in this ability: Is able to make logical inferences from data, takes pertinent facts into consideration, applies previous knowledge to the n e w situation, is able to see r e l a t i o n ­ ships, especially cause and effect relationships. Knows what evidence for his inference is, and why it is evidence. 204 The person very inferior in this ability: Is unable to make logical inferences from data, does not diflerentiate between relevant and i rr e l e ­ vant data or between critical and non-critical data, is unable to see relationships. The ratings were on a five point scale; very superior, superior, average, inferior, and very inferior. One hundred and forty-three students taking the first term'of Biological Science who were given Test IA at the beginning of the first term of the three-term sequence of the course were rated on their ability to think scientifically by their instructors. Test IA was administered again to 136 of these same students at the end of the first term. A part of these students were taught by the present investigator and the remaining students were taught by another instructor. Bach of these students was rated by his instructor on the rating scale described above. Students taking Biological Science at Michigan State College do not necessarily have the same instructor for more than one term, therefore, during the second term most of these students h a d a different Instructor. These students were . scattered throughout the classes of the 16 instructors teach­ ing the second term of the three-term sequence. Some had failed the first term's work and repeated it, hence they were in classes of one of the three instructors teaching the first term of the course. These instructors were requested to rate the students on their ability to think scientifically by 205 u s i n g the r a t i n g 19 i n s t r u c t o r s The two sheet d es c r i b e d above. were Involved instructors that second term A very to u s e were students rating rating 2 points, and a very inferior rating of p o s s i b l e An formance scores o n the expectancy of p e r s o n s composite chart,^ which entry t a b l e was as o n e a x i s a n d Because very s u p e r i o r or v e r y ratings. average, Rating scores scores below ^ Adkins, ojd. t wo judges, Since a was t h e r a n g e reveals test t he e x p e c t e d p e r ­ scor es , was of the t e s t in their instructors. the one on the A t est o t h e r a x i s. f e w r a t e d b y b o t h r a t e r s as e i t h e r inferior, c o n s t r u c t e d on the b a s i s superior rating. o n t h e r a t i n g s as there were very student. an inferior constructed with scores scores compu­ 1 point. the v a l i d i t y of t he r a t i n g of s t u d e n t s b y double by a of 4 p o i n t s receiving various of the m e t h o d s u s e d to d e s c r i b e terms 5 points, two a b i l i t i e s Each only. 5 points, an average m a x i m u m of 2 0 p o i n t s a n d a m i n i m u m the in s t a t i s t i c a l rating 4 points, e ach s t u d e n t w a s r a t e d on during calculated for each superior rating was a l l o t t e d students. of the r a t i n g s . these ratings composite ratings a t o t a l of of t he students for m o s t instructor rated a few In o r d e r tation, in t h e r a t i n g taught the f i rs t t e r m w e r e r e s p o n s i b l e I n a ll, of the expectancy superior, from average, 10 t h r o u g h 14 w e r e 10 w e r e considered c i t . . pp. 163-164. c h a r t was and inferior considered i n f e ri o r, and 206 scores a b o v e 14 w e r e c o n s i d e r e d s u p e r i o r . The expectancy means c h a r t can be of the c h i - s q u a r e t r e a t e d s t a t i s t i c a l l y by test.^ The hypothesis was th a t the s c o r e s m a d e b y the s t u d e n t s e s s e n t i a l l y u n r e l a t e d to the r a t i n g s their i n s t r u c t o r s o n t h e i r a b i l i t y The e x p e c t a n c y charts, served numbers sis, on T e s t IA w e r e to t h i n k s c i e n t i f i c a l l y . Tables XXXII and XXXIII, of p e r s o n s in e a c h c a t e g o r y and, s h o w the o b ­ in p a r e n t h e ­ in e a c h of the c a t e ­ if th ere w e r e no r e l a t i o n s h i p b e t w e e n Test IA a n d tested of the s t u d e n t s by the n u m b e r s w h i c h w o u l d be e x p e c t e d g o ri es to b e the sco re s on the r a t i n g s . TABLE XXXII E X P E C T A N C Y C H A R T SHOWING- T H E C O M P A R I S O N OF S C O R E S ON T H E T E S T IA P R E - T E S T A N D R A T I N G S Superior 7 5 - 100 14* (3.5)** 9 (10.6) 0 (8.9) 23 50 - 7 4 24 - 49 Totals * ** 8 (12.6) Ul - Discrimination r Index Difficulty % Success Index 48.8 36.1 40.0 25.0 .09 .14 8 30 39 53.3 41.7 37.7 22.2 .16 .23 14 31 40 33.3 16.7 13.3 0.0 .28 .50 33 08 21 35.5 19.4 33.3 16.7 .03 .05 3 17 30 71.1 63.9 22.2 2.8 .49 .71 54 33 40 17.8 0.0 22.2 2.8 -.03 -.17 -10 02 8 51.1 38.9 40.0 25.0 .12 .17 10 31 40 17.8 0.0 11.1 0.0 .11 .00 0 0 0 46.7 33.3 31.1 13.9 .17 .26 16 22 34 55.6 44.4 17.8 0.0 .43 .70 52 22 34 31.1 13.9 37.7 22.2 -.07 -.12 - 7 18 31 53.3 41.7 22.2 2.8 .34 .59 41 22 34 48.8 36.1 20.0 0.0 .32 .66 48 18 31 13.3 0.0 6.7 0.0 .10 .00 0 0 0 40.0 25.0 33.3 16.7 .08 .12 7 21 33 44.4 *84.4 30.6 **80.6 Method of Flanagan Method of Davis .44 .51 34 55 53 273 TEST D ORGANIZATION OF DATA This test is designed to test your ability to or­ ganize data. Select from the key below the curve which best fits the data. If none of the curves fit the data mark space five on your answer sheet. 1 3. 5. 4. 2 none of the curves The horizontal axis represents temperature. The vertical axis represents the amount of Substance A derived from Substance B. Temperature Amount of Substance A 4 grams 10°C. 25°G. 7 grams 35°C. 9 grams 14 grams 60°C. The horizontal axis represents the amount of oxygen in the experimental gas mixtures. The vertical axis represents the amount of oxygen taken up by red cells in these experiments. Oxygen in gas mixtures 0 10 20 3. Oxygen taken up by red cells 0 n 90 30 98 50 The horizontal axis represents the percent of carbon dioxide in gas mixtures breathed in; the vertical axis represents the percent increase in total amount of gas breathed per minute. Carbon dioxide percent 0 1 2 3 5 7 Percent Increase 0 10 25 50 100 200 274 The horizontal axis Is the concentration of salt. (Sodium chloride). The vertical axis is the per­ cent of red cells destroyed in these concentrations of salt. Concentration of salt Abbreviated Key 1 - ^ 3.. 2. 4. 5. none Percent red cells destroyed 98 75 .27 .36 .41 .50 10 1 The horizontal axis represents the amount of thyroprotein fed daily to cows. The vertical axis repre­ sents the percent increase in milk production. Thyroproteln fed Percent increase .15 .20 .24 .30 18 23 27 33 grams grams grams grams The horizontal axis represents age in years. The vertical axis is the percent increase in the weight of the brain from birth to twenty years of age. Percent increase 1 yr. 4 12 40 80 98 The horizontal axis represents the time in minutes to kill bacteria in a weak solution of silver nitrate. The vertical axis are the temperatures to which the bacteria in the silver nitrate solu­ tions were subjected. Temperature Time in minutes 160 80 40 0 15° C. 20°C. 30° C. 45°C. 275 8. The horizontal axis repre­ sents age in years. The vertical axis is the per­ cent increase in the weight of the ovaries and other female sex organs from birth to 20 years. *s2 4 9. Percent increase ‘ 8 12 14 18 20 80 The horizontal axis represents time in seconds; the vertical axis represents the amount of heat developed in a single contraction of a single muscle fiber. .0 .1 .2 .4 .8 Heat 0.0 10.0 15.0 14.0 4.0 The horizontal axis represents the number of days since the memorization of certain nonsense syllables; the vertical axis is the percent of the nonsense syllables forgotten. Time in days 11. 5. none 10 Time 10. Abbreviated Key Percent forgotten 1 45 2 60 6 12 80 84 The horizontal axis represents age of girls in years; the vertical axis is the strength index of these girls in pounds. Age 1 5 Strength index .5 2.0 8 5.0 15 12.0 20 12.5 276 12. The horizontal axis repre­ sents the successive number of trials In the learning of a puzzle. The vertical axis Is the time In seconds of each trial. Trial 1st 5th 10th 14th 18th 13. 1 3 6 15. i- ^ 2. X 3 / 4. t 5. none Time in seconds 420 419 240 60 50 The horizontal axis represents the time in hours after the injection of sugar into the blood; the vertical axis is the amount of sugar in the blood. Time after in .lection 14. Abbreviated Kev Blood suffar 35 12 8 The horizontal axis is the time in minutes after pint Jars of c o m have been put in boiling water and kept boiling; the vertical axis is the temperature in the center of the pint Jar. Time Temperature 5 10 30 60 100 20 21 55 90 99 The horizontal axis is age in years. The vertical axis is the metabolic rate of an individual expressed in calories per day. A&2 2 5 15 25 40 Calories 60 40 30 25 23 277 16. The horizontal axis represents time in days; the vertical axis is the number of yeast cells In millions (starting with 100 yeast cells). Time in days 4 8 12 20 17. none Number of yeast cells in millions 25 150 390 400 The horizontal axis is the temperature in Centigrade. The vertical axis represents the amount of enzyme activity of a certain type of bacteria in arbitrary units. Temperature 10 30 50 70 90 18 Abbreviated Key Enzyme activity 0 1 2 3 2.5 The horizontal axis represents age in weeks; the vertical axis represents the weight of an animal, in kilograms. Age Weight 1 .05 .15 .80 3 8 12 19. 1.6 16 2.4 25 2.8 The horizontal axis represents the external temper­ ature; the vertical axis represents the amount of oxygen absorbed by a frog at the various temperatures Temperature 10 14 20 25 Oxygen 104 130 160 208 mg. mg. mg. mg. 278 20. The horizontal axis represents Abbreviated Key the time in hours. The vertical j axis the temperature inside a l._-X 5.none thermos bottle containing germ­ inating pea seeds. 2. Time Temperature 0 20°C. 12 24 36 24°C. 3 0°G. 32°0. 279 TABLE XXXIX ITEM ANALYSIS DATA FOR TEST D Percent Success Item 1 2 3 4 5 6 7 8 9 10 11 12 13 ** ** Upper 27$ Lower 27$ Discrimination r Index Difficulty ; $ Success Index *75.6 **69.4 42.2 27.8 .35 .40 26 48 49 95.6 94.4 37.7 22.2 .67 .74 57 57 54 80.0 75.0 24.4 5.6 .56 .71 54 40 45 88.8 86.1 20.0 0.0 .68 .87 80 42 46 84.4 80.6 51.1 38.9 .38 .43 28 59 55 91.1 88.9 26.7 8.3 .66 .78 63 48 49 77.8 72 .2 11.1 0.0 .67 .81 69 37 43 95.6 94.4 26.7 8.3 .73 .83 71 51 51 93.3 91.7 31.1 13.9 .67 .75 59 53 52 93.3 91.7 31.1 13.9 .67 .75 59 53 52 75.6 69.4 35.5 19.4 .41 .51 34 44 47 93.3 91.7 33.3 16.7 .65 .74 57 53 52 95.6 94.4 15.6 0.0 .80 .90 90 46 48 Method of Flanagan Method of Davis 280 TABLE XXXIX (continued) Percent Success Item 14 15 16 17 18 19 20 * ** Upper 2 7 % Lower 2 7 % Discrimination r Index Difficulty % Success Index 75.6 69.4 33.3 16.7 .43 .54 36 42 46 97.8 97.2 33.3 16.7 .75 .81 68 57 54 64.4 55.6 20.0 0.0 .46 .75 58 28 38 77.8 72.2 26.7 8.3 .52 .66 48 40 45 68.9 61.1 26.7 8.3 .43 .59 41 35 42 33.5 16.8 22.2 2.8 .14 .35 22 09 22 *42.2 **27.8 22.2 2.8 .23 .48 32 15 28 Method of Flanagan Method of Davis 281 TEST E EVALUATION OF HYPOTHESES This test is designed to measure your understanding of the relation of facts to the solution of a problem. The over-all problem involved in this test is presented. This is followed by a series; of possible solutions to the problem (hypotheses). After each hypothesis there are a number of items, all of which are true statements of fact. Determine how the statement is related to the hypothesis and mark each statement according to the key which follows the hypothesis., GENERAL PROBLEM: What factors are involved in the transmission and development of Infantile Paralysis (Poliomyletis)? HYPOTHESIS I. In man the disease is contracted by direct contact with persons having the disease. For items 1 through 11 mark space if the item offers; 1.:direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no bearing on the hypothesis. 4. indirect evidence against the hypothesis. 5. direct evidence against the hypothesis. 1. Monkeys free from the disease almost never catch infantile paralysis from infected monkeys. 2. Most strains of infantile paralysis virus can be transferred from man only to monkeys and apes and not to other animals. 3. The virus has been isolated from the nasopharyngeal washings of humans and monkeys. 4. The curve of number of cases of the disease in a given area is the same shape as the curve for the fly population in that area, the Infantile paralysis incidence curve lagging behind the fly population curve by about two weeks. 3. The virus has never been isolated from the blood. 6. The virus is not found in the nasal secretion, nor in the saliva. 282 7. The incubation period for infantile paralysis is from 4 to 21 days. 8. Most persons in contact with the diseased individual do not develop the disease. 9. The incidence of infantile paralysis is higher in rural districts than in the cities. 10. Oases of infantile paralysis have been found to follow the roads of communication of the population, that is, the disease spreads from populated areas along roads or rivers to other areas. 11. Even during epidemics cases are spotty, it is usually impossible to trace one case from another. 12. What is the status of hypothesis I ? 1. It is true. 2. It is probably true. 3. It is false. 4. It is probably false. 5. The data are contradictory, hence its truth or falsity cannot be Judged. HYPOTHESIS II. The disease is spread by the excrement (excreted material) of persons harboring the virus. For items 13 through 23 mark space if the item offers: 1. direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no bearing on the hypothesis. 4. indirect evidence against the hypothesis. 3. direct evidence against the hypothesis. 13. The virus is always found in the stools of persons who have the disease. 14. In the stools of persons not in contact with persons with the disease the virus is found in only one person in 100. 15. During an epidemic non-paralytic cases outnumber paralytic cases ten to one. 16. The curve of number of cases of the disease in a given area is the same shape as the curve for the fly population in the area, the infantile paralysis incidence curve lagging behind the fly population curve by about two weeks. 283 17. The Incubation period for infantile paralysis is from 4 to 21 days. 18. Nine out of 14 adult contacts had virus in the stool, almost all child contacts have virus in the stools.. 19. The virus has been isolated from streams carrying sewage. 20. Oases of the disease have been found to follow the roads of communication of the population, that is, the disease spreads from populated areas along roads or rivers to other areas. 21. The virus of the disease has been found in the stools and vomit of flies up to two days after eating an infected meal. 22. Even during epidemics cases are spotty. 23. It is usually impossible to trace one case from another. 24. What is the status of hypothesis II ? 1. It is true. 2. It is probably true. 3. The data are contradictory, so the truth or falsity cannot be Judged. 4. The hypothesis is probably false. 5. It is definitely false. HYPOTHESIS III. The olfactory nerve (nerve from nose to brain) is the route of entry of the virus. For items 25 through 34 mark space if the item offers 1. direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no bearing on the hypothesis. 4. indirect evidence against the hypothesis. 5. direct evidence against the hypothesis. 25. The virus has been isolated from nasopharyngeal washings of humans and monkeys. 26. A plug of cotton, saturated with virus, placed in the nose of the monkey invariably causes the monkey to contract the disease. If the olfactory nerve is cut the monkey does not contract the disease when a plug saturated with the virus is placed in the nose. 284 27. If the nose of a monkey is sprayed with zinc sulphate the monkey (with virus plug inserted) does not contract the disease. 28. The virus is not found in the nasal secretion or in the saliva. 29. The virus has been isolated from the spinal cord of 71# of the cases autopsled, and from the olfactory nerve in 5# of the cases autopsled. 30. The virus has been found in the nasopharynx from several days before the onset of the disease until about 3 days after the onset of the disease. 31. Many doctors recommended the use of zinc sulphate nasal spray (administered only by the physician). 32. The virus is not affected by freezing. 33. Most strains of the virus can be transferred to monkeys and apes. only 34. The percentage of cases of infantile paralysis among persons receiving the nasal spray of zinc sulphate was the same as the percentage of cases in the total population. 35. What is the status of hypothesis III ? 1. It is true. 2. It is probably true. 3. The data are contradictory, hence truth or falsity of the hypothesis cannot be Judged. 4. It is probably false. 5. It is definitely false. HYPOTHESIS IV. The higher the degree of sanitation the greater are the chances of epidemic forms of the disease. For items 36 through 45 mark space if the item offers: 1. direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no bearing on the hypothesis. 4. indirect evidence against the hypothesis. 5. direct evidence against the hypothesis. 36. Monkeys free of the disease almost never catch infantile paralysis from infected monkeys. 285 37. The virus has been Isolated from streams carrying sewage. 38. In India epidemics seldom occur. 39. In India children under five are about the only ones affected. 40. During the war there was one epidemic among the European and American soldiers in India, the incidence among the soldiers was extremely high. 41. The percent of cases of infantile paralysis in whites is about four times that in colored people. 42. In the south (U.S.) there are three times as many cases under five years as over five years of age. 43. The percent of cases of infantile paralysis is higher in rural districts than in the cities. 44. In the north (U.S.) about 50$ of the cases are over 5 years of age.. 45. During an epidemic non-paralytic cases outnumber paralytic cases ten to one. 46. What is the status of hypothesis IV ? 1. The hypothesis is true. 2. It is probably true. 3. The data are contradictory, hence the truth or falsity of the statement cannot be Judged. 4. It is probably false. 5. It is definitely false. HYPOTHESIS V: Healthy persons having had contact with diseased individuals may carry the disease from one person to another. For items 47 through 59 mark space if the item offers: 1. direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no bearing on the hypothesis. 4. indirect evidence against the hypothesis. 5. direct evidence against the hypothesis. 47. Monkeys free of the disease almost never catch infantile paralysis from infected monkeys. 286 48. During an epidemic non-paralytic cases outnumber paralytic cases ten to one. 49. It has been found that exertion prior to or at the time of infection increases the incidence of the disease. 50. Even during epidemics cases usually impossible to trace 51. The virus is always found in the stools of people who have the disease. 52. Most persons in contact with the diseased individual do not develop the disease. 53. Nine out of 14 adult contacts had virus in stools, almost all child contacts have virus in stools. 54. U p to two months after contact the virus is found in the stools of persons who contacted the victims, but who did not contract the disease. 55. In the stools of non-contacts the virus was found In only one person in 100. are spotty; it is one case from another. 5 6 . Data on families each with one case of infantile paralysis in the family: 39% of other children in family from 1-4 years of age and 30% of other children in family 5-9 years of age had minor illnesses. Only 9% of children in other homes showed similar illnesses. 57. The percent of cases of infantile paralysis is higher in rural districts than in the cities. 58. Under twenty years of age the percent of cases in males is three times the percent of cases in females. 59. Flies were allowed to feed on contaminated food. The flies were then placed in contact with food which was fed to monkeys. The feces of the monkeys contained the virus. 60. What is 1. 2. 3. 4. 5. the status of hypothesis V ? The hypothesis is true. It is probably true. The data are contradictory, so the truth or falsity cannot be Judged. It is probably false. It is definitely false. 287 HYPOTHESIS VI. An immunity to the disease may he developed. For items 63 through 71 mark space if the item offers: 1. direct evidence in support of the hypothesis. 2. indirect evidence in support of the hypothesis. 3. evidence which has no hearing on the hypothesis. 4. indirect evidence against the hypothesis. 5^ direct evidence against the hypothesis. 61. Most strains of the infantile paralysis virus can be transferred only from man to monkeys and apes and not to other animals. 62. During an epidemic non-paralysis virus cases outnumber paralytic cases ten to one. 63. The incubation period of infantile paralysis is from 4 to 21 days. 64. Even during epidemics cases are spotty; it is usually impossible to trace one case from another. 65. Most persons in contact with the diseased individual do not develop the disease. 66. Up to two months after contact the virus is found in the stools of persons who contacted the victims, but who did not contract the disease. 6 7 . In the stools of persons not in contact with persons with the disease the virus was found in only one person in 100. 68. Data on families each with one case of infantile paralysis in the family: 39$ of the other children in family from 1-4 years of age and 30$ of the other children in family from 5-9 years of age had minor illnesses. Only 9$ of children in other homes showed similar illnesses. 69* Epidemics seldom occur in India and the disease is almost entirely among children under 5 years of age. 70. The percent of cases in whites is about four times the percent of cases in colored people. 71. Gases of infantile paralysis may continue into the winter, but an epidemic never arises anew during the winter. 288 72. What is 1. 2. 3. 4. 5. 73. the status of hypothesis VI ? The hypothesis is true, It is probably true. The data is contradictory, so the truth or falsity cannot be Judged. It is probably false. It isdefinitely false. How many of the six hypotheses are acceptable? 1. 1 2. 2 3. 4. 5. 74. 3 4 5 How many of the hypotheses are not acceptable? 1. 1 2. 2 3. 3 4. 4 5. 5 289 TABLE XXXX ITEM ANALYSIS DATA F O R TEST E Percent Success Item 1 2 3 4 5 6 7 8 9 10 11 12 13 ♦ ♦♦ U p p e r 27$ Lower 27$ Discrimination r Index Difficulty $ Success Index ♦66.7 ♦♦58.3 31.1 13.9 .38 .47 31 35 42 57.8 47.2 40.0 25.0 .18 .24 15 35 42 40.0 25.0 31.1 13.9 .10 .17 10 19 32 53.3 41.7 17.0 0.0 .41 .69 51 21 33 77.8 72.2 44.4 30.6 .36 .40 26 51 51 53.3 41.7 22.2 2.8 .34 .60 42 22 34 93.3 91.7 66.7 58.3 .40 .45 29 74 64 91.1 88.9 84.4 80.6 .13 .15 9 83 70 82.2 77.8 40.0 25.0 .45 .55 37 53 52 57.8 47.2 44.4 30.0 .13 .18 11 38 44 55.6 44.4 37.7 22.2 .20 .24 15 33 41 48.8 36.1 22.2 2.8 .28 .56 38 19 32 71.1 63.9 71.1 63.9 .00 .00 0 64 58 M e thod of Flanagan M e thod of Davis - 290 TABLE XKXX (continued) Percent Success Item Upper 2 7 % Lower 2 Discrimination 7 % r Index Difficulty % Suc c e s s Index 53.3 41.7 17.8 0.0 .38 .69 41 21 33 100.0 10 0.0 84.4 80.6 .50 .52 35 89 76 53.3 41.7 15.6 0.0 .41 .69 51 21 33 84.4 80 . 6 80.0 75.0 .05 .08 5 77 66 28. 9 11.1 20.0 0.0 .12 .39 25 5 16 66.7 58.3 33.3 16.7 .36 .44 27 35 42 53.3 41.7 44.4 30.6 .08 .14 8 35 42 53.3 41.7 22.2 2.8 .34 .60 42 22 34 60.0 50.0 48.8 36.1 .12 .15 9 42 46 38.7 22 . 0 17.8 0.0 .27 .55 37 12 25 84.4 80. 6 5 1.1 38.9 .38 .43 28 59 55 25 48.8 3 6.1 28 . 9 11.1 .21 .34 21 24 35 26 95.6 94.4 84.4 80.6 .25 .29 18 87 74 27 11.1 0.0 11.1 0.0 .00 .00 0 0 0 28 53.3 41 . 7 20.0 0.0 .36 .69 51 21 33 44.4 30.6 31.1 13.9 .15 .22 13 22 34 14 15 16 17 18 19 20 21 22 23 24 29 291 TABLE XXXX (continued) Percent Success Item Upper 27 % Lower Discrimination r Index Difficulty % Success Index 57.8 47.2 33.3 16.7 .26 .38 24 31 40 46.7 33.3 24.4 5.6 .24 .41 27 19 32 97.8 97.2 84.4 80.6 .40 .41 27 89 76 91.1 88.9 75.6 69.4 .27 .27 17 79 67 55.6 44.4 37.7 22.2 .20 .24 15 33 41 62.6 52.8 24.4 5.6 .40 .58 40 28 38 75.6 69.4 44.4 30.6 .34 .39 25 50 50 17.8 0.0 17.8 0.0 .00 .00 0 0 0 55.6 44.4 17.8 0.0 .42 .70 52 22 34 39 31.1 13.9 20.0 0.0 .14 .46 30 8 20 40 46.7 33.3 24.4 5.6 .24 .43 28 19 32 62.6 52.8 28.9 11.1 .35 .47 31 31 40 26.7 8.3 26.7 8.3 .00 .00 0 8 20 43 46.7 33.3 13.3 0.0 .40 .65 47 17 30 44 15.6 0.0 13.3 0.0 .07 .00 0 0 0 100.0 100.0 77.8 72.2 .55 .61 43 85 72 30 31 32 33 34 35 36 37 38 41 42 45 292 TABLE XXXX (continued) Percent Success Item Upper 27$ Lower 27$ Discrimination r Index Difficulty $ Success Index 33.3 16.7 13.3 0.0 .30 .48 32 8 21 62.6 52.8 24.4 5.6 .40 .58 40 28 38 15.6 0.0 13.3 0.0 .07 .00 0 0 0 93.3 91.7 55.6 44.4 .49 .56 38 69 60 53.3 41.7 17.8 0.0 .49 .56 38 69 60 88.9 86.1 64.4 55.6 .33 .36 23 70 61 57.8 47.2 22.2 2.8 .38 .63 45 25 36 35.5 19.4 24.4 5.6 .14 .29 18 13 26 57.8 47.2 20.0 0.0 .40 .71 54 24 35 57.8 47.2 26.7 8.3 .32 .51 34 28 38 44.4 30.6 37.7 22.2 .06 .10 6 25 36 57 48.8 36.1 28.9 11.1 .21 .32 20 24 35 58 95.6 94.4 88.9 86.1 .20 .18 11 90 77 59 35.5 19.4 20.0 0.0 .18 .54 36 11 24 60 64.4 55.6 37.7 22.2 .28 .37 23 38 44 46 47 48 49 50 51 52 53 54 55 56 293 TABLE XXXX (continued) Percent Success Item 61 62 63 64 65 66 67 68 69 70 71 72 73 74 * ** U p p e r 27$ Lower 27$ Discrimination r Index Difficulty % Success Index 46.7 33.3 40.0 25.0 .07 .10 6 28 38 60.0 50.0 46.3 33.3 .13 .17 10 42 46 95.6 94.4 88.9 86.1 .20 .18 11 90 77 37.7 22.2 35.5 19.4 .03 .02 2 21 33 68.9 61.1 24.4 5.6 .44 .64 46 34 41 48.8 36.1 35.5 19.4 .13 .17 10 28 38 66.7 58.3 42.2 27.8 .27 .31 19 42 46 48.8 36.1 35-5 19.4 .14 .20 12 28 38 51.1 38.9 33.3 16.7 .18 .27 17 27 37 46.7 33.3 24.4 5.6 .24 .43 28 20 32 73.3 66.7 46.7 33.3 .28 .34 21 50 50 71.1 63.9 37.7 22.2 .36 .43 28 42 46 42.2 27.8 13.3 0.0 .35 .61 43 15 28 *40.0 **26.7 25.0 8.3 .17 .31 19 17 30 Method of Flanagan Method of Davis 294TEST F EXPERIMENTATION AND THE INTERPRETATION OF DATA This test was designed to measure your ability to interpret data and to test your understanding of experi­ mentation. In each case the numbers in the first column are the numbers which you will use as your answer. Thus the table presented becomes both the source of data and your key for the questions which follow it. In each case where a test tube number or group number is called for the one which gives positive evidence for the statement should be given. Below this the control or comparison is called for. This is the test tube or group number of the data which offers a comparison. For example: 1. 2. Leaf Leaf in dark - no starch. in light - starch. Light is necessary for the production of starch. You would mark space 2 because this is thepositive evi­ dence, but it would be meaningless if itwere not compared with the leaf in the dark. Therefore, the following item, •'What is the control (comparison) for item 1 ?" would be marked space 1. Items 1 through 15 refer to the data presented below. Some test tubes were set up and each contained 1 gram of fat. They were marked 1, 2, 3, 4-, and 5. Mark each item according to the test tube number called for. Various substances were added to the tubes containing fat. All substances were dissolved in water before they were added to the fat. All test tubes were kept at 85 F . (Water boils at 212° F.) For test tube 5, Substance A was boiled and then allowed to cool before it was added to the fat. Test Tube Number 1 2 3 4 5 Content of tube Fat plus Substance A Fat plus Substance A plus Substance C Fat plus Water Fat plus Substance C Fat plus Substance A (boiled) Amt. of Substance B present after 24 hrs. .1 gram .5 gram -0 gram »0 gram .0 gram 295 1. Give the number of the test tube which acts as a control (comparison) for the entire experiment, 2. G-ive the number of the tube which gives evidence that fat does not break down spontaneously into Substance B in 24 hours. 3. Give the number of the tube used to show that a temperature of 85 degrees F. was not sufficient to cause fat to be broken down into Substance B. 4. Give the test tube number of the tube which gives evidence that Substance A is the active substance in the breakdown of fat to Substance B. 5. Give the teBt tube number of the tube which is the control (comparison) for item # 4. 6. Give the number of the tube which provides evidence that Substance 0 alone is ineffective in the break­ down of fats. 7. What is the control for item # 6? 8. Which test tube gives evidence that Substance C accelerates the rate of activity of Substance A? 9. Give the tube which is the control for item # 8 . 10. Which tube gives evidence that Substance A is a substance whose properties can be destroyed? 11. Give the control for the tube in item # 10. 12. Which tube gives evidence that Substance G affects the fat in some way so that Substance A can more easily act upon it? 13. Which tube is the control for # 12? 14. Which tube gives evidence that Substance A is not a stable substance? 15. What is the control for item # 14? 296 Items 16 through 28 refer to the d ata presented below. Mark each item a c c ording to the g r o u p called for. Each g r o u p contained 100 persons fed on the diets indicated. Group 1 2 3 4 5 D iet Gases of Beri Beri whole rice (i.e. rice w ith hulls) polished rice (i.e. rice with h u l l s removed) polished rice plus V i t a m i n B i polished rice plus V i t m a i n B 2 polished rice plus V i t a m i n B complex none 60# none 60# none 16. G-ive the number of the g r o u p which is the control (comparison) for the entire experiment. 17. Give the group w h ich gives evidence that rice hulls contain a beri beri preventing substance. 18. Give the control for item 17. 19. Give the number of the g r o u p which provides e v i ­ dence that V i t a m i n B is not a single entity. 20. Give the control for item 19. 21. Give the numb e r of the g r o u p which rice hulls may contain V i t a m i n B. 22. Give the control for item 21. 23. Give the number of the group which provides evidence that rice hulls may contain Vit a m i n B^. 24. Give the num b e r of the g r o u p which is the control for item 23. 25. Which group gives evidence that a differing of Vi t am i n B causes beri beri. 26. What is the control for item 257 27. What g r oup gives evidence that Vit a m i n B2 is not the active factor in the prevention of beri beri? 28. What is the control for item 27? indicate that 297 Items 29 through 39 r e fer to the d a t a p r e sented below. M a r k e ach Item ac c o r d i n g to the g r o u p numb e r called for. W h e n a person ascends to h igh altitudes his blood cell count Increases a f ter about 10 days. The following d a t a were o b t a i n e d from a study of altitude effects on rats. 7 6 0 mm, of m e r c u r y Is atmospheric pressure at sea level. A i r Is c o m p o s e d of about 20$ oxygen a nd 8 0 $ nitrogen. G r oup 1 2 3 4 5 A t m o s p h e r i c pressure 760 380 760 760 380 $02 10 20 20 40 40 $ N 90 80 80 60 60 R e d cell count Increased increased normal d e c reased normal 29. G i v e the n u m b e r of the g r o u p which is the control for the entire experiment. 30. G i v e the n u m b e r of the g r o u p that gives evidence that a decrease in atmospheric pressure causes an increase in red cell count at h i g h altitude. 31. W h i c h g r o u p is the control 32. W h i c h g r o u p gives evidence that it is the decrease of oxygen pressure which is responsible for the increase in cell count at h i g h altitudes? 33. W h i c h of the groups is the 34-. W h i c h of the groups gives evidence that a decrease in at m o s pheric pressure is n o t the cause of an in­ c r e a s e d red cell count at h i g h altitudes? 35* W h a t is the control for item 34? 36. Give the n u m b e r of the g r o u p which gives evidence that a decrease in n i trogen pressure is not respon s i b l e for the I n c reased r e d cell count at h i g h altitudes. 37. Give the n u m b e r of the group that is the control for item 36. 38. W h i c h g r o u p gives evidence that an increase in o x y g e n pressure decreases the red cell count? 39. Wha t is the control for item 38? (comparison) for item 30? best control for item 32? 2 9 8 Items 40 through 57 refer to the data presented below. Mark each item according to the leaf number called for. Plant A normally stores starch in its leaves while plant B does not normally store starch in its leaves. The following experiments were performed in a dark room at 72 degrees F. Glucose (sugar) solutions were made with 20 grams of glucose per 100 cubic centimeters of water. Leaves of plant A taken from a plant that had been in the dark for 48 hours were floated in the 5 solutions listed below and left in the glucose solution for an hour. Leaf 1 2 3 4 5 Solution Glucose Water Glucose plus Juice from Plant B Glucose plus Juice from Plant C Glucose plus boiled Juice from Plant B Analysis of leaf after 4 hours Starch in leaf No starch in leaf No starch in leaf No starch in leaf Small amount of starch in leaf 40. Give the number of the leaf which showed that starch does not develop spontaneously in the leaf in the dark. 41. This leaf indicates that a temperature of 72 degrees F. does not cause starch to form in the leaf. 42. Give the number of the leaf which is the control (comparison) for the entire experiment. 43. Give the number of the leaf which gives evidence that Plant A is capable of manufacturing starch from glucose. 44. Give the number of the leaf which is the control for item 43. 45. Give the number of the leaf which gives evidence that the juice of Plant B is capable of preventing the manufacture of starch from glucose. 46. What is the control for item 45? 47. Give the number of the leaf which gives evidence that Plant A is normally able to store starch in its leaves. 48. What is the control for item 47? 299 49. Gi v e the n u m b e r of the leaf w h i c h gives evidence that P l a n t C d o e s n o t n o r m a l l y f o r m s t a r c h in I t s l e a v e s . 50. G-ive the 51. Wh i c h leaf shows that w a t e r does not d u c t i o n o f s t a r c h in t h e l e a f ? 52. G-ive the n u m b e r of the leaf w h i c h g i v e s e v i d e n c e that the juices o f Plant B contain a substance which i n h i b i t s t h e p r o d u c t i o n o f s t a r c h in i t s l e a v e s . 53. Give 54. This leaf g i v e s e v i d e n c e t h a t the s t a n c e is n o t a s t a b l e s u b s t a n c e . 55. What 56. G i v e the n u m b e r of the leaf w h i c h shows that b o i l ­ i n g d e s t r o y s t h e a c t i v i t y o f t h e J u i c e o f P l a n t B. 57. Give the is the leaf n u m b e r leaf w h i c h the control control for of is for the the control control item f o r I t e m 49. cause for the p r o ­ i t e m 52. inhibitory sub­ 54? i t e m 56. I t e m s 58 t h r o u g h 7 2 r e f e r t o t h e d a t a p r e s e n t e d □elow. F i v e t e s t tubes, e a c h c o n t a i n i n g a g r a m of protein, r f e re set up. M a r k e a c h i t e m a c c o r d i n g to t h e t e s t t u b e l u m b e r c a l l e d for. A l l s u b s t a n c e s w e r e d i s s o l v e d in w a t e r , ill t e s t t u b e s w e r e k e p t a t 37 ° C . ( w a t e r b o i l s a t 1 0 0 ° C.) ror t e s t t u b e 5, S u b s t a n c e X w a s b o i l e d a n d t h e n c o o l e d D e f o r e it w a s a d d e d to t h e p r o t e i n . Tube 1 2 3 4 5 • 59. Contents of tubes Protein plus S ubs ta nce X Protein plus W a t e r Protein plus Substance X plus h y d r o c h l o r i c acid Protein plus H y d r o c h l o r i c acid Protein plus S u b s t a n c e X (boiled) A m t . of S u bstance W present after 24 hours .05 g r a m .00 g r a m .08 g r a m •00 g r a m .00 g r a m G i v e t h e n u m b e r o f t h e t e s t t u b e w h i c h a c t s as a control (comparison) for the entire experiment. G i v e the n u m b e r of the test tube wh i c h gives e v i ­ dence that protein does not break down spontaneously into S u b s t a n c e W. 300 60. G-ive the number of the test tube which gives evi­ dence that Substance X is the active substance in the break down of proteins. 61. Give the number of the tube which is the control for item 60. 62 . G-ive the number of the test tube which shows that a temperature of 37° C. does not cause protein to break down into Substance W. 63. Which test tube gives evidence that Substance X is not a stable substance? 64. Which 65. Which tube gives evidence that acid accelerates the activity of Substance X? 66. Which 67. Which tube gives evidence that Substance X is a substance whose properties can be destroyed? tube is the control for item 63? tube is the control for item 65? 68. Give the test tube number of the control for item 67. 69. Which test tube gives evidence that acid affects the protein in some way so that Substance X can act upon it more easily? 70. Give the tube number which is the control for item 69. 71. Give the number of the test tube which indicates that hydrochloric acid alone is ineffective in breaking down proteins. 72. G-ive the control for item 71. 301 TABLE X X X X I ITEM A N A L Y S I S D A T A F O R TEST F Percent Success Item 1 2 3 4 5 6 7 8 9 10 11 12 13 * ** U p p e r 27 % Lower 27$ Discrimination r Index Difficulty % Success Index *91.1 **88.9 77.8 7 2.2 .24 .24 15 80 68 77.8 72.2 24.4 5.6 .55 .70 52 38 44 64.4 56.6 22.2 2.8 .44 .68 50 28 38 88.9 86.1 60.0 50.0 .37 .41 27 68 60 31.1 13.9 24.4 5.6 .08 .18 11 10 23 100.0 100. 0 73.3 66.7 .62 .64 46 82 69 86. 7 83.3 51.1 38.9 .43 .48 32 61 56 100.0 100.0 82.2 77.8 .52 .55 37 88 75 88.9 86.1 64.4 55.6 .34 .36 23 70 61 97.8 97.2 80.0 75.0 .45 .47 31 86 73 9 3.3 91.7 73.3 66.7 .34 .38 24 79 67 95.6 94.4 80.0 75.0 .35 .35 24 79 67 8 0. 0 75.0 5 1.1 38.9 .33 .36 23 57 54 M e t h o d of F l a n a g a n M e t h o d of Davis 302 TABL2 XXXXI (continued) Percent Success Upper 14 2 1 % Lower 2 1 % Discrimination r Index Difficulty % Success Index 91.1 88.6 48.8 36.1 .50 .63 38 61 56 88.9 86.1 53.3 41.7 .44 .48 32 63 57 33.3 16.7 8.9 0.0 .38 .48 32 8 21 77.8 72.2 55.6 44.4 .25 .29 18 57 54 77.8 72.2 53.3 41.7 .28 .31 19 59 54 71.1 63.9 35.5 19.4 .36 .46 30 42 46 28.9 11.1 15.6 0.0 .18 .31 19 59 54 62.6 52.8 53-3 41.7 .11 .10 6 46 48 15.6 0.0 6.7 0.0 .16 .00 0 0 0 23 42.2 27.8 42.2 27.8 .00 .00 0 28 38 24 35-5 19.4 22.2 2.8 .16 .45 29 10 23 25 100.0 100.0 86.7 83.3 .47 .50 33 91 78 26 46.7 33.3 25.9 11.1 .19 .32 20 22 34 27 95.6 94.4 53.3 41.7 .60 .63 45 69 60 53.3 41.7 28.9 11.1 .25 .39 25 25 36 15 16 17 16 19 20 21 22 303 TABLE XXXXI (continued) Item Percent Success Upper 2 7 % Lower 27 % Discrimination r Index % Difficulty Success Index 97.8 97.2 73.3 66.7 •52 •52 35 82 69 93.3 91.7 80.0 75.0 .25 .29 18 83 70 71.1 63.9 33.3 16.7 .38 .51 34 40 45 91.1 88.9 68.9 61.1 .33 .37 23 74 64 77.8 72.2 40.0 25.0 .38 .47 31 48 49 93.3 91.7 71.1 63.9 .37 .39 25 77 66 53.3 41.7 35.5 19.4 .18 .23 14 31 40 64.4 55.6 35.5 19.4 .29 .39 25 38 44 37 20.0 0.0 6.7 0.0 .25 .00 0 0 0 38 100.0 100.0 88.9 86.1 .45 .46 30 93 81 71.1 63.9 35.5 19.4 .36 .43 25 38 44 91.1 88.9 35.5 22.2 .61 .67 49 55 53 86.7 83.3 44.4 30.6 .48 .54 36 57 54 42 62.6 52.8 35.5 19.4 .27 .34 21 38 44 43 100.0 100.0 62.6 52.8 .68 .71 54 74 64 29 30 31 32 33 34 35 36 39 40 41 304 TABL2 XXXXI (continued) Percent Success Upper 27% Lower 2 7 % Discrimination r Index Difficulty % Success Index 93.3 91.7 40.0 25.0 .62 .69 51 57 54 97.8 97.2 80.0 75.0 .45 .46 30 87 72 80.0 75.0 28.9 11.1 .52 .63 45 42 46 91.1 88.9 73.3 66.7 .29 .31 19 77 66 86.7 83.3 86.7 83.3 .00 .00 0 83 70 49 100.0 100.0 100.0 100.0 .00 .00 0 100 100 50 0.0 0.0 0.0 0.0 .00 .00 0 0 0 100.0 100.0 86.7 83.3 .48 .50 33 91 79 88.9 86.1 42.2 27.8 .53 .59 41 57 54 53 62.6 52.8 11.1 0.0 .55 .73 56 27 37 54 97.8 97.2 55.6 44.4 .65 .68 50 70 61 55 91.1 88.9 53.3 41.7 .48 .54 36 64 58 56 100.0 100.0 75.6 69.4 .58 .62 44 84 71 57 93-3 91.7 64.4 55.6 .42 .47 31 73 63 93.3 91.7 71.1 63.9 .36 .39 25 77 66 44 45 46 47 48 51 52 305 TABLE XXXXI (continued) Percent Success Item 59 60 61 62 63 64 65 66 67 68 69 70 71 72 * ** Upper 27^ Lower 27^ Discrimination r Index % Difficulty Success Index 97.8 97.2 40.0 25-0 .72 .76 60 61 56 95.6 94.4 46.7 33.3 .64 .67 49 64 58 80.0 75.0 26.7 8.3 .53 .68 50 40 45 95.6 94.4 37.7 22.2 .67 .73 56 57 54 93.3 91.7 35.5 19.4 .60 .72 55 55 53 97.8 97.2 42.2 27.8 .72 .75 59 61 56 100.0 100.0 91.1 88.9 .40 .41 27 93 82 82.2 77.8 44.4 30.6 .42 .48 32 53 52 100.0 100.0 80.0 75.0 .55 .58 40 86 73 100.0 100.0 73.3 66.7 .62 .64 46 82 69 97.8 97.2 88.9 86.1 .32 .32 20 91 79 84.4 80.6 44.4 30.6 .45 .51 34 55 53 100.0 100.0 86.7 83.3 .45 .50 33 91 78 *91.1 **88.9 66.7 58.3 .35 .39 25 73 63 Method of Flanagan Method of Davis 306 TEST G DRAWING- OF CONGLUS IONS This test was designed to measure your ability to make conclusions. W h e n facts are analysed and studied they sometimes yield evidence which help in the solution of a problem. However, any conclusion must be checked before it can be accepted. The following key includes four ways in which conclusions may be faulty. Each of the items present a question or problem, a brief descrip­ tion of an experiment and one or more conclusions drawn from the experiment. Each experiment was repeated many times. Read each problem, experiment and the conclusions. Where several conclusions are giv e n evaluate each conclu­ sion separately. Is the conclusion tentatively Justified by the data? If so, mark space 1 on your answer sheet. If the conclusion is not Justified determine whether 2, 3, 4, or 3 in the key is the best reason for it being faulty and mark the proper space on your answer sheet. Key The conclusion Is: 1. Tentatively justified. 2. U n j u s tified - it does not answer p r o b l em. 3. Unjust i f i e d - the experiment lacks a control (comparison). 4. Unjust i f i e d - the data are faulty or Inadequate, though a control was i ncluded. 5. U njustified - it is contradicted by the data. PROBLEM: 1. He concluded that the test was a-specific test for the substance. PROBLEM: 2. A student was interested in developing a test for a certain type of substance. In all 100 cases his test was positive. A student knew that a purple color develops when iodine is added to starch and that this is a specific test for starch. He wished to determine whether a certain food contained starch. He added iodine to the food and found that it turned purple. He concluded that the food was fattening. Another student concluded that iodine is & test for starch. •I: An investigator wanted to know what causes people to hreathe faster when they are running rapidly. He found that breathing more carbon dioxide in­ creased the breathing rate, but that the breath­ ing of air deficient in oxygen did not increase the breathing rate. He concluded that people breathe faster when they are running because they need more oxygen. Someone else concluded that running increases the rate of breathing. Another person said that people running rapidly take in more carbon dioxide, causing them to breathe acre rapidly. Still another claimed that it is harder for the heart to pump faster without sufficient oxygen. Another concluded that carbon dioxide affects the breathing rate. Someone else concluded that people who are exercis­ ing must breathe pure carbon dioxide to cause an Increase in breathing rate. *r .: An individual, oxygen is used air of a large found that the wishing to determine whether during sleep, analyzed the expired number of sleeping persons. He expired air contained oxygen. He concluded that oxygen is not used during sleep. Another concluded that oxygen is needed for life. Someone else claimed that people breathe while they are sleeping. Still another person concluded that oxygen is given off as well as taken in during sleep. Another person said that this proved that oxygen is used during sleep. 308 PROBLEM: An investigator wished to determine whether temperature increased the rate of a certain reaction. On repeated tests he found that if he started out with a certain amount of his original substances he would obtain, after one hour, 1 gram of the substance produced by the reaction at 0 ° C ., 2 grams at 20°G., 5 grams at 40°C. and 3 grams at 60°C. 13. He concluded that increased temperature increased the rate of the reaction. 16. Another person claimed that this shows that an in­ crease in temperature increases the amount of the original substance. PROBLEM: A person wanted to determine whether bile aided in the digestion of fats. He found that whenever he mixed pancreatic juice with fats a small part of the fat was digested, but whenever he mixed pancreatic juice and bile with fat, he found that the fat was completely digested. When he mixed bile alone with fat he found that there was no digestion. 17. He concluded that bile aided in the digestion of fats. 18. Another concluded that pancreatic Juice was necessary for digestion of fats. 19. One person concluded that it was necessary that the bile and pancreatic juice work together, in order that fats may be digested. 20. Someone else claimed that bile does not aid in the digestion of fat. PROBLEM: In order to find out if all foods contained starch, ten foods were tested by the iodine test which was known to be a specific test for starch. All of the foods tested contained starch. 21. The conclusion drawn was that all foods contain starch. 22. Another conclusion was that iodine is a good reagent to determine the presence of starch. 23. Another conclusion was that the iodine test proved that starch was present. 309 PROBLEM: 24. In order to determine whether oortioosterone caused a certain disease, a person analyzed the blood of several hundred patients suffering from the disease. He found that in each case the blood contained cortln. He concluded that the disease was caused by corti­ costerone. PROBLEM: In order to determine the cause of increased red blood cell count at high altitude, experimenters subjected rats, dogs and guinea pigs at sea level to a reduced total atmospheric pressure. The red cell count was higher in these than in the same kinds of animals not subjected to reduced atmos­ pheric pressure. 25. Conclusion: A decrease in the oxygen in the air breathed at high altitude causes the increase in red cell count. 26. Another conclusion: The red cell count varies in­ versely with the atmospheric pressure. PROBLEM: Two students desired to know whether certain types of mosquitos or whether all mosquitos spread malarial fever. They captured many speci­ mens of three kinds of wild mosquitos, types A, B, and C. They examined the digestive tracts of all three types. They found malarial parasites only in type A mosquitos. 27. Conclusion: Malarial fever is spread by type A mosquitos but not by types B and C. 28. Another conclusion: parasites. Not all mosquitos carry malaria 29. Another conclusion: parasites. Not all mosquitos have malarial PROBLEM: A student Interested in frozen food preservation wanted to determine whether extremely low tempera­ tures killed the kind of bacteria that spoil meat. He cut a number of pieces of various types of meat into two pieces leaving one piece of each sample at room temperature and the other of each sample in a locker at a temperature of 40 degrees below freezing. All samples were sealed in bacteria-proof containers. After thirty days he opened the packages. He found the room temperature 310 PROBLEM: (cont inued) samples badly decomposed. The frozen samples were in their original condition except for being frozen solid. 30. Conclusion: A temperature 40 degrees below freez­ ing will kill the bacteria that are responsible for the decay of meat. 31. Another conclusion: Heat is a controlling factor in the preservation of foods. 32. Another conclusion: Meat kept in a temperature of 40 degrees below freezing does not become decomposed. 33. Another conclusion: Room temperature causes meats to spoil, whereas frozen meats are preserved. 34. Still another conclusion: Bacteria must not have been present in the frozen packages. PROBLEM: A person wanted to know what caused a certain disease. He examined 1000 patients with the disease. All had a certain bacteria (Bacteria A) in the digestive tract. 35. He concluded that Bacteria A was the cause of the disease. 36. Another conclusion: digestive tract. The disease starts in the 37. Another conclusion: digestion. Bacteria A is necessary for 38. Another conclusion: spoilage of food. The cause of the disease was PROBLEM: A person wanted to know why plants bend toward the light. He placed one group of plants in the light with the light source at the right. He placed another group of similar plants in the dark. The plants in the dark grew straight, the plants in the light were bent to the right. 39. He concluded that plants bend toward the light. 40. Another concluded that plants bend toward the light because they need light to grow. 311 41. Someone else concluded that light influences the direction in which plants grow. 42. Another concluded that plants bend toward the sun in order to get the beneficial rays of the sun. PROBLEM: An investigator wanted to know what caused fish to swim against the current. He placed fish in a bottle. If the bottle was moved to the right the fish moved to the left and vice versa. Blind fish did not respond to the water currents in the bottle, but fish do orient against the current in a stream at night. 43. He concluded that fish can see at night. 44. Another concluded that fish swim against the current because fish will drown if water enters the rear of the gills with force over a long period. 45. Another concluded that normal fish swim against the current. 46. Someone else concluded that blind fish do not swim against the current because they cannot see. PROBLEM: Investigator A wanted to know what caused people to become ill if confined in large numbers to a small closed area. He found on repeated tests that the air in very crowded closed areas con­ tained about 5 % carbon dioxide, while normal air contains .03^ carbon dioxide. 47. He concluded that excessive carbon dioxide caused the illness. 48. Another investigator concluded that the illness was caused by insufficient oxygen. 49. Another investigator claimed that the illness was caused by the germs exhaled by the people in the room. PROBLEM: 50. Investigator B in an attempt to solve the same problem repeated the experiment done by investi­ gator A but in addition had people in uncrowded rooms breathe air containing 3 % carbon dioxide. No ill effects were noted among those in the u n ­ crowded r o o m s . He also concluded that excessive carbon dioxide caused the illness. 312 51. Anothe r Investigator claimed that this showed that the disease was caused by insufficient oxygen. 52. The investigator who callmed the disease was due to germs was convinced by this experiment that he was correct. 53. A n o the r conclusion was that 5/6 carbon dioxide will produce no ill effects. 54. Still another claimed that people live better in un c r ow d e d a r e a s • PROBLEM: 55. To find out if all foods contain sugar. Benedict's solution is a specific test for sugar. Ten foods were tested with Benedict's solution. A l l of the foods contained sugar. Conclusion: sugar. Benedict's solution is a good test for 56. Another 57. A n o the r conclusion: sugar was present. PROBLEM: 58. 59. conclusion: All foods contain sugar. The Benedict test showed that To determine whether a certain bacteria uses oxygen. The Winkler test is an oxygen test. A broth in which bacteria were grown was tested for oxygen. The broth was shown, by the Winkler test, to contain oxygen. Conclusion: oxygen. This type of bacteria does not use Another conclusion: This type of bacteria gives off oxygen as a waste product. 60. Still smother conclusion: The presence of oxygen does not stop the growth of bacteria. 61. Another person concluded that this proves that oxygen is needed by bacteria. PROBLEM: To determine the cause of disease X. One thousand persons with the disease were examined. Bacteria ^ was found in the mouth of all of the persons wit h the disease. 313 62. One conclusion: 63. A n o t h e r conclusion: mouth. 64. A n o t h e r co n c l u s i o n : This d i s e a s e is caus e d b y b a c t e r i a i n t r o d u c e d into the m o u t h f rom c o n t a m i n ­ a t e d food. PROBLEM: Bacteria ^ causes the disease. This d i s e a s e starts in the To d e t e r m i n e the r e a c t i o n of insects to light. F l i e s w e r e p l a c e d in a Jar, the u p p e r h a l f of w h i c h w a s c o v e r e d w i t h b l a c k paper. A light w a s p l a c e d n e a r the Jar. A l l of the flies fle w to the l o w e r h a l f of the Jar and toward the i l l u m i n a t e d side. 65. Conclu s i o n : 66. A n o t h e r conclu s i o n : Insects are a t t r a c t e d to heat. 67. A n o t h e r conclusion: warmth. The flies n e e d e d light for PROBLEM: Insects are a t t r a c t e d to light. To d e t e r m i n e some of the r e q u i r e m e n t s for the s p r o u t i n g of seeds. Two groups of plants were p l a n t e d in f l o w e r pots. C o n d i t i o n s of b o t h were the same e x c e p t that one pot was put in the g r e e n ­ h o u s e at 4 0 degrees; the o t h e r g r o u p was put in a g r e e n h o u s e at 7 0 degrees. Those in the c old roo m d i d n o t sprout, those in the w a r m r o o m sprouted. M a n y kinds of seeds w ere u s e d in e a c h group. 68. Conclusion: A t e m p e r a t u r e of 7 0 degrees is r e q u i r e d f o r seeds to sprout. 69. A n o t h e r conclusion: 70. A n o t h e r c o n c l usion: M o i s t u r e is one of the r e q u i r e ­ ments f o r the s p r o u t i n g of seeds. 71. A n o t h e r conclusion: needed. 72. A n o t h e r c onclusion: A t e m p e r a t u r e of 40 degrees keeps seeds f rom sprouting. PROBLEM: Plants n e e d h e a t to live. F o r a n y t h i n g to g r o w ener g y is To d e t e r m i n e s ome of the r e q u i r e m e n t s for the s p r o u t i n g of seeds. Two groups of seeds were planted. C o n d i t i o n s w ere the same for both grou p s exc e p t that one g r o u p was p l a n t e d in 314 PROBLEM: 73. (continued) stoppered “bottles, the other group in open bottles. Only the seeds in the open bottles sprouted. Many different kinds of seeds were included in each group. Conclusion: Seeds require oxygen to sprout. 74. Another conclusion: One of the requirements for sprouting of seeds is moisture. 75. Another conclusion: The seeds in the stoppered bottles were dormant. 76. Another conclusion: Energy from the outside is necessary for growth. 77. Another conclusion: Carbon dioxide is a requirement of sprouting seeds. PROBLEM: the What are some of the requirements for seeds to sprout? A student put many different kinds of seeds in pots containing garden soil and many different kinds of seeds in pots containing the same type of soil with all of the potassium salts removed. The plants in the garden soil grew and developed well. The plants in the other pots were small and soon died. All other conditions were the same for both groups. 78. Conclusion: to sprout. Potassium salts are required for seeds 79. Another conclusion: Heat and moisture are necessary for seeds to sprout. 80. Another conclusion: Minerals are essential for the germination of seeds. 81. Another conclusion: Potassium salts important energy for plants. 82. Another conclusion: When the plants had used up their supply of food they couldn*t replace it. 83. Another conclusion: Potassium salts as well as other minerals are essential to plants and their lack will slow down growth. contain some 315 PROBLEM: What are some of the requirements for seeds to sprout? The student placed two groups of seeds In two pots and watered one pot daily. The other group he watered on alternate days. All of the seeds sprouted. Many types of seeds used, other conditions same for both groups. 84. Conclusion:- Water is necessary if seeds are to sprout but it is not necessary to water them every day. 85. Another conclusion: Seeds will sprout with a limited amount of water. 86. Another conclusion: One of the requirements of seeds to sprout is moisture. 87. Another conclusion: sprouting of seeds. 88. Another conclusion: Both groups of plants had an adequate amount of water. PROBLEM: Water is a minor factor in the What are some of the requirements for seeds to sprout? The same student planted two groups of seeds of different types in pots and placed one group of the pots in the light, the others in the dark. Those plants in the light were green, those in the dark were yellow. Other conditions were the same for both groups. 89. Conclusion: seeds. 90. Another conclusion: properly. Plants require light to mature 91. Another conclusion: Light makes the plants green. PROBLEM: 92. Light is necessary for sprouting of An Investigator wanted to determine whether in­ creased light increased the rate of a certain reaction. On repeated tests it was found that a certain amount of the original substance (X), after one hour, would produce 1 gram of sub­ stance Y with 10 photons (units of light) of illumination, 2 grams with 20 photons, 4 grams with 30 photons and 3 grams with 40 photons. Conclusion: Increased amount of light increases the rate of the reaction. 316 93. Another conclusions reaction. PROBLEM: Heat increased the rate of the A student wanted to determine whether plants grow more rapidly in the light or in the dark. Two groups of seeds were planted. After two weeks the plants were measured. Those in the light were green and a few inches long. Those in the dark were yellow and a foot long. All other conditions were the same for both groups. The experiment was repeated with several kinds of seeds. The results were the same as given above. 94. Conclusion: The plants in the dark put all their energy into height trying to reach light while the other ones put their energy into strength. 95. Another conclusion: Light is necessary for faster and better growth of plants. 96. Another conclusion: were more healthy. The plants grown in the light 97. Another conclusion: dar k . Plants grow more rapidly in the 98. Another conclusion: Light is necessary for the development of the green color of plants. PROBLEM: 99. 100. A student wanted to determine whether a certain beverage contained sugar. Benedict's solution which is blue when added to sugar and heated turns the solution yellow. (It is known to be a specific test for sugar). Benedict's was added to the beverage and heated. The solution turned yellow. Conclusion: The beverage is not fattening. Another student concluded that Benedict's solution is a good test for sugar. 317 TABLE XXXXII ITSK ANALYSIS DATA FOR TEST G- Percent Success Item Upper 27$ Discrimination Lower 27$ r Index Difficulty $ Success Index .33 .43 28 33 41 77.8 72.2 64.4 55.6 .17 .15 9 64 58 60.0 50.0 24.4 5.6 .37 .56 38 28 38 95.6 9 4 . 4 35.5 19.4 .72 .75 59 55 53 80.0 75.0 40.0 25.0 .42 .50 33 50 50 20.0 0.0 4.4 0.0 .36 .00 0 0 0 37.7 22.2 35.5 19.4 .02 .04 2 21 33 20.0 0.0 8.8 0.0 .20 .00 0 0 0 33.3 16.7 29-8 11.1 .04 .07 4 3.4 27 10 46.7 33.3 26.7 8.3 .21 ,38 24 21 33 11 91.1 88.9 60.0 50.0 .46 30 68 60 88.9 86.1 40.0 25.0 .54 .63 45 57 54 22.2 2.8 6.7 0.0 .32 .12 7 2 5 2 3 4 5 6 7 8 o «✓ 12 13 * ** Method of Flanagan Method of Davis • 31.1 13.9 CM *62.6 **52.8 1 318 TABLE XXXXII (continued) Percent Success Upper 27% Lower 27% 14 Discrimination Difficulty r Index % Success Index .14 2 .12 7 5 22.2 2.8 13.3 15 46.7 33.0 17.8 0.0 .33 .65 47 17 30 16 35.5 19.4 26.7 8.3 .13 .23 14 14 27 17 95.6 94.4 68.9 .47 .50 33 78 66 46.7 33.0 4.4 .58 .65 47 17 30 19 22.2 4.4 2.8 0.0 .37 .12 7 2 5 20 100.0 100.0 68.9 61.1 .65 .67 49 80 68 4.4 24.4 5-6 -.05 -.07 -4 5 15 22 93.3 91.7 33.3 16.7 .65 .72 56 55 53 23 77.8 8.9 72.2 0.0 .69 .81 69 37 43 .23 .31 19 35 42 30 8 20 0 0 0 18 21 20.0 0.0 61.1 0.0 60.0 37.7 50.0 22.2 25 31.1 13-9 13.3 0.0 .25 .46 26 15.6 0.0 6.7 .18 0.0 .00 37.7 28.9 .10 24 27 22.2 11.1 .18 11 17 30 26.7 8.3 .50 .35 22 5 14 2.2 0.0 319 TABLE XXXXII (continued) Item Percent Success Upper 27# Lower 27# Discrimination Difficulty r Index # Success Index 22.2 37.7 2.2 0.0 .55 37 12 25 26.7 8.3 26.7 8.3 .00 .00 0 8 20 48.8 11.1 0.0 .46 .66 48 18 31 60.0 50.0 8.9 0.0 .58 .72 55 25 36 62.5 20.0 0.0 .44 .73 56 27 37 33.3 16.7 6.7 .40 0.0 • 00 32 9 21 35 71.1 63.9 42.2 27.8 .30 .36 23 46 48 36 60.0 50.0 31.1 13.9 .30 .41 27 31 40 37 77.8 72.2 53.3 41.7 .29 .31 19 57 54 13.3 8.9 .00 .00 0 0 0 82 44 47 43 31 40 29 30 31 36.1 32 33 52.8 34 38 .61 0.0 0.0 39 91.1 88.9 15.6 0.0 40 66.7 58.3 24.4 5.6 41 55.6 44.4 6.7 .0.0 .60 .70 52 22 34 42 42.2 27.8 22.2 2.8 .23 .46 30 16 29 43 86.7 83.3 33.3 16.7 .57 .67 49 50 50 .72 .88 .43 .61 320 TABLS XXXXII (continued) Item 44 Percent Success Upper 27% Lower 27% Discrimination Difficulty Index r % Success Index .18 14 40 .23 31 53.3 41.7 35.5 19.4 45 75.6 69.4 11.1 0.0 .68 81 35 42 46 73.3 66.7 17.8 0.0 .56 .79 65 33 41 47 6.7 0.0 0.0 0.0 .00 0 0 0 48 75.6 69.4 17.8 0.0 .58 .81 68 35 42 49 66.7 58.3 42.2 27.8 .26 .31 19 42 46 50 95.6 94.4 66.7 58.3 .49 .46 30 74 64 51 62.6 52.8 17.8 .48 .73 56 27 37 52 48.8 24.4 5.6 .43 29 21 33 36.6 0.0 .65 .35 .26 53 51.1 38.9 8.9 0.0 .51 .67 49 19 32 54 95.6 94.4 26.7 8.3 .74 .83 71 51 51 55 95.6 94.4 26.7 8.3 .74 .83 71 51 51 56 20.0 0.0 20.0 0.0 .00 .00 0 0 0 57 84.4 80.6 0.0 0.0 .87 .84 74 40 45 58 37.7 13.3 .32 .55 37 12 25 22.2 0.0 321 TABLE XXXXII 59 60 61 62 Percent Success Upper 27# Lower 27# 26.7 6.7 0.0 8.3 (continued) Discrimination Difficulty r Index # Success Index .34 22 14 5 .35 86.7 83.3 35.5 19.4 .55 .63 45 51 51 35.5 19.4 22.2 2.8 .15 .39 25 10 23 66.7 58.3 40.0 25.0 .28 .34 21 42 46 53.3 41.7 37.7 .16 22.2 .23 14 31 40 64 20.0 0.0 22.2 2.8 .00 -.12 7 2 5 65 4.4 4.4 0.0 0.0 .00 .00 0 0 0 40.0 17.8 25.0 0.0 .58 40 13 26 67 48.8 36.1 31.1 13.9 .18 .29 18 25 36 68 51.1 38.9 28.9 .23 .35 22 24 35 69 46.7 33.3 17.8 .33 .64 46 17 30 70 82.2 77.8 33-3 16.7 .61 43 46 48 71 60.0 50.0 17.8 .45 .72 55 25 36 72 33.3 16.7 4.4 0.0 .48 .48 32 8 21 22.2 2.8 13.3 .14 7 2 5 63 66 11.1 0.0 0.0 0.0 .26 .50 .12 -: 322 TABLE XXXXII (continued) Percent Success Item 27% Lower 2.1% r Index Difficulty % Success Index GO 73.3 35.5 35.5 19.4 • 74 Upper Discrimination .46 30 44 47 48.8 22.2 2.8 .30 .56 38 19 32 17.8 0.0 .31 .62 44 16 29 22.2 17.8 0.0 .23 .55 37 12 25 26.7 8.3 22.2 2.8 .06 .17 10 5 15 79 80.0 75.0 24.4 5.6 .56 .71 54 40 45 80 28.9 8.9 11.1 0.0 .32 .40 26 6 16 33.3 16.7 6.7 .42 .48 32 8 21 .34 .41 27 46 48 75 36.1 76 44.4 30.6 77 78 81 37.7 0.0 73.3 66.7 40.0 S3 11.1 0.0 2.2 0.0 .34 .00 0 0 0 84 6.7 4.4 0.0 0.0 .08 .00 0 0 0 83 20.0 0.0 8.9 0.0 .20 .00 0 0 86 20.0 0.0 4.4 .34 0.0 .00 0 0 A 0 87 51.1 38.9 24.4 5.6 .29 .47 31 22 34 88 55.6 44.4 6.7 .58 .70 52 22 34 82 25.0 0.0 A 0 323 TABLE XXXXII (continued) Percent Success Item 89 90 91 92 93 94 95 96 97 98 99 100 * ** Upper 27$ Lower 27$ Discrimination r Index Difficulty % Success Index 75.6 69.4 40.0 25.0 .37 .43 28 48 49 35.5 19.4 6.7 0.0 .45 .52 35 10 23 46.7 33.0 4.4 0.0 .57 .64 46 17 30 51.1 38.9 6.7 0.0 .55 .67 49 19 32 53.3 41.7 20.0 0.0 .36 .69 51 21 33 44.4 30.6 4.4 0.0 .57 .62 44 17 30 86.7 83.3 62.6 52.8 .32 .35 22 68 60 77.8 72.2 17.8 0.0 .60 .81 69 37 43 91.1 88.9 91.1 88.9 .00 .00 0 89 76 86.7 83.3 11.1 0.0 .74 .86 77 40 45 77.8 72.2 53.3 41.7 .28 .31 19 57 54 *97.8 **97.2 40.0 25.0 .72 .77 61 61 56 Method of Flanagan Method of Davis 324 TSST H INTERPRETATION OP DATA TEST J GENERALIZATIONS A N D ASSUMPTIONS Tills test was de s i g n e d to measure your ability to interpret data. F o l lowing the d a t a you will find a number of statements. Y o u are to assume that the d a t a as p r e ­ sented are true. Evaluate each statement according to the following key and m a r k the appropriate space on your answer sheet. Key 1. 2. 3. 4. 5. True: The d a t a alone are sufficient to show that the statement is true. Probably true: The data indicate that the statement is probably true, that It is logical on the basis of the d a t a but the data are not sufficient to say that it is definitely true. Insufficient evidence: There are no data to indicate whether there is any degree of truth or falsity in the statement. Probably false: The data indicate that the statement is probably false, that is, it is not logical on the basis of the data but the data are n o t sufficient to say that it is definitely false. False: The d ata alone are sufficient to show that the statement is false. In freezing of vegetables the common practice for both commercial a n d home frozen vegetables is to scald the vegetables first, b y placing them in boiling water for two or three minutes. The following data were obtained In an experiment w h ich mea s u r e d the amounts of Vitamin C in fresh vegetables, scalded vegetables before freezing, and v e g e ­ tables frozen for six months. One group of the frozen vegetables was frozen without first scalding, the other group was first scalded. The V i t a m i n C content of the frozen vegeta b l e s was determ i n e d before and after they were cooked. A l l figures indicate the amount of Vitamin C in mg. per 100 cc. 325 Vegetable Ghard (greens) Spinach Peas G-reen beans Lima beans I. * ! i , Fresh 60 82 29 34 . 2? Scalded 37 43 21 29 20 Frozen Unscalded Scalded Raw Gooked Raw Gooked 20 2 24 14 10 1 16 27 14 10 20 16 25 13 23 17 26 18 20 14 Scalding of all vegetables causes destruction of some of the Vitamin C content of the vegetables. 2. Spinach is a good source of Vitamin C. 3. Leafy green vegetables are a better source of Vitamin C than the pod type vegetables. 4. Leafy green vegetables are a better source of Vitamin C than root vegetables. 5. The practice of scalding leafy vegetables before freezing should be eliminated because scalding destroys some of the Vitamin G. 6. Lima beans should be frozen without scalding provided the quality of the unscalded product is equal to the scalded in other respects. 7. A better tasting product is obtained if lima beans are scalded before freezing. 8. After commercially frozen peas have been cooked they are a good source of Vitamin G as commercially frozen chard which has been cooked. 9. The percentage of the total Vitamin C destroyed by scald­ ing is about the same for all vegetables. 10. Since the vitamin content of food is an important consid­ eration in its purchase, in buying frozen green vegetables one should be careful in choosing the kind of vegetables because the Vitamin G content of different frozen vege­ tables varies considerably. II. The breakdown of Vitamin C is hastened by heating. 12. Since frozen leafy vegetables are much easier to prepare, the practice of using them exclusively is Justified from the dietary standpoint. 326 13. Frozen orange Juice contains somewhat less Vitamin C than freshly extracted orange Juice. 14. (Fresh spinach is usually cooked for about ten minutes). Cooked spinach (unfrozen) contains less Vita m i n C than scalded spinach. 13. Heating causes some change to occur in the Vitamin C molecule. Items 16 through 21 are a re-evaluation of some of the items 1 through 15. Re-read items 1, 3, 9, 11, 13 and 15 and determine whether they are generalizations, exten­ sions of data, explanations of the data or merely restate­ ments of the data, etc. Answer each according to the following k e y : • 1. 2. 3. 4. 5. A generalization, that is the data says it is true for this situation, a generalization says it is true for all similar situations. The data indicates a trend which if continued in either direction would make the statement true. An explanation of the data in terms of cause and effect. A restatement of results. None of the above. 16. Item 1. 19. Item 11. 17. Item 3* 20. Item 13. 18. Item 9. 21. Item 15. This phase of the test is designed to measure your understanding of assumptions underlying conclusions. A conclusion is given. (This conclusion is not necessarily Justified by the data). The statements which follow the conclusion are the items which are to be evaluated accord­ ing to the following key. These items all relate to the data presented for items 1 through 15. 1. 2. 3. 4. 5. An assumption which must be made to make the conclusion valid (true). An assumption which if made would make the conclusion false. An assumption which has no relation to the validity (truth) of the conclusion. Not an assumption; a restatement of fact. Not an assumption; a conclusion. 327 Conclusion I: The breakdown of V i t a m i n C proceeds spon­ taneously but is a relatively slow process at low temperature. 22. V i t a m i n C is a stable substance. 2 3 . There is order in the universe. 24. V i t a m i n C is not destroyed by the freezing process. 2 5 . V i t a m i n C responds in a similar way to the environment no m a t t e r what the source of Vit a m i n C is. 26. The V i t a m i n C content of all the vegetables studied was reduced after b e i n g frozen for six months. 27. All chard is similar in its reactions to studied in this experiment. the chard 28. V i t a m i n C is gradually destroyed by freezing and is not suddenly destroyed. Conclusion II: The breakdown of V itamin C is hastened by heating. 29. All vitamins react in the same way. 30. Vitam i n C evaporates when heated. 31. A l l beans are similar in their reaction to the ones studied in this experiment. 32. H e a ti n g causes some change to occur in the Vitamin C molecule. 33. V i t a m i n C reacts in the same way no matter what the source of the Vit a m i n C. 34. Pod type vegetables have a basic similarity. Conclusion III:: The Vitamin A content of vegetables is affected by heating. 35. Pod type vegetables have a basic similarity. 36. V i t a m i n C is gradually destroyed by heating. 37. All vitamins react In a similar way to heat. 38. There is a direct relationship between the amount of V i t a m i n C and Vitamin A In foods. 328 39. There Is o r d e r in the u n i v e r s e , 40. H e a t i n g a f f e c t e d the a m o u n t o f V i t a m i n 0 in the v e g e t a b l e s studied, 41. In a l l cases s t u d i e d c o o k i n g r e d u c e d the V i t a m i n C c o n t e n t of the v e g e t a b l e s . T h i s t e s t was d e s i g n e d to m e a s u r e y o u r a b i l i t y to I n t e r p r e t data. F o l l o w i n g the d a t a y o u w i l l f i n d a n u m b e r of sta t e m e n t s . Y o u are to a s s u m e t h a t the d a t a as p r e s e n t e d are true. E v a l u a t e e a c h s t a t e m e n t a c c o r d i n g to the f o l l o w ­ ing k ey a n d m a r k the a p p r o p r i a t e space on y o u r a n s w e r sheet. Key 1. 2. 3. 4. 5. True: The d a t a alone are s u f f i c i e n t to show that the s t a t e m e n t Is true. P r o b a b l y true: The d a t a i n d i c a t e that the s t a t ement is p r o b a b l y true, that it is l o g i c a l on the b a sis of the d a t a b u t the d a t a are n o t s u f f i c i e n t to say that it is d e f i n i t e l y true. I n s u f f i c i e n t evidence: T h e r e are no d a t a to Indicate w h e t h e r there Is any d e g r e e of t r u t h o r f a l s i t y in the statement. P r o b a b l y false: The d a t a i n d i c a t e that the statement Is p r o b a b l y false, that is, it is n o t log i c a l on the b a s i s of the d a t a b u t the d a t a are n o t s u f f i c i e n t to say t h a t it is d e f i n i t e l y false. F a lse : T h e d a t a alone are s u f f i c i e n t to show that the s t a t e m e n t is false. I t e m s 42 t h r o u g h 61 r e f e r to the f o l l o w i n g graph. U s e the key a b o v e to a n s w e r the items. T h e l i z a r d is c o n s i d e r e d to b e c o l d blo o d e d , the others w a r m blooded. d o 40° €»' ss 30° © c 20o .. 10° «- 4 3 *CJ ■ m «a 4» > « m # lO External 329 42. T h e b o d y t e m p e r a t u r e of the cat v a r i e s m o r e t h a n the b o d y t e m p e r a t u r e of the a n t eater. 43. T he cat a n d suit e a t e r h ave some type of m e c h a n i s m w h i c h r e g u l a t e s the b o d y t e m p e r a t u r e . 44. W h e n the e x t e r n a l t e m p e r a t u r e of the l i z a r d is a lso 50^0. 45. T he b o d y t e m p e r a t u r e of w a r m b l o o d e d suiimals is u n ­ a f f e c t e d by the e x t e r n a l temper a t u r e . 46. A t a m e x t e r n a l t e m p e r a t u r e of 5 0 ° C . the t e m p e r a t u r e of the c a t is 50°C. Is 50°C. the te m p erature 47. W h e n the e x t e r n a l t e m p e r a t u r e is 50°C. the t e m p e rature of the suit e a t e r w o u l d be h i g h e r t h a n the t e m p e r a t u r e of t he cat. 48. T he t e m p e r a t u r e of a m o u s e w o u l d be a b o u t h a l f way b e t w e e n that o f the cat a n d the ant eater. 49. A t no time d u r i n g the e x p e r i m e n t d i d suiy of the animals h a v e t h e same b o d y tempe r a t u r e . 50. The a nt e a t e r e x h i b i t s a c l o s e r r e l a t i o n s h i p to the l i z a r d than to the opossum. 51. The s h a r p rise in the b o d y t e m p e r a t u r e of the lizard i n d i c a t e s that the l i z a r d u s e s f ood at a f a s t e r rate than the c a t . 52. The du e 53. T h e r e is a c l o s e c o r r e l a t i o n b e t w e e n the b o d y t e m p e r a ­ tur e of the l i z a r d a n d that o f the e x t e r n a l environment. 54. T he h e a r t rate of the l i z a r d w o u l d Increase w i t h t e m p ­ e r a t u r e in the same w a y as the b o d y t e m p e r a t u r e increases 55. a b i l i t y of the cat to to its c o a t of hair. maintain its t e m p e r a t u r e is T he b o d y t e m p e r a t u r e of the cat s h o w e d the l e ast v a r i a ­ t i o n in t e m p e r a t u r e d u r i n g the e x p e r i m e n t a l period. 5 6 . T he t e m p e r a t u r e o f a l l of the w a r m b l o o d e d suiimals was a l w a y s h i g h e r t h a n the e x t e r n a l temperature. 57. The w a r m b l o o d e d a n i m a l s are s u f f i c i e n t l y c o n s e r v e heat. i n s u l a t e d to 58. W a r m b l o o d e d a n i m a l s csui withstsuid c old b e t t e r thsui c o l d b l o o d e d ani m a l s . 330 59. At 20 degrees below 0°C. the lizard would be frozen. 60. The normal body temperature of the duckbill is higher than that of the echidna. 61. If the temperature of other cold blooded animals were plotted it would resemble that of the lizard. Items 62 through 68 are a re-evaluation of some of the Items 42 through 61. Re-read items 43, 44, 47, 50, 52, 55 and 61 and determine whether they are generalizations, extensions of the data, explanations of the data or merely restatements of the data, etc. Answer each according to the following key: Key 1. A generalization, that is the data says it is true for this situation, a generalization says it is true for all similar situations. 2. The data Indicates a trend which if continued in either direction would make the statement true. 3* An explanation of the data in terms of cause and effect. 4. A restatement of results. 5. None of the above. 62. Item 43. 66. Item 52 63. Item 44. 67. Item 55 64. Item 47. 68. Item 61 65. Item 50. This phase of the test is designed to measure your understanding of assumptions underlying conclusions. A conclusion is given. (This conclusion is not necessarily Justified by the data). The statements which follow the conclusion are the items which are to be evaluated accord­ ing to the following key. These items all relate to the data presented for items 41 through 61. 1. 2. Key An assumption whi c h must be made to make the conclu­ sion valid (true). An assumption which if made would make the conclusion f*€LlS0 • 3. 4. 5. An assumption which has no relation to the validity (truth) of the conclusion. Not an assumption; a restatement of fact. Not an assumption; a conclusion. 331 Conclusion I: Warmblooded animals have some type of heat regulating mechanism. 69. All cats react similarly to changes in temperature. 70. It is possible for animals to have some type of heat regulating mechanism. 71. The cat and the duckbill are very different in their reaction to the external environment. 72. A man and a cat react similarly to the external temp­ erature. 73. The lizard has no heat regulating mechanism. 74-. The opossum had a lower body temperature than the cat. Conclusion II: Anteaters and duckbills are more closely related than anteaters and cats. 75. Similarity of reaction of living things indicate a relationship. 76. All anteaters react similarly to changes in external temperature. 77. The temperature of the anteater varied more with the external temperature than did that of the cat. 78. The degree of closeness of similarity of response of living things runs parallel with the closeness of kin­ ship. 79. Close relationship means that two living things have a common ancestor. 80. The temperature of the cat varied less than that of the anteater and duckbill with change of temperature. This test was designed to measure your ability to interpret data. Following the data you will find a number of statements. Y o u are to assume that the data as pre­ sented are true. Evaluate each statement according to the following key and mark the appropriate space on your answer sheet. 332 Key 1. 2. 3. 4. 5. True: The data alone are sufficient to show that the statement is true. Probably true: The data indicate that the state­ ment Is probably true, that it is logical on the basis of the data but the data are not sufficient to say that it is definitely true. Insufficient evidence: There are no data to indi­ cate whether there is any degree of truth or falsity in the statement. Probably false: The data indicate that the state­ ment is probably false, that is, it is not logical on the basis of the data but the data are not sufficient to say that it is definitely false. False: The data alone are sufficient to show that the statement is false. Analyses were made of the Vitamin 0 content of red ripe and green tomatoes as soon as they were picked. Mature green tomatoes were stored at the temperatures indicated in the following table. Those which had ripened by the end of the first week were analyzed for their Vitamin C content; those ripened at the end of the second week were analyzed at the end of the second week, etc. In addition some mature green tomatoes were analyzed each week. Condition when taken from field mature green red ripe mature green mature green mature green mature green mature green mature green mature green mature green No. of T e m p • when weeks stored stored not stored not stored 700F. 70°F. 70°F. 800F. 80°F. 80°F. 70°F. 70°F. 0 0 1 2 3 1 2 3 ■1 2 Stage of ripeness when analyzed mature green red ripe red ripe red ripe red ripe red ripe red ripe red ripe mature green mature green Vitamin C mft/100 Rrams 15.0 16.2 14.4 12.9 8.2 14.0 9.8 7.1 10.0 7.2 81. At the time of harvest the green tomatoes were only slightly lower in Vitamin 0 content than the red ripe ones. 82. Tomatoes which ripened during the first week of storage were almost as high in Vitamin C as those which were ripe at the time of harvest.. 83. Tomatoes ripening during the second week of storage were lower in Vitamin C content than those which ripened during the first week. 333 84. Tomatoes ripened at 90°C; would have less Vitamin 0 after three weeks than those stored at 80°P. 85. Tomatoes could not be stored at 90°F. because at this high a temperature they would rot or spoil. 86. 87. The lower the temperature at which tomatoes are stored the less is the breakdown of Vitamin 0. At 75°F. there would be about 14 mg/100 grams of V ita­ min G after a week of storage. 88. Heat causes a breakdown of the Vitamin 0 molecule. 89. If tomatoes are to be stored for a considerable length of time they should be held at as low a temperature as possible, but high enough to avoid freezing. 90. When one buys tomatoes inthe winter the Vitamin G content of the tomatoes compares favorably with the Vitamin G content of those bought fresh in the summer. 91. After four weeks of storage tomatoes stored at 70°F. would contain less than 7 mg/lOO grams of Vitamin C. 92. Vitamin C does not develop in the tomatoes as they change from mature green to red ripe on the vine. 93. Some mature green tomatoes ripen in storage within week. 94. (Tomatoes are often picked green and allowed to ripen during the early fall). The Vitamin G content of these tomatoes is about the same as when they were picked. 95. The green tomatoes which did not ripen in a week had lost about the same amount of Vitamin C as those which ripened during the week.. 96. Vitamin G breaks down spontaneously at room temperature. 97. The Vitamin G content of other vegetables decreases if stored at high temperatures. 98. Boiling of vegetables destroys 99. Vitamin G is a stable substance. 100. Vitamin C is manufactured some than in the fruit (tomato) and a some of the Vitamin G. place else in the plant is stored in the fruit. 334 Items 101 through 107 are a re-evaluation of some of the items 81-100. Re-read items 82, 84, 86, 88, 91, 93, and 97 and determine whether they are generalizations, ex­ tensions of the data, explanations of the data or merely restatements of the data, etc. Each of these items is to be answered according to the following key. Key 1. 2. 3. 4. 5. A generalization, that is the data says it is true for this situation, a generalization says it is true for all similar situations. The data indicates a trend which if continued in either direction would make the statement true. An explanation of the data in terms of cause and effect. A restatement of results. None of the above. 101. Item 82. 105. Item 91. 102. Item 84. 106. Item 93. 103. Item 86. 107. Item 97. 104. Item 88. This phase of the test is designed to measure your understanding of assumptions underlying conclusions. A conclusion is given. (This conclusion is not necessarily Justified by the data). The statements which follow the conclusion are the items which are to be evaluated accord­ ing to the following key. These items all relate to the data presented for items 81 through 100. 1. An assumption which must be made to make the conclu sion valid (true). 2. An assumption which if made would make the conclusion false. 3. An assumption which has no relation to the validity (truth) of the conclusion. 4. Not an assumption; a restatement of fact. 5. Not an assumption; a conclusion. Conclusion Is Sunlight causes an increase in the Vitamin C content of tomatoes as they ripen on the vine. 108. The test used to measure the amount of Vitamin C in this experiment was a specific test for Vitamin C. 335 109. The Increase of Vitamin C in tomatoes ripening on the vine was caused by the action of sunlight on the leaves. 110. The tomatoes which were analyzed when green ripe would have contained more Vitamin G if they had been allowed to ripen on the vine. 111. The test u s e d to measure the amount of Vitamin G accurately measures the amount. 112. The same results would not have been obtained if the plants h a d been kept in the dark for the week during which the tomatoes ripened. 113. All tomatoes would yield the same type of res\ilts as those obtained in this experiment. 114. The Vitamin C content of the tomatoes used in this e x ­ periment increased as the tomatoes ripened on the vines 115. The Vitamin C was formed in the roots and was trans­ ported to the fruits. 116. The Vitamin G content of ripe tomatoes on the vine was higher than the Vitamin C content of the green ripe tomatoes on the vine. 117. The plant is capable of manufacturing Vitamin G. 118. Some change takes place in the Vitamin C molecule at high temperatures. Conclusion II: Vitamin G breaks down spontaneously at room te m p e r a t u r e . 119. Vitamin C reacts similarly in all plants in which it is found. 120. Tomatoes are all similar in the amount of Vitamin C they contain. 121. The Vitamin G content of all tomatoes would decrease when stored at room temperature. 122. When the tomatoes were stored at room temperature the Vitamin G content decreased. 123. All vitamins react similarly to storage at room temp­ erature. 124. There is order in the universe. 336 125. V i t a m i n 0 evaporates at room temperature. 126. The V itamin C molecule undergoes changes which change the properties of the substance. This test was d e signed to measure your ability to interpret data. Fo l l o w i n g the data you will find a number of statements. Y o u are to assume that the data as presented are true. Evaluate each statement according to the follow­ ing key and mark the appropriate space on your answer sheet. 1. True: The data alone are sufficient to show that the statement is true. 2. Probably true: The data indicate that the statement is probably true, that it is logical on the basis of the data but the data are not sufficient to say that it is definitely true. 3. Insufficient evidence: There are no data to indicate whether there is any degree of truth or falsity in the statement. 4. Probably false: The data indicate that the statement is probably false, that is, it is not logical on the basis of the data but the data are not sufficient to say that it is definitely false. 5. False: The data alone are sufficient to show that the statement is false. The following data is concerned with the temperature at which various seeds germinate (sprout). Three kinds of seeds were used, seeds from Species A, Species B and Species 0. The number of seeds germinating at various temperature in two weeks is given in the table. No seeds germinated at temperatures b e low 40°F. or above 95°F • 0 B 0 6 20 41 G 0 0 0 127. 95° 100 0 0 5 18 50 70 84 65 30 0 4 7 0 92 65 30 5 0 0 0 0 0 4 3 72 90 81 52 34 6 0 0 16 75° o o 0 V0 0 00 1 o 0 0 85° o o o o -V A 65° 45° 50° 55° VO 35° o o Temperatures in Degrees Farenhelt_________ Plant B should be planted early in the spring but not in midsummer in middle western states, such as Illinois, Iowa, etc. 337 128. Plant 0 is a tropical plant. 129. M o r e s e e d s of P l a n t A w i l l g e r m i n a t e at 8 2 ° t h a n at any o t h e r t e m p e r a t u r e . 130 . N o n e o f the see d s o f P l a n t A w i l l g e r m i n a t e b e l o w 65°. 131. S e eds do n o t g e r m i n a t e at f r e e z i n g temperature. 132. The h i g h e r the t e m p e r a t u r e , germinate. 133. One w o u l d n o t g e t a c r o p f r o m plants of the A type in the c l i m a t e o f the n o r t h e r n states, such as Mi chigan, M i n n e s o t a , etc. 134. The o p t i m u m t e m p e r a t u r e f o r the g r o w t h of pla n ts of the 0 type is 70°. 135. S o m e s e e d s of the C v a r i e t y w i l l g e r m i n a t e at 95°. the m o r e seeds w i l l 136 . T h e o p t i m u m t e m p e r a t u r e f o r the g e r m i n a t i o n of seeds of the B type is a b o u t 56°. 137. P l a n t s of the A t y p e are f o u n d in h o t wet climates.. 138. The r a t e at w h i c h seeds g e r m i n a t e is a f f e c t e d by the temperature. 139. A d e c r e a s e in m o i s t u r e r e d u c e s the n u m b e r of seeds g e r m i n a t i n g m o r e t h a n does a d e c r e a s e in temperature. 140. If P l a n t B t a kes a r e l a t i v e l y long time to mature, seeds s h o u l d be s t a r t e d in g r e e n h o u s e s and set out l a t e r if a c r o p of this type p l ant is d e s i r e d in n o r t h e r n states. 141. P l a n t A c o u l d be w a t e r m e l o n . 142. N o p l a n t s g e r m i n a t e a t t e m p e r a t u r e s above 100°. 143. M o r e s e e d s w o u l d h a v e g e r m i n a t e d at l o wer tempe ratures if they h a d b e e n l e f t for a longer time. 144. A n i n c r e a s e of 10° above 8 5 ° r e s u l t e d in a m u c h g r e a t e r r e d u c t i o n in the n u m b e r of type A seeds g e r m i n a t i n g t h a n d i d a r e d u c t i o n of 10°. 145. If one w e r e d e s i r o u s of r a i s i n g all three of these p l ants in one g r e e n h o u s e one shou l d k e e p the g r e e n h o u s e at a b o u t 72°. 338 146. A temperature of 100° will kill plants of the B and 0 t y p e s • Items 147 through 151 are a re-evaluation of some of the items 127 through 146. Re-read items 131, 1 3 8 , 139, 142 and 144 and determine whether they are generalizations, extensions of the data, interpretations of the data or merely restatements of the data, etc. Sach of these items is to he answered according to the fpllowing key: 1 . 2. 3. 4. 5. Key A generalization, that is the data says it is true for this situation, a generalization says it is true for all similar situations. The data indicates a trend which if continued in either direction would make the statement true. An explanation of the data in terms of cause and effect. A restatement of results. None of the above. 147. Item 131. 150. Item 142. 148. Item 138. 151. Item 144. 149. Item 139. This phase of the test is designed to measure your understan d i n g of assumptions underlying conclusions. A conclusion is given. (This conclusion is not necessarily Justified by the data). The statements which follow the conclusion are the items which are to be evaluated accord­ ing to the following key. These items all relate to the data presented for items 127-146. 1. 2. 3. 4. 5. A n assumption which must be made to make the conclu­ sion v a l i d (true). An assumption w h ich if made would make the conclusion false. An assumption which has no relation to the validity (truth) of the conclusion. Not an assumption; a restatement of fact. Not an assumption; a conclusion. Conclusion I: Seeds will germinate only in the range of temperature from 35°F. to 100°F. 152. The seeds u s e d in this experiment are representative of the extremes of germinating temperatures of seeds. 339 153. No seeds of Species B ever germinate below 35°F. 154. None of the seeds which were planted of Species A germinated above 100°F. 155* Too few seeds were u sed in the experiment to make it valid. 156. A ll seeds of Species A behave similarly in their response to temperature to the ones used in this experiment. 157. The seeds from Species 0 germinated at a higher temperature than the seeds of Species B. 158. Plants which do not germinate at high temperatures will not grow at h i g h temperatures even when germin ated at lower temperatures. 159. Seeds will germinate only in a limited temperature range. Conclusion II: 80° f . 160. 161. Some seeds of Species B will germinate at The seeds u sed in this experiment are completely representative of seeds of Species B. A larger sample would yield a greater range of germ ination temperature. 162. All seeds of a species are exactly alike in their response to temperature. 163. Some seeds of C germinate at 80°F. 164. The entire range in which seeds of Species B will germinate is not represented by this experiment. 165. Species B is a cold climate plant. 340 TABLE XXXXIII ITEM ANALYSIS DATA FOR TEST H Percent Success Item 1 Upper 27$ Lower 27$ Discrimination r Index Difficulty % Success Index *40.0 **25.0 20.0 0.0 .24 .58 40 13 26 37.7 22.2 24.4 5.6 .16 .31 19 14 27 60.0 50.0 44.4 30.6 .17 .20 12 40 45 91.1 88.9 48.8 36.1 .51 .56 38 61 56 48.8 36.1 13.3 0.0 .42 .66 48 18 31 64.4 55.6 33.3 1 6 .7 .32 .42 27 35 42 95.6 94.4 71.1 63.9 .45 .45 29 79 67 62.6 52.8 37.7 22.2 .26 .32 20 37 43 71.1 63.9 51.1 38.9 .21 .26 16 51 51 17.7 0.0 2.0 0.0 .43 .00 0 0 0 11 44.4 30.6 20.0 0.0 .28 .62 44 16 29 12 6.7 0.0 2.2 0.0 .22 .00 0 0 0 13 22.2 8.9 8.9 0.0 .23 .18 11 3 10 2 3 4 5 6 7 8 9 10 * ** Method of Flanagan Method of Davis 341 TABLE XXXXIII (continued) Percent Success Item 14 Upper Lower 2 7 % Discrimination r Index Difficulty % Success Index 11.1 0.0 15.6 0.0 -.07 .00 0 0 0 64.4 55.6 27.2 8.6 .38 .55 37 31 40 91.1 88.9 71.1 63.9 .33 .34 21 76 65 51.1 38.9 37.7 22.2 .15 .20 12 30 39 71.1 63.9 24.4 5.6 .46 .65 47 35 42 45 60.0 50.0 31.1 13.9 .30 .41 27 31 40 46 82.2 77.8 20.0 0.0 .62 .83 72 38 44 47 66.7 58.3 35.5 19.4 .32 .39 25 38 44 48 100.0 100.0 84.4 80.6 .50 .54 36 89 76 49 88.9 86.1 46.7 33.3 .48 .55 37 59 55 50 26.7 8.3 8.9 0.0 .28 .35 22 5 14 51 71.1 63.9 68.9 61.1 .02 .01 1 61 56 52 13.3 0.0 22.2 2.8 -.18 -.12 7 2 5 53 91.1 88.9 64.4 55.6 .38 .40 26 71 62 54 82.2 77.8 77.8 72.2 .04 .10 6 76 65 55 95.6 94.4 77.8 72.2 .37 .38 24 83 7 0 15 42 43 44 342 TABLE XXXXIII (continued) Percent Success Upper 27% Lower 27% Discrimination Difficulty r Index % Success Index 57.8 47.2 31.1 13.9 .28 .38 24 30 39 44.4 30.6 26.7 8.3 .20 .35 22 19 32 33.3 22.2 16.7 2.8 .14 .35 22 9 22 66.7 58.3 28.9 11.1 .38 .52 35 35 42 0.0 0.0 0.0 0.0 .00 .00 0 0 0 61 66.7 58.3 46.7 33.3 .20 .26 16 44 47 81 88.9 86.1 80.0 75.0 .15 .17 10 80 68 82 75.6 69.4 66.7 58.3 .12 .12 7 63 57 83 93.3 91.7 80.0 75.0 .25 .29 18 83 70 84 80.0 7 5.0 31.1 13.9 •50 .62 44 42 46 85 95.6 94.4 68.9 61.1 .35 .47 31 77 66 86 57.8 47.2 24.4 5.6 .35 .54 . 36 27 37 87 62.6 52.8 40.0 25.0 .23 .31 19 38 44 88 71.1 63.9 24.4 5.6 .47 .65 47 35 42 60.0 50.0 37-7 22.2 .23 .31 19 35 42 42.2 27.8 46.7 33.0 -.05 -.04 -2 44 47 56 57 58 59 60 343 TABLE XXXXIII (continued) Item Percent Success Upper 27$ Lower 27$ Discrimination Difficultyr Index $ Success Index 86.7 83.3 53.3 41.7 .40 .43 28 61 56 17.8 0.0 8.9 0.0 .17 .00 0 0 0 97.8 97.2 73.3 66.7 .48 .52 35 82 69 40.0 25.0 11.1 0.0 .38 .58 40 13 26 64.4 55.6 33.3 16.7 .32 .41 27 35 42 40.0 25.0 20.0 0.0 .24 .58 40 13 26 28.9 11.1 11.1 0.0 .27 .40 26 5 17 37.7 22.2 22.2 2.8 .19 .47 31 12 25 99 46.7 33-0 13.3 0.0 .40 .64 46 17 30 100 73.3 66.7 42.2 27.8 .32 .38 24 47 48 127 64.4 55.6 15.6 0.0 .50 .75 58 28 38 128 64.4 55.6 13.3 0.0 .54 .75 58 28 38 129 28.9 11.1 11.1 0.0 .27 .40 26 5 17 130 6.7 0.0 4.4 0.0 .10 .00 0 0 0 131 37.7 22.2 15.6 0.0 .30 .55 37 12 25 91 92 93 94 95 96 97 98 344 TABLE XXXXIII (continued) Percent Success Item U p p e r 27% Lower 27# Discrimination r Index Difficulty % Success Index 51.1 38.9 28.9 11.1 .23 .36 23 25 36 4.4 0.0 6.7 0.0 -.10 .00 0 0 0 2. 2 0.0 6.7 0.0 -. 2 0 .00 0 0 0 37.7 22.2 8.9 0.0 .40 .55 37 12 25 88.9 86.1 80.0 75.0 .15 .17 10 80 68 8.9 0.0 11.1 0.0 .05 .00 0 0 0 100.0 100.0 75.6 69.4 .55 .62 44 83 70 53.3 41. 7 20.0 0.0 .36 .69 51 21 33 20.0 0.0 24.4 5.6 -. 06 -.27 -17 4 12 33 . 3 16.7 3-3.3 0.0 .28 .50 33 8 21 13.3 0.0 13.3 0.0 .00 .00 0 0 0 8.9 0.0 11.1 0.0 .05 .00 0 0 0 1 45 66.7 58.3 1 5.6 0.0 .54 .76 60 30 39 146 *51.1 **38.9 6.7 0.0 .46 .67 49 19 32 133 134 135 136 137 138 139 140 141 142 143 144 * ** M e t h o d of F l a n a g a n M e t h o d of D a v i s 345 TABLE XXXXIV ITEM ANALYSIS DATA FOR TEST J Percent Success Item Upper 2 7 % Lower 2 7 % Discrimination r Index Difficulty % Success Index *84.4 **80.6 60.0 50.0 .30 .34 21 64 58 44.4 30.6 20.0 0.0 .27 .62 44 16 29 48.8 36.1 28.9 11.1 .22 .34 21 22 34 33.3 16.7 33.3 16.7 .00 .00 0 17 30 66.7 58.3 64.4 55.6 .04 .02 1 57 54 46.7 33.3 24.4 5.6 .25 .41 27 19 32 82.2 77.8 64.4 55.6 .23 .24 15 66 59 11.1 0.0 6.7 0.0 .10 .00 0 0 0 35.5 16.7 17.8 0.0 .23 .50 33 8 21 25 48.8 36.1 17.8 0.0 .35 .66 48 18 31 26 53.3 41.7 24.4 5.6 .32 .50 33 24 35 48.8 36.1 22.2 2.8 .30 .57 39 19 32 28.9 11.1 4.4 0.0 .45 .40 26 6 17 16 17 18 19 20 21 22 23 24 27 28 « ** Method of Flanagan Method of Davis 346 TABLE XXXXIV (continued) Percent Success Item Upper 2 7 % Lower 27/6 Discrimination r Index Difficulty % Success Index 80.0 75.0 60.0 50.0 .24 .26 16 61 56 13.3 0.0 17.8 0.0 -.10 .00 0 0 0 51.1 38.9 15 •6 0.0 .39 .67 48 19 32 20.0 0.0 11.1 0.0 .16 .00 0 0 0 53.3 41.7 24.4 5.6 .32 .50 33 24 35 15.6 0.0 20.0 0.0 -.04 .00 0 0 0 35 26.7 8.3 8.9 0.0 .29 .35 22 5 14 35 64.4 55.6 57.8 47.2 .07 .07 4 51 51 37 51.1 38.9 24.4 5.6 .29 .48 32 22 34 38 55.6 44.4 57.8 47.2 -.02 -.02 - 1 45 47 39 15-6 0.0 8.9 0.0 .09 .00 0 0 0 40 64.4 55.6 -24.4 5.6 .42 .59 39 35 42 41 46.7 33.3 8.9 0.0 .47 .64 46 17 30 62 53-3 41.7 28.9 11.1 .25 .39 25 28 36 63 62.9 52.8 35.5 19.4 .28 .36 23 35 64 68.9 61.1 37.7 22.2 .33 .40 26 40 29 30 31 32 33 34 J. /-s 42 45 TABLE XXXXIV (continued) Percent Success U p p e r 27# L o w e r 27# Discrimination r Index Difficulty # Suc c e s s Index 40.0 25.0 20 . 0 0.0 .24 .58 40 13 26 71.1 63.9 42.2 2 7.8 .30 .36 23 46 48 86.7 83.3 37.7 22.2 .54 .61 43 53 52 84.4 80.6 44.4 30.6 .44 .51 34 55 53 73.3 66 . 7 51.1 38.9 .23 .29 18 51 51 77.8 72.2 53.3 41.7 .27 .31 19 57 54 53.3 41.7 24.4 5.6 .32 .50 33 24 35 66.7 5 8.3 46.7 33.3 .23 .26 16 44 47 73 20.0 0.0 6.7 0.0 .27 .00 0 0 0 74 71.1 63.9 20.0 0.0 .52 .78 63 31 40 75 84.4 80.6 46.7 33.3 .43 .48 32 57 54 76 48.8 36.1 31.1 13.9 .18 .30 18 25 36 77 64.4 55.6 22.2 2.8 .43 .68 50 28 38 78 57.8 47.2 20.0 0.0 .40 .70 53 24 35 15.6 0.0 11.1 0.0 .10 .00 0 0 0 66.7 58 . 3 26.7 8.3 .41 .58 40 33 41 65 66 67 68 69 70 71 72 348 TABLE XXXXIV (continued) Item Percent Success Upper 27^ Lower 27^ Discrimination Difficulty r Index % Success Index 84.4 80.6 24.4 5.6 .60 .75 59 44 47 82.2 77.8 35.5 19.4 .48 .59 41 48 49 26.7 8.3 24.4 5.6 .03 .07 4 7 19 60.0 50.0 15.6 0.0 .48 .73 56 25 36 77.8 72.2 42.2 27.8 .38 .43 28 50 50 86.7 83.3 35.5 19.4 .56 .63 45 51 51 57.8 26.7 47.2 8.3 .33 .51 34 28 38 64.4 55.6 40.0 25.0 .25 .32 20 40 45 109 13.3 0.0 11.1 0.0 .04 .00 0 0 0 110 73.3 66.7 35.5 19.4 .38 .47 31 42 46 62.6 52.8 28.9 11.1 .34 .48 32 31 40 66.7 58.3 40.0 25.0 .27 .34 21 42 46 64.4 55.6 35.5 19.4 .30 .39 25 38 44 37.7 22.2 35.5 19.4 .03 .05 3 21 33 115 53.3 41.7 22.2 2.8 .33 .59 41 22 34 116 55.6 44.4 17.8 0.0 .45 .70 52 22 34 101 102 103 104 105 106 107 108 111 112 113 114 349 TABLE XXXXIV Percent Success Upper 27$ Item Lower 27$ (continued.) Discrimination r Index Difficulty % Success IndLex 60.0 50.0 35.5 19.4 .25 .35 22 35 42 62.6 52.8 33.3 16.7 .29 .40 26 33 41 53.3 41.7 26.7 8.3 .28 .45 29 25 36 64.4 55.5 44.4 30.6 .22 .27 17 42 46 71.1 63.9 48.8 36.1 .24 .29 18 50 50 60.0 50.0 17.8 0.0 .45 .72 55 25 36 123 82.2 77.8 46.7 33.3 .40 .46 30 55 53 124 20.0 0.0 6.7 0.0 .38 .00 0 0 0 125 17.8 0.0 15.6 0.0 .04 .00 0 0 0 126 13.3 0.0 6.7 0.0 .06 .00 0 0 0 147 53.3 41.7 37.7 22.2 .16 .23 14 31 40 148 40.0 25.0 28.9 11.1 .12 .22 13 18 31 149 82.2 77.8 42.2 27.8 .43' .50 33 53 52 150 55.6 44.4 31.1 13.9 .26 .36 23 28 38 151 8.9 0.0 8.9 0.0 .00 .00 0 0 0 117 118 119 120 121 122 ' 350 TABLE XXXXIV Percent Success Item 152 153 15^ 155 156 157 158 159 160 161 162 163 164 165 * ** U p p e r 27$ Lower 27$ (continued) Discrimination r Index Difficulty % Success Index 75.6 69.4 53.3 41.7 .25 .29 18 55 53 57.8 47.2 53.3 41.7 .05 .05 3 44 47 53.3 41.7 37.7 22.2 .16 .23 14 31 40 68.9 61.1 28.9 11.1 .41 .56 38 35 42 84.4 80.6 35.5 19.4 •52 .60 42 50 50 64.4 55.6 17.8 0.0 .48 .75 58 28 38 80.0 75.0 51.1 38.9 .33 .36 23 57 54 42.2 27.8 20.0 0.0 .26 .61 43 15 28 68.9 61.1 24.4 5.6 .44 .64 46 33 41 77.8 72.2 33.3 16.7 .46 .56 38 44 47 64.4 55.5 28.9 11.1 .36 .61 43 33 41 22.2 2.8 13.3 0.0 .14 .15 9 3 10 75.6 69.4 15.6 0.0 .60 .81 68 35 42 *17.8 ** 0.0 11.1 0.0 .13 .00 0 0 0 Method of Flanagan Method of Davis A P P E N D I X II 351 TEST I THE ABILITY TO THINK SCIENTIFICALLY GENERAL DIRECTIONS 1. Place your name, age and sex In the spaces provided on the answer sheet. 2. Place your student number in the space provided for "data of birth”. 3. On the space marked ”school” place your major. 4. In the space marked ”1” below "school” give courses you have had in science in high school, in the space marked ”2” give any courses you have had in science in college in addition to biological science. 5. Answer all items; if you don't know - guess. 6. Do not mark on the test booklet. if you wish. Use scratch paper 7. Be sure to mark dark on the answer sheet; the machine does not pick up light markings. 8. Each item has only one answer; do not mark more than one. This test has been devised to msasur© your ability to think scientifically. It is divided into several parts, each of these parts tests a different phase of scientific thinking. This portion of the test is designed to measure your ability to differentiate phases of thinking. These steps include major problems or perplexities, possible solutions to problems, observations which are not results of experimentation but rather preliminary observations, results of experimentation, and conclusions. The following key is to be used for the succeeding paragraph. Certain parts of the paragraph are underlined, and each underlined item is a question. Choose the proper response from the key and blacken the appropriate space in the answer sheet. 1. 2. 3. 4. 5. A major problem (stated or implied). Hypothesis (possible solution to problem). Result of experimentation. Initial observation (not experimental). Conclusion (probable solution of problem). (1) How does a homing pigeon navigate over territory it has never seen before? the pigeon in some way? (2) (3) Do air currents stimulate Are the pigeons equipped with some sort of magnetic compasses; that is, are they sensitive to the earth’s magnetism? Yeagley tested the latter by fastening small magnets to the wings of well-trained pigeons (4) Moat of these birds never ftot home. (5) Others. carry ins equal wing weights of non-map;netlo copper, made the home roost without trouble, (6) indicating; that the earth's mag­ netism la a factor in plseon navigation. But the pigeons magnetic compass could not, by itself, bring him back to his roost; because many places on the earth’s surface have identical magnetic conditions. Leagley endeavored (7) to determine the other guiding; factor. (8) It might be the 353 sun or s t a r s , but pigeons navigate under clouds. Abbreviated Key While looking at a 1. 2. 3* 4. 5. map which had lines representing the intensity of the earth's mag- A major problem Hypothesis Results Observations Conclusions netism, he noted that the lines were crossed at varying angles by the parallels of latitude. (9) If pigeons are sensitive to some factor connected with the lines of lati­ t u d e . they would have all they need to find their way h o m e . The next step was (10) to find some physical f o r c e , some­ thing the pigeons might be able to d e t e c t , related to the lines of lat i t u d e . The effect of the earth's turning varies directly with latitude; objects near the equator are carried daily around the earth's circumference, moving at over 1,000 mi. per hr. slowly. Objects near the poles are carried around more The direction and variation of this circling can be recorded by various man-made instruments. not the pigeons feel it. tooY (12) (11) Why should If they c o u l d , they would h a v e , along with their magnetic compass, a satisfactory navigating instrument. Yeagley trained hundreds of pigeons to return to their home roosts at State College, Pa. Then he took them to a part of Nebraska where the lines represent­ ing the earth's magnetism cross the parallels of latitude at the same angle as at State College, Pa. pigeons to the east of this spot. west. He released the (13) The pigeons a ll flew Yeagley believes that (14) pigeons are guided by both the earth's magnitude and by its turning. (15) Just where the birds keep their instruments is still u n k n o w n ; but he found that (16) birds have a mysterious organ in their eyes. 354 at the end of the optlo nerve. (17) This organ may con- tain the nerve fibers that pick up vibrations of magnetism and the even more delicate aense that meaaure the earth's turning. This portion of the test Is designed to test your ability to delimit a problem. A problem is presented. This is followed by a series of questions. Rate the ques­ tions according to the following key. Key 1. 2. 5. 4. 5. PROBLEM: This question must be answered In order to solve the problem. This question If answered might be useful in the solution of the problem. The answer to this question, though related to the problem, would not help in the solu­ tion of the problem. This question is completely unrelated to the problem. This question if answered in the affirmative ii a basic assumption of the problem. What causes colds? QUESTIONS: 18. Do all people have colds? 19. Is it possible to determine the cause of a cold? 20. Does aspirin help to cure a cold? 21. Can some germ be isolated which, when injected, will cause a cold? 22. Do colds have a cause? 23. Does getting one's feet wet cause a cold? 24. Does becoming chilled after being overheated cause a cold? 25. Why are colds more prevalent in the winter than in the summer? 26. Do other animals get colds? 27. Are people who are tired more susceptible to colds? 355 PROBLEM: What is the function of the thymus gland? (The thymus gland is located in the chest cavity Just above the heart.) This gland is largest during the growing period and becomes progress­ ively smaller after maturity. QUESTIONS: 28. Is the gland inactive after maturity? 29. Does the gland have a function? 30. Can any substance be extracted from the gland which when injected into another animal cause growth? 31. Ifthe gland is removed 32. Can the 33. will the animal mature? function of the gland be determined? What causes the gland to grow smaller? This portion of the test is designed to measure your ability to recognize faulty experimental procedures. In each case a problem and a possible solution to the problem (an hypothesis) are presented. In each case the experiments were designed by students to test the hypotheses. Judge each experiment according to the following key. Key This experiment is: 1. Satisfactory 2. Unsatisfactory because it lacks a control or comparison. 3. Unsatisfactory because the control or com­ parison is faulty. 4. Unsatisfactory because it is unrelated to the hypothesis. 5. None of the above - the experiment is u n ­ satisfactory for reasons other than listed in 2, 3, and 4. PROBLEM: What are some of the requirements for the sprout­ ing of seeds? HYPOTHESIS: 34. Oxygen Is a requirement for the sprouting of seeds. If a seed lacked oxygen under a controlled experiment the seed would not function properly and would soon die. 356 35* Take two packages of seeds. Allow oxygen to be in con­ tact with one package but keep the other package of seeds protected from all oxygen. Observe which sprouts. Abbreviated Key 1. 2. 3. 4. Satisfactory Lacks control Control faulty Unrelated to hypoth­ esis 5* None of the above 36. Place growing plants in an air tight container. Pump out the oxygen. Place other growing plants in containers with oxygen. Keep temperature, light, etc., the same,for each. 37• Plant seeds in a container with glass covering it so that no oxygen can enter and see if they sprout. Keep temperature, light and moisture normal. PROBLEM: A minute insect (aphid) is suspected of spreading a virus disease of roses. How would you determine whether this is true? HYPOTHESIS: The aphid spreads a virus disease of roses. 38. Put the insect among other kinds of plants other than roses. Leave another group of these plants free from contact with the aphids. Compare the results. 39. Since aphids travel through the air, a plot of roses must be entirely protected from them, and another ex­ posed to aphids which in turn have been exposed to roses afflicted with the virus disease. All must be under constant conditions of soil, atmosphere, etc. 40. Take sample rose with the virus disease. Obtain same kind of rose with no disease. Use microscope to aid in detection of the disease. Use some sort of spray. Note results. 41. Use rose plants which are known not to be diseased. In the same area place rose plants which are diseased but which have been treated to destroy the aphid. Note whether the disease still spreads after the aphids have been killed. 42. In order to determine whether the aphid spreads a virus disease in roses, a group of roses should be put in a hot house free from aphids to see whether they get such a virus disease. 357 PROBLEMS To determine the cause of illness which appears when large numbers of people being confined to a small space. HYPOTHESIS: Lack of oxygen causes the people to become ill Abbreviated Key 1. 2. 3« 4. Satisfactory Lacks control Control faulty Unrelated to hypothesis 5* None of the above 43. One might check the oxygen by placing a number of people in a confined place where there was a con­ trol amount. Other checks would have to be made also such as the purity of food, the purity of water and whether or not proper sanitation rules were followed. 44. Confine one group to a small space in which there is a limited supply of oxygen. Let the other group have unlimited supply of oxygen and a large space. Let their diets and other items be the same. If the cause of the illness is as stated the confined group will be ill from lack of oxygen. 45. Set two groups of people, one with plenty of oxygen and the other in a normal environment. Determine which group becomes ill. 46. Put a lot of rabbits in a small space for a period of time. Put a few rabbits in the same amount of space. Observe the rabbits and draw conclusions. 47. Put one group of people in a amount of carbon dioxide and room with a normal amount of the oxygen concentration the room with an excessive another group in a carbon dioxide. Keep same in both rooms. This portion of the test is designed to test your ability to organize data. Select from the key below the curve which best fits the data. If none of the curves fit the data mark space five on your answer sheet. The curves need not have the same amount of slope as the curves presented in the key. Use scratch paper if you wish. 1 5. None of the curves 358 48. The horizontal axis represents Abbreviated Key the time in hours after the in/ Jection of sugar into the blood; 1. J 3 the vertical axis is the amount — / 5 . none of sugar in the blood. Time after in .lection 1 3 6 49. Percent increase 8 10 12 14 18 20 80 The horizontal axis represents time in days; the vertical axis is the number of yeast cells in millions (starting with 100 yeast cells). Number of Time in days 51. 8 The horizontal axis represents age in years. The vertical axis is the percent increase in the weight of the ovaries and other female sex organs from birth to 20 years. Age 4 50. Blood sugar 35 12 yeast cells in millions 4 25 8 130 12 390 20 400 The horizontal axis represents the amount of thyroprotein fed daily to cows. The vertical axis repre­ sents the percent increase in milk production. Thyroproteln fed .15 .20 .24 .30 grams grams grams grams Percent Increase 18 23 27 33 This test is designed to measure your understanding of the relation of factB to the solution of a problem. The over-all problem involved in this test is presented. This is followed by a series of possible solutions to the prob­ lem (hypotheses). After each hypothesis there are a number of items, all of which are true statements of fact. Deter­ mine how the statement is related to the hypothesis and mark each statement according to the key which follows the hypothesis. GENERAL PROBLEM: What factors are involved in the transmission and development of Infantile Paralysis (Poliomyletis)? HYPOTHESIS I: In man the disease is contracted by direct contact with persons having the disease. For Items 52 through 60 mark space if the item offers 1. Direct evidence in support of the hypothesis. 2. Indirect evidence in support of hypothesis. 3. Evidence which has no bearing on the hypothesis 4. Indirect evidence against the hypothesis. 5. Direct evidence against the hypothesis. 52. Monkeys free from the disease almost never catch infantile paralysis from infected monkeys. 53. The curve of number of cases of the disease in a given area is the same shape as the curve for the fly population in that area, the infantile paralysis incidence curve lagging behind the fly population curve by about two w e e k s . 54. The virus has never been isolated from the blood. 55. The virus is not found in the nasal secretion, nor in the saliva. 56. The Incubation period for infantile paralysis is from 4 to 21 days. 57. Most persons in contact with the diseased individual do not develop the disease. 58. The incidence of infantile paralysis is higher in rural districts than in the cities. 59. Gases of infantile paralysis have been found to follow the roads of communication of the population, that is, the disease spreads from populated areas along roads or rivers to other areas. 60. Even during epidemics cases are spotty, it is usually impossible to trace one case from another. 61. What is the status of hypothesis I ? 1. It is true. 2. It is probably true. 3. The data are contradictory, so the truth or falsity cannot be Judged. 4. The hypothesis is probably false. 5. It is definitely false. 360 HYPOTHESIS II: Healthy persons having had contact with diseased Individuals may carry the disease from one person to another. For items 62 through 70 mark space if the item offers: 1. Direct evidence in support of the hypothesis. 2. Indirect evidence in support of the hypothesis. 3« Evidence which has no bearing on the hypothesis. 4. Indirect evidence against the hypothesis. 5. Direct evidence against the hypothesis. 62. Monkeys free of the disease almost never catch infantile paralysis from infected monkeys. 63* It has been found that exertion prior to or at the time of infection increases the incidence of the disease. 64. Even during epidemics cases usually impossible to trace are spotty; it is one case from another. 6 5 . The virus is always found in the stools of people who have the disease. 66. Most persons in contact with the diseased Individual do not develop the disease. 6 7 . Nine out of 14 adults contacts had virus in stools, almost all child contacts have virus in stools. 68. U p to two months after contact the virus is found in the stools of persons who contacted the victims, but who did not contract the disease. 6 9 . In the stools of non-contacts the virus was found in only one person in 100. 70. The percent of cases of Infantile paralysis is higher in rural districts than in the cities. 71. What 1. 2. 3. 4. 5. is the status of hypothesis II ? The hypothesis is true. It is probably false. The data are contradictory, so the truth or falsity cannot be Judged. It is probably false. It is definitely false. 361 This portion of the test was designed to measure your ability to interpret data and to test your understand­ ing of experimentation. In each case the numbers in the first column are the numbers which you will use as your answer. Thus the table presented becomes both the source of data and your key for the questions which follow it. In each case where a test tube number or group number is called for the one which gives positive evidence for the statement should be given. Below this the control or com­ parison is called for. This is the test tube or group " number of the data which offers a comparison. For example: 1. 2. Leaf in dark Leaf in light - no starch. starch. "Light is necessary for the production of starch." You would mark space 2 because this is the positive evi­ dence, but it would be meaningless if it were not compared with the leaf in the dark. Therefore, the following "What is the control (comparison) for item 1?"would be marked space 1. Items 72 through 80 refer to the data presented below. Five test tubes, each containing a gram of protein, were set up. Mark each item according to the test tube number called for. All substances were dissolved in water*. All test tubes were kept at 37° 0. (water boils at 100° C.). For test tube 5, Substance X was boiled and then cooled before it was added to the protein. Test Tube 1 2 3 4 5. Contents of Tubes Amt. of Substance W present after 24 hours. Protein plus Substance X Protein plus water Protein plus Substance X hydrochloric acid Protein plus Hydrochloric acid Protein plus Substance X (boiled) .05 gram .00 gram .08 gram .00 gram .00 gram Give the number of the test tube which acts as a control (comparison) for the entire experiment. 73. Give the number of the test tube which gives evidence that protein does not break down spontaneously into Substance W. 74. Give the number of the test tube which gives evidence that Substance X is the active substance in the break­ down of proteins. item, 75. G i v e the n u m b e r of the tube which is the control for Item 74. 76. aiv e the n u m b e r of the test tube w h ich shows that a temper a t u r e of 37 degrees G • does n o t cause p r o ­ t ein to bre a k down into S u b stance W. 77• W h i c h test tube gives evidence that Substance X is no t a stable substance? 78. W h i c h tube is the control for item 77. 79. G ive the n u m b e r of the test tube w h ich indicates that h y d r o c h l o r i c a c i d alone is ineffective in b reaking d o w n proteins. 80. G-ive the control for item 79. Items 81 through 91 r e fer to the d ata presented below. M a r k e ach item ac c o r d i n g to the leaf num b e r called for. P l a n t A n o r m a l l y stores starch in its leaves while Plant B does not nor m a l l y store starch in its leaves. The f o l lowing experiments were p e r formed in a dark room at 7 2 ° F. G-lucose (sugar) solutions were m ade with 20 grams of glucose per 100 cubic centimeters of water. Leaves of plant A taken from a plant that h a d been in the dark f o r 48 h o urs were flo a t e d in the 5 solutions listed b e low and left in the glucose solution for an hour. Leaf 1 2 3 4 5 ______________ S o l u t i o n ___________ G-lucose Water G-lucose plus juice from Plant B G-lucose plus juice from Plant G G l ucos e plus b o i l e d juice from Plant B Analysis of leaf a f ter 4 hours. Starch in leaf No ptarch in leaf No s t a r c h in leaf No starch in leaf Sma l l amount of starch in leaf 81. Giv e the n u m b e r of the leaf w h ich does not d e v e l o p spontaneously in showed that the leaf in starch the dark. 82. This leaf indicates that a temperature of 72° n o t cause starch to form in the leaf*. 83. G i v e the n u m b e r of the leaf which is the control (comparison) for the entire experiment. 84. G ive the n u m b e r of the leaf which gives evidence that P l a n t A is capable of manufa c t u r i n g starch from glucose F. does 363 85. G-ive the n u m b e r for item 84. 86. G-ive the n u m b e r of the leaf w h i c h gives evidence that the Juice of P l a n t B is capable of preventing the m a n u f a c t u r e of s t a r c h f r o m glucose. 87. of the leaf w h i c h is the control W h a t is the c o n t r o l for item 86? 88. Giv e the n u m b e r of the leaf w h ich gives evidence that the Juices of P l a n t B contain a substance w h i c h inhibits the p r o d u c t i o n of starch in its leaves. 89. Give the leaf w h i c h is the con t r o l for item 88. 90. This leaf g i ves e v i d e n c e that the stance is n o t a stable substance. 91. W h a t is the c o n t r o l for item 90? inhibitory s u b ­ This portion of the test was d e s i g n e d to measure your ability to m a k e conclusions. W h e n facts are a n a l y z e d and s t u d i e d they s o m e t i m es y i e l d evidence which h e l p in the solution of a problem. However, any conclusion m ust be c h e cked b e f o r e it can be accepted. The fo l l o w i n g key in­ cludes four w ays in w h i c h c onclusions may be faulty. Each of the items p r e s e n t a q u e s t i o n or problem, a brief d e s c r i p ­ tion of an ex p e r i m e n t a n d one o r m o r e conclusions drawn from the experiment. E a c h ex p e r i m e n t was r epeated many times. R e a d each problem, e x p e r i m e n t and the conclusions. W h e r e severa l c o n c l u s i o n s are g i v e n evaluate each c o n c l u ­ sion separately. Is the c o n c l u s i o n tentatively Justified by the data? If so, m a r k space 1 on your answer sheet. If the conclusi o n is n o t J u s tified d e t e r m i n e whe t h e r 2, 3, 4, or 5 in the k e y is the b e s t reas o n for it b e i n g faulty and m ar k the p r o p e r space on y our a n s w e r sheet.. Key The 1. 2. 3. 4. 5. co n c l u s i o n is: Ten t a t i v e l y l u s t i f i e d . U n j u s t i f i e d b e c a u s e it does not answer the p r o b l e m . U n j u s t i f i e d b e c a u s e the e x p e r i m e n t lacks a control comparison. U n j u s t i f i e d b e c a u s e the d a t a are faulty or i n a d e q u a t e , though a con t r o l w a s included. U n j u s t i f i e d b e c a u s e it is c o n t r a d i c t e d by the d a t a . 364 PROBLEM: A student was In t e r e s t e d in developing a test for a certain type of substance. In all 100 cases h i s test was positive. 92. He concluded that the test was a specific test for the substance. PROBLEM: A n i n v e s t i g a t o r w a n t e d to k n o w w hat causes people to b r e a t h e fas t e r w h e n they are running rapidly. He f o u n d t h a t b r e a t h i n g m ore carbon dioxide i ncreased the b r e a t h i n g rate, but that the breathing of air d e f i c i e n t in oxygen did not increase the breathing r a t e . 93. He concluded that people breathe faster when they are running b e c a u s e they n e e d more oxygen. 94. Someone else c o n c l u d e d that running increases the rate of breathing. PROBLEM: A n investigator w i s h e d to determine w hether temp­ eratur e increased the rate of a certain reaction. On repeat e d tests h e found that if he started out with a certain amount of his original substances he would obtain, after one hour, 1 gram of the substance pro­ d u c e d by the r e a c t i o n at 0° C., 2 grams at 20° C., 5 grams at 40° C ., and 3 grams at o0° C . 95. He concluded that increased temperature increased the rate of the reaction. PROBLEM: A person w a n t e d to determine whether bile aided in the d i g estion of fats. He found that w henever he m i x e d pancreatic Juice with fats a small part of the fat was digested, b u t w h e n e v e r he m i x e d pancreatic juice and bile with fat, he found that the fat was completely digested. W h e n he m i x e d bile alone with fat he found that there was no digestion. 96. He concluded that fats. bile aided in the digestion of 97. A n o t h e r c o n cluded that pancreatic ary for dig e s t i o n of fats. 98. Someone else cla i m e d that bile does not aid in the digest i o n of fat. juice was n e c e s s ­ PROBLEM: A person w a n t e d to k now what caused a certain disease. H e e x a m i n e d 1000 patients with the disease. A l l h a d a certain bacteria (Bacteria A) in the digestive tract. 365 99• He concluded that Bacteria A was the cause of the disease. PROBLEM: A person wanted to know why plants bend toward the light. He placed one group of plants in the light with the light source at the right. He placed another group of similar plants in the dark. The plants in the dark grew straight, the plants in the light were bent to the right. 100. He concluded 101. Another concluded that plaints bend toward the because they need light to grow. 102. that plants bend toward the light. light Someone else concluded that light influences the direction in which plants grow. PROBLEM: Investigator A wanted to know what caused people to become ill if confined in large numbers to a small closed area. He found on repeated tests that the air in very crowded closed areas contained about 5 % carbon dioxide, while normal air contains .03/6 carbon d i o x i d e . 103. He concluded that excessive carbon dioxide caused the illness. 104. Another investigator concluded that the illness was caused by Insufficient oxygen. PROBLEM: Investigator B in an attempt to solve the same problem repeated the experiment done by investigator A but in addition had people in uncrowded rooms breathe air containing 5 % carbon dioxide. No ill effects were noted among those in the uncrowded rooms. 105. He also concluded that excessive carbon dioxide caused the illness. 106. Another investigator claimed that this showed that the disease was caused by insufficient oxygen. 107. Another conclusion was that 5$ carbon dioxide will produce no ill effects. 108. Still another claimed that people live better in uncrowded areas. 366 PROBLEM: What are some of the requirements for seeds to sprout? The same student planted two groups of seeds of different types in pots and placed one group of the pots in the light, the others in the dark. Those plants in the light were green, those in the dark were yellow. Other conditions were the same for both groups. 109. Conclusion: Light is necessary for sprouting of seeds. 110. Another conclusion: Plants require light to mature properly. This portion of the test was designed to measure your ability to interpret data. Following the data you will find a number of statements. You are to assume that the data as presented are true. Evaluate each statement according to the following key and mark the appropriate space on your answer sheet. Key 1. 2. 3. 4. 5. True: The data alone are sufficient to show that the statement is true. Probably t r u e : The data indicate that the statement is probably true, that it is logical on the basis of the data but the data are not sufficient to say that it is definitely true. Insufficient evidence: There are no data to indicate whether there is any degree of truth or falsity in the statement. Probably false: The data indicate that the statement is probably false, that is, it is not logical on the basis of the data but the data are not sufficient to say that it is definitely false. False: The data alone are sufficient to show that the statement is false. Items 111 through 131 refer to the following graph. Use the key above to answer the items. The lizard is con­ sidered to be cold blooded, the others warm blooded. Q> ST