DOCTORAL DISSERTATION SERIES

Mb

TEST ft MEASUfiE SbME i f M
/NbbtWE ASPECTS bf SC fflTM

mum
author

MEET MICE tftXSMll SUMMESTEA

UNIVERSITY

DEGREE

M/C/t STATE Coil,
£ 10.

DATE /

PUBLICATION NO. m

I UNIVERSITY MICROFILMS
ANN ARBOR

• MICHIGAN

.

fS!

COPYRIGHTED

BY
Mary Alice Horswill Burmester

1953

THE C O N S T R U C T I O N A N D V A L I D A T I O N
TO M E A S U R E S O M E OF THE
ASPECTS

OF SCIENTIFIC

OF A

TEST

INDUCTIVE
THINKING

BY
Mary Alice Burmester

A

THESIS

S u b m i t t e d to the G r a d u a t e S c h o o l of M i c h i g a n
S t a t e C o l l e g e of A g r i c u l t u r e a n d A p p l i e d
S c i e n c e In p a r t i a l f u l f i l l m e n t of the
r e q u i r e m e n t s f o r the d e g r e e of
D O C T O R OF E D U C A T I O N

D e p a r t m e n t of E d u c a t i o n
1 9

5 1

ACKNOWLEDGMENTS
The writer wishes to express appreciation for the
assistance given by Dr. Victor H. Noll,

thesis adviser,

and to the other members of the advisory committee.

She

also wishes to thank Dr. Clarence H. Nelson of the Board
of Examiners of Michigan State College for valuable sug­
gestions on the construction of test items and to ac ­
knowledge the cooperation of all of the members of the
Department of Biological Science of Michigan State College
for their aid in the study.

THE CONSTRUCTION AND VALIDATION OF A TEST
TO MEASURE SOME OF THE INDUCTIVE
ASPECTS OF SCIENTIFIC THINKING

Mary Alice Burmester

AN ABSTRACT
Submitted to the Graduate School of Michigan
State College of Agriculture and Applied
Science in partial fulfillment of the
requirements for the degree of
DOCTOR OF EDUCATION

Department of Education
1 9

Approved

5

1

The purpose of this study was to devise a valid test to measure
some of the induotive aspects of the ability to think scientifically,
in the area of biologioal science.

The educational objectives related

to scientific thinking were formulated and were defined in terms of
desired behaviors involved.

In all, 98 behaviors were recognized as

attending the critical, as opposed to the creative, aspects of scientific
thinking.

Nine tryout tests, consisting of a total of 637 items were

constructed to evaluate these behaviors.

These tests were administered

during the spring term of 1950 to 168 students taking the third term of
the three-term sequence of Biological Science At Michigan State College.
Item validity and item difficulty were calculated for eaoh item of the
tryout tests.
Test I, The Ability to Think Scientifically, constructed from
discriminating items of the tryout tests, consisted of 150 items.

Test

I was administered in the spring of 1950 to 500 students at the end of
the three-term sequence of Biological Science, and in the fall of 1950
to another group of 240 students who had had no college biology.

The

reliabilities of the test for the two groups were .89 and .91 respect­
ively.

Because Test I proved too long, 25 of the poorer items, as

identified by item analysis, were eliminated.

The remainder constituted

Test IA, The Ability to Think Scientifically. This test was administered
in the fall of 1950 to 330 students who had had no college biology, and
to 136 of these same students after completion of one term of Biological
Science.

The reliabilities for the two groups were .91 and .90 respect-

2
The eurrieular validity of the test was established by:
1.

Designing the test items to measure the behaviors involved
in soientifio thinking.

2.

Submission of the tryout tests to competent judges for
criticism.

3.

Using free responses of students as items wherever feasible.

4.

Careful selection of materials utilized in the construction
of the test items.

Three general methods were used in the statistical validation of
the test, namely,
1.

Scores made on the test of the ability to think scientific­

ally were correlated with measures of intelligence, of reading ability
and of knowledge of biological facts.

These correlations ranged from

•33 to .51.
2.

Mean scores made by students who had had no college biology

were compared with mean scores made by students who had had Biological
Science.

The means of those having had Biological Science were signifi­

cantly higher.
3.

Scores made on Test IA by 143 students were compared with

ratings of these students by their instructors on their ability to think
scientifically.

The chi-square test, a comparison of means of students

receiving superior, average and inferior ratings, and a correlation of
scores on the test with the ratings all gave evidence of the statistical
validity of the test.

The correlation between scores on the test and the

ratings of the instructors was .77 for the test when administered as a
pretest, and .72 when administered as a post-test.

Mary Alice Burmester
candidate for the degree of
Doctor of Education

Final examination, May 10, 1951, 3*00 F. M.
Dissertation:

The Construction and Validation of a Test to
Measure Some of the Inductive Aspects of
Scientific Thinking

Outline of Studies
Major subject:
Education
Cognate area:
Physiology
Biographical Items
Born, September 1, 1909, Oakland, California
Undergraduate Studies,
Graduate Studies,
Experience:

University of California, 1926-1930

University of California, 1930-1933
Michigan State College, 1946-1951

Teaching Assistant in Physiology, University
of California, 1931-1934,
Instructor in
Biological Science, Michigan State College,
1945-1948, Assistant Professor in Biological
Science, Michigan State College, 1948-1951

Member of Kappa Delta Pi, Phi Sigma, Sigma Xi

TABLE OF CONTENTS
CHAPTER
I.

PAG-E

THE BACKGROUND OF THE PROBLEM ..............
Introduction
The problem

..........................

1

.......

, 10

Statement of the problem.................

10

...........

11

.........

11

...............

12

Delimitation of the problem
Basic assumptions of thestudy
Importance of the study

Organization of the remainder of
II.

1

thethesis

13

REVIEW OF RESEARCH RELATED TO THE PROBLEM

.

15

Steps and skills of scientific thinking

.

15

The measurement of problem-solving
abilities

...........................

22

Summary concerning tests on abilities
involved in problem-solving

.....

60

Relationship between problem-solving and
other abilities

........................

63

Relation of intelligence to abilities
involved in problem-solving..... ......

63

Summary of studies concerning the
relation of intelligence to problem­
solving

..............................

Educability in problem-solving

......

73
75

Summary of studies on educability in
problem-solving

......................

82

iv
CHAPTER

PAGE
Relation of reading to abilities
involved in problem-solving

.........

82

Summary of studies concerning the
relation of reading ability to
problem-solving

......................

85

Relation of factual information to the
abilities involved in problem-solving .

85

Summary of studies concerning the rela­
tion of knowledge of facts to problem­
solving abilities

....................

89

Summary of research related to the
problem
III.

...............................

GENERAL PROCEDURES INVOLVED IN THE DEVELOP­
MENT OF THE TEST

IV.

89

.........................

THE DEVELOPMENT OF THE TEST ITEMS

.........

92
107

The formulation of the educational
objectives

.............................

The definition of the behaviors

107

.........

109

Methods used to determine the behaviors.

109

An outline of the behaviors

116

...........

The location of the source materials from
which the items could be constructed

..

121

The construction of the evaluation
instruments

............................

124

Analysis of the tryout tests in terms of
the behaviors involved

................

157

V

CHAPTER
V.

PAG-E
THE STATISTICAL ANALYSES OF THE TESTS AND
THE TEST ITEMS.... ...........................

144

Methods used initem-analysis

..............

144

....................

146

Analysis of tryouttests

Analysis of Test A - Some Steps in
Scientific Thinking

..... .

146

Analysis of Test B - The Delimitation
of Problems

..........................

148

Analysis of Test C - Experimental
Procedures

.......................

150

Analysis of Test D - Organization of
Data

..................................

152

Analysis of Test E - Evaluation of
Hypothesis

...........................

154

Analysis of Test F - Experimentation
and the Interpretation of Data

......

155

Analysis of Test G - Drawing of
Conclusions

..........................

156

Analysis of Test H - Interpretation of
Data

..................................

157

Analysis of Test J - G-eneralizations
and Assumptions

.....................

159

Analysis of tryout tests considered as a
single test

........................

Intercorrelations

oftryout test scores.

162
162

Vi
CHAPTER

PAGE
Correlations of scores on tryout tests
with scores on intelligence and read­
ing tests

...............................

171

The preparation of Test I - The Ability
to Think Scientifically

..............

Analyses of Test I and Test IA

173
175

Analysis of Test I - The Ability to
Think Scientifically

.................

175

Analysis of Test IA - The Ability to

VI.

Think Scientifically

.................

183

THE VALIDATION OF THE TEST

.................

187

The curricular validation of the test
The statistical validation of the test

...

188

..

192

Validation by correlation with measures
of intelligence, reading ability, and
factual information

............

192

Validation by comparison of scores of
various groups

........................

198

Validation by comparison of scores with
ratings of students by competent
judges
VII.

.................................

SUMMARY AND CONCLUSIONS
Summary

....................

.......

Conclusions

211
211

................................

Educational implications

202

.................

219
220

vii
CHAPTER

PACE
Educational Implications for Biological
Science at Michigan State College

...

220

courses in general education.... ......

221

Educational implications for science

Other educational implications

......

Problems suggested by the study .........
LITERATURE CITED
APPENDIX I
APPENDIX II
APPENDIX I I I

AFPENDIX IV

.....................................

222
222
227

..........................................

236

.........................................

351

.........................................................................................................

.........................................

383

403

LIST OF TABLES
TABLE

I.
II.
III.

PAGrE

Behaviors Measured by the Tryout Tests
Pertinent Data for Test A

...

.................

138
148

Item Analysis Data on the Seven Items of
Test B which Measured Ability to Recognize
Assumptions Underlying Problems

.............

149

IV.

Pertinent Data for Test B

.................

150

V.

Pertinent Data for Test 3

............

151

VI.

Pertinent Data for Test D

.................

154

VII.

Pertinent Data for Test S

...........

155

VIII.

Pertinent Data for Test F

.................

156

IX.

Pertinent Data for Test G-

.................

157

X.

Pertinent Data for Test H

.................

158

XI.

Pertinent Data for Test J

.................

159

XII.

Comparison of Means, Standard Deviations,
and Reliabilities of the Tryout Tests

XIII.

.,

160

Comparison of Mean Item Validities and Mean
Item Difficulties of the Tryout Tests

..

161

Pertinent Data for the Tryout Test Battery.

162

XV.

Intercorrelations of Tryout Test Scores

163

XVI.

Intercorrelations of Tryout Test Scores

XIV.

Corrected forAttenuation
XVII.

.♦

..................

165

Coefficients of Determination of Tryout
Tests

.......................................

166

ix
TABLE
XVIII.

PAO-E
Correlation of Total Scores on Tryout Test
Battery with Each of the Tryout Tests

XIX.

167

Multiple Correlation of Tryout Total with
Two of the Tryout Tests

XX.

..

....................

169

Multiple Correlation of Tryout Tests with
the Criterion - Obtained by the tfherryDoolittle Method

XXI.

............................

Correlation of Tryout Test Scores with
Intelligence Test and Reading Test Scores

.XXII.
XXIII.

170

Pertinent Data for Test I

..................

172
177

Comparison of Discrimination Indices and of
Difficulty Indices of Identical Items as
Obtained from Item Analysis of Tryout
Tests and as Obtained from Item Analysis.

XXIV.

178

Summary of Item Analysis Data for Tryout
Test Items Used in Construction of Test I,
Items of Test I, and Items of Test I Used
in Construction of Test IA

XXV.'
XXVI.

................

Pertinent Data for Test IA ..................

183
185

Correlation of Tryout Test Scores and
Scores on Test I with Psychological
Examination Scores and Reading Test

XXVII.

Scores

194

Intercorrelations of Tryout Test,
Psychological Examination, and Reading
Test

.........................................

195

X

table
XXVIII.

page
Intercorrelation of Test I, Psychological
Examination and Reading Test

XXIX.

..........

Intercorrelation of Total Tryout Test
Scores and Scores on Other Tests

XXX.

196

.....

197

Comparison of Means and Standard Deviation
of Test I for a 3-roup Before Taking Bio­
logical Science with Another Group After
Taking Three Terms of Biological Science

XXXI.

200

Comparison of Means and Standard Devia­
tions of Test IA on Pre-Test and PostTest

XXXII.

................................

Expectancy Chart Showing the Comparison of
Scores on Test IA Pre-Test and Ratings .

XXXIII.

206

Expectancy Chart Showing the Comparison of
Scores on Test IA Post-Test and Ratings

XXXIV.

201

207

Mean Gains of Students Rated as Superior,
Inferior, and Average on Test IA

.....

208

XXXV. Differences in Means and Critical Ratios
of Differences between Students Rated
Superior and Students Rated Average and
Students Rated Average and Students
Rated Inferior

.........................

209

XXXVI.

Item Analysis Data for Test A

..........

246

XXXVII.

Item Analysis Data for Test B

...........

256

XXXVIII.

Item Analysis Data for Test C

...........

269

xi
TABLE

Pa GE

XXXIX.

Item Analysis Data for Test D

.............

279

XL.

Item Analysis Data for Test E

.............

289

XLI.

Item Analysis Data for Test F

.............

301

XLII.

Item Analysis Data for Test G

.............

317

XLIII.

Item Analysis Data for Test H

.............

340

XLIV.

Item Analysis Data for Test J

............

345

XLV.

Item Analysis Data for Test I

.............

373

CHAPTER I
THE BACKGROUND OF THE PROBLEM
INTRODUCTION
With the growth of a general education program in
the secondary schools and the lower college years there
has been an increased emphasis upon the

acquisition of

knowledge, skills, and attitudes which are required for
participation in a democratic society.^

One of these

skills, which has become a major objective of education,
is the ability to solve problems.

This objective has been

stated variously by different educators.

They refer to it

as reflective thinking, critical thinking, clear thinking,
or as scientific thinking.

Although different terms are

used they all refer to the kind of thinking involved in
the solution of a problem.
As early as 1909, Dewey
scientific habits of mind.

2

advocated the teaching of

He asserted then and has con­

tinued to contend^ that the problem of problems in our
1

American Council on Education, Executive Committee
of the Cooperative Study in General Education, Cooperation
in General Education. Washington: American Council on
Education. 194-7. p. 12.
^ John Dewey, How We Think.
Company.
1909.
(preface).

Boston: D. C. Heath and

^ John Dewey, "Method in science teaching.’*
Education. 29:119-23* April, 1945.

Science

2
education is to discover how to teach scientific habits of
thought.

Almost every major educational committee in the

last twenty-five years has emphasized the importance of
this instructional objective, not alone as an objective of
science courses, but as an objective for general education..
Evidence for this is presented in the paragraphs that follow.
Eurich,

4

in a report in the Thirty-eighth Yearbook of

the National Society for Education said that there should be
a "deepened desire to do something that will make education
more effective than it has been in the past, largely, per­
haps, in the hope that future generations will be able to
solve better such social problems as those that baffle pres­
ent-day society."
The Educational Policies Commission^ in 1944 made a
plea for the reorganization of the secondary schools of
America.

A plan was presented for the education of all

American youth.

The following quotation gives the broad

outline of this plans
Schools should be dedicated to the proposition that
every youth in these United States - regardless of sex,
economic status, geographic location, or race - should

Alvin C. Eurich, "A renewed emphasis upon general
education," in G-eneral Education in the American College.
Thirty-eighth Yearbook of the National Society for the Study
of Education, Part II, p. 6-7• Bloomington, Illinois:
Public School Publishing Company, 1939*
^ Educational Policies Commission, Sduoation for All
American Y o u t h . Washington: National Education Association.
1944.
p. 21.

3
experience a broad and balanced education which will
(1) equip him to enter an occupation suited to his
abilities and offering reasonable opportunity for
personal growth and social usefulness; (2) prepare
him to assume the full responsibilities of American
citizenship; (3) give him a fair chance to exercise
his right to the pursuit of happiness; (4) stimulate
intellectual curiosity, engender satisfaction in in­
tellectual achievement, and cultivate the ability to
think rationally; and (5) help him to develop an
appreciation of the ethical values which should under­
gird all life in a democratic society.
It is the duty
of a democratic society to provide opportunities for
such education through its schools.o
Further evidence that the ability to think critically
is a major objective of education is supplied by the follow­
ing statement of a committee which evaluated educational
objectives;

"The committee believes that the ability to

think reflectively and the disposition to do so in all the
problem situations of life is an especially important educa­
tional o b j e c t i v e . T h i s

same committee stated that this

ability is ''peculiarly necessary in a democracy, where each
is expected to take part in policy-making."®
The importance of this objective is also emphasized
in the following quotations
The responsibility of secondary schools for training
citizens who can think clearly has been so long and so
frequently acknowledged that it is now almost taken for
granted.
The educational objectives classifiable under
the generic heading "clear thinking" are numerous and
varied as to statement, but there can be little doubt

^ Loc. c i t .
^ Progressive Education Association, Science in
General Education. New York:
D. Appleton-Oentury Company.
1 9 3 8 . p. 306.
® Ibid., p. 46.

4
concerning their fundamental Importance. Although in
recent years there has been increasing recognition of
other responsibilities and purposes, there has been
little accompanying tendency to demote clear thinking
to a minor role as an educational objective.
It was
therefore not surprising to find considerable emphasis
upon this objective in the statements of purposes sub­
mitted to the Evaluation Staff by the schools partici­
pating in the Eight-Year Study.9
The Harvard Committee1^ and the President's Commission
on Higher Education11 both recognized reflective thinking as
a major objective of education.

The much quoted report of

the Harvard Committee on G-eneral Education stressed the values
of reflective thinking.

According to this report abilities

which should be sought above all others in the general educa­
tion program are the ability to think effectively, to communi­
cate thought, to make relevant judgments, and to discriminate
among values.

The President’s Commission on Higher Education

included the ability Mto acquire and use the skills'and habits
involved in critical and constructive thinking" as one of the
eleven basic objectives o*f general education.
As may be seen from the above discussion the ability to
solve problems is a stated objective of general education

9 Eugene R. Smith, Ralph W. Tyler and the Evaluation
Staff, Appraising and Recording Student Progress. New York:
Harper and Brothers.
1942.
p. 35.
1(^ Harvard University, General Education in a Free
Society. Cambridge: Harvard University Press.
1945.
p. 65.
11 President's Commission on Higher Education, Higher
Education for American Democracy. Volume I. Establishing
the Goals. New York: Harper and Brothers.
1947* PP. 57-58.

5
for all subject-matter courses.
stated as a major objective.

For science courses it Is

Problem-solving was mentioned

as a specific objective of science teaching as early as
1920, when the report HReorganization of Science in Second­
ary Schools” 12 suggested ways in which science instruction
could contribute to the ’’Cardinal Principles of Secondary
Education."

In this report it wa3 stated that useful methods

of solving problems were specific values of the study of
science.
The development of scientific attitudes was mentioned
as one of the major objectives of science teaching in the
Thirty-first Yearbook of the National Society for the Study
of Education.^ The Progressive Education Association lists
the ability to think reflectively as one of the five broad
areas of needs of adolescents.^
In "Science Education in American Schools," certain
criteria were established for the formulation of objectives.
The recommendations were made that objectives should be
practicable for the classroom teacher.

They also should be

National Education Association, Reorganization
of Science in Secondary Schools. U. S’. Bureau of Education
Bulletin, 1920, No. 26, Washington: G-overnment Printing
Office,
pp. 12-15.
13

Program for Teaching Science. Thirty-first Year­
book of the National Society for the Study of Education,
Part I, p. 44. Bloomington, Illinois:
Public School
Publishing Company,
1932.
14
A Progressive Education Association.

s-

o£. clt.. p. 46.

6
psychologically sound, possible of attainment, universal in
a democratic society and should indicate the relationship
of classroom activity to the desired changes in behavior.
On the basis of these criteria the committee suggested
eight categories of objectives; one of these was problem­
solving skills
That problem-solving skills are still one of the
major objectives of the teaching of science is attested to
by the fact that the Committee on Research in Secondary
School Science of the National Association for Research in
Science Teaching has set as one of its major tasks the
identification of some of the important problems dealing
with the teaching of problem-solving.
Not only is the ability to solve problems a major
objective of the secondary and elementary schools; but as
shown by the following examples, it is also stated as a
major objective of science teaching at the college level.
The Harvard r e p o r t ^ recommended that a part of the general
education program in colleges be the teaching of an under­
standing of the means by which science has progressed.
^ Science Education in American Schools. Fortysixth Yearbook of the National Society for the Study of
Education, Part I, pp. 19-4-0. Chicago: University of
Chicago Press.
194-7.
16

Committee on Research in Secondary-School Science,
"Problems related to the teaching of problem-solving that
need to be investigated." Science Education. 34-: 180-184-,
April, 1950.
I? Harvard University, ojd. clt.. pp. 220-230.

7
Gray1® in 1931 listed "facility in application of the
scientific method" as one of the objectives in the teaching
of biology at the University of Chicago. In 1937, Greulack1^
in a committee report gave as one of the desired outcomes of
biology teaching the development of scientific methods of
thinking.

To Impart knowledge of the scientific method and

encourage its use in thinking were listed as major object­
ives for the biology course at the University of Minnesota.
Although the ability to think scientifically has been
stated as a major objective of science by almost all educa­
tors there are still many unsolved problems in regard to this
objective.

In fact, as one considers the list of problems

presented by the Committee on Research In Secondary-School
Science

21

one wonders if anything at all is known about the

teaching of the scientific method.

The major problem areas

considered by the committee were:
1.

What is the nature of problem-solving In science?

18

William S. Gray, editor, Recent Trends in American
College Education. Chicago: University of Chicago Press.
1931. pp. 61-67.
19

Muskingum College, A College Looks at its Program.
Columbus: The Spahr and Glen Company.
1937.
pp. 139-146.
20

Ivol Spafford, editor, Building a Curriculum for
General Education. Minneapolis: The University of Minnesota
Press.
1943. pp. 243-261.
21

Committee on Research in Secondary-School Science,
o p . clt.. pp. 180-184.

8
2.

How should problem-solving be taught?

3.

How should ability in problem-solving
be measured?

Approximately 150 problems were suggested by 53 of
the members of the National Association for Research in
Science Teaching who replied to a questionnaire concerning
problems needing solving in the above areas.

Some of the

questions concerning the nature of problem-solving object­
ives in science teaching-learning situations which need to
be answered and which are related directly or indirectly
to the present investigation are:
A.

B.

C.

What are the specific skills and abilities
necessary for successful problem-solving?
1.

Is problem-solving one ability or a composite
of many different abilities?

2.

What are the fundamental components of the
problem-solving ability?

3.

What is the relationship of problem-solving
ability to general intelligence?

4.

Does the development of ability to solve
problems depend chiefly upon thesubject
matter material or upon the manner in which
it is presented?

What is the relationship of individual differences
in the following factors to the teaching of prob­
lem-solving?
1.

Ability to reason.

2.

Ability to read.

What techniques can be used to measure a person's
problem-solving ability?

9
1.

Can the several kinds of problem-solving
ability be expressed in any common measure?

2.

Can the several components of problem­
solving ability be appraised individually?

3.

How can the validity of techniques for
measuring problem-solving ability be
established? Reliability?22

Almost all of the questions presented above are
based on the assumption that there will be improvement in
the ability to think scientifically if the teaching is
directed toward that objective.

But is this true?

Some

educators believe that the ability is an inherent one and
that it does not yield to educative efforts.

This point of

view will be discussed more fully in Chapter II.

Answers

to most of the questions concerning methods of teaching
scientific thinking, and the nature of scientific thinking
depend upon a valid instrument to measure the ability to
think scientifically.

Although some tests have been devised

to test certain abilities Involved in scientific thinking,
there are few if any tests now available which attempt to
measure all of the inductive aspects of scientific thinking;
nor are there any tests especially designed to measure these
aspects of thinking for a course in first year college
biology.
The present study is an outgrowth of an interest in
writing laboratory studies for the laboratory guide used in

Loc. cit.

10
Biological Science at Michigan State College which purports,
among other things, to teach the student to think scientif­
ically.

Early in the evaluation of the effectiveness of the

laboratory studies it became evident that until some measur­
ing device for the ability to think scientifically was avail­
able no evaluation of the methods used in this laboratory
guide was possible.
THE PROBLEM
Statement of the problem.

The purpose of this study

was to devise a valid test to measure some of the inductive
aspects of the ability to think scientifically.
The construction of test items required the identifi­
cation of skills, and steps involved in scientific thinking,
and the definition of behaviors which would give evidence of
the ability to perform these skills.

The validation of the

test required the investigation of the relationship of what­
ever was measured by the test to (a) Intelligence, (b) read­
ing ability, (c) knowledge of biology, and (d) other measures
of the ability to think scientifically, as evidenced by lab­
oratory situations.

In addition, it would require investiga­

tion to determine whether there was an increase in proficiency
on the test after the completion of a course in biology which
had as one of its major objectives the teaching of the ability
to think scientifically.

11
Delimitation of the problem.

The problem was limited

to the construction of a test to measure the critical aspects
of the inductive phases of scientific thinking. In this study
the aspects of scientific thinking which were not creative
activities, such as the sensing of a problem and the actual
formulation of hypotheses, have been considered the critical
aspects of thinking.

A more detailed definition of these

critical aspects of thinking and the reasons for limiting the
test to the critical aspects will be discussed in Chapter IV.
The reason for also limiting the test to the inductive phases
was the fact that these phases of thinking were emphasized in
the writing of laboratory studies for the course in Biological
Science at Michigan State College.

The items of the test were

chosen from biological areas because the test was specifically
devised for a course in first year biological science at the
college level.

No attempt has been made in this study to d e ­

vise items to test the ability to apply principles of biology
to new situations, nor has any attempt been made to construct
items to test the attitudes which are assumed to attend the
ability to think scientifically, namely, the scientific atti­
tudes.

;v„

Basic assumptions of this study.

The following are

the major assumptions which underly this research.
1.

Individuals differ in their ability to think

scientifically.
2.

These differences can be measured by direct

12
observation of the behavior of the Individuals, and by
indirect methods such as paper and pencil tests,
3.

There are a number of skills involved in

scientific thinking,
4.

The behaviors which attend these skills can be

described with sufficient objectivity to permit the devising
of valid test items.
5.

A sampling of an individual's reactions will give

a measure of his reactions to a much larger range of situa­
tions.
6.

The Investigation of the ability to think scien­

tifically is an important area of educational research.
Importance of the study.

If the ability to think

scientifically is an innate ability or if it is in reality
23 24
general intelligence, as some educators believe,
*
it
is useless to attempt to attain it through the teaching of
science.

If, on the other hand, the ability is not innate

or identical with general intelligence, as most educators
believe, it should be teachable and it should be possible
to determine which methods of teaching are most effective.
23

Marion L. Billings, "Problem-solving in different
fields of endeavor." American Journal of Psychology.
46:259-272, April,
Ben D. Wood and F. S. Beers, "Knowledge versus
thinking."
Teachers College Record. 3 7 J487-499, March,
1936.

13
In order to determine which of the above contrary opinions
Is correct, a test for the ability to think scientifically
should be available.
ORGANIZATION OP THE REMAINDER OP THE THESIS
In Chapter II is presented a review of the research
literature related to the problem.

The first area of re­

search reported is concerned with the identification of the
steps involved in scientific thinking. The second portion of
the review of literature is devoted to a discussion of tests
which have been devised to measure various aspects of scien­
tific or critical thinking. This discussion is followed by a
review of research on the relationship of various aspects of
critical thinking to such factors as intelligence, reading
ability and knowledge of facts.
Chapter III is a discussion of the procedures involved
in the development of a test designed to measure the ability
to think scientifically.
Chapter IV is concerned with the steps involved in the
development of the test items.

The objectives, their defini­

tion in terms of desired behaviors, and illustrations of test
items are included in this chapter.
Chapter V is concerned with the statistical analysis
of the test and the test items.

Item analysis data on the

items of the preliminary tests and the statistical treatment
of the preliminary and final forms of the test are presented.

14
Methods used to validate the test are presented In
Chapter VI.
Chapter VII brings together the findings of this
study with the conclusions to be drawn from them.

This is

followed by a discussion of the problems which the study
has suggested and by the educational implications of the
research.

CHAPTER II
REVIEW OF RESEARCH RELATED TO THE PROBLEM
In order to devise a test to measure the ability to
think scientifically, it was necessary to determine the steps
and skills involved in the use of the scientific method.
Literature on this aspect of the problem is presented.

This

is followed by a review of tests which have been devised to
measure various phases of scientific thinking. Previous work
on the relation of the ability to think scientifically to
various other characteristics such as intelligence, reading
ability, and factual information is presented.

A few studies

on educability in ability to think scientifically are
discussed.
STEPS AND SKILLS OF SCIENTIFIC THINKING
Although much of a philosophic nature has been written
on scientific method and individual scientists have described
their methods of solving such problems, a review of these
works has not been attempted here.

Instead, the emphasis was

placed on research aimed at determining the nature of this
method.

One exception was made in the case of Dewey, since

he has been quoted frequently as an authority on problem-solv­
ing.

The steps of problem-solving as conceived by Dewey‘S ares

^ John Dewey, How We Think.
Company.
1909.
p. 72.

Bostons D. C. Heath and

1.
2.
3.
4.
5.

A felt difficulty.
Its location and definition.
Suggestion of possible solution.
Development by reasoning of the bearings
of the suggestion.
Further observation and experimentation
leading to Its acceptance or rejection.

Until fairly recently little research had been done
to determine the nature of the scientific method, although
much has been written in the past 30 years on the desiro
ability of teaching this method.
Keeslar
surmised that
the reluctance on the part of educators to investigate the
steps of the method was due,

(1) to the fact that problem­

solving depends to some extent on the nature of the problem
and, (2) to the tendency among researchers and writers to
confuse the elements of the scientific method with scien­
tific attitudes.
One of the earliest analyses of the elements of the
scientific method was made by Downing^ in 1928.

For his

steps in scientific thinking h e drew upon illustrations from
the history of science. In his list he Included elements and
safeguards of the scientific method.

His safeguards were,

in some instances, skills involved such as; inferences must

Oreon Keeslar, *’A survey of research studies
dealing with the elements of scientific method as objectives
of investigation in s c i e n c e . Science Education. 29? 212216, October, 1945.
^ Elliot R. Downing, MThe elements and safeguards
of scientific thinking.’1 Scientific M o n t h l y . 26:231-243,
March, 1928.

be tested experimentally and, in other cases,
attitudes such as;

they were

Judgment must be unprejudiced.

It was

4.

Keeslar*s

opinion that this failure to distinguish a t t i ­

tudes from elements has led to confusion of later workers
and may have prevented a clear-cut definition of scientific
method.
Tyler® discussed phases of scientific thinking in
relation to the construction of tests to measure this abil­
ity.

Davis,® LeSourd,^ Downing,® and Beauchamp^ described

classroom techniques for the teaching of phases of scien­
tific thinking.

Curtis*0 analyzed the foregoing discussions

and also incidents in the history of science.

On the basis

of these analyses he presented the following characteristics
of scientific method as distinct from scientific attitudes.
-

Keeslar, op. pit., p. 212.
5 Ralph W. Tyler, Constructing Achievement T e s t s .
Columbus, Ohio:
Ohio State U n i v e r s i t y . 1934.
pp. 24-30.
® Ira C. Davis, "is this the scientific method?"
School Science and M athematics. 34: 83-86, January, 1934.
Homer W. LeSourd, "Teaching scientific method."
School Science and M a t h e matics. 34; 234-235, March, 1934.
® Elliot R. Downing, "Teaching scientific method."
School Science and M a t h e matics. 34; 400-405, April, 1934.
^ Wilber L. Beauchamp, "Teaching scientific method."
School Science and Mathematics. 34; 508-510, May, 1934.
10 Francis D. Curtis, "Teaching scientific methods."
School Science and M athematics. 34: 816-819, November, 1934.

18
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Locating problems.
Making hypotheses, or generalizations from
given facts or observations.
Recognizing errors and defects in conditions
or experiments described.
Evaluating data or procedures.
Evaluating conclusions in the light of facts
or observations upon which they are based.
Planning and making new observations to find,
out whether certain conclusions are sound.
Making inferences from facts and observations.
Inventing check experiments.
Using controls.
Isolating the experimental factors.

In 1937, C r o w e l l ^ prepared a list of 29 attitudes
and 25 skills involved in scientific thinking.

This list

was derived from books and articles on philosophy, logic,
science education, and science measurement.

This list was

presented to 64 science educators for evaluation.

The

skills

rated as Important by 80 percent of the Judges are

listed

below in the
1.
2.
3.
4.
5.
6.
7.
8.

order of their importance.

Skill in observing accurately.
Skill in recording observations accurately
and
orderly.
Skill in forming independent Judgments based
on facts.
Skill in distinguishing between a fact and a
theory.
Skill in picking out pertinent elements from a
complex situation.
Skill in recognizing errors and defects in
conditions and processes.
Evaluating conclusions in the light of facts
or observations on which they are based.
Isolating the experimental factor.

Victor L. Crowell, Jr. MThe scientific method.H
School Science and Mathematics. 37:525-531, May, 1937.
12 Loc. cit.

19
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.

Forming sound Judgments concerning adequacy
of data.
Synthesizing or putting together separate facts
to form a conclusion.
Gathering data systematically.
Planning an experiment to determine whether or
not a proposed hypothesis is true.
Evaluating data and procedures.
Recognizing omissions or deficiencies in set ups.
Profiting from worthwhile criticism (an attitude?).
Forming a reasonable generalization.
Arranging and classifying data in sequence and
making conclusions obvious.
Applying general principles to a new situation.
Recalling selectively items essential to a
problem.
Locating problems.
Disregarding irrelevant facts.
Directing imagination into new and worthwhile
channels.
Using the scientific instruments common in the
laboratory•

Although 23 skills were rated as Important by 80 per­
cent of the respondees, no attempt was made to organize these
skills into a plan for over-all problem-solving techniques.
Until 1945, when Keeslar1^ reported his study on the
elements of scientific method, no adequately validated list
of these elements, was available.

His original list of ele­

ments of scientific method was prepared on the basis of a
survey of 43 books and articles on the scientific method.
This list was then presented for validation to 22 research
scientists at the University of Michigan.

Elements consid­

ered to be of minor importance by the Judges were eliminated
from the list.

^

The 42 remaining items were considered and

Keeslar, o£. c l t .. pp. 212-216.

combined, and were reorganized to form a final list of 10
major and 17 minor elements set forth in the order in which
they might logically be expected to occur in the solution
of a problem.

This list was then checked by three special­

ists in the teaching of science.
The following is Keeslar's1^ list of major and minor
elements of scientific thinkings:
I.
II.

III.

IV.

V.
VI.

Sensing a problem and deciding to trv to find the
answer to i t . (italics in the original)
Defining the problem. (italics in the original)
Stating the problem in words.
Analyzing the problem into its essential factors.
Studying the situation for all facts and clues
bearing upon the problem. (italics in the original)
Drawing upon past experience, both personal and
those reported in literature, for possible
explanations or generalizations to account
for the phenomena observed.
Making the best tentative explanations or hypotheses
as to the possible solution of the problem. (italics
in the original)
Recognizing the assumptions which must be made If
one goes beyond the known facts in formulating
a hypothesis.
Selecting the most likely hypothesis.
the original)

(italics in

Inventing and carefully planning one or more experi­
ments to test the hypothesis, isolating the experi­
mental factor wherever possible by using a control,
(italics in the original)
Deciding upon the kinds of evidence which should
be collected.

21

Choosing reliable methods of collecting the
evidence.
Refining measuring instrument to the degree
warranted by the nature of the problem.
Practicing to gain skill in manipulation in
order to secure accurate results.
VII.

Testing the hypothesis bv carrying; out the exper­
iment with great care and accuracy. (italics in
the original)
Preventing, as far as possible, all uncontrolled
variations in the conditions which might affect
the results.
Making quantitative measurement of experimental
results and estimating the probable error of
such measurements.
Recording the results, adhering strictly to
standard definitions and usage of scientific
terms.
Organizing the pertinent data so that they may
be studied and summarized.

VIII.

Running check experiments involving the same exper­
imental factor to verify the results secured in the
original experiment, (italics in the original)
Studying the condition of the experiment in order
to detect any omissions, defects, or errors,
particularly those errors which might have been
introduced in the experimental results by coin­
cidence or chance.
Recognizing and, if possible, checking further
the validity of the assumptions involved in
setting up the experiment.

IX.

X.

Drawing a conclusion. (italics in the original)
Arriving at a solution to the problem based on an
honest, unbiased appraisal of the data.
Suspending Judgment when results are not conclusive.
Galling attention in the conclusion to those basic
assumptions which it has been necessary to main­
tain throughout the procedure.
Making inferences based on this conclusion when
facing new situations in which the same factors
are operating.14 (italics in the original)

14 Keeslar, l o c . cit.

22

Keeslar1^ concluded that the elements of the scien­
tific method are definite, are distinct from attitudes, and
are known and used by scientists.

There was a high degree

of agreement among the research scientists concerning the
nature of these elements, thereby indicating that the scien­
tific method has developed beyond the introspection stage
and that teaching and testing can be based upon these skills.
The 46th Yearbook1^ presented a somewhat more comprehensive
list of skills than Keeslar's.

Apparently it waB based on

Keeslar's list plus additions from various other sources.
The foregoing discussion has presented a brief survey
of the research which has led to a definition of scientific
method.

It is interesting to note that the steps conceived

by Dewey1*^ in 1909, were basically the same as those derived
from research in this area.
THE MEASUREMENT OF PROBLEM-SOLVING ABILITIES
In the last three decades a number of tests have been
devised to measure various phases of scientific thinking.
Some of these tests purported to measure numerous behaviors
15

Loo. clt.

^ Science Education in American S c h o o l s . Fortysixth Yearbook of the National Society for the Study of
Education, Part I, pp. 145-147.
Chicago: The University
of Chicago Press, 1947.
Dewey, ojd. c l t .. p. 72.

while others were designed to m e a s u r e v e r y specific b e h a v ­
iors; such as, the ability to interpret data,
to plan experiments.

or the ability

The following d i s c u s s i o n presents the

historical sequence of the tests w h i c h h a v e b e e n d e v i s e d a n d
the techniques which h ave been u s e d to a p p r a i s e the a bilities
involved.
As Glaser

lfi

has poi n t e d out,

several of the ab ilities

included under the concept of the a b i l i t y to think critically
are, to some extent, m e a s u r e d by intelligence tests. A l t h o u g h
such tests may be related in g e n e r a l to tests of scientific
thinking no a t t e m p t will be made in this r e v i e w to include
tests or parts of tests which pur p o r t to m e a s u r e g e n e ral in­
telligence or any of its aspects.
Tests and scales h a v e been d e v i s e d to measure both the
skills involved in problem-solving a n d the attitudes which
attend these abilities.

Some p u r p o r t e d to m e a s u r e b o t h skills

and attitudes while others, which w e r e called a t t i t u d e tests,
contained some of the skills involved in scientific thinking.
This r e view of tests will be limited to those w h ich seem to
measure skills involved in scientific thinking, a n d w ill not
include tests and scales that m e a s u r e at t i t u d e s only.
One of the earliest tests d e v i s e d to mea s u r e the ability

18

Edw a r d M. G-laser, A n E x p e r i m e n t in the Development
of Critical T h i n k i n g . Contributions to Education, No. 843.
New York: Bu r e a u of Publications, Teachers College,
Columbia University.
1941.
p. 73.

24
to think scientifically was published in 1918 by H e r r i n g . ^
On the basis of an analysis of the work of such men as
Francis Bacon, John Stuart Mill, a n d Karl Pearson, Herring
selected eleven processes which h e believed could be evalu­
ated by a test.

Herring stated that all of his eleven p ro­

cesses together did not constitute the whole of the scien­
tific method, but h e did believe that they all fell within
the concept.

His eleven processes,

expressed in terms of

the abilities involved, were (1) value,
(3)

definition,

(7) recording,

(4) clarity,

(2) feasibility,

(5) statistics,

(8) comparison,

(6) relevancy,

(9) classification,

(10)

arrangement, and (11) sufficiency.
The test was devised for elementary and high school
classes in geography.

It contained thirty-three items of

the multiple choice type.
followed by twelve choices.

A direction was given which was
A thirteenth choice was avail­

able to indicate that none of the twelve choices were satis­
factory.
The test was validated by being submitted to six
judges.

The

judges indicated the answers they considered

to be the correct ones and judged the fitness of the items
as measures of the abilities which they were supposed to

^ John P. Herring, ‘'Measurement of some abilities
in scientific thinking.” Journal of Educational P s y c h o l o g y .
9:535-558, December, 1918.

25
measure.
given.

Estimates of the reliability of the test were not
An interesting point about the test was that the

processes described were expressed in terms of the abilities
to be measured.
In 1924, Curtis20 devised a test to measure the
values derived from extensive reading in general science.
It was designated as an attitude test and purported to meas­
ure, (1) a conviction of the universality of cause and effect
relations, (2) the habit of delayed response,

(3) the habit

of weighing evidence with respect to pertinence, soundness,
and adequacy, and (4) respect for another's point of view.
The test was comprised of 34 items; some short answer items,
and some multiple choice items.

No reliabilities were given

for the test.
Watson,

21

in 1925, published a test of fair-mindedness

which purported to measure prejudice.

In reality, this test

probably measured much more than prejudice. The test was made
up of six different types of sub-tests, some of which seemed
to be measures of prejudice while others appeared to be meas­
ures of ability to think critically,

A description of his

20

Francis D. Curtis, Some Values Derived from an
Extensive Reading of General Science. Contributions to
Education, No. 163. New York: Bureau of Publications,
Teachers College, Columbia University.
1924.
pp. 57-67.
21 G-oodwin B. Watson, The Measurement of Falrmindedness.
Contributions to Education, No. 176. New York:
Bureau of Publications, Teachers College, Columbia Univer­
sity.
1925.
pp. 9-35.

26

six sub-tests follows:
1.

Form A was a list of 51 words.

Instructions were

given to cross out annoying or distasteful words.
2.

Form B presented 53 statements about religious or

economic matters upon which authorities differ.

Instructions

were given to mark each statement as true, probably true, u n ­
certain or doubtful, probably false, or false.

This type of

key has probably been used more frequently in tests devised
to measure abilities involved in critical thinking than any
other single type of answer key.

Watson*s

Op

test seems to be

the first one in which it was used.
3.

Form 0, entitled the Inference Test, presented

statements of fact followed by conclusions which might be
drawn from the facts.

Instructions were given to check only

Inferences which were certain and not to check those which
were merely probable.

One of the alternative answers was

that no such conclusion could fairly be drawn.

In each case

one of the conclusions was a restatement of the data.

In

each case the only answers considered correct were the re­
statement of the data or the response that no conclusion
could be drawn.
4.

Form D was a moral judgments test.

Fifteen in­

stances of behavior were presented to be judged.
5.
22

Form E was an arguments test based on the

Watson, loc. c i t .

27
assumption that a person will tend to feel that all a r g u ­
ments on the other side are weak.

Twelve issues were pre­

sented followed by arguments.
6.

Form F, the Generalization Test, contained u n ­

warranted generalizations

about groups as a whole.

Sub­

jects were asked to Indicate whether the statement was true
for all, most, many, few, or no individuals of

the group.

This test was scored on a negative basis, that

is, a high

score indicated that a person was not fairminded, a low
score indicated that he was fairminded.

The estimate of

the reliability, determined by the split-half method was .96.
The test was validated by:
1.

Examination of the tests with reference to what
they seemed to be measuring.

2.

A study of the scores obtained by persons who
were considered by their groups to be fairminded.
This group actually had a lower average score
than an unselected group (indicating fairmindedness).

3.

A study of individuals who were supposed to
prejudiced by persons who knew them well.

4.

A study of groups who would be suspected of
certain lines of prejudice.

5.

Acorrelation of test scores with other test
scores.
Results showed almost zero correlation
both with reading test scores and with intelli­
gence test s c o r e s . 23

be

In the same year in which Watson described his test,

23

Watson,

loc. clt.

28
oA

D a i l y ^ d e s c r i b e d a t est to m e a s u r e

the a b i l i t y of h i g h

school pupils to sele c t e s s e n t i a l data
The test was n o t a n o b j e c t i v e
here b e c a u s e

it seems

in s o l v i n g problems.

test, b u t h a s b e e n i n c l u d e d

to b e one of the f i rst tests d e v i s e d

to measure a student*s a b i l i t y

to r e c o g n i z e

insufficiency

of data a nd a b i l i t y to s e l e c t p e r t i n e n t data.

Eighteen

short paragraphs c o n t a i n i n g d ata w e r e p r e s ented.
cases the data were
superfluous data.
the answers were,

insuf f i c i e n t ;

In some

in o t h e r cases there were

The s t u d e n t was a s k e d to a n s w e r questions;
in r eality,

c o n c l u s i o n s b a s e d on the data.

D a ily2 -* r e p o r t e d the r e l i a b i l i t y of the test to be

.73. The

r e l i a bility was e s t i m a t e d b y p r e s e n t a t i o n of the same

test

seven weeks a f t e r the f i r s t a d m i n i s t r a t i o n of the test.
The S t a n f o r d S c i e n t i f i c A p t i t u d e t est was d e v i s e d in
1927 by Zyve 0 to s a t i s f y a n e e d for m ore a c c u r a t e g u i d a n c e
of incoming college students.

It h a s b e e n c a l l e d a n a p t i ­

tude test b e c a u s e Zyve c l a i m e d that it t e s t e d i n h e r e n t a b i l ­
ity of the indivi d u a l a n d n o t h i s a c h i e v e m e n t .
cluded elev e n elements of s c i e n t i f i c a p t itude;

The test in­
namely,

(l)

24

B e n j a m i n W. Daily, The A b i l i t y of H i g h S c h o o l
Pupils to S e l e c t E s s e n t i a l D ata in S o l v i n g P r o b l e m s . C o n t r i ­
bu t ions to Education, No. 190.
N e w York: B u r e a u of P u b l i c a ­
tions, Teac h e r s College, C o l u m b i a U niversity.
1925.
PP. 59-60, 90-96.
25

Loc.

clt.

2^ D. L. Zyve, **A test of s c i e n t i f i c aptitude.*'
Jo u r n a l of E d u c a t i o n a l P s y c h o l o g y . 18:525-546,
N o v ember,
1927.

experimental bent, (2) clarity of definition,
versus snap Judgment,

(4) ability to reason,

(3) suspended
(5) ability to

detect Inconsistencies, (6) ability to detect fallacies,
(7) induction, deduction and generalization,
and thoroughness,

(8) caution

(9) discrimination of values in selecting

and arranging experimental data,

(10) accuracy of interpre­

tation, and (11) accuracy of observation.
The estimated reliability of the test was .93.

The

test was validated by having two Judges rank students a c ­
cording to their aptitude for science.

These rankings were

compared with the rank of the students in their test perfor­
mance.

The coefficient of correlation between the scores on

the Stanford Scientific Aptitude Test and the ratings of the
Judges was .74.

The means of the test for science and engin­

eering students and for a science faculty group were consider
ably higher than the means of a group of entering freshmen
and non-science faculty.
27
Z y v e fs
test appears to be one of the first tests to
make a successful attempt to measure scientific ability.
Whether the test measures innate aptitudes which it purports
to measure, or whether it measures an ability which can be
learned does not seem to have been investigated despite the
fact that the test has been rather widely used.

30
Hoff2** devised a scientific attitude test in 1930
which included the habit of weighing evidence as one of
the attitudes measured.

The test was validated by fifteen

expert Judges and by correlation with intelligence test
scores and reading scores.
tive but low.

These correlations were posi­

The reliability given was .76, calculated

by the split-half method.
A test of scientific thinking was published by Down­
ing2^ in 1936, but had been used as early as 1931 by
Strauss.

30

The test was designed to measure skill in the

use of fifteen elements and safeguards involved in scientific
thinking.
1.
2.
3.
4.
5.
6.
7*
8.
9.
10.
11.
12.

The items were designed to test:
Accuracy of observation.
Ability to pick out pertinent elements from a
complex situation.
Ability to synthesize.
Selective recall.
Fertility of hypotheses.
Ability to define a problem before trying to
solve it.
Ability to hold in mind a complex of relations.
Problem-solving ability.
Judgment on adequacy of data.
Tendency to try to solve a problem scientifically
rather than by trial and error.
Tendency to suspend Judgment on moot questions.
Ability to apply a rule or law.

Alfred G-. Hoff, "A Test for Scientific Attitude.*1
Unpublished Master's thesis, Department of Education,
University of Iowa, 1930.
pp. 1-42.
^ Elliot R. Downing, ’’Some results of a test on
scientific thinking.” Science Education. 20:121-128, October,
1936.
Sam Strauss, "Some results of the test of scien­
tific thinking." Science Education. 16:89-93, December, 1931.

13.
14.
15.

Tendency to test an hypothesis by collecting
facts.
Awarements of the danger of reasoning by
analogy•
Ability to arrange data in sequence to make
the conclusion evident.51

As determined by the split-half method, the relia­
bility of the test was .99 for a group of eighth through
twelfth grade students.

In general, each of the abilities

tested was measured by a single question.

G-laser

32

has

criticized this test from the point of view of sound test
construction and raises serious questions concerning its
reliability and validity.
In 1933, W e l l e r ^ constructed a test of 21 items which
was designed to measure the effectiveness of teaching of
scientific thinking in the elementary schools.
of items were used.

Seven sets

The first item of each set attempted to

measure observation, the second item asked the student to
draw a conclusion from simple data, and the third item asked
for a proof or possible verification of the conclusion drawn.
She found the reliability of this portion of her test to be
.54.
Noll,-^ in 1933, described a test of scientific

Downing, op. c i t .. pp. 121-128.
32

G-laser, op. c l t .. p. 76.

■^5 Florence Weller, ’’Attitudes and skills in element­
ary science.” Science Education. 17s 90-97, April, 1933.
^ Victor H. Noll, The Habit of Scientific Thinking:.
A Handbook for Teachers. New Yorks Bureau of Publications,
Teachers College, Columbia University.
1935.
pp. 18-25.

32
thinking entitled, "What do You Think?"

The test was con­

structed to satisfy a need in the schools for a test to
evaluate the teaching of scientific thinking.
Six habits of thinking were selected as a basis for
constructing the preliminary forms of the test. Each ques­
tion was intended to express a situation which was familiar
to most persons, and which afforded an opportunity for
scientific thinking.

The preliminary form of the test in­

cluded 134- items, most of which were of the true-false type.
Approximately 25 items were designed to measure each of the
six habits of thinking, namely; accuracy of observation, in­
tellectual honesty, openmindedness, suspended judgment, a
conviction of universal operation of the law of cause and
effect, and criticism.
The reliabilities of the two final forms of the test
were determined in two ways.

The method of split-halves

corrected by the Spearman-Brown formula gave a reliability
of .82 for Form I and a reliability of .92 for Form II.

A

correlation between the two forms of the test gave a relia­
bility of .69.

N o l l ^ believed that the true reliability

coefficient was probably somewhere between the highest and
the lowest figures obtained.
The test was validated by correlation with I.Q.'s and
by the determination of item validity.

^

Noll, loc. clt.

The correlation of

33
the test with I.Q.'s ranged from .30 to .41, indicating that
native ability was not being tested to a large extent. Norms
for grades eight through twelve were presented.
In 1936 Frutchey, Tyler and H e n d r i c k s ^ reported a
test to measure the ability to interpret experimental data.
This report is of Interest, not because it presents the con­
struction of a complete test, but because it reports an in­
vestigation of the validity of a particular type of item. In
Test I an experiment was described and the student was asked
to write a conclusion.

Although this is, in some ways, a

very satisfactory method of evaluating a student's ability
to draw conclusions,
Test II was prepared.

it is difficult to grade; therefore,
The same experiments were used and

five conclusions were selected from the free responses of
students in Test I.
the best conclusion.

The students were instructed to select
This method did not give a valid meas­

ure of a student's ability to formulate conclusions since the
correlation of the scores on Test I and Test II was only .38.
This same test was rendered more valid when the student was
asked to check the best conclusion and the one contradicted
by the data.

This was designated as Test III.

tion with Test I was .85.
36

It's correla­

In the final form of the test which

Fred P. Frutchey, Ralph W. Tyler and B. Clifford
Hendricks, "Measuring the ability to interpret experimental
data."
Journal of Chemical Education. 13s 62-64, February,
1936.

34
p r o v e d to b e the m o s t v a l i d ,

the s a m e t e s t

b u t s t u d e n t s w e r e i n s t r u c t e d to m a r k e a c h

items w e r e u s e d
item acc o r d i n g

to the f o l l o w i n g k e y : .
Mark with a 1 every s t a tement which
i n t e r p r e t a t i o n o f the data.

is a r e a s o n a b l e

M a r k with a 2 every statement which might possibly
be t r u e b u t for w h i c h i n s u f f i c i e n t f a c t s a r e g i v e n to
Justify the i n t e r p r e t a t i o n .
M a r k w i t h a 5 e v e r y s t a t e m e n t w h i c h c a n n o t b e true
b e c a u s e it is c o n t r a d i c t e d b y the r e s u l t s o b t a i n e d in
the exp e r i m e n t .
Love,^

in 1937,

d e v i s e d a t e s t of s c i e n t i f i c a t t i ­

tudes a n d s c i e n t i f i c thinking.
a n d c o n t a i n e d 24 items.

The

test was

in t h r ee parts

P a r t I d e a l t w i t h the c r i t i c i z i n g

a n d p l a n n i n g of e x p e r i m e n t s ;

Parts

II a n d III t e s t e d the

a b i l i t y to r e c o g n i z e a s s u m p t i o n s u p o n w h i c h

conclusions were

based.
Raths,^®

in 1938,

uate t h i n k i n g ability.
with

d e s c r i b e d a t est d e s i g n e d to e v a l ­
T h e f i r s t p o r t i o n of the t e s t d e a l t

I n t e r p r e t a t i o n of data.

The

s t u d e n t was r e q u i r e d to

d e t e r m i n e the p r o b a b l e t r u t h or f a l s i t y of a ser i e s of s t a t e ­
m e n t s c o n c e r n i n g the data.

T h e s e c o n d p o r t i o n of t h e test

c o n t a i n e d a d e s c r i p t i o n of a s i t u a t i o n f o l l o w e d b y t h r e e c o n ­
clusions.

T h e s t u d e n t s w e r e i n s t r u c t e d to c h o o s e the b e s t

37

K e n n e t h Gr. Love, ’’S c i e n t i f i c A t t i t u d e - T h i n k i n g ."
E v e r y P u p i l T e s t . C o l umbus, Ohio:
T h e S t a t e D e p a r t m e n t of
E d u c ation.
Apr i l .
1937.
L o u i s E. R a t h s , " E v a l u a t i n g the p r o g r a m o f a school.
E d u c a t i o n a l R e s e a r c h B u l l e t i n . 1 7 : 3 7 - 8 4 , M a r c h , 1938.

35
conclusion.

The conclusions were followed by a Beries of

reasons which could be used to explain why the conclusion
was chosen.

The students were instructed to indicate the

reasons they had chosen a particular conclusion. The third
portion of the test presented a situation and a conclusion
based upon the situation.

These were followed by a series

of statements some of which were assumptions.

The student

was instructed to check the assumptions and to indicate
those upon which the conclusion was based.

He was then

required to organize a proof for the conclusion using the
assumptions and data.

No reliabilities for the test were

given.
The tests devised by the evaluation staff of the
Eight-Year Study were published in 1938, and were described
in detail by Smith and T y l e r ^ in 1942. The Eight-Year
4q
Study
was planned to implement broad objectives of educa­
tion in the secondary schools without regard to college
entrance requirements.

The experiment was confined to thirty

selected secondary schools throught the United States.

Stud­

ents from these schools were admitted to colleges on the
basis of recommendation by the principal of the school and
39

Eugene R. Smith, Ralph W. Tyler, and the Evaluation
Staff, Appraising and Recording Student Progress. New York:
Harper & Brothers.
1942.
pp. 5-15^•
^ Wilford M. Aikin, The Story of the Eight-Year
Study. New York: Harper and Brothers.
1942. pp. 12-24.

36
not on the basis of college entrance requirements or exam­
inations.

Extensive studies of objectives and means of

evaluation of objectives were, among other things, a part
of this project.

The behaviors which were to be measured

by the tests were defined by committees composed of the
members of the evaluation staff of the Eight-Year S t u d y ^
and representatives from each school interested in the ob­
jectives being measured.

Two of the objectives related to

the present study were, the ability to interpret data, and
the ability to understand the nature of proof.
The earlier forms of the interpretation of data tests
were intended primarily for use in the senior high school.
Ten sets of data, presented in various forms including prose,
graphs, tables, and charts were each followed by 15 state­
ments.

The students were instructed to evaluate each of

these on the basis of the following keys
(1) are sufficient to make the statement true.
(2) are sufficient to indicate that the statement
is probably true.
(3) are not sufficient to indicate whether there
is any degree of truth or falsity in the
statement.
(4) are sufficient to indicate that the statement
is probably false.
42
(5) are sufficient to make the statement false.
41
42

Smith and Tyler, o£. c i t . . pp. 3-156.
Ibid., p. 52.

37
In the early history of the development of tests to
measure the ability to Interpret data, tests were devised
for specific subject matter fields.

However, the evalua­

tion staff believed that the behaviors involved in these
tests were not essentially different so a single measuring
instrument was constructed.

In all, nine forms have been

used; the last two, Interpretation of Data Test Form 2.51
and 2.52, have been prepared as alternate forms.

Forms 2.71

and 2.72 were prepared for use in the Junior high schools.
The answers to the test items were validated by the
Judgment of a group of experts in the field and by prelimin­
ary tryouts on groups of students.
The method of scoring these tests is of considerable
interest.

The tests were scored four separate times to give

the following scores:
1. General accuracy score was the total number of
answers which agreed with the answers of the Jury of
experts. This score was expressed as the percent of
the maximum possible number of correct responses.
2. The ’’going beyond data” score was calculated by
determining the number of times a student considered a
statement to be true which the Jury had considered only
probably true, or probably true when the Jury had con­
sidered it as insufficient data, etc.
3. The ’’caution” score indicated the extent to
which a student marked statements keyed true as prob­
ably true; keyed probably true as insufficient data,etc.
4. The ’’crude error” score was obtained by deter­
mining the extent to which students marked items in
contradiction to the data.43

^

Ibid., pp. 54-55.

38
The tests on interpretation of data were validated;
(a) by comparing the behaviors demanded of students in the
test with the behaviors defined in the statement of object­
ives to be measured,

(b) by selecting data which were of

the type which students encounter in textbooks, and (c) by
studying the distribution and means of scoreB made by stud­
ents in various grades of school.
grade levels.

The means increased with

Another method used in the validation of the

tests was the comparison of test scores with essay responses
on the same data.
The reliabilities of the various types of scores on
Form 2.52 of the test computed by use of the Kuder-Richardson formula ranged from .81 to .95.
score was the most reliable.

The general accuracy

The split-halves method of est­

imating reliability was used for Form 2.51.
ranged from .86 to .92.

Reliabilities

Comparisons of the two forms yielded

reliability coefficients of from .65 to .85»^4
Another of the tests devised by the Evaluation Staff
of the Eight-Year Study4^ was the HNature of Proof.w

This

test was devised to measure the ability of students to locate
and appraise the basic assumptions upon which the proof of a
statement depended.
by a conclusion.
44

A paragraph containing data was followed

Following this were 14 statements, some of

Ibid.. pp. 65-76.

45 Ibid.. pp. 128-154.

which w e r e a s s u m p t i o n s u n d e r l y i n g t h e a r g u m e n t .
first p a r t of the test,

In the

t h e s t u d e n t w a s a s k e d to d e c i d e

which s t a t e m e n t s w e r e r e l e v a n t to the c o n c l u s i o n a n d to
m ar k t h e m as e i t h e r
clusion.

s u p p o r t i n g o r c o n t r a d i c t i n g the c o n ­

In the s e c o n d p a r t o f the

a s k e d to

test,

the s t u d e n t was

i n d i c a t e w h i c h o f t h e s t a t e m e n t s m a r k e d as s u p ­

porting the c o n c l u s i o n h e w o u l d c h a l l e n g e .
part of the test,

In the t h i r d

the s t u d e n t w a s i n s t r u c t e d to c h o o s e o ne

of three s t a t e d c o n c l u s i o n s .

In t h e f o u r t h part,

ent was a s k e d to s e l e c t a c t i v i t i e s

w h i c h m i g h t be u s e f u l in

the s o l u t i o n of a p r o b l e m r e l a t e d to the p r e v i o u s
sions.

In p a r t f i v e

the s t u d ­

conclu­

the s t u d e n t was d i r e c t e d to i n d i c a t e

w h i c h of these a c t i v i t i e s

could be

situation.

of the v a r i o u s p a r t s c o r e s on the

Reliabilities

test r a n g e d f r o m
T wo

.20 to

c a r r i e d o u t in a s c h o o l

.82.

interesting types

of i t e m s d e v i s e d to m e a s u r e

critical t h i n k i n g in a s c i e n c e c o u r s e h a v e b e e n d e s c r i b e d
b y Hart.

46

A

s t a t e m e n t o f a s i t u a t i o n was p r e s e n t e d .

was f o l l o w e d b y a n u m b e r e d s e r i e s of o b s e r v a t i o n s .
hypotheses were then p r e s e n t e d and

the s t u d e n t w a s

This

Five
instructed

to list b y n u m b e r s a l l of the o b s e r v a t i o n s w h i c h s u p p o r t e d
each of the h y p o t h e s e s .

H e was a l s o

i n s t r u c t e d to lis t by

n u m b e r a l l o f the f a c t s w h i c h w e a k e n e d e a c h h y p o t h e s e s .
46

h

Then

E. H. H a r t ,
M e a s u r i n g c r i t i c a l t h i n k i n g in a
science course."
C a l i f o r n i a J o u r n a l of S e c o n d a r y E d u c a t i o n .
1 4:334-338, O c t o b e r , 1939.

40
the most valid hypothesis was to be checked.

The second

type of Item described was similar to the above but an
hypothesis was chosen by the student before the data were
presented.

The data were used to support, weaken, or elim­

inate the hypothesis.

No data on validity or reliability

of tests composed of such items were given.
47
In 1940, Gans
described a test used in a study of
critical reading comprehension.

The test was devised to

measure ability to recognize problems and to solve problems
by critical selection and rejection of data.
containing problems were presented.
had as its foils three problems.

Paragraphs

An item followed which

The student was asked to

determine which problem had been presented in the paragraph.
The problem item was followed by a series of paragraphs con­
taining facts which were directly related, indirectly related,
or unrelated to the problem.

The student was instructed to

mark each paragraph according as to whether it did or did not
aid in the solution of the problem.

These paragraphs were

followed by a three-choice item asking for the major problem
under consideration.

This was followed by single statements

of facts taken from the paragraphs previously presented.
Again, the student was requested to indicate whether the fact

Roma G-ans, A Study of Critical Reading Comprehen­
sion in the Intermediate Grades. Contributions to Education,
No. 811. New York: Bureau of Publications, Teachers College,
Columbia University.
1940. pp. 59-89.
^

41
helped or did not help In the solution of a problem.

In

addition, he was asked to Judge the truth or falsity of
each of these statements.
The test was scored as five subtests.
ities of the subtests ranged from .6? to .90.

The reliabil­
The total

test reliability was not given.
48
Engelhart and Lewis,
in 1941, described a 23 item
portion of a pretest for a physical science survey course
at Chicago City Junior College.

These 23 items were designed

to measure scientific thinking.

In an introductory paragraph

the terms hypothesis and conclusion were defined.

An experi­

mental situation was described, a problem was stated, and the
following key was presented:
"Below are given a series of hypotheses, each of which
is followed by numbered items which represent data. After
each item number on the answer sheet blacken space A.

if the item directly helps to prove the hypothesis
true.

B.

if the item indirectly helps to prove the hypothesis
true.

C.

if the item directly helps to prove the hypothesis
false.

D.

if the item indirectly helps to prove the hypothesis
false.

E.

if the item neither.directly nor Indirectly helps to
prove the hypothesis true or false."49
48

Max D. Engelhart and Hugh B. Lewis, "An attempt to
measure scientific thinking.1' Educational and Psychological
Measurement. 1:289-294, Third Quarter, 1941.
4q

Loc. cit.

42
Three hypotheses were presented, each hypothesis was
followed by five statements of fact.

These statements con­

stituted the items, which were marked by the above key.
These items constituted 15 items of the test.

The student

was directed to Judge each hypothesis as to its truth or
falsity.
test.

These judgments constituted three items of the

Following these 18 items five conclusions were given.

Each conclusion was to be judged either the best, the worst,
or neither best nor worst.
The items of this test proved to be quite discrimin­
ating, the range of correlations of items with the total
score on the 23 items being from .17 to .61. The reliability
of the test was estimated to be .72 by means of the KuderRlchardson formula.
The Watson-G-laser Tests of Critical Thinking were
50
described by Glaser
in 1941. These tests were designed to
appraise some of the abilities involved in critical thinking.
They were, in effect, an extensive revision of Watson's tests
of fair-mindedness.

All of the tests were validated by 15

judges.
Test A, A Survey of Opinions, was devised primarily
to show the extent of a person's consistency of opinion.
The test-retest reliability was .88; the correlation between
scores on Section I of the test and Section II of the test

50

Edward M. G-laser, op. c i t .. pp. 87-92.

43
was .85.
Test B, the General Logical Reasoning Test, was de­
signed to measure the ability to think in accord with the
rules of logic.

The test-retest coefficient of reliability

was given as .82.
Test C, the Inference Test, was designed to measure
ability to Judge the probable truth or falsity and the rele­
vance of inferences drawn from given facts.

The persons

taking the test were instructed to determine whether the con­
clusions drawn were true, probably true, false, probably
false, or questionable.

The test was validated by the fact

that the test significantly distinguished between two groups.
of students Judged by their teachers to be either superior
or inferior in ability to think logically.

Test-retest reli­

ability was found to be .86.
Te3t D, the Generalization Test, was substantially the
same as the one of the same name devised by Watson and dis­
cussed earlier in this review.

The reliability of this test

was reported as .88.
Test E, the Discrimination of Arguments, was also sub­
stantially the same as the arguments test of Watson's earlier
edition.

The reliability given for this test was .76.

Test F, the Evaluation of Arguments Test, was a new
test in the series.

Each test item consisted of a paragraph

followed by three alternative conclusions, only one of which
was logical on the basis of the data presented in the

44
paragraph.

Following the conclusions six reasons were

listed, one of which explained why the correct conclusion
was the logical one.

The testee was instructed to check

the reason explaining his conclusion.

The test-retest co­

efficient of reliability for this test was .83«^1
52
Fleming,
in 1942, described a test used in his
analysis of outcomes of a course in biological science.

A

portion of the test was devoted to the measurement of the
ability to think scientifically.
had been chosen from
course.
parts;

The items for the test

examinations given previously in the

This portion of the test

was divided into four

Part A was designed to measure the recognition of

steps in problem solving, Part B was an evaluation of state­
ments with reference to a problem, Part G was designed to
measure the ability to evaluate inferences, and Part D was
the selection of data pertinent to the solution of a problem
situation.

The tests were described but no test items were

included in Fleming's dissertation.

The reliability of this

portion of the test was not given.
A test designed to measure a student's ability to
Judge conclusions was constructed by H i g g i n s ^ in 1942 to
51

G-laser, loc. cit.
52 Maurice G. Fleming, "An Analytical Study of Certain
Outcomes of a Course for Orientation in Biological Science at
College Level.
Unpublished Doctor's thesis, Department of
Education, New York University, 1942. Appendix.
Conwell D. Higgins, "Educability of Adolescents in
Inductive Ability." Unpublished Doctor's thesis. Department
of Education, New York University, 1942. pp. 36-40, 133-137.
5 3

45
evaluate educability in inductive ability.

Twelve experi­

ments were described; each experiment was followed by a
series of conclusions which constituted the items.

There

were a total of 97 items which had been selected from free
responses of students. The testees were instructed to deter­
mine whether the conclusions were complete, incomplete,
based on insufficient data, or false.

The test was validated

by agreement of four judges as to the correct answers to the
items.

The estimate of reliability, as determined by the

split-half method, was .90.
Ter Keurst and Bugbee,-^ in 1943, published a test by
which the authors claim ”teachers or students can check them­
selves on the understanding of the methodology of science.”
The test consists of a series of four-choice items which pur­
port to measure knowledge of skills, attitude, and terminology
of scientific method.
not behavior.

The test seemed to test knowledge but

However, it is of interest to note, that it

apparently had a certain degree of validity.

The test was

administered to a group of students who had been named as the
five best and the five worst students in science classes in
respect to their ability to think scientifically.

The crit­

ical ratio of the difference of the means of these two groups
was 5.01.

The test was also validated by opinion of experts.

^ Arthur J. Ter Keurst and Robert E. Bugbee, "A test
on scientific method.” Journal of Educational Research.
36:489-501, March, 1943.

46
The estimate of the reliability by means of the split-half
method was .82.
A very interesting test in two forms entitled, ”Do
You Think Straight?” was described by J o h n s o n , i n

1943.

The test was designed to measure the relation of reflective
thinking to ability in debating and discussion.

Because her

test was an attempt to overcome some of the inadequacies of
earlier tests her criticisms of existing tests are presented
here:
These tests, though useful, appear to be inadequate
for the diagnosis and measurement of the process ( ital­
ics in the original) of reflective thinking. Each test
is deficient on two or more of the following counts:
1.

It breaks the process of reflective thinking into
what may be superficially (italics in the original)
distinct and uncoordinated units.

2.

Even in measuring such units, the following factors
or steps are not considered:.
a. The formulation of a problem.
b. The analysis into major variables.
c. The determination of criteria and application
of them to the evaluation of possible solutions.
d. The construction and comparison of hypotheses.

3.

It deals with a great variety of problems - each
item relating to a different problem, in most tests whereas the need in actual life situations (and the
need in discussion and other forms of public speak­
ing) is to think through (Italics in the original)
a particular problem.

4.

It emphasizes the logic of intentional (italics in
the original) reasoning - the discrimination among
formally valid and invalid conclusions and "reasons*1
for conclusions - rather than the logic of construct
ive (italics in the original) reasoning or scientific
55

Alma Johnson, ”An experimental study in the analy
sis and measurement of reflective thinking.” Speech Mono­
graphs. 10: 83-96, Annual, 1943.

discovery.
In fact, those tests which require the
subject to check a conclusion and then to check
reasons for his choice appear to be measuring little except expertness in "rationalizing." 5o
Johnson's tests were constructed on the assumptions
that; (1) Dewey's steps were a correct description of the
thought process, and (2) there were discoverable and observ­
able obstacles to reflective thinking.
C7
Forms A and B,
were each designed around a single
problem.

Section I of each test was an attempt to measure

attitudes about the problem.
problems were presented.

In Section II, ten subsidiary

The student was instructed to num­

ber these in order of their usefulness as starting points in
the solution of the overall problem.

Section III presented

four groups of questions, each composed of three subordinate
questions; the most important one to be checked.

In Section

IV, data were presented which might aid in the solution of
the four major questions posed in Section III.

These data

were followed by statements which the student was Instructed
to mark as being true, probably true, insufficient data,
probably false, or false.

In Section V, ten syllogisms or

pseudo-syllogisms were presented; followed by conclusions
which the students were instructed to mark as sound or un­
sound.

The students were instructed to rank the six solutions

to the overall problem.

This constituted Section VI of the

56 Ibid.. p. 85.
Ibid.. pp. 83-96

48
test.

Section VII r equired the m atching of advantages and

disadvantages, which were summaries of statements of infor­
mation given throughout the test, with the three best solu­
tions of Section VI.
Section VII,

In the final section of the test,

the student was instructed to classify each

of ten conclusions as critical, uncritical, hypercritical,
or dogmatic.
Johnson stated that there was inherent v a lidity in
the test since it was patterned after Dewey's steps of
thinking and since the syllogism test followed the rules of
logic.

In addition, however,

the test was validated by 15

experts in the fields of logic and scientific method.
also found that scores of students

She

Judged superior in the

abilities involved in reflective thinking were hig h e r than
those Judged as average, and those
higher than those

Judged average scored

Judged as Inferior.

She also cited an

increase in scores in college grade levels as evidence for
the validity of the test.

These increases in scores with

college grade levels were at the 5 percent level of signifi­
cance or better.
The estimate of reliability,

determined by c o rrelat­

ing the scores made on the two forms of the test, was 82 *
.02.

The scores on the attitude portion of the test were

not included in the total test scores.
A portion of the test, which was used to appraise
methods of teaching scientific method, was designed by

49
Thelen

58 to measure an understanding of experimental design.

The purpose of an experiment was given; this was followed
by conditions of the experiment and statements about the
experimental material.

The student was instructed to indi­

cate which factor or factors were to be varied, which were
to be fixed, which might be assumed to be negligible, which
were irrelevant, and which factors the student did not
understand.

In all, there were 60 such items.

No reliabil­

ity was given nor was any evidence concerning the validity
of the test presented.
In 1944, R a t h s ^ devised the "Ohio Thinking Checkup,*'
a thinking test for students in the third, fourth, and fifth
grades.

Twelve problem situations were presented.

Each

problem was followed by eight statements which the students
were instructed to mark as true, false, or questionable.
Items were devised to reveal nine types of errors in think­
ing; namely,
1.
2.
3.
4.
3.
6.
7.

Interpretation through personal Judgment.
Evading of issue by name-calling or ridicule.
Leaning on authority.
Believing in superstition.
Generalizing from insufficient evidence.
Rationalizing or misinterpreting data.
Calling either-or statements true.

CQ
Herbert A. Thelen, "An Appraisal of Two Methods
for Teaching Scientific Method in General Chemistry."
Unpublished Doctor's thesis, Department of Education,
University of Chicago, 1944.
pp. 365-369.
^ Louis E. Raths, "A thinking test."
Research B u l l e t i n . 23*72-75, March, 1944.

Educational

50
8.
9.

Galling if-then atatements true.
Leaning on school loyalty.

The reported reliabilities of the tests as deter­
mined by the method of matched halves were .89 for the
fourth grade,

.91 for the fifth grade, and .93 for the

sixth grade.
Grant and Meder,

in 1944, suggested a type of item

to evaluate reasoning ability.

A statement was presented

followed by six reasons for agreeing with the statement and
six reasons for disagreeing.

The student was instructed to

check valid reasons from either or both lists and then to
decide whether he agreed or disagreed with the statement.
In 1944, reports of the high-school and the college
chemistry tests for the armed forces were published.

In

each of these tests one section was devoted to items de­
signed to measure abilities involved in scientific thinking.
Ashford,^1 in reporting on the college test, listed six of
these abilities which were to be measured.

Items were

devised to test the ability to (1) distinguish between
observed phenomena and their theoretical explanation,
explain phenomena in terms of theory,
mental evidence for a theory,

(2)

(3) give the experi­

(4) identify the assumptions

Charlotte L. Grant and Elsa M. Meder, ‘'Some
evaluation instruments for biology students."
Science
Education. 28:106-110, March, 1944.
Theodore A. Ashford, "The college chemistry test
in the Armed Forces Institute."
Journal of Chemical Educa­
tion. 21:386-392, August, 1944.

51
necessary for a given conclusion, (5) identify the factor
that must he controlled in an experiment, and (6) identify
statements which are true merely by definition.

The test

was prepared in two forms; one for the armed forces, one
for civilian use.
Hered and T h e l e n ^ devised a similar test for use
at the high-school level.

Single items were devised to

measure each of the abilities which they had considered to
be important in scientific thinking.

The reliability co­

efficients of the tests were not given; however, Hered and
Thelen reported that the reliability of the high-school
test was satisfactory.
The ability of ninth grade students to make conclu­
sions was investigated by Teichman.^ For this investigation
he designed three tests.

Test A, which was not objective,

presented 16 paragraphs from which the students were to draw
conclusions.

In Test B, 29 experiments were described; each

was followed by four conclusions. The students were instruct­
ed to choose the best one.
sented, followed by data.
was stated.

In Test 0, 15 problems were pre­
A conclusion, which was faulty,

These 15 faulty conclusions constituted the

^ William Hered and Herbert A. Thelen, ”The highschool chemistry test of the Armed Forces Institute.”
Journal of Chemical Education. 21:507-515, October, 1944.
TE
Louis Teichman, ”The ability of science students
to make conclusions.” Science Education. 28:268-279,
December, 1944.

52
Items of Test C.

Students were instructed to evaluate the

faulty conclusions according to the following key:
(a) It does not answer the problem or question,
(b) It does not agree with the facts of the
experiment.
(c) There are not enough facts to make the conclu­
sion valid (correct).
(d) The facts have not been obtained by proper
control (comparison) in the experiment.
The test was validated by unanimous agreement of
three prominent educators in the field of science, by item
analysis, and by intercorrelations of the three tests. The
reliabilities were estimated by the split-half method. The
reliability of Test A was .88, of Test B was .88, and of
Test C was .68. The total test reliability was given as .91.
A l p e r n , ^ in 1946, devised a test for high-school
students to measure the ability to suggest procedures to
test hypotheses.

From the responses of this non-objective

test he constructed an objective test to measure the ability
to select methods of testing hypotheses.

Each of the test

items consisted of (1) a situation, (2) a statement of the
problem, (3) an hypothesis offered as an explanation, and
(4) four suggested procedures.

These last constituted the

foils of each item; the student was Instructed to choose
the best experiment to test the hypothesis given.

^

The

Morris L. A l p e m , wThe ability to test hypotheses.”
Science Education. 30:220-229, October, 1946.

53
preliminary forms of this test were revised on the basis of
criticism of experts and on the basis of item analysis.
Twenty items constituted the test which had an estimated
reliability coefficient of .75.

The test was validated by

the Judgment of 41 educators in science, by item analysis,
by a consideration of the range of difficulty of the items,
and by the fact that average scores increased through suc­
cessive grades, from ninth through twelfth.
A test to measure certain aspects of scientific think­
ing in the area of college physics was devised by Dunning .

^

The test was constructed to measure ability to interpret
data and ability to apply principles.

The method of evalua­

tion used by Dunning to test the ability to interpret data
66
was substantially that reported by Smith and Tyler.
Dunning's unique contribution to the measurement of this
objective was his use of four methods of scoring the papers
in order to determine the effects of variously weighted
scorings on the reliability.

He found the method of giving

a single point for the keyed answer gave the highest estimate
of reliability by the split-half method.
given as .83.

The reliability was

In addition, he found that this method also

^ Gordon M. Dunning, "The Construction and Validation
of A Test to Measure Certain Aspects of Scientific Thinking
in the Area of First Year College Physics." Unpublished
Doctor's thesis, Department of Education, Syracuse University,
1948.
Smith and Tyler, og. clt.. pp. 15-28.

54
gave the highest validity coefficient when he correlated
scores on the test with teacher ratings of the students.
The validity coefficient obtained was .56.

A second method

of validation of the test was the correlation of scores
made on the objective test with scores on the same material
on an essay test.
This correlation was .66.
67
Ullsvik
constructed a test which was designed to
measure critical Judgment in geometry classes.
however, was on non-geometric subjects.
three parts:

The test,

The test was in

Part I was called "Judging of Conclusions"

and instructed the students to mark the conclusions given
as acceptable, not acceptable, or insufficient evidence,
Part II was an evaluation of definitions, Part III presented
a paragraph followed by 15 statements.

The student was

instructed to select the two statements which were the most
crucial in leading one to accept the conclusion, and the two
which were the most crucial in leading one to reject the con*
elusion.

The reliability of the test was not given.

In 1949, Read^®

published a description of a non­

verbal test of the ability to use the scientific method.

An

BJarne R. Ullsvik, "An attempt to measure critical
Judgment." School Science and Mathematics. 49:445-452,
June, 1949.

68

John G-. Read, "A non-verbal test of the ability to
use the scientific method as a pattern for thinking."
Science Education. 33:561-366, December, 1949.

55
6q

analysis of Keeslar’s
major elements of scientific method
70
led Read
to the inference that many of these steps involv­
ed discriminatory choices.

The inventing and planning of

experiments could only be measured by physical methods but
the other elements he claimed all Involve discriminatory
choices.

These were summarized as follows:

1.

Observation is only valuable when it is
discriminating.

2.

The defining of a problem means a choice
among possible problems.

3.

Classification of data is discrimination
between items.

4.

Setting up hypotheses is the choosing of
one or more possible explanations of the
data.

5.

Selecting the most likely hypothesis is
critical discrimination.

6.

Drawing conclusions is selecting and fitting
of data, again critical discrimination.

7.

Validation of the conclusion is again a
matter of discrimination and choice.

On the basis of his contention that scientific think­
ing is primarily the making of discriminatory choices, he
devised a picture-test to appraise the ability to make these
choices.

He described his test as follows:

The picture-test is a series of sub-tests, related
in that they are all aspects of the environment, and
that they all pose problems which can be solved through
69

Keeslar, £2* clt.. pp. 212-216.

7° Read, ojo. cit.. pp. 361-366.

56
the association of two sets of pictures.
There are
seven categories; each edlineated by four pictures,
each of which represents a particular sub-division
of the category.
(Three more categories of a bio­
logical nature have been added). The categories
have to do with electricity, with air pressure,
with one phase of chemistry, with mechanics; they
are samples of common environmental science.
The four pictures are mounted on a card, ......
the card is placed in a box. Under each of the four
pictures is a small bin. From six to eighteen sepa­
rate loose pictures may be picked up by the testee,
closely examined, sorted, compared, and finally
dropped into one of the bins.
The only directions
are to "place each picture in the bin where it fits
best.”
High scores are obtained by those who discover
what the four pictures on the card represent. As
each card is on a single topic, the task is to dis­
cover the more or less fine shades of dis-similarity
(italics in the original) among the four pictures.
The loose pictures serve as clues, and as they can
be moved around without penalty, once the pattern
exhibited by the four pictures on the card is dis­
covered, the way is open for careful comparison and
critical discrimination.71
Read originally used 133 pictures which he presented
to eleven science specialists for sorting.

Of these 133,

seventy were placed by all of the Judges in the same bins.
Item-analysis showed that 27 of these were non-discriminatory; the remaining 43 pictures made up the items of the
test.
twelve.

The test was designed for grades seven through
By means of the Kuder-Richardson formula, Read

found the reliability of the test to be .78.

The test was

validated by administering it to 18 members of the group
who won high honors in a state science contest.

The scores

made by these students was significantly higher than scores

71 Ibid.. pp. 362-363

57
made by students who had had no science.
Bingham

72

devised a series of tests for general

science, biology,

chemistry, and physics which were used

primarily as teaching devices.

The instructor performed

an experiment and then a twelve-item test was given.

Item

1 was concerned with the results of the experiment.

Item

2 described experiments;

the student was directed to select

the one actually performed.

Item 3 presented five h y po t h ­

eses to account for what happened;
structed to choose the best one.

the student was in­
In items 4-8 additional

facts were given and the student was directed to choose the
fact which showed the untenable hypotheses presented in
Item 3 to be unsound.

The choice, "none of these,11 could

be used for the hypothesis which was sound.

Item 9 tested

an understanding of the assumptions underlying the conclu­
sion drawn;

Item 10 was concerned with new problems arising

out of the experiment, while Item 11 presented assumptions
underlying the application of the conclusion to new situa­
tions.

Item 12 tested the ability to apply the conclusion

to new situations.

No data on the reliability or validity

of the test were presented.
E d w a r d s , ^ i n 1950, reported on two tests, Test A and

Eldred N. Bingham, "A direct approach to the
teaching of the scientific method."
Science E d u c a t i o n.
33:241-249, April, 1949.
73
ii
v Thomas B. Edwards,
Measurement of Some Aspects
of Critical Thinking."
Journal of Experimental E d u c a t ion.
18:263-279, March, 1950.

58
Test C, which he devised to measure certain aspects of
critical thinking.

Test A was devised to measure induction.

Four principles were stated; each principle was followed by
five facts.

The pupil was instructed to choose the fact

which supported the principle.

The estimate of reliability

of the test was ,88 as determined by the method of splithalves,

.80 as measured by a correlation of the two forms

of the test.

Edwards claimed that the validity was built

into the test by using an accepted theory of critical think­
ing and by using facts familiar to students.

Additional

evidence for validity was found in an increase in scores
from grades ten through grade fourteen (college sophomore)
and in a correlation of only .17 with intelligence.
Test 0 was called a Judgment Test.

Four opinions

were stated; these were labeled A, B, G, and D. One opinion
was sound, one fairly adequate, one irrelevant, and one
totally incorrect.

The opinions were then presented in

pairs, AB, AG, etc., giving six items for each set of four
opinions.

The student was instructed to choose the better

of each pair.

This test was prepared in two forms.

Reli­

ability coefficients ranged from .49 to .75 when determined
by the split-half method.
forms was .32.
for Test A.

The correlation between the two

The methods of validation were the same as

The correlation of Test G with intelligence

wa s .15.
Tests A and G were two tests of a battery of tests

59
devised by Edwards

74

who o r i g i n a l l y set out to m e a s u r e

seven aspects of c r i t i c a l thinking.
vised.

S e v e n tests w e r e d e ­

Test I a i m e d to test the a b i l i t y to

liability of sources

of information.

Judge the r e ­

A series of s t a t e ­

ments concerning m e a s u r e m e n t s w e r e presen t e d .
was instructed to u n d e r l i n e

the

letter R

The student

if he felt that

the accuracy m e n t i o n e d was p o s s i b l e b y m e a n s of the device
used, but to u n d e r l i n e

the l e t t e r N

not measure as a c c u r a t e l y as was

if the device c o u l d

i n d i c a t e d in the statement.

Edwards states that this test s h o w e d some promise, b u t that
it was not d e v e l o p e d b e y o n d the p r e l i m i n a r y stages b e c a u s e
the reliabil i t y was

low.

Test II was a test of re l e v a n c e .
sisted of two statements.
underline the letter R

Each question con­

The student was

In s t r u c t e d to

if the two s t a t e m e n t s were related,

to underline the l e t t e r N

if they were n o t related.

This

test was n ot r e v i s e d a f t e r the f i r s t tryout b e c a u s e of the
difficulty of o b t a i n i n g facts w h i c h the test c o n s t r u c t o r
was sure al l of the s t u d e n t s w o u l d know.
induction test d i s c u s s e d as T e s t A above.

T e s t III was the
Test IV was a

deduction test d e v i s e d to m e a s u r e the student's a b i l i t y to
Judge goo d a n d poor a r g u m e n t s .
called Test B.
74

T his test was r e v i s e d a n d

The r e l i a b i l i t i e s were not stable;

„

they

Thomas B. Edw a r d s ,
M e a s u r e m e n t of Some A s p e c t s
of C r i tical Thinking.'1 U n p u b l i s h e d D o ctor's thesis,
Department of Education, U n i v e r s i t y of California, 1949.
PP. 23-50.

60
ranged from .20 to .86.

Test V was the Judgment test dis­

cussed as Test C above.

Test VI presented ten paragraphs,

each of which was followed by three conclusions; one sound,
one irrelevant, and one contradicted by the data.

These

were labeled A, B, and 0 and were presented in pairs.

The

student was instructed to choose the better of the pair.
Test VII was similar to test VI, but the conclusions were
all based upon the data.

The student was instructed to

choose the better of a pair of the conclusions.
upon revision, became Test D.
were .82 and .84.

This test,

The estimated reliabilities

The correlation of this test with

intelligence was .22.
Summary concerning tests on abilities involved in
oroblem-solving.

Considerable progress has been made in

the testing of abilities involved in problem-solving in the
three decades since Herring ^ published his test of scien­
tific thinking.

His pioneer work was of considerable in­

terest because it was the first test of such a nature to
be published and because he defined the kinds of behaviors
which he associated with scientific thinking.

Watson's

Test of Fairraindedness, though designed to measure prejud­
ice, was a forerunner of most of the tests which have been
devised to measure the ability to interpret data.

^
78

Herring, op. c i t .. pp. 535-558.
Watson, op. cit., pp. 9-35.

In

61
addition, it was later modified by Watson and G-laser and
became the highly successful Test of Critical Thinking.
Watson's contribution was also significant in that he
validated the test by curricular and statistical methods.
Another significant test of the mid-twenties was
Zyve's^ Stanford Scientific Aptitude Test, which purported
to measure eleven scientific aptitudes.

This test appears

to have been the first test of this type and has been widely
used.

This test, also, was quite well validated.

Downing's*^®

test of scientific thinking was a distinct contribution
because it was designed to measure many of the skills and
safeguards of scientific thinking.
The primary contribution
70
of Weller'* was the recognition of the distinction between
the skills of scientific thinking and the scientific atti­
tudes.

One of the best of the attitudes tests was, "What' Do

You Think?", constructed by Noll,

who defined attitudes as

habits of thinking.

This test also has been widely used.
81
The tests devised for the Eight-Year Study
were

77
78
79

Zyve, op. cit., pp. 525-546.
Downing, op. pit., pp. 121-128.
Weller, pp. cit., pp. 90-97.

Oq
Noll, pp. cit.. pp. 18-25.
O 1

Smith and Tyler, pp. cit.. pp. 3-156.

62
noteworthy contributions to test construction because in
the development of these tests the behaviors attending the
major objectives were considered in detail, and because
the abilities involved in critical thinking were recog­
nized as major outcomes of secondary education.

The Inter­

pretation of Data tests devised for the Eight-Year Study
have been used very extensively.
In the last decade the trend toward increased emphasis
on the teaching of critical thinking has culminated in the
production of a number of tests devised to test phases of
this major objective.
The Watson-G-laser Test of Critical
82
S'?
Thinking,
previously referred to, was reported.
Johnson ^
made a significant contribution in devising a test revolving
84
88
around a single major problem.
Telchman
and Alpern
devised interesting tests to appraise the abilities to draw
conclusions from data and the ability to devise experiments,
respectively.
An entirely new approach to the problem of measuring
86
the ability to think scientifically was presented by Read
in his Non-verbal Test of Scientific Thinking.
82
Q-Z
J

G-laser, pp. cit. . pp. 87-92.
Johnson, op. c it .. pp. 83-96.
Teichman, pp. cit.. pp. 268-279.

88
^

Alpern, op. £ i t . f pp. 220-229.
Read, pp. cit. . pp. 361-366.

This test

63
was designed on the assumption that critical discrimination
is the keynote of scientific thinking, and presents an
Interesting method of isolating this factor.
No attempt has been made in this summary to include
mention of all of the tests and testing techniques which
have been developed.

Only the highlights in the measure­

ment of problem-solving have been treated.

It is, however,

of interest to note, that tests have been devised for almost
all educational levels from fourth grade through college,
and that some tests have been devised without regard to
subject matter areas, whereas, others have been designed for
specific subjects.
RELATIONSHIP BETWEEN PROBLEM-SOLVINGAND OTHER ABILITIES
Relation of Intelligence to abilities involved in
problem-solving.

It is the opinion of a few investigators

that the abilities involved in problem-solving are identical
with intelligence.

The majority of investigators seem to

believe that there is a moderate to substantial relationship
between intelligence and the abilities Involved in problem­
solving.

A few, however, contend that the two abilities are

almost completely unrelated.
87
Billings ' has cited some evidence to support the

y f

tt

„ Marion L. Billings,
Problem-solving in different
fields of endeavor.11 American Journal of Psychology.
46:259-272, April, 1934.

64
viewpoint that problem-solving is a general intelligence
factor.

In an attempt to ascertain the nature of problem­

solving, he presented his subjects with problems in eight
different subject-matter areas.

The subject matter necess­

ary to the solution of the problems was taught prior to the
administration of the tests.

He obtained correlations

ranging from .53 to .78 between the tests of reasoning in
the various subject-matter areas.
was .67.

The average correlation

Correlations between the tests of reasoning in

the various fields and intelligence, as measured by the
Army Alpha test, ranged from .42 to .59.

Since he found a

higher average correlation between the scores on reasoning
in various fields than between reasoning in a particular
field and information in that field, he inferred that prob­
lem-solving was an important part of Spearman's general
factor of intelligence,

if not intelligence itself.
88
It is Interesting to note that Billings
attributed

problem-solving to intelligence with correlations of from
.42 to .59 between his test and an intelligence test, while
other investigators obtaining similar correlations have not
interpreted their data as indicating particularly high rela­
tionships between problem-solving ability and intelligence.

Billings, loc. c i t .

65
Zyve,

Qq
y

qq

Sinclair and Tolman,

and Downing

91

seem to

believe, however, that critical or scientific thinking is
an Innate characteristic.

On the other hand, many investi­

gators have shown that the ability to think scientifically
can be taught.

If this is true, problem-solving could not

be identical with Intelligence nor could it be an innate
ability.

A discussion of these alternate viewpoints follows
92
Zyve,
who considered his test to be a measure of

scientific aptitude, did not claim that the aptitude was
intelligence itself.

His data gave evidence that it was not

intelligence, since he found a correlation of .44 to .51
between his test and intelligence as measured by the Thorn­
dike intelligence test.
9-5

A study by Sinclair and Tolman

on the effect of

scientific training on logical thinking showed that students
in the science and engineering fields in college were
superior to students in other fields in their ability to
make inferences, as evidenced by the Inference test of the

89

Zyve, op. cit., pp. 525-546.

James H. Sinclair and Ruth S. Tolman, "An attempt
to study the effect of scientific training upon prejudice
and illogicality of thought."
Journal of Educational
Psychology. 24:362-370, May, 1933.
91
7 Downing, pp. c i t .. p. 128.
Zyve, op. pit., pp. 525-546.
93

Sinclair and Tolman, pp. c i t .. pp. 362-370.

66
Wats o n test of F a i r m i n d e d n e s s .

The a u t h o r s

that this m i g h t m e a n t h a t s t u d e n t s w h o

94 s u g g e s t e d

elect science and

e n g i n e e r i n g s h o w a t e n d e n c y to s u p e r i o r i t y in t his a b i l i t y .
This s u g g e s t i o n w o u l d l e a d one to b e l i e v e
and T o l m a n c o n s i d e r the a b i l i t y
ability.

to i n f e r to b e a n i n n a t e

They report a correlation of

on the T h o r n d i k e I n t e l l i g e n c e

that Sinclair

.49 b e t w e e n s c o r e s

test a n d s c o r e s o n W a t s o n 1s

I n f e rence

test.
QC
D o w n i n g 27-' r e p o r t e d a c o r r e l a t i o n o f

.66 b e t w e e n h i s

Test on S c i e n t i f i c

Thinking and Intelligence for students

in the s e n i o r h i g h

school,

these t r aits f o r s t u d e n t s

a n d a c o r r e l a t i o n of
in the

c o n c l u d e d that I n t e l l i g e n c e ,

J u n i o r h i g h school.

as e x p r e s s e d b y

e rent f r o m the e l e m e n t s or s a f e g u a r d s
ing.

convincing evidence

in

However,

the s a f e g u a r d s

h e does n o t p r e s e n t

s u p p o r t of this v i e w p o i n t . S t r a u s s

f o u n d a c o r r e l a t i o n of

.64 b e t w e e n

a n d scor e s on the O t i s

I n t e l l i g e n c e test.

u s e d in this s t u d y w e r e b e t w e e n

94

think­

of s c i e n t i f i c

t h i n k i n g w e r e due to i n h e r i t e d a b i l i t y w h i l e
instruction.

He

IQ, was d i f f ­

of s c i e n t i f i c

It was h i s o p i n i o n t h a t the e l e m e n t s

were the r e s u l t of

.47 b e t w e e n

sco r e s on D o w n i n g ' s

the a g e s

^

Sinclair and

Tolman,

loc. cit.

9 5

Do wning,

op.

cit.,

pp.

9 5

Strauss,

op.

pit.,

pp. 89-93-

121-128.

test

T h e 90 s t u d e n t s
of 10 a n d 18.

96

67
Ter Keurst and Bugbee

97

administered their test on

the scientific method to college freshmen and sophomores.
They found correlations of .51 and .66, respectively, be ­
tween the scores made by these groups on their test and
the scores on the American Council on Education Psycholog­
ical Examination.

Since their test measured knowledge of

the method of science rather than ability to use the scien­
tific method, these correlations cannot justifiably be com­
pared with the other correlations reported here.
98
3-laser
reported correlations ranging from .03 to
.52 between Intelligence, as measured by the Otis Mental
Ability test, and the six tests which make up the Watson-

Grlaser Test of Critical Thinking.

The correlation of scores

on the entire critical thinking test with scores on the Otis
Mental Ability test was .46 for the initial administration
of the test and .48 for the final administration of the test.
99
Howell
attempted to discover the effect of debating
on critical thinking.

As a part of his study he correlated

the composite Scores on five of the six Watson-G-laser tests
with intelligence quotients.

He obtained a correlation

of .63.
97
9 8

Ter Keurst and Bugbee, op. cit.. pp. 489-501.
G-laser, op. cit., 142-147.

^ William S. Howell, "The effect of high school
debating on critical thinking."
Speech Monographs. 10:
96-102, Annual, 1943.

68

In a study of the ability of ninth grade students to
make conclusions, T e i c h m a n ^ ^ found a correlation of .65
between the scores on his test and scores on a measure of
mental ability.

He found no significant relationship b e ­

tween intelligence and growth in the ability to make con­
clusions;
Higgins,as

a part of his study on the educability

of adolescents in inductive ability, devised a test entitled
Judge Conclusions.

He found that the correlation between

the scores on this test and scores on the Henmon-Nelson Test
of Mental Ability was .54.

Of particular interest, however,

was his finding of a correlation of only .36 between his
test and Thurstone's Induction Test.

One would expect that

his test, which he believed measured abilities involved in
inductive reasoning, would have had a higher correlation with
a test which purported to measure the inductive factor of in­
telligence than with a general intelligence test, such as the
Henmon-Nelson Test of Mental Ability.
102
Weisman,
in her study of factors related to the
ability to interpret data, reported correlations of .64 to

100 Teichman, op. cit., pp. 268-279.
Higgins, o£. cit.. p. 40.
Leah L. Weisman, "Some Factors Related to the
Ability to Interpret Data In Biological Science." Unpub­
lished Doctor's thesis, Department of Education, University
of Chicago, 1946.
p. 91.

.69 "between intelligence as measured by the Henmon-Nelson
Test of Mental Ability and ability to interpret data as
measured by the Progressive Education Association Inter­
pretation of Data test.
The studies considered thus far have all given evi­
dence of a moderate to substantial relationship between
intelligence and problem-solving abilities.

Two studies,

utilizing the technique of partial correlations, have shown
that the true relationship between intelligence and problem­
solving is probably not shown by simple correlations.

In a

study devised to investigate the relationship between ability
to recall and ability to reason, S m i t h found a correlation
of .58 between ability to reason and IQ.

When ability to

recall was held constant, by means of a partial correlation,
this coefficient of correlation between ability to reason and
IQi

IQ was reduced to .2 3 .

Alpern,

ity of students to test hypotheses,

in his study on the abil­
found a correlation of

.53 between intelligence and ability to test hypotheses.

How­

ever, by holding reading grade and chronological age constant
by the use of a partial correlation, he found the correlation
was reduced to .11.

victor G. Smith, "A study of the degree of rela­
tionship existing between ability to recall and two measures
of ability to reason.” Science Ed u c a t i o n . 30:88-90, March,
1946 b
~

70
Somewhat lower correlations between
abilities

intelligence and

i n v o l v e d in c r i t i c a l t h i n k i n g h a v e b e e n r e p o r t e d

in a n u m b e r of studies.

Hoff^^

r e p o r t e d a c o r r e l a t i o n of

.36 b e t w e e n i n t e l l i g e n c e a s m e a s u r e d b y

the A m e r i c a n C o u n c i l

on E d u c a t i o n P s y c h o l o g i c a l e x a m i n a t i o n a n d h i s
scientific at t i t u d e s .

t est for

Noll*^^ found moderate positive cor­

relations, r a n g i n g f rom
p r e l i minary forms of h i s

.30 to
test,

correlations, h e b e l i e v e d ,

.41 b e t w e e n IQs a n d scores on
" W h a t Do Y o u T h i n k . ”

indicated that his

T h ese

test measured

factors o t h e r than i n t e l l i g e n c e o r n a t i v e a b i l i t y of the
eighth to t w e l f t h g r a d e s t u d e n t s

to w h o m h e a d m i n i s t e r e d the

tests.
Bedell ,

107
'in a s t u d y on the r e l a t i o n b e t w e e n the a b i l ­

ity to infer a n d the a b i l i t y to r e c a l l ,
c o r r e lations b e t w e e n

intelligence of

found low positive

junior a n d s e n i o r h i g h

school s t u d e n t s a n d t h e i r a b i l i t y to infer.
data r e v e a l e d that

However, his

the l o w e s t q u a r t e r o f the group,

in terms

of scores on the i n t e l l i g e n c e test,

scored scarcely better

than c h a n c e on the

He

i n f e r e n c e test.

that a c e r t a i n d e g r e e of i n t e l l i g e n c e

conclu d e d ,

t e ntatively,

is e s s e n t i a l to p r o b l e m ­

solving abili t y .

Hoff,

op.

c i t . . pp. 28-35.

Noll,

o g .

c i t . , p. 24.

-*-°7 R a l p h C. Bed e l l , "The R e l a t i o n s h i p B e t w e e n the
A b i l i t y to In fer in S p e c i f i c L e a r n i n g S i t u a t i o n s . ” U n p u b ­
l i shed D o c t o r ' s thesis, D e p a r t m e n t of E d u c a t i o n , U n i v e r s i t y
of M i s s o u r i ,
1934.
pp. 36-37.

71
Johnson

i o ft

correlated scores made on her test devised

to measure reflective thinking with mental alertness, as
measured by the Ohio Psychological examination.

She r e ­

ported a coefficient of correlation of .40 for a group of
84 college students.

She believed that the data revealed

that those aspects of reflective thinking measured by her
test may depend on college level intelligence, but that other
variables were more significant.
Furst,

109

in a study of changes evoked in two years of

general education, gave a series of tests to measure, among
other things, changes in the ability to think critically.

As

a part of his study, he correlated the scores made on the
portions of his test which measured critical thinking with
intelligence as measured by the American Council on Education
Psychological examination.
correlations were below .40.

He found that 80 percent of these
He asserted that his data indi­

cated that the various tests of critical thinking measured
characteristics of s t u de n t’s behavior which were not highly
related to. measures of scholastic aptitude.

He believed

that, at the secondary school level and the lower college
level, students with relatively low scholastic aptitude may
loft
° Johnson, o£. c i t .. pp. 83-96.
Seward J. Furst, ’’Changes in Organization of
Various Abilities and Skills after Two Years of General
Education at the Secondary-School Level.” Unpublished
Doctor's thesis, Department of Education, University of
Chicago, 1948.
p. 155.

72
be able to perform as well as those with high scholastic
aptitude on tests of critical thinking.
Dunning110 studied the relationship of the ability
to interpret data, as measured by his test, to factors of
Intelligence.

As a measure of the factors of intelligence

he used a battery of Thurstone's Primary Mental Abilities
tests.

He found correlations of from .04 to .24 between

the various factors of intelligence as measured by this
test and the scores on the interpretation of data, portion
of his Test of Scientific Thinking.

He concluded that the

ability to interpret data was a different ability than any
of the factors of intelligence.
Ill
Head
reported a correlation of .39 between intell­
igence and his non-verbal test of the ability to use the
scientific method.

Edwards

112

found correlations ranging

from .00 to .22 between measures of intelligence and his
four tests which were designed to measure (1) induction,
(2) deduction,

(3) Judging opinions, and (4) Judging conclu­

sions .

110 G-ordon M. Dunning, "The construction and valida­
tion of a test to measure certain aspects of scientific
thinking in the area of first year college physics."
Science Education. 33*221-235, April, 1949.
111 Read,
112

o p

. cit.. pp. 261-266.

Edwards, pp. cit., pp. 80-85.

Fleming^1^ studied the outcomes of a course in
biology at the college level.

One of the purposes of his

investigation was to measure growth in understanding of
the elements of the scientific method.

As a part of this

study he correlated the scores made on his test of scien­
tific thinking with intelligence.

He reported a coeffic­

ient of correlation of .34.
Summary of studies concerning the relation of in­
telligence to problem-soIving.

There is no substantial

agreement among investigators concerning the relationship
of problem-solving to intelligence.

A number of investi­

gator's correlations ranged from .40 to .69, indicating a
fairly substantial relationship between intelligence and
problem-solving abilities.

Billings

interpreted such

correlations as indicating that problem-solving ability is
a general factor, if not intelligence itself, whereas other
investigators made no such claim.

On the other hand, how­

ever, some investigators have found correlations ranging
from .00 to .40, indicating no relationship to moderate
relationship between these characteristics.

Evidence ob­

tained by the use of partial correlations indicated that
other factors, such as memory and reading ability may account
for some of the relatively high correlations.

Fleming, op. cit.. p. 185*
Billings, op. cit.. pp. 259-272.

74
Although many of the correlations show a moderate
to substantial relationship between intelligence and the
abilities involved in problem-solving, these correlations
are not as high as correlations between scores on intelli­
gence tests and achievement tests over information previously learned.

Stroud

115

has stated that correlations

between scores on achievement batteries and intelligence
tests are of the magnitude of .8, and K e l l e y c l a i m e d
that there was a 90 percent overlapping between a general
intelligence test and a general achievement test.

These

findings seem to indicate that there is somewhat less rela­
tionship between intelligence and ability to think scien­
tifically than between intelligence and general academic
achievement.
Zyve,

D o w n i n g , a n d Sinclair and Tolman^"^ sup­

port the viewpoint that the ability to think critically is
an innate characteristic.

If this is true, no appreciable

Improvement in scores on thinking tests as a result of in­
struction would be anticipated.

Evidence to the contrary

115

James B. Stroud, Psychology in Education. New
York: Longmans, Green and C o m p a n y . 1946.
pp. 558-339.
Truman L. Kelley, Interpretation of Educational
Measurements. Yonkers-on-Hudson: World Book Company.
1927.
pp. 363.
117

Zyve, o p . cit. . pp. 525-546.
Downing, og. cit.. pp. 121-123.

■^9 Sinclair and Tolman,

ojd .

cit.. pp. 262-270.

75
is presented in the discussion which follows.
Educability in problem-solving.

Related to the prob­

lem of the relationship of intelligence to abilities Involved
in critical thinking, is the problem of educability in the
thinking process.

If abilities involved in critical think­

ing were primarily due to intelligence as suggested by
Billings, there should be little, if any, improvement in the
ability with training.

The evidence seems to Indicate that

these abilities can be improved if they become specific ob­
jectives of instruction.

On the contrary, there is no evi­

dence to indicate that they are a necessary by-product of
the study of science.

As Indicated by Noll,

120

the attain­

ment of these objectives will come when they are taught;
that is, when the emphasis of teaching is upon learning to
think rather than on memorization of facts.
There is considerable evidence to show that skills of
the scientific method can be taught effectively to students
121
of all grade levels. Weller
found a significant differ­
ence between two equated groups of sixth grade students; one
group received specific instruction in both scientific atti­
tudes and skills of scientific thinking, while the other
received no special training.

She concluded that growth in

120 victor H. Noll, "Teaching the habits of scientific
thinking.” Teachers College Record. 35*202-212, December,
1933.
121

Weller, 0£. cit., pp. 90-97.

76
both attitudes and skills could he stimulated if they were
specific objectives of instruction.

Arnold,122 in a study

of fifth and sixth grade students, also concluded that
critical thinking can be taught in the elementary school.
G-rener and Raths12^ found significant gains in the
ability to think critically in a group of third grade
pupils after a five month period of teaching for critical
thinking.
Curtis12^ and Daily12^ both found that Junior high
school pupils benefited from direct instruction in critical
thinking.
Blair and Goodson

126

conducted an experiment which

showed that ninth grade students receiving instruction in
127
scientific thinking improved more on Noll's
"What Do You
Think" test than did the two groups which did not receive
noft
this special instruction.
One of the control groups
122

Dwight Arnold, "Testing Ability to use data in
the fifth and sixth grades." Educational Research Bulletin.
17:255-259, December, 1937..
12^ Norma G-rener and Louis E. Raths, "Thinking in
third grade." Educational Research Bulletin. 24:38-42,
February, 1945.
122f Curtis, pp. cit.. p. 78.
12^ Daily, pp. cit.. p. 81.
•*•26 G-ienn M. Blair and Max R. G-oodson, "Development
of scientific thought in general science.
School Review.
47:696-700, November, 1939.
127 Noll, Habits of Scientific Thinking, op. cit..

PP. 27.

1PR

Blair and Goodson, pp. cit.. pp. 696-700.

77
received no science instruction, while the other control
group received science instruction by the usual methods.
The means for all three groups were higher on the post-test
than on the pre-test.

The comparison of means for the two

control groups showed no significant difference which seems
129
to support Downing’s
viewpoint that science instruction
does not necessarily produce growth in ability to think
scientifically.
Teichman

130

investigated the ability of ninth grade

students to draw conclusions.

Twelve groups, designated as

controls, were taught the regular course in science.

Eight

groups were given additional training in the drawing of con­
clusions.

He found that although both groups made gains in

these abilities, the experimentals made significantly greater
gains.
H i g g i n s ^ ^ studied the educability of adolescents in
inductive ability.

He reported that the gains of students

receiving special instruction in problem-solving in a course
in high school biology were meaningfully greater than the
gains of other students taking biology but not receiving
special instruction in problem-solving.

Downing, op. cit.. pp. 121-128.
130 Teichman, pp. pit., pp. 268-279.
Conwell D. Higgins, "The educability of adoles­
cents in inductive ability."
Science Education. 29.*82-85,
March, 1945.

Neuhof

132

f o u n d t hat s t u d e n t s t a ki n g h i g h s c h o o l

chemistry i m p r o v e d m a r k e d l y

in t h e i r a b i l i t y to i n t e r p r e t

data, as m e a s u r e d b y the P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n
I nterpretation of D at a tests,
pretation of data.
study.

Gains

students.

after

training

in the i n t e r ­

N o c o n t r o l g r o u p was e m p l o y e d in this

in s c or es w e r e n o t l i m i t e d to the b e t t e r

H e c o n c l u d e d t ha t d e f i n i t e l y m e a s u r a b l e r e s u l t s

could be a c h i e v e d in the t e a c h i n g of such c o m p l e x m e n t a l
processes as the i n t e r p r e t a t i o n of data.
Weisman

133

i n v e s t i g a t e d the d e v e l o p m e n t of skills of

scientific t h i n k i n g in h i g h s c h o o l biology.
taught by the i n v e s t i g a t o r

using problem-solving techniques

were c o m p a r e d w i t h six c l a s s es
lieve that the a b i l i t y

S i x c l a ss es

t a u g h t by teac he r s who b e ­

to t h i nk s c i e n t i f i c a l l y c o u l d be

taught w i t h o u t s p e c i a l instruc ti o n.

Weisman found her e xper­

imental group s g a i n e d s i g n i f i c a n t l y more

than the c on tr o l s on

the P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n I n t e r p r e t a t i o n of Data
tests.

There was a ls o a s i g n i f i c a n t g ain on s e v e r a l of the

7/atson-Glaser Tests

of C r i t i c a l Thi n ki ng .

A l t h o u g h th ese

results are c o n s i s t e n t w i t h r e s u l t s of m a n y o t h e r studies,
K a l l i s o n ^ ^ ^ c r i t i c i z e d the

tests.

i m p l i c a t i o n of the f i n d in g b e c a u s e

132 M a r ]£ Neuh of , ” I n t e g r a t e d i n t e r p r e t a t i o n of data
S c i e n c e E d u c a t i o n . 26:21-26,
January, 194-2.
133
^

Weisman,

op.

cit.,

pp. 77-83.

G e o r g e G-. M a l l i s o n , **The im pl i ca ti on s of r e c e n t
res ea r ch in the t e a c h i n g of science a t the s e c o n d a r y - s c h o o l
l e v e l . ” J o u r n a l of E d u c a t i o n a l R e s e a r c h . 43:321-342,
January, 1950.

79
the study failed to take into account the fact that the
investigator may have been a superior teacher.
Glaser

135

utilized four control and four experimental

classes in twelfth grade English to measure changes in abil­
ity to think critically. The experimental classes were given
instruction to stimulate critical thinking.

G-laser found

that the average gains on the battery of critical thinking
tests of the four experimental classes, after ten weeks of
instruction, were significantly greater than the average
gains of the control classes.

This study is especially sig­

nificant in that it included a follow-up study. The students
were tested again six months after the experimental period.
The growth in ability to think scientifically had been re­
tained.

Glaser predicted that some aspects of the growth

would probably be retained more or less permanently, and
would afford a basis for further growth in the ability to
think critically.
A few studies have been reported on teachability of
the skills involved in scientific thinking at the college
level.

Teller

136
J

used an experimental and a control group

of students taking a course in the history of education.
Both groups had classes five days a week, but one class

Glaser, op. cit., pp. 131-14-0.
136

James D. Teller, " Improving ability to interpret
educational data." Educational Research Bulletin. 19*363371, September, 194-0.

80
p e r i o d e a c h w e e k wa s
t o r i c a l d ata

d e v o t e d to the

in the e x p e r i m e n t a l

interpretation of h i s ­

section.

Teller

that the e x p e r i m e n t a l g r o u p s h o w e d g r e a t e r

Improvement

the a b i l i t y to i n t e r p r e t d a t a as m e a s u r e d by a
s t r u c t e d to a p p r a i s e

the a bili ty

to

found

test

in

con­

Interpret historical

da t a .
Tyler

137

for s t u d e n t s

r e p o r t e d a s t u d y on r e m e d i a l

enrolled

in a c o u r s e

Students who received remedial

in f r e s h m a n z o o l o g y .

instruction

ing t e c h n i q u e s g a i n e d s i g n i f i c a n t l y m o r e
the r e m e d i a l

instruction.

instruction

Students

in p r o b l e m - s o l v ­

than those without

In th i s

m a t c h e d on the b a s i s o f i n t e l l i g e n c e ,

study were

pre-test

s c o re s,

sex,

and instructor.
Fleming

138

reported a study

comes of a c o u r s e
ap p r a i s e d was
two g r o u p s

in b i o l o g i c a l

the a b i l i t y

of s t u d e n t s ,

Thelen

139

certain

science,

f o u n d that,

out­

O n e o f the o u t c o m e s

to t h i n k s c i e n t i f i c a l l y .

He

in the a b i l i t y

ing the s c i e n c e

s ci en c e .

one taking no

ing b i o l o g i c a l s c i e n c e .
made gains

to m e a s u r e

He

equated

the o t h e r t a k ­

although both groups

to t h i n k s c i e n t i f i c a l l y ,

those tak­

course made significantly g r e a t e r gains.
m a d e a s t u d y of the e f f e c t o f

137

instruction

R a l p h W. T y l e r , S e r v i c e S t u d i e s in H i g h e r E d u c a ­
t ion . C o l u m b u s , Ohio: T h e O h i o S t a t e U n i v e r s i t y .
1932.
pp. 11 9 -1 22 .
Fleming,
139 t h e l e n ,

o£.

cit.,

. cit..

ojq

pp.
pp.

172 -1 79 .
2 34 -2 6 1 .

81
p lanned to p r o d u c e g r o w t h
tifically.

In the a b i l i t y

The experiment was

a course in f r e s h m a n
taught b y t r a d i t i o n a l

to t h i n k s c i e n ­

conducted with students

chemistry.

in

The control groups were

laboratory methods,

perimental groups were given op portunities
in i n d u c t i v e t h i n k i n g as o f t e n as w a s

whereas

the e x ­

to p a r t i c i p a t e

feasible.

Thelen's

test on e x p e r i m e n t a l p r o c e d u r e s a n d the P r o g r e s s i v e E d u c a ­
tion A s s o c i a t i o n

I n t e r p r e t a t i o n of D a t a

eva lu a te the se a b i l i t i e s .
of cov ar ia n ce ,

U s i n g the

t e st w er e u s e d to

t e c h n i q u e of a n a l y s i s

h e f o u n d th a t the e x p e r i m e n t a l g r o u p s w e r e

s u p e ri or to the con tr ol s.

However,

the g a i n s w e r e n o t

g r ea t in t e r ms o f p e r c e n t gain s.
Bon d,

140

in a s t ud y s i m i l a r to T h e l e n ’s f o u n d s u p e r ­

iority in a n e x p e r i m e n t a l g r o u p .

The subject-matter area

of B o n d ’s s t u d y was a u n i t on g e n e t i c s

in a c o u r s e

in c o l l e g e

biology.
Barnard

141

c o m p a r e d the r e l a t i v e e f f e c t i v e n e s s

of the

l e c t u r e - d e m o n s t r a t i o n m e t h o d w i t h the p r o b l e m - s o l v i n g m e t h o d
in the t e a c h i n g of a c o u r s e
tion i n s t r u m e n t s

in c o l l e g e

science.

The evalua­

i n c l u d e d a t e s t o n the a b i l i t y to s o l ve

140

A u s t i n D. M. B o n d , A n E x p e r i m e n t in the T e a c h i n g
of G e n e t i c s w i t h S p e c i a l R e f e r e n c e to the O b j e c t i v e s of
G e n e r a l E d u c a t i o n . C o n t r i b u t i o n s to E d u c a t i o n , No. 797.
N e w York; B u r e a u of P u b l i c a t i o n s , T e a c h e r s Col le ge , C o l u m b i a
University.
1940.. pp. 77-7 9.
141

J. D a r r e l l B a r n a r d , " T he L e c t u r e - d e m o n s t r a t i o n
vs p r o b l e m - s o l v i n g m e t h o d of t e a c h i n g a c o l l e g e s c i e n c e
course."
Science E d u c a t i o n . 26:121-132,
O c t o be r, 1942.

problems.

The g r o u p s u s e d w er e e q u a t e d on the b a s i s

tests and scores on p s y c h o l o g i c a l e x a m i n a t i o n s .

of p r e ­

He found

that the p r o b l e m - s o l v i n g m e t h o d p r o d u c e d s i g n i f i c a n t l y
greater gains on the t e st s d e s i g n e d to m e a s u r e p r o b l e m - s o l v ­
ing abilities.
S u m m ar y of s t u d i e s

on e d u c a b i l i t y

in p r o b l e m solving;.

The evidence p r e s e n t e d in this p o r t i o n of the r e v i e w of l i t ­
erature seems to i n d i c a t e
thinking are n o t to a n y
the teaching of science.
to the h y p o t h e s i s
v i di ng

that a b i l i t i e s

i n v o l v e d in c r i t i c a l

c o n s i d e r a b l e e x t e n t a b y - p r o d u c t of
The e v i d e n c e a l s o

lends

credence

that c r i t i c a l t h i n k i n g can be t a u g h t p r o ­

it is a s p e c i f i c o b j e c t i v e of instruction.

the evidence

Howe ve r ,

is still f r a g m e n t a r y a n d the c o n c l u s i o n

is t e n ­

tative.
R e l a t i o n of r e a d i n g a b i l i t y
in p r o b l e m - s o l v i n g .

to the a b i l i t i e s

Involved

T h e r e is c o n s i d e r a b l e e v i d e n c e to show

that there is a r e l a t i o n s h i p b e t w e e n r e a d i n g a b i l i t y a n d the
abil it y to th ink c r i t i c a l l y . A n i n t e r e s t i n g p o i n t in this r e142
gard is the f act that B u r o s
p l a c e d the P r o g r e s s i v e E d u c a ­
tion A s s o c i a t i o n I n t e r p r e t a t i o n of D a ta tests a m o n g h i s list
of r e a d i n g tests

in the 19 4 0 M e n t a l M e a s u r e m e n t Y ea rbook.

G-rim-^3 f o u n d c o r r e l a t i o n s r a n g i n g f r o m .51 to

.66

142

O s c a r K. B u r o s , The N i n e t e e n - F o r t y M e n t a l M e a s u r e ­
ment Y e a r b o o k . H i g h l a n d Park, N. J.:
The M e n t a l M e a s u r e m e n t
Yearbook.
1941.
pp. 546-347.
P a u l R. G-rim, M I n t e r p r e t a t i o n of da t a a n d r e a d in g
a b i l i t y in s o c i a l s t u d i e s . ’* E d u c a t i o n a l R e s e a r c h B u l l e t i n .
19:372-374, September, 1940.

83
bet we e n scores on P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n
pretation of D a t a t e st s a n d s c o r e s
junior h i g h

school students.

on r e a d i n g t e st s a m o n g

Weisman^

g re ss iv e E d u c a t i o n A s s o c i a t i o n

Inter­

also used the P r o ­

I n t e r p r e t a t i o n of D a t a

te s t

In h e r study on f a c t o r s r e l a t e d to the a b i l i t y to I n t e r p r e t
data a m o n g h i g h s c h o o l s t u d e nt s.
b e t w e e n scores

on this

She found correlations

t e s t a n d s c o r e s on the

R e a d i n g test to r a n g e f r o m

.57 to

.65.

Iowa S i l e n t

A partial correla­

tion b e t w e e n scores on

the r e a d i n g t e s t a n d s c o r e s

i n t e r p r e t a t i o n of d at a

t est w i t h

Dunning

145
^ compared scores

test in ph ysics,

146

held

the

constant was

.34.

i n t e r p r e t a t i o n of d a t a

d e s i g n e d f o r c o l l e g e fre s hm en ,

on a r e a d i n g test.
3-l aser

on h i s

I Q ,

on

with

He r e p o r t e d a correlation of

reported correlations

of

scores

,36.

.32 a n d

.36 b e t w e e n

the c o m p o s i t e s c or e on the Watson-G-laser b a t t e r y of t e s ts
a n d scores on the N e l s o n - D e n n y r e a d i n g
of scores on the r e a d i n g

test.

Correlation

t e s t a n d s c o re s on the six

individ­

ual tests of the Watson-G-laser b a t t e r y r a n g e d f r o m - .06 for
the g e n e r a l i z a t i o n t e s t to
is of i n t e r e s t to n o t e

.55 fo r the

t h a t th ere

i n f e r e n c e test.

is a r e l a t i v e l y h i g h

r e l a t i o n b e t w e e n r e a d i n g a b i l i t y a n d a n a b i l i t y to
d e g r e e of tru th or f a l s i t y of s t a t e m e n t s .

Weisman,
145
J

op.

c i t . . pp.

Du nning, p p. p i t . ,
G-laser,

op.

147 I b i d . . pp.

p. 232.

c i t .. pp.
166-167.

97-98.

142-147.

It
cor­

judge the

G-laser'*'2*’^ f o u n d

84
higher correlations between
of R e a d i n g C o m p r e h e n s i o n .
.36 to

.77 f or the

for his b a t t e r y

sco re s

on h i s

test and a

These correlations

i n d i v i d u a l te sts a n d f r o m

test

ranged
.77 to

from
.82

of tests.

T e r K e u r s t a n d B u g b e e 1^® o b t a i n e d c o r r e l a t i o n s
.57 a n d

.59 b e t w e e n

scores

on t h e i r t e s t on s c i e n t i f i c m e t h o d

a n d sco re s on the N e l s o n - D e n n y r e a d i n g test.
m en ti on ed ,

t hi s

of

As

test of scientific m e t h o d seems

previously
to m e a s u r e

k n o w l e d g e of s t e p s a n d a t t i t u d e s r a t h e r t h a n b e h a v i o r s .
this b a s i s

one m i g h t e x p e c t r a t h e r h i g h

reading ability and scores
Teichman,
stu de nt s
between

on this

test.

found a correlation of

this a b i l i t y as m e a s u r e d b y h i s
Alpern

correlation between

14Q
7 in s t u d y i n g the a b i l i t y o f n i n t h g r a d e

to d r a w c o n c l u s i o n s ,

abi li t y.

ISO

found similar correlations between his

to t es t h y p o t h e s e s

scho ol pupils.

H e r e p o r t e d a c o r r e l a t i o n of

found that by h o l d i n g

tial c o r r e l a t i o n ,
Hoff

151

this

148

However,

constant by means

of a p a r ­

c o r r e l a t i o n w a s r e d u c e d to

Ter K eu rst and Bugbee,

Alpern,
H o ff ,

op.

ojd.

o£.

ojd.

c l t .. pp.

c l t . . pp.

in h i g h

.57-

H e r e p o r t e d a c o r r e l a t i o n of

l4Q
^ Teichman,

151

I Q ,

and reading grade

f o u n d low c o r r e l a t i o n s b e t w e e n h i s

reading ability.

.61

test a n d r e a d i n g

test on a b i l i t y

Alpern

On

c l t . . pp.

.3 6 .
test and

.19 b e t w e e n

489-501.

268-279.

2 2 0 - 22 9 .

c l t . . pp. 28-35.
M

scores on h i s

t es t a n d s c o r e s

on t h e

comprehension

of the A m e r i c a n C o u n c i l o n E d u c a t i o n R e a d i n g
correlation between

scores

on his

ing scor es on the A m e r i c a n
test was

portion

test.

The

t e s t a n d s p e e d of r e a d ­

C o u n c i l on E d u c a t i o n R e a d i n g

.09.

Summary of studies
ing to p r o b l e m - s o l v i n g .

concerning
The

evidence

the r e l a t i o n of r e a d ­
presented

ind ic a te t h a t r e a d i n g a b i l i t y a n d a b i l i t y
tifically are

to s o m e d e g r e e

data tests a n d o t h e r

tests

scientific thinking are
u po n r e a d i n g a b i l i t y .

Interpretation

of

i n v o l v e d in

substantial degree dependent

the o t h e r h a n d ,

tests d i d n o t s e e m to d e p e n d

to

to t h i n k s c i e n ­

measuring abilities

to a
On

related.

seems

scores

on a t t i t u d e

to a n y m a r k e d e x t e n t on r e a d i n g

ability.
Relation
volved

of f a c t u a l

i n f o r m a t i o n to the a b i l i t i e s

in p r o b l e m - s o l v i n g .

According

in­

to W o o d a n d B e e r s ,

t h i n k i n g a n d t h i n k i n g a b i l i t y a r e n o t u n d e r the c o n t r o l of
t e a c h i n g e x c e p t as
This

statement

thinking

seems

k n o w l e d g e of f a c t s

to

influenced by knowledge.

i m p l y t h at g e n e r a l

Intelligence and

s h o u l d a c c o u n t f o r a l l of

in s c o r e s on t h i n k i n g test s.
view

is

the v a r i a b i l i t y

T h e e v i d e n c e f o r t h i s p o i n t of

is s o m e w h a t c o n t r a d i c t o r y as m a y b e

se e n in the

follow­

ing d i s c u s s i o n .

152
thinking."
1936.

B e n D. W o o d a n d F. S. B e e r s , " K n o w l e d g e v e r s u s
Teachers College Record, 37s487-499,
March,

Bedell
ship b e t w e e n

planned a study
the a b i l i t y

Thirty paragraphs

to r e c a l l a n d

containing facts

could infer principles
w er e c o n s t r u c t e d ;
measure

one

the a b i l i t y

were given.
to m e a s u r e

to m a k e

a d m i n i s t e r e d to 3 2 4 s t u d e n t s
schools.

Bedell found

a b i l i t y to

to h i s

difficult

process

Billings,

from

scores

.53 to

The average

age correlation b e t w e e n
problem-solving

t e st s

e d t h a t t h o s e who

154
155

Bedell,

op.

Billings,
Smith,

Q£.

know

could

in d i f f e r e n t

ojo. c i t . . pp.

259-272.

88-90.

to

fields

to s o l v e
solve

ranged

.6 7 . T h e a v e r ­
tests and

.45.

He

conclud­

the m a t e r i a l ,

solve

10-50.

c l t . . pp.

the

Correlations

correlations between

c l t . . pp.

were

unrelated.

the a b i l i t y

Information

the m a t e r i a l

to

infer was a more

in v a r i o u s

s o l v e d the p r o b l e m s

S m i t h 1-^ f o u n d h i g h
153

tests

completely

in the s a m e f i e l d w a s

that n o t a l l w h o k n e w

one

to r e c a l l a n d

same a r e a .

on

i t em s

senior high

correlation was

scores

student

of f a c t s ,

than b e t w e e n a b i l i t y

in the

infer.

to r e c a l l ,

correlations between

in p r o b l e m - s o l v i n g

.78.

to

to

of t e s t

in s t u d y i n g p r o b l e m - s o l v i n g

information

relation­

the

These

junior a n d

t h a n the a b i l i t y

154

sets

knowledge

the a b i l i t y

in d i f f e r e n t a r e a s

problems a n d
between

Two

t h a t the a b i l i t y

findings

found higher

problems

from which

Inferences.
in

the

t he a b i l i t y

infer were different but not

According

a r ea s

to d e t e r m i n e

but

the p r o b l e m s .
the a b i l i t y

to r e a s o n a n d k n o w l e d g e of facts.
obt ai n ed was

*77.

T he r e d u c t i o n

T he c o r r e l a t i o n h e
in this

c o r r e l a t i o n w as

slight w h e n IQ w a s h e l d c o n s t a n t b y m e a n s of a p a r t i a l c o r ­
relation.

T h e p a r t i a l c o r r e l a t i o n was

that the a b i l i t y to r e c a l l
see r e l a t i o n s h i p s

.65.

He concluded

i n f o r m a t i o n a n d the a b i l i t y

s e e m e d to be

to

two p r o d u c t s o f the s a m e

learning p rocess.
Dunning

1^6

r e p o r t e d a c o r r e l a t i o n of

J

i n t e r p r e t a t l o n of d a t a t e s t for a p h y s i c s
f r eshmen a n d a f a c t u a l
topics.

Since his

overlapping between
f a c t ua l test,

he

information

c o r r e l a t i o n of
the

.56 b e t w e e n h i s

course for college

test c o v e r i n g the
.56

i n d i c a t e d a 38 p e r c e n t

I n t e r p r e t a t i o n of d a t a

c o n c l u d e d t hat k n o w l e d g e

m a t i o n was n o g u a r a n t e e of a b i l i t y

same

to use

test a n d the

of f a c t u a l
the

infor­

information

in

the s o l v i n g of p r o b l e m s .
Fleming,
course
tion

157

as a p a r t o f h i s

in b i o l o g y a t the c o l l e g e
of .57 b e t w e e n

the t es t he

to t h in k s c i e n t i f i c a l l y

s t u d y on o u t c o m e s

level,

in a

reported a correla­

u s e d to m e a s u r e

a n d the t e s t h e u s e d

the a b i l i t y

to m e a s u r e

k n o w l e d g e of facts.
Welsman,
ity to infer,
156
J

157

158

in a s t u d y of f a c t o r s r e l a t e d to the a b i l ­

r e p o r t e d a c o r r e l a t i o n of

Dunning, pp.

.63 b e t w e e n s c o re s

c l t . . p. 232.

Fleming,

op. c l t . . pp.

186-187.

Weisman,

op. c l t . . pp.

104-105-

88
on the Progressive Education Associa ti on Interpretation of
Data test and scores made on the Cooperative Biology test.
She found, however,

that there was little re lationship b e ­

tween scores on the interpretation of data test and gain
in knowledge of biology,

or between gain in ability to in­

terpret data and knowledge of facts.
Read

Igq „

found a correlation of .53 between scores on

his non-verbal test of the ability to use the scientific
method and scores made on the Cooperative 3-eneral Science
test.
In a course in elementary biology,

Tyler

found a

correlation of .41 between scores on an information test
and scores on a test measuring ability to interpret data.
He reported a correlation of .46 between scores on the in­
formation test and a test designed to measure the ability
to plan experiments to test hypotheses, and a correlation
of .35 between knowledge of technical terms and ability to
draw inferences.

In another study of college students tak­

ing various subjects, h e 1^ 1 found correlations ranging from
.20 to .53 between scores on tests of recall and scores on
159

160

Read,

ojc. c i t ... pp. 3 6 1 -366.

„

Ralph W. Tyler,
Measuring the results of college
instruction.“ Educational Research B u l l e t i n . 11:253-260,
May, 1932.
Ralph W. Tyler, in Charles H. Judd, Education as
Cultivation of the Higher Mental P r o c e s s e s . New York:
The Macmillan Company.
1936.
p. l4.

89
tests r e q u i r i n g
that t h e r e w a s

students

to d r a w i n f e r e n c e s

little relationship between

and

concluded

these

two

abilities.
Summary

of s t u d i e s

concerning

the r e l a t i o n o f k n o w ­

ledge of f a c t s

to p r o b l e m - s o l v i n g a b i l i t i e s .

indicates

there

t ween

that

the a b i l i t i e s

knowledge
support

is a m o d e r a t e
involved

of f ac t s .

These

the c o n c l u s i o n

not guarantee an ability

seem,

f a c ts a r e

but

evidence

correlation b e ­

in p r o b l e m - s o l v i n g a n d

findings

that

a n d to p r o b l e m - s o l v i n g ,

positive

The

to u s e the

in g e n e r a l ,

essential

that k n o w l e d g e
facts

the

of

to

thought,

facts

in t h e

to

does

solution

of a p r o b l e m .
Summary

of r e s e a r c h

attempt has been made
criptive analysis

in th i s

o f the

chapter

to s h o w h o w

t he a b i l i t y

the d e v e l o p m e n t o f

r e s e a r c h on the a b i l i t y

to the p r o b l e m .

s t e p s of s c i e n t i f i c

r e l a t e d to m e a s u r e m e n t of
and how

related

t e st s h a s

An

the

thinking

to t h i n k

des­
is

scientifically,

influenced educational

to t h i n k s c i e n t i f i c a l l y .

Early work

in the d e s c r i p t i v e a n a l y s i s

was d o n e by p h i l o s o p h e r s a n d

dividual

systematic

scientists,

but no

s t ep s

involved

about

t w e n t y - f i v e y e a r s a go .

tributions

in s c i e n t i f i c

Since

that time

to a n u n d e r s t a n d i n g of t he n a t u r e

is of s p e c i a l

of

the

thinking was a t t e m p t e d u n t i l

thinking have been made by various
standing

evaluation

in­

importance

workers.

important

con­

of s c i e n t i f i c
Such an under­

to the m e a s u r e m e n t

of

ability to t h i n k s c i e n t i f i c a l l y b e c a u s e
ments of sc i en ti fi c
to be tested,

thinking provide

and because

the steps or e l e ­

specific objectives

the steps o f f e r s u g g e s t i o n s

of

the types of b e h a v i o r s w h i c h a t t e n d or w h i c h r e p r e s e n t
scientific thinking.
The r e c o g n i t i o n of the a b i l i t y to t h i n k s c i e n t i f ­
ically as a m a j o r o b j e c t i v e of e d u c a t i o n s t i m u l a t e d the
c o ns t ru ct io n of tests to a p p r a i s e v a r i o u s a s p e c t s
ability.

This

testing m o v e m e n t ,

of this

w h i l e slow a t first, h as

r e s u l t e d in the p r o d u c t i o n of a n u m b e r of tests w h ic h are
quite r e l i a b l e a n d w h i c h s e e m to h a v e
ity.

considerable v a l i d ­

A v a r i e t y of t e c h n i q u e s h a v e b e e n e v o l v e d to m e a s u r e

the a b i l i t i e s

inv ol ve d in s c i e n t i f i c

thinking.

M a n y of the

techniques a p p e a r to be u s e f u l m e t h o d s of o b t a i n i n g e v i d e n c e
of the a b il it ie s.
The d e v e l o p m e n t of i n s t r u m e n t s
t h i n ki ng has

to m e a s u r e s c i e n ti fi c

led to studies of the r e l a t i o n s h i p of this

a b i l i t y to v a r i o u s o t h er traits s u ch as,
r e a d in g ability,

intelligence and

a n d to the k n o w l e d g e of facts.

dence p r e s e n t e d supports

the i n f e r en ce

The e v i ­

that there is a

d ir ec t r e l a t i o n s h i p b e t w e e n the a b i l i t y to t h i n k s c i e n t i f i c ­
ally a n d the a b o v e m e n t i o n e d traits.

Ho we v er ,

most investi­

gators are of the o p i n i o n that the se f actors do n o t a c c o u n t
for a l l of the v a r i a b i l i t y
measure ability

in s c o re s on tests d e s i g n e d to

inv ol ve d in p r o b l e m - s o l v i n g .

91
O n e of
gators

the m o s t

stimulating

into the n a t u r e o f

apparently,
objective of
this v i ew .

he

taug ht ;

problem-solving

particularly

instruction.

findings

The b u l k

if it

is

of

the

that

is a

of e v i d e n c e

investi
it

can,

specific
supports

CHAPTER III
GENERAL PROCEDURES INVOLVED
IN THE D E V E L O P M E N T OF T H E TEST
The pur p os e of this

chapter

Is to de s c r i b e :

(1)

the m a n n e r in w h i c h the test was de v e l o p e d ,

(2)

the m e t h o d s

used in the c o n s t r u c t i o n of the test

(3)

the n a t u r e

of the group s
various

items,

to w h ic h the test was a d m i n i s t e r e d in its

stages of d e v e l o p me nt ,

(4)

sta ti s ti ca l a n a l y s i s of the test,

the m e t h o d s u s e d in the
and

(5)

the m e t h o d s u s e d

in the v a l i d a t i o n of the test.
The g e n e r a l p r o c e d u r e s

f o l l o w e d in the d e v e l o p m e n t

of the test to m e a s u r e the a b i l i t y to t hink s c i e n t i f i c a l l y
were s i m i l a r to those u s e d by S m i t h a n d T y l e r ^ in the
d e v e l o p m e n t of the tests u s e d in e v a l u a t i n g the r e s u l t s of
the E i g h t - Y e a r Study.

S e v e r a l st eps

in the p r o c es s a n d a

d e t a i l e d d e s c r i p t i o n of the p r o c e d u r e w i t h i n each s t e p as
m o d i f i e d for its use

in the p r e s e n t study a r e g i v e n below.

The first four st eps w e r e : . (1)
objectives,

(2)

the d e f i n i t i o n of e a c h of these o b j e c t i v e s

in terms of d e s i r e d be ha vi or ,
situati on s

the s e t t i n g u p of the

(3)

the

i d e n t i f i c a t i o n of

in w hich s t u d e nt s c ould be e x p e c t e d to d i s p l a y

these behaviors,

and

(4)

the w r i t i n g of items to e v a lu at e

1 E u g e n e R. Smith, R a l p h W. T y l e r a n d the E v a l u a t i o n
Staff, A p p r a i s i n g a n d R e c o r d i n g S t u d e n t P r o g r e s s . N e w York:
H a r p e r a n d B ro thers.
1942.
pp. 15-28.

the b e h a v i o r s .
c on s t r u c t e d ,

The fifth

s te p w a s

the t r y o u t o f t h e

the a n a l y s i s

of t h e s e

items,

a t i o n of the b e s t

it ems

The sixth

tion a n d a n a l y s i s

test.

of th i s

step was

of m e t h o d s

the c h a p t e r s w h i c h d e a l w i t h
it was

step was

felt that

used

in

test are

t h es e a s p e c t s

complete

p r e s e n t e d in C h a p t e r
d i s c u s s i o n of s t ep s

t h es e d i s c u s s i o n s

trea tme nt of

IV.

Chapter V

f i v e a n d six.

v a l i d a t i o n of the test,

Chapter VI

the e d u c a t i o n a l o b j e c t i v e s

objectives
o
Columbus,

deals with

the

the t e s t d e ­

to b e m e a s u r e d

involved a consideration

I n v o l v e d in s c i e n t i f i c

thinking as

discussed

a n d a c o n s i d e r a t i o n of the o b j e c t i v e s
e ac h o f

second step was
into

is

to t h i n k s c i e n t i f i c a l l y was:

The f o r m u l a t i o n of the o b j e c t i v e s

implied by

t he y

s t e p seven.

the f o r m u l a t i o n of

The

with which

is d e v o t e d to a d e t a i l e d

the a b i l i t y

in C h a p t e r II,

the p r o b l e m

the f i r s t f o u r s t e p s

s i g n e d to a p p r a i s e

of the e l e m e n t s

the c o n s t r u c ­

wou ld be more

The f i r s t s t e p in t he c o n s t r u c t i o n o f

teaching

the

reserved for

of

m e a n i n g f u l w h e n p r e s e n t e d w i t h the m a t e r i a l s
A

the a b i l i t y

the a d m i n i s t r a ­

The seventh

tion, a n a l y s i s a n d v a l i d a t i o n of the

were used.

incorpor­

the test.

Detailed discussions

because

the

into a t e s t to m e a s u r e

to th ink s c i e n t i f i c a l l y .

validation of

and

items

of

these elements.
the d e f i n i t i o n of e a c h o f

t e r m s of d e s i r e d b e h a v i o r .

As

Tyler

these
2

has

R a l p h W. Tyler, Constructing; A c h i e v e m e n t T e s t s .
Ohio:
Ohio State University.
1934.
pp. 4-23•

94
stressed,

this step is one of the c r u ci al ones of test c o n ­

struction since objectives are usually stated in r a t h e r
broad general terms.

For example:

the abil it y to inte rp re t

data is an o f t - s t a t e d objective of science teaching.

But

what are the specific things that a person does w h e n he
terprets data?

in ­

What are the kinds of errors m a de by persons

who do not consistently a c h i e v e this objective?

In o r d e r to

determine what these beh a vi or s are a study m u s t be made of
the types of reactions made by persons who are c o m p e t e n t in
this objective.
Sources of these b eh aviors were
minor elements

(1) the m a j o r a n d

involved in scientific thinking,

ture on test construction,

e sp ecially on tests d e v i s e d to

measure various aspects of scientific thinking,
tee reports on behaviors

(2) l i t e r a ­

(3) c o m m i t ­

involved in scientific thinking,

(4) reports of r e s ea rc h on b eh aviors of persons doing s c i e n ­
tific research,

and

(5) interviews with teachers of science

who are at t e m p t i n g to teach scientific

thinking.

The third step was the identification of situations
in which students could be exp ec te d to display the types of
behaviors

identified in step two.

to select materials,
students,

It was d ee m e d ad visable

which would be of some interest to the

which dealt with bi o l o g i c a l subject m a t t e r free

of technical terms,

and wh ich w o u ld be comprehensible to

students who h a d h a d no previous experience with b i o l og i ca l
subject matter.

T ec hnical

journals,

popular journals a n d

95
textbooks were examined for situations which could be
utilized in the construction of test items*
The fourth step involved the selection and trial of
promising methods of measuring behaviors which would give
evidence of the attainment of the objectives.

This step

included the writing of the items and the organization of
tryout tests. It is customary to construct two to five times
as many items as used in the final form of the test so that
poor items may be eliminated.

For this reason a series of

nine tryout tests was constructed.

Each of these tests was

designed to measure a limited number of the behaviors

invol­

ved in scientific thinking.
The tests were designed so that they could be scored
on International Business Machine answer sheets.

The five

choice answer sheet was selected as the most appropriate for
the purpose of this test. The detailed discussion of the con­
struction of the tryout tests and examples of items from each
of them will be presented in Chapter IV.
A total of 637 items was constructed for the nine try­
out tests.

They were given to four members of the depart­

ment of Biological Science at Michigan State College and to
one expert in the field of testing in biological science for
criticisms and suggestions.
The fifth step was the administration of the tryout
tests,

the determination of the difficulty a n d validity of

96
the items, a n d the s e l e c t i o n of the "best items.

The

tests were a d m i n i s t e r e d to a g r o u p of 168 s t u d e n t s
the t h ir d t e r m of the t h r e e - t e r m s e q u e n c e
S c i e n c e at M i c h i g a n S t a t e C o l l e g e d u r i n g
1950. O nl y s tu d e n t s

tryout

taking

of B i o l o g i c a l
the s p r i n g t e r m of

for w h o m c o m p a r a b l e p s y c h o l o g i c a l a n d

r e a d i n g e x a m i n a t i o n scores w e r e a v a i l a b l e w e r e u s e d in this
testing.

F o r this r e a s o n o nl y s t u d e n t s w h o h a d t a k e n the

e x a m i n a t i o n s g i v e n to e n t e r i n g f r e s h m e n b y
aminers

the B o a r d of E x ­

in the f all of 194-9 we r e a d m i t t e d to the six s e c t i o n s

w h i c h h a d b e e n d e s i g n a t e d as e x p e r i m e n t a l s e c t i o n s
tests.

However,

for tryout

six of the 168 s t u d e n t s a c t u a l l y e n r o l l e d

in these s e c t i o n s w e r e n o t f r e s h m e n w h o h a d e n t e r e d M i c h i g a n
State College

in the fa l l of 194-9.

These

students h a d been

p r e - r e g i s t e r e d in one of the e x p e r i m e n t a l s e c t i o n s by
d e p a r t m e n t of E n g i n e e r i n g .

Consequently

t r a n s f e r r e d to o t h e r sections.

The

they c o u l d n o t be

scor es

of these s t u de nt s

on the tests w e r e u s e d in the c a l c u l a t i o n of means,
deviations and reliabilities.
w e r e a ls o u s e d

The papers

in the c a l c u l a t i o n of

the

standard

of these s t u d e n t s

item d i f f i c u l t i e s a n d

ite m v a l i d i t i e s b u t t h e i r scores w e r e n o t u t i l i z e d in the
c o m p u t a t i o n of c o r r e l a t i o n s b e t w e e n
on i n t e l l i g e n c e

tests,

test scores a n d scores

r e a d i n g tests,

Of the 168 s t u d e n t s

a n d f a c t u a l tests.

to w h o m the t r y o u t tests w e r e

g i v e n 83 w e r e m a l e s a n d 8 5 w e r e f emales.
this g r o u p a t the b e g i n n i n g of the
17 y ears to 25 years;

T h e a ge r a n g e of

s p r i n g q u a r t e r was

the m e a n a g e was

18.76 years.

from

97
The tryout tests were given during each alt er na te
laboratory period during the term*
was one hour and fifty minutes

The laboratory period

in length.

Students were

permitted to work at their own rate of speed on these tests
and all students were all ow ed to finish all of the items on
all of the tests.

Some students finished as many as three

of the tryout tests during one period while others
only one or two per period.

completed

The students were instructed to

answer all items even if it was necessary to guess.

A l l of

the tests were scored on the basis of the total n u m b e r of
correct answers and no correction for chance was u se d in the
scoring.
As previously mentioned,

162 of the students c o m p l e t ­

ing the testing program h a d entered Michigan State College
in the fall of 194-9.

A t that time they had been g i ve n the

194-9 edition of the Amer i ca n Council on Education P s y c h o l o g ­
ical Examination, which purports to measure the linguistic
and quantitative factors of intelligence.

A composite score,

referred to as the total psychological score,

is obtained as

well as a score on the linguistic portion of the test and a
score on the quantitative portion of the test.

F or m Y of

the American Council on Education Rea di ng Comprehension Test
was administered at the same time.

This test yields a total

reading score, a vocabulary score, a speed of reading com­
prehension score and a level of reading comprehension score.

98
At the completion of the year course in Biological
Science a comprehensive examination covering the year's
work in biology is given to the students. This examination
is prepared by the Board of Examiners of Michigan State
College.

The score obtained by the student

on this examin­

ation determines the mark which he receives for the
year's work.

entire

The comprehensive examination scores were

obtained for each of the 168 students to whom the tryout
tests were administered.

In addition,

the comprehensive

examination papers of these students were rescored on the
basis of items which were purely factual and items involving
the ability to think scientifically. The latter items differ
from the items of the tryout tests in that they involve a
knowledge

of biological facts and principles.

Of a total of

300 items

in the comprehensive examination, 53

were purely

factual while 247 required some use of skills involved in
scientific thinking.
Although the student's mark in biological science is
determined entirely by his performance on the comprehensive
examination, his progress through the three terms of the
course is dependent upon the kind of work he does during the
year.

The work accomplished is reflected on the term-end

examinations which are constructed and directed by a com­
mittee composed of members of the department of Biological
Science.

The scores made by each of the 168 students on

their term-end examinations for the first and second terms

99
of the course were obtained.
The means and standard deviations were calculated for
each of the tryout tests and for the entire battery of tests
considered as a single test.

The reliabilities of each of

the tryout tests were calculated by correlating the scores
on the odd-numbered items with the scores on the even-numb­
ered items.

These correlations gave reliabilities of a test

half as long

as the actual tests.

The corrected reliabili­

ties of each

of the tryout tests were estimated by means of

the Spearman-Brown prophecy formula.

The reliability of the

test battery was calculated by one of the Kuder-Richardson
formulas.

3

The Kuder-Richardson formulas were designed to

overcome the disadvantages of test-retest, equivalent forms,
4
and split-half methods.
Adkins
states that they are super­
ior to other methods of determining the reliabilities of
tests.

The formula used in this study required only the n u m ­

ber of items

of the test, the mean of the test, and the stan­

dard deviation of the test.

It is well to note that there

are certain assumptions upon which this method rests.

These

assumptions are (1) that the test measures only one factor,
(2) that the intercorrelation of all items are equal, and
(3) that the items are equal in difficulty.

If the

^ Dorothy 0. Adkins, Construction and Analysis of
Achievement Tests. Washington:
U. S. Government Printing
Office.
1947.
P. 153-154.
A

Loc. oit.

assumptions are not met,

the v a l u e o b t a i n e d is an u n d e r e s t ­

imate of the reliability.

The v a l u e o b t a i n e d re pr esents

the

minimum reliability of the test for this g r o u p . .
Item analysis is the a n a l y z i n g of each item of a test
to determine its v alidity a n d difficulty.

Item ana ly si s data

were obtained for a l l items of a l l of the tryout tests.

Item

validity may be defined as a m e a s u r e of the item's correlation with a criterion.

5

The purpose of d e t e r m in in g the v a l i d ­

ity of the items is to identify items wh ich d i s c r im in a te well
Items difficulty is usually exp re ss e d as the percent of p e r ­
sons answering the item correctly.

S ince items a n s w e r e d c or ­

rectly by almost all of the students or by al m o s t none of the
students cannot have any functional v a lu e in an achieve me nt
test inasmuch as they do not serve to d is criminate b etween
students,
them.

it is generally co n s i d e r e d de sirable to eliminate

A d etailed discussion of the meth od s u s ed in the v a l ­

idation of the test items a n d in the c al cu la ti o n of the item
difficulties will be p r e s e nt ed in C h a p t e r V.
The scores on each tryout test were correlated with
the scores on each of the other tryout tests.

This was done

to determine whether a large degree of overlapping existed
between the tests and to determine w hether any tests might
be eliminated in the construction of the single test used to
measure the ability to think scientifically.

5 I b i d . . p. 180.

The scores on

101

each of the tryout tests were also correlated with the
quantitative score and the linguistic score on the American
Council on Education Psychological Examination and with the
total score on the reading test.

These correlations were

in reality measures of some phase of intelligence or of
reading ability.
The purpose of administering the tryout tests was to
identify good items to be used in the construction of a test
to measure the ability to think scientifically.

The tryout

tests went through two revisions. The first revision resulted
in a test, referred to as Test I, consisting of 150 items.
This test was too long to be administered in the hour and
fifty minute laboratory period, therefore twenty-five of the
poorer items were eliminated from it. This final form of the
test consisting of 125 items, is hereafter referred to as
Test IA.

Both Test I and Test IA have been called, The Abil­

ity to Think Scientifically.

In the construction of Test I

it was necessary, in most cases, to select blocks of items
from the tryout tests rather than individual items since
items were presented in blocks centering around a particular
problem of experiment.

The best blocks of items from each

of the tryout tests, as determined by item analysis, were
selected for inclusion in Test I*

Poor items, as identified

in the same manner, were eliminated from these blocks of
items unless they were necessary to the development of the
concept developed within the block of items.

A total of 150

102

Items were chosen to comprise Test I.
The sixth step involved the administration of Test I,
the determination of the mean, standard deviation, and reli­
ability of the test, and the analysis of the individual
items.

Test IA, the final form of the test, was constructed

from Test I by the elimination of 25 of the poorer items.
The sixth step also included the administration and statist­
ical analysis of this final form of the test.
Te 3t I was given in May,

1950, to 500 students who

had completed the three-term sequence of Biological Science.
This group has not previously been mentioned in this study.
Of this group 291 were males and 209 were females.

The age

range was from 17 years to 37 years. The mean age was 20.04
years.

Two hundred and sixty-four were freshmen who had en­

tered Michigan State College in the fall term of 1949 and who
had taken the 1949 edition of the American Council on Educa­
tion Psychological Examination and Form Y of the American
Council on Education Reading Comprehension Test at that time..
The remaining students were either freshmen who had taken
entrance examinations during the summer of 1949 or they were
sophomores,

juniors, or seniors.

These students had all

been given alternate forms of the American Council on Educa­
tion Psychological Examination and the American Council on
Education Reading Comprehension Test.

Correlations of

scores on Test I with scores on psychological examinations
and with scores on the reading test were therefore based on

103
the score of these 264 students who had taken the forms of
the latter tests given in the fall of 1949.

This was done

because it could not be assumed that scores on the various
forms of these tests were directly comparable, and because
raw scores were not available for any of the examinations
given prior to the fall of 1949.

Prior to this time only

percentiles had been available.
The mean and the standard deviation were calculated
for the group which completed Test I in the spring of 1950.
An estimate of the reliability of the test for this group
was determined by correlating the scores on the odd-numbered
items with the scores on the even-numbered items.

These cor­

relations were adjusted for the total test by means of the
Spearman-Brown formula.

A second method used to determine

the reliability of the test was the Kuder-Richardson formula.
This calculation was done to compare the reliability obtained
by the split-half method with a method which gives a minimum
reliability.

(This method will be discussed in greater detail

in Chapter V).

The test papers of the 500 students taking

Test I in May, 1950, were used for item analysis.

These

item analysis data were utilized in the construction of Test
IA.
In order to determine whether there was a difference
in the ability to think scientifically before and after the
completion of the course in Biological Science, Test I was
administered in September,

1950, to 240 students who had had

104
no biological science at the college level.

These students

were beginning their first term of the three-term sequence
of biological science.

This group was also different from

any previously mentioned.

86 were females.

Of this group 144 were males and

The age range was 17 years to 34 years,

with a mean age of 19.18 years.

The mean and the standard

deviation of the test scores were calculated for this group.
The reliabilities of the test were determined by the splithalf method and by the Kuder-Richardson formula.

As there

was no means of predicting the exact length of a test of
this nature to fulfill the time requirement of one hour and
fifty minutes, the number of items used was purely arbitrary.
The actual execution of the test indicated that it was too
long for all students to complete.

Of the 500 students tak­

ing the examination in May, 1950, 54 or 10.8 percent failed
to finish in the allotted time.
the test in September,

Of the 240 students taking

1950, 24 or 10 percent failed to com­

plete the test. Since the test was too long the poorer items,
as determined by item-analysis, were eliminated.

The remain­

ing items constituted Test IA.
This final form of the test, consisting of 125 items,
was administered to 330 students at the beginning of the
three-term sequence of biological science in September, 1950.
This is a different group from any previously mentioned in
this study, and included 182 males and 148 females. The age
range was from 16 years to 38 years with a mean of 18.62

105
years.

Thirteen, or 3.7 percent, did not complete the test.

The mean and standard deviation of the test was calculated
for this group.

The reliability of the test was estimated

for this group by correlating the scores made on the oddnumbered items with those made on the even-numbered items.
This correlation was corrected by the Spearman-Brown form­
ula.

The minimum reliability for this group was estimated

by means of the Kuder-Richardson formula.
The seventh step in the construction of the test was
its validation.

The most important characteristic of a test

is its validity

which may be defined as the extent to which

a test measures what it purports to measure.

7

Chapter VI is

devoted to a discussion of this characteristic of the test.
The curricular validity of the test was based on the follow­
ing considerations:

(1 ) designing the test to measure the

specific behaviors which attend the steps of scientific
thinking,

(2 ) submitting the test to qualified judges for

criticism, and (3 ) using free responses of students as foils
wherever feasible.
The test was validated statistically by correlating
total scores made on the battery of tryout tests with such
traits as (1 ) intelligence,

(2 ) reading ability, and (3 )

^ Herbert S. Hawkes, E. F. Lindquist and C. R. Mann,
The Construction and Use of Achievement Examinations.__
Cambridge, Mass.: Houghton Mifflin Company.
1936.
p. 21.
^ Adkins, ojc. c l t . . p. 160.

106

knowledge of biological facts.

As previously mentioned,

psychological examination scores and reading test scores
were available for 264 of the 500 students who took Test I,
The Ability to Think Scientifically, in the spring of 1950.
These scores were correlated with the scores made by these
students on Test I.
Another method of validating the test was the compar­
ison of the scores made by students on Test I at the b e g in ­
ning of the course in Biological Science with the scores
made by another group after taking three quarters of B i o ­
logical Science.

The assumptions underlying this comparison

will be discussed in Chapter VI.

Test IA was administered

to 136 students at the beginning and at the end of the first
quarter of the three-term Biological Science sequence.

The

scores made by these students at these two times were
compared.
Scores made by a group of 143 students on Test IA
were compared with ratings of these students on their abil­
ity to think scientifically.

The ratings were made by the

instructors who taught these students in Biological Science.
The rating sheet and the methods used to obtain scores from
these ratings and the statistical treatment of these data
will be discussed in detail in the chapter on the validation
of the test.

CHAPTER IV
THE DEVELOPMENT OF THE TEST
This
steps

chapter

ITEMS

is d e v o t e d to a d i s c u s s i o n o f t h o s e

in t he c o n s t r u c t i o n of the t e s t w h i c h p r e c e d e d a n d

i n c l u d e d the w r i t i n g of t h e p r e l i m i n a r y
u s e d in the t r y o u t

tests.

These

steps

u l a t i o n of the e d u c a t i o n a l o b j e c t i v e s

items which were
i n c l u d e d the f o r m ­

to b e

d e f i n i t i o n of t h e b e h a v i o r s w h i c h a t t e n d
the i d e n t i f i c a t i o n of s i t u a t i o n s
could be e x p e c t e d to d i s p l a y
ified in s t e p
appraise

two a n d

II,

scientific

major elements as

to

i t em s

designed

ident­
to

measured by
As

the t e s t was

d i s c u s s e d in C h a p t e r

I n v o l v e s a n u m b e r of e l e m e n t s .

The

outlined by Keeslar^ have been reworded

and are p r e s e n t e d he r e as
the a b i l i t y

to be

think scientifically.
thinking

of b e h a v i o r s

OF T H E E D U C A T I O N A L O B J E C T I V E S

The overall objective
to

of

students

identified.

THE FORMULATION

the a b i l i t y

the t y p e s

the

these objectives,

in w h i c h t h e

the w r i t i n g

the b e h a v i o r s

tested,

the m a j o r

objectives

i n v o l v e d in

think scientifically.

1.

The

ability

to

s ense a p r o b l e m .

2.

The

ability

to

st ate a p r o b l e m .

^ O r e o n K e e s l a r , ,fThe e l e m e n t s of s c i e n t i f i c m e t h o d .'1
S c i e n c e E d u c a t i o n . 2 9 5 2 7 5 - 2 7 8 , D e c e m b e r , 194-5*

3.

The a b i l i t y to del i mi t a problem.

4.

The a b i l i t y to r e c o g n i z e facts w h i c h a r e
r e l a t e d to the problem.

5.

The a b i l i t y to for mu la te h y p o t h e s e s .

6.

The a b i l i t y to plan e x p e r i m e n t s
hypotheses•

to test

7.

The a b i li ty to carry out experiments.

8.

The abi li t y to in terpret data.

9.
10.

The a b i l i t y to for mu la te g e n e r a l i z a t i o n s
b a s e d on data.
The a b i l i t y to ap ply g e n e r a l i z a t i o n s to
n e w situations.

S ome of the a b o v e a b i l i t i e s are creative,
critical, w h i l e others
aspects of sc ie ntific
of a pro bl em

involve b o t h c ri t i c a l a n d cre at iv e
thinking.

F o r example,

is a creative activity.

f ormulation of h yp ot he se s,
hypotheses

o t h e r s are

the sens in g

So also is the a c t u a l

but the d e t e c t i n g of illo g ic al

is a c ri ti c a l activity.

The p l a n n i n g of experi-

ments also has bo t h cre at i ve a n d c r i t ic al aspects. As Burke
points out,

2

there is o v e r l a p p i n g b e t w e e n cri ti ca l a n d c r e a t ­

ive thinking,

a n d the d e c i si on as

must be b a s e d on pra gm at ic

to w h e re to draw the line

considerations.

Thus, he in cluded

the draw in g of v a l i d inferences from data as critical t h i nk ­
ing since it may be m e a s u r e d by o bj ective tests.

The b e ­

haviors w h i c h h av e b e e n c o n s i d e r e d p r i m ar il y c ritical will

2

P a u l J. Burke, M Tes ti ng for cr itical thinking in
physics,"
A m e r i c a n J o u r n a l of P h y s i c s . 1 7 • 527-532,
December, 1949.

109
be discussed in detail in a later portion of this chapter.
The tests designed in this study have been limited to the
appraisal of the critical aspects of scientific thinking
because no method for evaluating the creative behaviors
was found in the literature, nor did the writer find it
possible to devise satisfactory methods for evaluating
these creative aspects of thinking.
According to Burke

3

critical thinking is an abstrac­

tion and can have concrete meaning only when applied to
some subject matter.

Therefore,

the behaviors which con­

stitute the elements of critical thinking must be thought
of in relation to some specific field; in this instance,
the field was biology.
THE DEFINITION OF THE BEHAVIORS
Methods used to determine the behaviors.

In order

to determine the kinds of behaviors attending the steps in
the scientific method several approaches were used.
The lists of steps in scientific thinking as preA

sented by Keeslar

c

and as presented in the 46th Yearbook,

both of which were reviewed in Chapter II, offered a source
3
^

Burke, loc. cit.

^ Keeslar, ojo. c i t . . pp. 273-278.
^ Science Education in American Schools. Forty-Sixth
Yearbook of the Society for the Study of Education, Part I.
pp. 145-147.
Chicago: University of Chicago Press, 1947.

110

for the definition of many of the behaviors involved in
scientific thinking.

The major steps constituted the prim­

ary objectives while the minor steps, in many cases,

implied

specific behaviors which could be measured.
A second source of behaviors was literature on tests
and test construction, committee reports on behaviors invol­
ved in scientific thinking, and reports of research on b e ­
haviors of persons doing scientific research.
In' his book on the construction of achievement tests,
Tyler

discussed tests to measure the ability to use the

scientific method and the ability to infer.

In these sec­

tions he described some of the behaviors involved. This was
a rather early piece of work in the area of definitions of
behaviors and was included here more for its historic inter­
est than for its value as a source of behaviors.
Hawkes, Lindquist and Mann

7

, in a chapter on examin­

ations in the natural sciences, discussed some of the behav­
iors which give evidence of the student’s ability to use
reliable sources of information, to recognize unsolved prob­
lems, to draw reasonable generalization from data, and to
plan experiments.
A very useful source of behaviors involved in

^ Ralph W. Tyler, Constructing Achievement T ests.
Columbus, Ohio:
Ohio State U n i v e r s i t y . 1 9 3 4 . pp. 24-30.
7

Herbert E. Hawkes, E. F. Lindquist and C. R. Mann,
The Construction and Use of Achievement Examinations.
Cambridge, Mass.:
Houghton Mifflin Company.
1936.
PP. 2 31-247.

Q
scientific thi nk in g was M S c l e n c e in G e n e r a l E d u c a t i o n .'1
portion of one c h a p t e r of this b o o k

A

is d e v o t e d to a dis cu ss

ion of the n a t u r e of r e f l e c t i v e thinking.

A n o ther chapter

is devo te d a l m o s t e n t i r e l y to the e v a l u a t i o n of s tudents
growth in r e f l e c t i v e thinking.

Situations are described

which show the k i nds of b e h a v i o r s e x p e c t e d of s t u d e n t s who
are p r o f i ci en t

in the a b i l i t y to t h in k r e f l e c t i v e l y .

objectives a n a l y z e d are:
define problems,

( 1 ) the a b i l i t y to d i s c o v e r a n d

(2 ) the a b i l i t y to o b s e r v e a c c u r a t e l y ,( 3 )

the a b i li ty to s e l ec t facts r e l e v a n t to a problem,
ability to c o l l ec t a n d o r g a n i z e facts,
draw inferences
proof,

The

from facts,

and (7 ) the a b i l i t y

(A)

the

(3 ) the a b i l i t y to

(6 ) the a b i l i t y

to r e c o g n i z e

to plan e x p e r i m e n t s

to test

hypotheses.
In the r e p o r t on the m e t h od s of e v a l u a t i n g s t u de nt
progress

in the E i g h t - Y e a r Study,

in det a il the b e h a v i o r s

S m i t h a n d T y l e r ^ d i s c us s

i n v o l v e d in the s t u d e n t s a b i l i t y to

interpret data a n d in some detail the b e h a v i o r s
an u n d e r s t a n d i n g of the n a t u r e of proof.
vol ve d in the a b i l i t y

i n v o l v e d in

The b e h a v i o r s

in­

to interpret data were d e r i v e d from

discussions of the c o m m i t t e e on the

i n t e r p r e t a t i o n of data.

® P r o g r e s s i v e E d u c a t i o n A s s o c i a t i o n , S c i e nc e in
General E d u c a t i o n . N e w York: D. A p p l e t o n - C e n t u r y Company.
193 8 . pp. 393-412.
^ E u g e n e R. Smith, R a l p h W. T y l e r a n d the E v a l u a t i o n
Staff, A p p r a i s i n g a n d R e c o r d i n g S t u d e n t P r o g r e s s . N e w York:
H a rp er a n d Br others.
1942.
p p . 38-41, 126-130•

The committee was comprised of a representative from each
school interested in this objective, and the members of the
Evaluation Staff of the Eight-Year Study.
committee was quite exhaustive.

The work of this

M o s t of the behaviors

listed under interpretation of data in the list of behaviors
presented in this thesis are either mentioned or implied in
Smith and Tyler's discussion of behaviors involved in their
discussion of the interpretation of data and their discuss­
ion on the nature of proof.
Joh ns on ,1^ in a discussion of h er test of straight
thinking, presents the kinds of behaviors which her test
purported to measure.

The major abilities discussed are:

(1) the ability to analyze a problem,
interpret data,

(2) the ability to

(3) the ability to evaluate arguments,

(4)

the ability to test hypotheses through reasoning, and (5)
the ability to recognize valid causal relationships.
The Committee on Research in Secondary School Science
focused its attention on the development of problem-solving
as the area in which research was needed.

The members of

this committee considered problem-solving to be a general
type of human behavior which included specific,

inter-

^ Alma Johnson, "An experimental study in analysis
and measurement of reflective thinking."
Speech Mo n o g r a p h s .
10:83-96, (Annual) 1943.
Committee on Research in Secondary School Science,
"Problem-solving as an objective of science teaching."
Science E d u c a t i o n . 33s192-195,
April,
1949.

113
related behaviors.

They a n a l y z e d these b eh aviors

in the

following areas:;
1.

Behaviors co nc er ne d with the i de nt if ic a ti on of
problems•

2.

Behaviors r e l at ed to the e st ablishment of facts
about the problem.

3.

Behaviors r e l a t e d to the formulation of
hypotheses.

4.

Behaviors rela te d to the testing of hypotheses.

5.

Behaviors concerned with the results of testing
hypotheses.

The behaviors

listed by this committee were i n c o r p o r ­

ated into the list of behaviors pre se nt ed in the present
s t ud y .
Burke,

12

in discuss i ng the d evelopment of test items

to test the ability to think scientifically,

says that before

any test of critical thinking could be constructed,

or before

any orderly attempt could b e made to teach the scientific
method,

the concept m us t be made more precise than it has

been previously.

H e presents an operational de fi nition c o n ­

sisting of a set of a b ou t 30 behaviors.
as a tentative definition.

He offers

the list

M o s t of the behaviors in his list

have been incorporated in the outline of behaviors presented
in this chapter.
A study sponsored by the A m e r i c a n Institution of

12 Burke, op. c i t . . pp. 27-32.

114
Research an d the A m e r ic an Council on E d u c at io n a nd supervised
by F l a n a g a n , ^ was made of the activity of r esearch workers
on the job,

to identify a n d define the characteristics of

effective scientific personnel,

in terms of specific o b s e r v a ­

tions and records of the work behavior of these personnel.
The m e th od used to obtain these behaviors was n o t the
opinions or beliefs of supervisors of research, but rather
the actual experiences,

in the form of reports of b e h a v i o r

which led to success or failure of individuals on var i ou s
parts of their

jobs.

Reports of what actually h a p p e n e d were

turned in to the committee.

Ab out 500 research workers were

contacted, who were a s ke d to describe critical incidents

in

which a person h a d been effective or Ineffective in research
techniques.

Upon the completion of the interviews the b e h a v ­

iors described were classified into groups of similar
beh av io rs .
On the basis of the classification of the behaviors a
comprehensive check list was prepared for the evaluation of
research workers.

Each area was divided into sub-areas.

In

addition to the check list which Included descriptions of
effective and ineffective behavior in each of the areas,
definitions of the areas were written to provide a general
description of the content of the area.
13

John 0. Flanagan, Critical Requirements for Research
P er s on ne l. Pittsburg: Am er ic an Institute for Research.
1949.
PP. 24-39.

115
Area I was the formulation of hypotheses and p rob­
lems.

This area was defined as stressing creative b e h a v ­

ior, and included the sensing and exploring of new problem
areas, delimiting problems and the proposing of hypotheses
to fit the available facts.

Within this area 21 effective

and eleven ineffective types of behaviors were described.
These made up the items of the check list.
Area II dealt with the planning and designing of an
investigation;

Area III was concerned with the conducting

of the investigation and Area IV was the interpretation of
research results.

Areas V, VI, VII, and VIII were not

related to scientific thinking but dealt with preparing
reports, administration of research, organizational respon­
sibility and personal responsibility and were not related
to the present investigation.
Although this work was outstanding in its thorough­
ness and although over 100 behaviors relating to research
ability were presented, most of them have not been Incorp­
orated, into the outline presented in this chapter because
many were creative activities, and many others were manipu­
lative activities.

The critical activities, however, were

Incorporated into the outline of behaviors which will be
presented later in this chapter.
A third source used in the Identification of b e h av ­
iors involved in scientific thinking was the interviewing

1 16

of some of the members of the department of Biological
Science at Michigan State College.

These persons were

asked to describe the behaviors they had observed in stud­
ents whom they believed to show considerable ability to
think scientifically, and the kinds of behaviors they had
observed in students who seemed to them to be very inferior
in their ability to think scientifically.

The major ab i l ­

ities mentioned in these interviews were the ability to
devise and evaluate experiments, and the ability to inter­
pret data.

Specific behaviors were described.

(These will

be discussed in greater detail in Chapter VI where a des­
cription of the ratings sheet devised for the validation of
the test will be discussed.)
The final source used in the definition of behaviors
was the experience of the writer as an instructor in the
course of Biological Science at Michigan State College and
her experience as a member of the committee responsible for
the construction of departmental examinations.
An Outline of the Be ha vi or s.

Below is an analysis,

in outline form, of the types of behaviors involved in
scientific thinking which it was believed could be measured
by objective tests.
inclusive list.

It is not assumed that this is an all-

It is, however, a synthesis of the b ehav­

iors Identified from the above mentioned sources.

117
1.00

Ability to recognize problems.

1.10
1.20
1.30
1.40
1.50

2.00

Ability to delimit a problem.

2.10
2.20
2.30
2.40
2.50
2.60
2.70

3.00

Ability to distinguish between major and minor
problems.
Ability to Isolate the single major problem or
single major idea in a problem.
Ability to see the relationship of minor problems
to the major problems.
Ability to distinguish between relevant and
irrelevant problems.
Ability to analyze the problem into its
essential parts.
Ability to concentrate on the main problem.
Ability to recognize the basic assumptions of a
problem.

Ability to recognize and accumulate facts related
to the solution of a problem.

3.10
3.20
3.30
3.40
3.50

4.00

Ability to recognize a problem or a perplexity
in the context of a paragraph or an article.
Ability to distinguish between a fact
(observation) and a perplexity or problem.
Ability to recognize a problem even when it
is stated in expository form rather than in
interogatory form.
Ability to distinguish a problem from a possible
solution to a problem (hypothesis) even when
the hypothesis is presented in interogatory form.
Ability to avoid becoming diverted from the
major problem into side Issues.

Ability to select the kind of information needed
to solve the problem.
Ability to recognize valid evidence.
Ability to differentiate between reliable and
unreliable sources of information.
Ability to select data pertinent to the solution
of the problem.
Ability to recognize the difference between data
pertinent to the solution of the problem and
that which is unrelated.

Ability to recognize an hypothesis.

118
4.10

Ability to distinguish an hypothesis from a
problem.
Ability to differentiate between a statement
that describes an observation and a statement
which is an hypothesis about the fact.
Ability to distinguish between an hypothesis as
a possible, solution to a problem and a
conclusion (probable solution to a problem).
Ability to recognize the tentativeness of an
hypothesis.

4.20
4.30
4.40

5.00

Ability to plan experiments to test hypotheses.

5.10

Ability to select the most reasonable hypothesis
to test.
5.20 Ability to differentiate between an uncontrolled
observation and an experiment involving controls.
5.30 Ability to recognize the fact that only one
factor in an experiment should be variable.
5.31 Ability to recognize what factors must be
controlled.
5.32
Ability to recognize the overall control.
5.33
Ability to recognize the partial controls.
5.34
Ability to recognize the variable factor.
5.35
Ability to understand why the overall control
was included in an experiment.
5.36 Ability to recognize the factor being held
constant in the overall control.
5.37 Ability to recognize the factors being held
constant in the partial, controls.
5.40 Ability to recognize experimental and technical
problems inherent in the experiment.
5.50 Ability to criticize faulty experiments when:
5.51 The experimental design was such that it
could not yield an answer to the problem.
5.52 The experiment was not designed to test
the specific hypothesis stated.
5.53 The method of collecting the data was
unreliable.
5.54 The data were not accurate.
5.55 The data were insufficient in number.
5.56 Proper controls were not included.
5.57 No controls were included.
5.00

Ability to carry out experiments.

6.10

Ability to recognize existence of errors in
measurement.

119
6.20

Ability to recognize when the precision of measure
ment given is warranted by the nature of the
problem.
6.30 Ability to make accurate observations.
6.31 Ability to observe differences in situations
which are similar.
6.32 Ability to observe similarities in situations
which are different.
6.40 Ability to organize facts into table, graphs, etc.
for easy Interpretation.
7.00
7.10

Ability to interpret data.

Ability to handle certain basic skills necessary
to the interpretation of data.
7.11 Ability to read tables and graphs.
7.12 Ability to perform simple computations.
7.20 Ability to evaluate relevancy of data.
7.21 Ability to recognize hypothesis and conclusions
contradicted by the data.
7.22 Ability to recognize hypotheses and conclusions
which are unrelated to the data.
7.23 Ability to select the hypothesis from a group
of hypotheses which most adequately explains
the data.
7.24 Ability to recognize facts which support an
hypothesis or a conclusion.
7.25 Ability to recognize facts which contradict an
hypothesis or a conclusion.
7.30 Ability to differentiate between facts and
Inferences.
7.31
Ability to differentiate between an observation
and a conclusion drawn from the observation.
7.32
Ability to differentiate a conclusion from an
hypothesis.
7.33 Ability to distinguish an assumption upon which
a conclusion depends and the conclusion itself.
7.34 Ability to distinguish a fact from an assumption.
7.40 Ability to recognize the limitations of data.
7.41 Ability to differentiate between what is
established by the data alone and what is
implied by the data.
7.42 Ability to recognize that a statement which goes
beyond the data cannot be absolutely true.
7.43
Ability torecognize that generalizations from
results of an experiment can only be extended
to new situations when there is considerable
similarity between the situations.
7.44 Ability to confine definite conclusions to the
evidence at hand.

120

7.50

Ability to consider as possibly true or probably
true inferences b a s ed on the data.
7.51 Ability to make inference on the basis of trends.
7.52 Ability to extrapolate.
7.53 A bility to interpolate.
7.54 Ability not to be so overcautious that all s t a t e ­
ments which go beyond the data are re jected
because of insufficient evidence.
7.60 Ability to perceive relationships in data.
7.61 Ability to make comparisons.
7.62 Ability to see element in common to several items
of data.
7.63 Ability to recognize prevailing tendencies and
trends in data.
7.64 Ability to recognize that when two things vary
together that there may be a relationship
between them, but does not assign cause and
effect Judgments on the basis of this r e l a t i o n ­
ship.
7.65 Ability to formulate reasonable generalizations
b ased upon the data.
7.70 Ability to recognize the nature of evidence.
7 .71 Ability to recognize the difference between
direct a n d Indirect evidence.
7.72 Ability to recognize a statement which is given
as evidence as not being evidence when the
statement contradicts the conclusion.
7.73 Ability to recognize a statement which is given
as evidences as not being evidence when the
statement is unrelated to the conclusion.
7 .74 Ability to recognize evidence for an inference
and to choose such evidence from a series of
statements.
7.75 Ability to recognize the validity of the evidence
used to support conclusions.
7.80 Ability to recognize the assumptions involved in
the formulation of hypotheses and conclusions.
7.81 Ability to recognize assumptions which go bey on d
the data but which are essential to the f o r mu ­
lation of an hypothesis.
7.82 A bility to recognize assumptions which must be
maintained in the drawing of a conclusion.
7.83 Ability to recognize assumptions which can be
checked experimentally.
7.84 Ability to recognize invalid assumptions.
8.00

Ability to apply generalizations to new situations.

12 1

8.10

Ability to refrain from applying generalizations
to new situations when the new situation does
not closely parallel the experimental situation.
Ability to be aware of the tentativeness of pre­
dictions about new situations even when there is
a close parallel between the two situations.
Ability to recognize the assumptions which must
be made in applying a generalization to a new
situation.

8.20

8.30

THE LOCATION OF THE SOURCE MATERIALS
FROM WHICH THE ITEMS COULD BE CONSTRUCTED
The third step in the development of the test was
the identification of situations In which the student could
be expected to display the types of behaviors implied in
the steps of scientific thinking.

Each major objective was

considered and situations were considered which might be
utilized in the construction of items to test the abilities
involved in these objectives.
There were certain requirements which should be met
in the selection of the material.

It was considered reason­

able that in all cases the material should be (1 ) of some
interest to the student,

(2 ) free from technical terms,

(3 )

comprehensible to the student who had had no training in
biology,

(4) on biological subjects, and (5) obtained from

valid sources.
It was thought that the abilities involved in the
recognition of a problem, an hypothesis, a fact, and a con­
clusion could be discovered by having a student actually
locate them in his reading.

In the development of an

122

objective test it seemed that one way in which these behav­
iors could be measured was by the presentation of short
essays or paragraphs which contained problems, etc. and
having the student identify them.

With this in mind, popu­

lar and scientific journals were Inspected for descriptions
of experiments or observations which contained problems,
hypotheses, experiments, observations and conclusions.
These were judged by the following criteria:
1.

They should be of such a nature that they
could be condensed into a paragraph or two.

2.

They should each contain a problem or problems,
hypotheses, observations and experiments, and
a conclusion.

It was tentatively assumed that a student's ability
to delimit a problem might be measured by giving him a com­
prehensive problem so stated that it could not be solved
unless it were broken down .into a series of minor problems.
Such problems were located in textbooks, research journals,
and by interview of members of the Department of Biological
Science of Michigan State College.

The criteria used in

the selection of the problems were:
1.

Unsolved problems were chosen so that the student
could not know the solution to the problem.

2.

The problems should be broad major problems.

In order to measure a student's ability to plan exper
iments it was necessary to locate problems and hypotheses
already under investigation or those which might be investi­
gated, thus limiting the possibility of the student having

had experience with the problem.

Some of these were found

in research Journals and some were obtained by interviews
with staff members of the Department of Biological Science
at Michigan State College.

The criteria by which they were

Judged were:
1.

They should be of such a nature that no technical
apparatus would be needed to design an experiment.

2.

They should be within the experience of the
student; that is, the general problem should deal
with situations which could reasonably be assumed
to be familiar to him.

In order to test a student's understanding of experi­
mental design actual experiments were located in which the
student could identify controls, partial controls, etc.
These experiments were located in scientific Journals.

It

was assumed that the experiments should be:

1.

Entirely new to the students.

2.

On a subject with which the student was familiar.

These assumptions were met by choosing experiments
from technical Journals which the average student would not
have read, and by choosing experiments which were about
rather common subjects, such as food, plants, etc.
It was thought that the ability to organize data
could be tested by giving students raw data to graph.
search of
material.
data were:

A

textbooks and journals produced this type of
The criteria used to judge the usability of the

124
1.

The data must be in units familiar to the
student.

2.

The data must be such that only few points
would be needed to plot a curve so that a
number of curves could be plotted in a mi n i ­
mum of t i m e .

Scientific journals and advanced textbooks were exam­
ined for data which the student could Interpret.

It was

assumed that these data should be entirely new to the stud­
ent.
THS CONSTRUCTION OF THE EVALUATION INSTRUMENTS
The fourth step in the development of the test was
the selection of promising techniques, and the inventing of
new techniques to obtain evidence concerning the attainment
of the objectives.

Previous tests designed to test certain

phases of scientific thinking were examined.

No tests for

biology were found which measured all of the objectives
listed.

There were only a few which measured any of the ob­

jectives.

New techniques for appraising the desired behav­

iors were devised, paragraphs from sources were rewritten,
students were presented with some of the materials identified
in step three for free responses which were culled and class­
ified.

On the basis of this work nine tryout tests were de­

vised.

The following discussion gives in more detail the

method used in the construction of each of the Instruments
and the objectives and types of behavior which each was
intended to evaluate.

125
In the development of the test Items certain require­
ments regarding mechanics were set up. The first requirement
was that the test be easily scored.

A five-response machine

scored answer sheet was chosen as the most appropriate for
the purposes of this test.
test form.

A second consideration was the

A five-choice key was selected as the most suit­

able form inasmuch as a single key for each test would enable
the student to answer a rather large number of items in a
fairly short time, thus increasing the reliability of the
test.

He would become ac q u a i n t e d with the key and thus r e ­

duce the reading time of the test.

Sach tryout test had a

separate key.
After the test items h a d b een constructed they were
given to five experts for keying,

criticism and suggestion.

The items were revised on the basis of these Judgments, and
assembled into tryout tests.

(See Ap pendix I.)

The first tryout test, her ea ft er referred to as Test
A, was designed to evaluate the student's ability to recog­
nize problems, hypotheses,
clusions.

experimental conditions and con­

Five paragraphs were written,

each on a different

subject and each based on short articles from popular maga­
zines.

Certain parts of the paragraphs were underlined;

these underlined portions,
the item number,

preceded by a number indicating

constituted the 7 4- items of the test.

The directions given to the student,

the key for the

test and a oortion of one of the paragraphs follows:

126

TEST A
SOME STEPS IN SCIENTIFIC THINKING
This test is designed to measure your ability to
differentiate phases of thinking.
These steps include
major problems or perplexities, possible solutions to
problems, observations which are not results of experi­
mentation but rather preliminary observations, results
of experimentation, and conclusions.
Certain parts of the paragraph are underlined, and
each underlined item is a question.
Choose the proper re­
sponse from the key and blacken the appropriate space on
the answer sheet.

Key
1.
2.
3.
4.
5.

A major problem (either stated or implied).
Hypothesis (possible solution to problem).
Results of experimentation.
Observations (not experimental).
Conclusion (probable solution to problem).

Ever since the days of Hippocrates one of medicine's
big mysteries has been (1 ) the bodily process that transforms
disease into death.

With a special type of equipment which

makes blood vessels transparent and three dimensional under
a microscope, one investigator began examining the blood of
healthy animals.

The (2) blood cells of the healthy animals

are separate and move rapidly.

One day while observing the

blood of a monkey dying of malaria,

this researcher saw that

the (3 ) blood was flowing slowly.
Test B . designed to test the student's ability to
delimit problems, was constructed from free responses of
students.

For example, several facts about colds were given

to the students.

They were asked to read the paragraph and

127
3tate briefly the problem or problems presented.
problem was: What causes colds?

The major

In constructing the test

this problem was followed by other problems which the stud­
ents had suggested.

Four major problems were presented;

each of which was followed by a series of questions..
was a total of 67 such questions in this tryout test.

There
A

portion of Test B follows:
TEST B
THE DELIMITATION OF PROBLEMS
This portion of the test is designed to test your
ability to delimit a problem.
A problem is presented.
This is followed by a series of questions.
Rate the
questions according to the following key.

1.
2.
3.
4.
5.
PROBLEM:

Key
This question must be answered in order
to solve the problem.
This question if answered mi^ht be useful
in the solution of the problem.
The answer to this, question, though related
to the problem, would not help in the solu­
tion to the problem.
This question is completely unrelated to
the problem.
This question if answered in the affirmative
is a basic assumption of the problem.
What causes colds?

QUESTIONS:
1.

Do all people have colds?

Test G was designed to measure the student’s under­
standing of the experimental method.

This test was also

constructed on the basis of free responses from students.
They were presented with a problem and hypotheses and were

128
instructed to design an experiment to test each hypothesis
presented.

For example:

Problem:

requirements of sprouting seeds?

What are some of the
Hypothesis:

Oxygen is a

requirement of sprouting seeds.
The papers were cut so that the experiments designed
to test a single

hypothesis could be sorted and these were

placed in piles according to the key which was used in Test C.
Some of the responses were satisfactory experiments,
were faulty for one reason or another,

others

some were faulty for

several reasons. Those which were faulty in more than one way
were discarded.

Ten or eleven responses for each problem were

chosen as the test items.

Six series of experiments with a

total of 62 items constituted Test C, a portion of which is
presented here:
TEST C
EXPERIMENTAL PROCEDURES
This test is designed to measure your ability to
recognize faulty experimental procedures and to test your
ability to select the best of a series of experiments. In
each case a problem and a possible solution to the problem
(an hypothesis) are presented.
In each case the experi­
ments were designed by students to test the hypotheses.
Judge each experiment according to the following key.

Ml
1.
2.
3.
4.

This experiment is satisfactory.
This experiment is unsatisfactory because it
lacks a control or comparison.
This experiment is unsatisfactory because the
control or comparison is faulty.
This experiment is unsatisfactory because it
is unrelated to the hypothesis.

129
5.

None of the above - the experiment or situation
is unsatisfactory for reasons other than those
listed in 2, 3, and A.

PROBLEM:

What are some of the requirements for the
sprouting of seeds?

HYPOTHESIS:

Oxygen is a requirement for the sprouting of
seeds

1. Plant one seed in a container where oxygen is av ail­
able and place another seed in a container where all
oxygen has been removed.
Keep all other conditions
the s a m e .
Test D . designed to measure the student's ability to
organize data, contained twenty items similar to the one
illustrated here:
TEST D
ORGANIZATION OF DATA
This test is designed to test your ability to organize
data.
Select from the key below the curve which best fits
the data.
If none of the curves fit the data mark space five
on your answer sheet.
Key

5. none of
the curves.
1.

The horizontal axis represents temperature.
The vertical
axis represents the amount of Substance A derived from
Substance B.
Temperature

10°C.
25°C.
35°C.

60°C.

Amount of Substance A
A
7
9
1A

grams
grams
grams
grams

130
Teat E is similar to one described by Engelhart and
14
Lewis.
It was designed to measure the student's under­
standing of the relation of facts to the solution of a prob­
lem.

All of the 74 items of this test were related to the

overall problem:

What factors are involved in the trans­

mission and development of Infantile Paralysis (Poliomyeli­
tis)?

Six hypotheses were presented.

Each hypothesis was

followed by a series of facts which constituted the items.
The data for the test were obtained from articles on infan­
tile paralysis in research journals and medical journals.
A portion of Test E follows:
TEST E
EVALUATION OF HYPOTHESES
This test Is designed to measure your understanding
of the relation of facts to the solution of a problem.
The
overall problem involved in this test is presented.
This
is followed by a series of possible solutions to the problem
(hypotheses). After each hypothesis there are a number of
items, all of which are true statements of fact.
Determine
how the statement is related to the hypothesis and mark each
statement according to the key which follows the hypothesis.
GENERAL PROBLEM:
What factors are Involved in the trans­
mission and development of Infantile Paralysis
(Poliomyelitis)?
HYPOTHESIS I:
In man the disease is contracted by direct
contact with persons having the disease.

14
Max D. Engelhart and Hugh B. Lewis, ’*An attempt
to measure scientific thinking." Educational and Psycholog­
ical Measurement. 1:289-294, Third quarter, 1941.

131

Mx
For items 1 through 11 mark space if the item offers:
1. Direct evidence in support of the hypothesis.
"d. Indirect evidence in support of the hypothesis.
3. Evidence which has no bearing on the hypothesis.
4.
Indirect evidence against the hypothesis.
3. Direct evidence against the hypothesis.
1. Monkeys free from the disease almost never catch in­
fantile paralysis from infected monkeys.
2. Most strains of infantile paralysis virus can be
transferred from man only to monkeys and apes and not
to other animals.
12.

What is the status of hypothesis I?
1.
It is true.
2.
It is probably true.
3.
It is false.
4.
It is probably false.
5.
The data are contradictory, hence its truth or
falsity cannot be Judged.
Test F was designed to measure the student's ability

to interpret data and to test his understanding of experi­
mentation.

The directions for this tryout test and a por­

tion of the test are given below:
TEST F
E X P E R I M E N T A T I O N A N D THE I N T E R P R E T A T I O N OF DATA

This test was designed to measure your ability to in­
terpret data and to test your understanding of experimenta­
tion.
In each case the numbers in the first column are the
numbers which you will use as your answer.
Thus the table
presented becomes both the source of data and your key for
the questions which follow it.
In each case where a test
tube number or group number is called for the one which gives
positive evidence for the statement should be given.
Below
this the control or comparison is called for.
This is the
test tube or group number of the data which offers a compari­
son.
For example:

132
1.
2.

L e a f in d a r k
L e a f in l i g h t

- n o starch,
- starch.

L i g h t is n e c e s s a r y f or the p r o d u c t i o n of starch.
Y o u w o u l d m a r k s p a c e 2 b e c a u s e this is the p o s i t i v e e v i ­
dence, b u t it w o u l d b e m e a n i n g l e s s if it w e r e n o t c o m ­
pared w i t h the le a f in the dark.
T h e r e f o r e , the f o l l o w ­
ing item, HW h a t is the c o n t r o l (c o m p a r i s o n ) for i t e m I ? ’1 ,
w o uld be m a r k e d s p a c e 1.
Items 1 t h r o u g h 15 r e f e r to the d a t a p r e s e n t e d
below.
S o m e t e s t t ubes w e r e set u p a n d ea c h c o n t a i n e d
1 g r a m of fat. T h e y w e r e m a r k e d 1, 2, 3, 4, a n d 5.
Mark
each item a c c o r d i n g to t h e t e s t tu b e n u m b e r c a l l e d for,.
V a r i o u s s u b s t a n c e s w e r e a d d e d to the tubes c o n t a i n i n g fat.
A l l s u b s t a n c e s w e r e d i s s o l v e d in w a t e r b e f o r e they w e r e
a d d e d to the fat.
A l l t e s t tubes w e r e k e p t a t 85
F.
(Water b o i l s at 2 1 2 ° F.)
F o r t est tu b e 5, S u b s t a n c e A
was b o i l e d a n d t h e n a l l o w e d to c o o l b e f o r e it was a d d e d
to the fat.
Test T ub e
Number
1
2
3
4
3

Content

of tube

F a t plus S u b s t a n c e
F a t plus S u b s t a n c e
p lus S u b s t a n c e
F a t plus W a t e r
F a t plus S u b s t a n c e
F a t plus S u b s t a n c e
(boiled)

Amt. of S u b s t a n c e B
p r e s e n t a f t e r 24 h o u r s
A
A
C
C
A

.1 g r a m
.5 g r a m
.0 g r a m
.0 g r a m
.0 g r a m

1.

3-ive the n u m b e r of the test t ube w h i c h acts as a
c o n t r o l (c o m p a r i s o n ) for the e n t i r e ex pe ri me nt .

2.

3-ive the n u m b e r of the tube w h i c h g i v e s e v i d e n c e
that f a t does n o t b r e a k d o w n s p o n t a n e o u s l y into
S u b s t a n c e B in 2 4 h o ur s .

3.

3 i v e the n u m b e r of the tube u s e d to s h ow that a
t e m p e r a t u r e of 8 5 ° F. was n o t s u f f i c i e n t to c a use
fat to b e b r o k e n d o w n into S u b s t a n c e B.

4.

3 i v e the te s t tube n u m b e r of the tube w h i ch g i v e s
e v i d e n c e t ha t S u b s t a n c e A is the a c t i v e s u b s t a n c e
in the b r e a k d o w n of fat to S u b s t a n c e B.

5.

3 i v e the t est tu b e n u m b e r of the tube w h i ch
c o n t r o l ( c o m p a ri so n) for i t em § 4.

is the

M

133
Five such series of items were included in Test F.
The total number of items was 72.
Test G is somewhat like the test described by

IS

Teichman ^ which was constructed to evaluate conclusions
in terms of reasonableness, sufficiency and pertinent data.
This test was constructed from free responses of students.
A problem was presented. This was followed by data.
example:

A student was interested in developing a test for

a certain substance.
tive.

For

In all 100 cases his test was posi­

The students were requested to state a conclusion.

In some instances, as in the above, there was no control
included so no conclusion was really possible.

Some of the

students realized thisj others wrote conclusions.

The

answers were sorted into stacks according to the key used
for Test 3-. . The most appropriate responses were chosen as
the 100 items for the test.
TEST 0
DRAWING OF CONCLUSIONS
This test was designed to measure your ability to
make conclusions. When facts are analyzed and studied
they sometimes yield evidence which help in the solution
of a problem.
However, any conclusion must be checked
before it can be accepted.
The following key includes
four ways in which conclusions may be faulty.
Each of
15

Louis Teichman, ’’The ability of science students
to make conclusions.” Science Education. 28: 268-279,
December, 194-4.

134
the items present a question or problem, a brief descrip­
tion of an experiment and one or more conclusions drawn
from the experiment.
Each experiment was repeated many
times. Read each problem, experiment and the conclusions.
Where several conclusions are given evaluate each conclu­
sion separately.
Is the conclusion tentatively justified
by the data?
If so, mark space 1 on your answer sheet.
If the conclusion is not Justified determine whether 2,
3, 4, or 5 in the key is the best reason for it being
faulty and mark the proper space on your answer sheet.
Key
The conclusion is:
1. Tentatively Justified.
2. Unjustified because it does not answer the
pro bl e m.3. Unjustified because the experiment lacks a
control (comparison).
4. Unjustified because the data are faulty or
inadequate, though a control was included.
5. Unjustified because it is contradicted by
the d a t a .
PROBLEM:
A student was interested in developing a test
for a certain type of substance.
In all 100 cases
his test was positive.
1.

He concluded that the test was a specific test for
the substance.
The final tryout test was in reality two tests,

Test H and Test J . combined into one.
contained 168 items.

Test H was devised to measure the

student's ability to interpret data.
to the students.

In all, these tests

Data were presented

These were followed by a series of items

which were possible interpretations, restatements, explana­
tions, extensions, and comparisons of the data.
constituted Test J.

These items

135
TEST H
I NTERPRETATION OF DATA
This test was d e s i g n e d to m e a s u r e y ou r a b i li ty
to interpret data.
F o l l o w i n g the d ata y o u will find a
number of statements.
Y o u are to a s s u m e that the data
as presented are true.
E v a l ua te e ach s t a te me nt a c c o r d ­
ing to the f ol lowing key a n d m ark the a p p r o p r i a t e space
on your answer sheet.

Ml
1.
2.

3.
4.

5.

True:

The data alone a r e s uf f i c i e n t to show
that the sta te me nt is true.
Probably true:
The data indicate that the
sta te m en t is p r o b a b l y true, that it
is logical on the basis of the data
b ut the data are not su f f i c i e n t to
say that it is d e f i n i t e l y true.
In su ff ic i en t evidence:
There are no data to
indicate w h e t h e r there is any degree of
truth or falsity in the statement.
Pr obably false:
The data indicate that the
s t a t em en t is probably false, that is,
it is n o t logical on the basis of the
data but the data are n o t
s uf ficient to
say that it is d e f i n i t e l y false.
False:
The data alone are suf fi ci en t to show
that the st atement is false.

In freezing of ve ge ta bl es the common practice for
both commercial a nd h om e frozen v e g e t a b l e s is to sc ald the
vegetables first by placing them in b o i l i n g water for two
to three minutes.
The following data were ob tained in an
experiment w hich m e a s u r e d the amou nt s of V i t a m i n G in fresh
vegetables, scalded v e g e ta bl es b e f o r e freezing, a nd v e g e ­
tables frozen for six months.
One g r ou p of the frozen
vegetables was frozen with ou t first scalding, the other
group was first scalded.
The V i t a m i n G content of the
frozen v e g e t a b l e s was d e t e r mi n ed b e fo r e a n d
af ter they were
cooked.
A l l figures indicate the a m o u n t of V i t a m i n G in mg.
per 100 cc.

Veg et ab le
Fresh
Chard (greens)
60
Spinach
82
Peas
29
Green be ans
34
Lima beans
.21

Scalded
37
43
21
29
20

Frozen
Unscalded
S c a ld ed
Cooked
Cooked
Raw
Raw
14
24
2
20
16
1
10
27
16
20
14
10
17
23
13
25
14
20
18
26

136
1.
2.

Scalding of all vegetables causes destruction of some
of the Vitamin 0 content of the vegetables.
Spinach is a good source of Vitamin C.
TEST J
GENERALIZATIONS AND ASSUMPTIONS

Items 16 through 21 are a re-evaluation of some of
the items 1 through 15. Re-read items 1, 3, 9, 11, 13 and 15
and determine whether they are generalizations, extensions
of data, explanations of the data or merely restatements of
the data, etc. Answer each according to the following key:
1.
2.
3.
4.
5.
16.

Key
A generalization; that is the data says it is
true for this situation, a generalization says
it is true for all similar situations.
The data Indicates a trend which if continued
in either direction would make the statement
true.
An explanation of the data in terms of cause
and eff ec t ..
A restatement of results.
None of the above.

Item 1.

This phase of the test Is designed to measure your
understanding of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion Is not necessarily
justified by the d a t a ) . The statements which follow the
conclusions are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 1 through 15.
1.
2.
3.
4.
5.

Key
An assumption which must be made to make the
conclusion valid (true).
An assumption which if made would make the
conclusion false.
An assumption which has no relation to the
validity (truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

Conclusion 1: The breakdown of Vitamin C proceeds spon­
taneously but is a relatively slow process at low tempera­
ture .
22.
23.

Vitamin C is a stable substance.
There is order in the universe.

137
A N A L Y S I S OF T H E T R Y O U T T E S T S
IN T E R M S O F T HE B E H A V I O R S I N V O L V E D
Table
behaviors

I has been

p r e p a r e d to

outlined earlier

in t his

out t e st s w a s

designed

are p r e s e n t e d

in t h e t a b l e .

h a v i o r s r e w o r d e d in t o

to m e a s u r e .

Test
Test
Test
Test
Test
Test

A
B
C
D
E
F

T e s t 3Test H
Test J
An
was m a d e

to f a c i l i t a t e

test battery.

table

valid sources

of data,

ments.

we r e

has been made

the b e ­

tests have
titles

are

the t ab l e .

indicates

the b e h a v i o r s

observable

covered by

seen

these

in the

t h a t a f e w of the

i n v a l i d d ata,

a n d the a b i l i t y

in p e r s o n s

thinking

the tests,

s u c h as
valid and

the
in­

to c a r r y o u t e x p e r i ­

omitted chiefly because

to t e a c h

that a n a t t emp t

of s c i e n t i f i c

It w i l l b e

to r e c o g n i z e v a l i d a n d

These

The

of

try­

objectives

descriptive

the r e a d i n g

critical aspects

behaviors were not well
ability

this

to c o v e r m o s t of

preliminary

the

S o m e S t e p s in S c i e n t i f i c T h i n k i n g
T he D e l i m i t a t i o n of P r o b l e m s
Experimental Procedures
O r g a n i z a t i o n of D a t a
E v a l u a t i o n of H y p o t h e s e s
E x p e r i m e n t a t i o n a n d the I n t e r p r e t a t i o n
of D a t a
D r a w i n g of C o n c l u s i o n s
I n t e r p r e t a t i o n of D a t a
Generalizations and Assumptions

i n s p e c t i o n of

employing the

The m a j o r

These are fo l l o w e d by

b u t the

of

c h a p t e r e a c h o f t he

shorter statements.

previously b e e n described,
presented here

Indicate which

two o b j e c t i v e s

little attempt
in the c o u r s e

Biological Science at Michigan State College.

in

138
TABLE I
BEHAVIORS MEASURED BY THE TRYOUT TESTS

Behaviors______________
___________________________________ A
Recognizes Problems
1.10 Recognizes problems
in context
1.20 Distinguishes
fact
from problem
1.30 Recognizes problem
in expository form
1.40 Distinguishes
problem
from hypothesis
1.50 Distinguishes problem
from side issues
Delimits Problem
2.10 Distinguishes
major
problem from minor ones
2.20 Isolates major problem
or major idea
2.30 Sees relation of minor
problems to major one
2.40 Distinguishes relevant
from irrelevant problems
2.50 Analyses problem into
essential parts
2.60 Concentrates on main
problem
2.70 Recognizes basic assumptlons of problem

B

O

Tests___________
D E F O H __ J.

X
X
X
X
X
X

X

Recognizes Facts Related
to solution of problem
3.10 Selects information
needed to solve problem
3.20 Recognizes valid evidence
3.30 Recognizes reliable
sources of information
3.40 Selects data pertinent
to solution of problem
3.50 Distinguishes between pertinent and unrelated data

X

X
X
X
X
X
X
X

X

X

X

X X

X
X

X X

X

139
TABLE I (continued)
Behaviors
__________________________________ A
Recognizes hypotheses
4.10 Distinguishes hypothesis
from problem
4.20 Differentiates observation from hypothesis
4.30 Distinguishes hypothesis
from conclusion
4.40 Recognizes tentativeness
of hypothesis

B

G

D

Testa__________
E F G - H J

X
X
X

X X

X

X X

X

X X

Plans Experiments
5.10 Selects proper
hypothesis to test
5.20 Differentiates observaX
tion from experiment
5.30 Uses single variable
factor
5.31 Controls proper factors
5.32 Recognizes overall
control
5.33 Recognizes partial
control
5.34 Recognizes variable
factor
5.35 Understands reason
for overall control
5.36 Recognizes constant
factor of overall control
5.37 Recognizes constant
factor of partial control
5.40 Recognizes problems
inherent in experiment
5.50 Criticizes faulty
experiments when
5.51 Not designed to
answer problem
5.52 Not designed to
test hypothesis
5.53 Methods were not reliable
5.54 Data were not accurate
5.55 Data were insufficient
in number
5.56 Proper controls were
not included
5.57 No controls were included

X

X X

X
X

X X
X
X
X
X
X
X
X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

140

TABLE I (continued)
Behaviors
______________________________________ A
Garries out experiments
6.10 Recognizes m ea su re m en t
errors
6.20 Recognizes precision
of measurement necessary
6.30 Makes accurate obs er va ­
tions
6.31 Observes differences
in similar situations
6.32 Observes similarities in
different situations
6.40 Organizes facts for
interpretation
Interprets data
7.10 Handles skills necessary
to interpretation
7.11 Gan read tables
a n d graphs
7.12 Gan perform simple
computations
7.20 Evaluates relevancy
of data
7.21 Recognizes inferences
contradicted by data
7.22 Recognizes inferences
unrelated to data
7.23 Selects b est hypothesis
to explain data
7.24 Recognizes facts
supporting inference
7.25 Recognizes facts contradicting inference
7.30 Distinguishes facts
X
from inferences
7.31 Distinguishes observaX
tion from conclusion
7.32 Distinguishes hypothesis X
from conclusion
7.33 Distinguishes assumpX
tion from conclusion
7.34 Distinguishes fact
X
from assumption

B

G

D

Tests__________
E F O H J

X
X
X
X

X
X

X

X

X

X

X

X
X

X
X

X

X

X X

X

X X
X

X X

X

X X

X
X X
X X

X

X

141

TABLE I (continued)
Behaviors
1
7.40 Recognizes limitations
of data
7.41 Distinguishes data from
what is implied by data
7.42 Recognizes inferences
as not absolutely true
7.43 Recognizes limitations
in applying generaliza­
tions
7.44 Confines definite con­
clusions to evidence
7.50 Makes inferences based
on data
7.51 Makes inferences based
on trends
7.52 Makes inferences based
on extrapolations
7.53 Makes inferences based
on interpolations
7.54 Is not too over­
cautious
7.60 Perceives relationships
in data
7.61 Makes comparisons in
data
7.62 Sees common elements
in data
7.63 Recognizes tendencies
and trends
7.64 Suspends cause and
effect Judgments
7.70 Recognizes nature of
evidence
7.71 Distinguishes direct
from indirect evidence
7.72 Recognizes evidence which
contradicts conclusion
7.73 Recognizes evidence
unrelated to conclusion
7.74 Recognizes evidence
for inferences
7.75 Recognizes validity of
evidence

A B O

Tests
D E F

0

H

J

X

X

X

X

X

X

X

X

X

X
X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X
X

X

X

X

X

X

X
X
X

X

X

X

X

X

X

X

X

X

X

X
X

X

X

142
TABLE I (continued)
Behaviors
______________________________________ A
7.80 Recognizes assumptions
underlying inferences
7.81 Recognizes essential
assumptions
7.82 Recognizes assumptions
underlying conclusions
7.83 Recognizes testable
assumptions
7.84 Recognizes invalid
assumptions
Applies generalizations
8.10 Is cautious in a pplying
generalizations
8.20 Is aware of tentativeness
of applications
8.30 Recognizes assumptions
underlying applications

B

O
X
X

D

E

F

Tests____
G - H J
X X
X X
X

X

X

X

X X
X
X
X

X

X

X

CHAPTER V
THE STATISTICAL ANALYSES
OF THE TESTS AND THE TEST ITEMS
This chapter is devoted to a presentation of the
statistical analyses of the tests and the test items.
The means, standard deviations, and reliabilities of each
of the tryout tests are presented.

Item analysis data for

the items in the tryout tests have been summarized.

Inter­

correlation of the tryout test scores have been calculated
and data concerning the degree of overlapping of the tryout
tests are discussed.

This discussion is followed by analyses

of Tests I and IA and by the item analysis data on these
tests.
METHODS USED IN ITEM-ANALYSIS
Item validity may be defined as a measure of the
item's correlation with a criterion.^"

In this case the cri­

terion used was the scores on the tryout test which included
the particular item for which the validity was to be deter­
mined.

The purpose of determining item validity is to ident­

ify good items to be retained and poor items to be eliminated
or revised. Poor items are generally defined as those lacking
in discriminative power while good items are discriminatory

^ Dorothy C. Adkins, Construction and Analysis of
Achievement T e s t s . Washington:
U. S. Government Printing
Office.
1947.
P. 180.

144
G-ood items are those missed more often by those persons who
have a low degree of the quality being measured,

(in this

case, the ability to think scientifically), and answered
correctly more often by persons having much of this same
quality, whereas poor items are answered correctly by the
same number of persons,

irrespective of their ability.

Item

validity may be estimated by any one of several methods.
Test items are usually validated by comparing the proportion
of persons having high scores on the test who answer the item
correctly with the proportion of persons having low scores on
2
the test who answer the item correctly.
Kelley has shown
that the best estimates of the correlation of the item with
the total test score can be obtained by using the responses
of the upper 27 percent or total score and the lower 27 per­
cent on total score of the group for the calculations.
The estimated item correlation was determined by two
methods, both of which required the determination of the per­
cent in the upper 27 percent of the group and the percent in
the lower 27 percent of the group answering the items cor­
rectly.

One method was devised by Flanagan.-^ By this method
2

Truman L. Kelly, "The selection of upper and lower
groups for the validation of test items."
Journal of
Sducational Psychology. 30:17-24, January, 1939*
^ John G. Flanagan, "General Considerations in the
selection of test items and a short method of estimating the
product-moment coefficient from data at the tails of the
distribution."
Journal of Educational Psychology.
30:674-680, December, 1939.

145
validity la read from a chart, the chart being entered by
the percent of successes of each of the groups.

The second

method used for the estimates of the discrimination power
of the items was that of Davis.

4

This method also involves

the use of the upper and lower 27 percent of the group.

A

table is entered by percent of successes of each group; how­
ever, the percent successes are calculated differently by
Davis' than by Flanagan's method.

Straight percent successes

are used in the Flanagan method whereas the method devised
by Davis Involves a correction for guessing.

In addition,

Davis' method yields a figure which he calls the discrimina­
tion index, which is a linear function of the hyperbolic arc
tangent of the product-moment coefficient of correlation.
He believes that this figure is truly comparable from item
to item whereas the coefficient Itself is not.

The coeffi­

cient of correlation cannot Justifiably be averaged. However,
a table is included in his monograph whereby the discrimina­
tion index can be converted to a coefficient of correlation
5
for comparison with results obtained by other methods.
Item difficulties,

stated in terms of the percent of

persons answering the items correctly, were estimated by the
4

Frederick B. Davis, Item-Analvsl3 Data. Cambridge:
Graduate School of Education, Harvard University.
1946.
pp. 8-15.
^ Ibid. . pp. 14-15.

146
method proposed by Davis.^

These were estimated from the

percents of successes of the upper and lower 27 percents of
the group.

Davis suggests the use of a difficulty index,

which like the discrimination index, is read from the table
included in his monograph.

Like his discrimination index,

the difficulty index is corrected for chance.

Because item

difficulties, when expressed as percents passing the item,
cannot Justifiably be averaged he devised a difficulty index
which is a linear scale.

The actual percents can be obtained

by use of a table to convert the difficulty indices to per­
cents passing the item.
ANALYSES OF TRYOUT TESTS
The tryout tests were administered to 168 students in
the spring term of 1950.

The tests were scored on the basis

of total number of items answered correctly.

No correction

was made for guessing since students were instructed to
answer all items.
Analysis of Test A - Some Steps in Scientific Thinking.
Test A, designed to measure an understanding of some of the
steps of scientific thinking as described in Chapter IV (see
page 126), was comprised of a total of 74 items.
on this test ranged from 24 to 67.

The scores

The mean and its standard

error were 50.60 ± 0.62 and the standard deviation and its
standard error, 8.13 ± 0.44.

^ Ibid., pp. 2-3.

The reliability as estimated

147
by the split-half method and adjusted by the Sp>earman-Brown
prophecy formula, was .87 ±.02.
Complete item-analysis data for Test A are presented
in Table XXXVI of Appendix I.

The range of item discrimina­

tion, as expressed in terms of estimated coefficients of cor­
relation with total test score, was from .00 to .77.

The

range in terms of Davis' discrimination indices was from 0
to 61.

As previously mentioned,

the coefficients of correla­

tion cannot justifiably be averaged, whereas the discrimina­
tion indices can be averaged.
was 29.45.

The mean discrimination index

Davis’ Table of conversion of indices to equiva­

lent values of coefficients of correlation gave an estimated
mean correlation of .45.
The range of difficulty of the items of Test A was from
0 to 95 percent.
0 to 85.

The range of indices of difficulty was from

Since the difficulty index is subject to statistical

treatment these were averaged giving a mean of 55.51. This was
equivalent to 60 percent of the group answering these items
correctly.
The item analysis data, and the mean of the test indi­
cate that the test was rather easy; the item discrimination
data gave evidence that as a whole the items discriminated
quite adequately between those students having considerable
understanding of the steps of scientific thinking and those
not having such an understanding.

The reliability coefficient

of the test indicated that the test measured whatever quality
it was measuring quite consistently.
are summarized in the following table.

The data for this test

148
TA3LS II
PERTINENT DATA FOR TEST A
..............................
74
Number of Items
..............................
24 - 67
Range of scores
Mean
................ ..........................
50.60 t 0.62
............................
8.13 1 0.44
Standard deviation
Reliability coefficient
......................... 87 ± .02
Range of discrimination indices
............
0-61
...................
29.45
Mean discrimination index
Range of difficulty Indices
.................
0-85
Mean difficulty index
.......................
55.51
Analysis of Test B - The Delimitation of Problems.
Test B, devised to measure the ability to delimit problems
(see page 127, Chapter IV), as presented originally contained
67 items.

Preliminary item analysis revealed that 17 of the

items were either lacking in discriminatory power or were
negatively discriminating. Since negatively discriminating
items reduce the reliability of a test it was deemed advis­
able to eliminate these 17 items, rescore the papers and on
this basis recalculate the item difficulties and item dis­
criminatory values.

The scores on the fifty items remaining

ranged from 12 to 3 3 J the mean was 22.46 ± 0.37.

The stand­

ard deviation was 4.77 ± 0.26 while the estimated reliability
coefficient was

.61 ± .05.

Complete item analysis data for

this test are presented in Table XXXVII of Appendix I.

The

range of item discrimination when expressed as an estimated
coefficient of correlation was from .04 to .83.
of discrimination indices was from 4 to 72.

The range

The mean dis­

crimination index was 27.08, which when converted to an
estimated coefficient of correlation was .44.
The range of difficulty expressed in percent of

149
successes for Test B was from 4 to 88,
indices of d i f f ic ul ty was
39.40.

the me a n b e in g

When converted to percent of successes this becam e

30 percent.
cesses

from 11 to 75,

The range of the

The mean of the test a n d the percent of s u c ­

indicated that this test was r e l a t iv el y difficult.

The standard deviation and the r a n ge of scores also g ave
evidence that the items were not a l l fu nc tioning to d i s c r i m ­
inate b e t w e e n those with superior a b i l i t y to delimit problems
and those

inferior in this ability.

presented in

Table X XX V I I

A n inspection of the data

(Appendix

I) and of the test items

(Appendix I) shows that the most d i s c r i m i n a t i n g items of the
test were those

involving the r e c o g n i t i o n of the basic a s s u m p ­

tions upon w hich the problem itself rested.

This point

seemed to be

of sufficient interest

to present these items

separately.

The following table gives the discr im in at io n

and the d if ficulty indices of the seven items of the test
which p u r p o rt ed to m e a su re the student*s a b i l it y to recognize
assumptions u n d e r ly in g problems.
TABLE III
ITEM A N A L Y S I S DATA ON THE SEVEN ITEMS CF TEST B W H I C H
M E A S U R E D A B I L I T Y T O .R E C O G N I Z E A S S U M P T I O N S UNDERLYING- PROBLEMS

Item N u m b e r

D is cr im in at io n Index

5
9
21
28
38
45
58
mean

48
48
72
39
53
52
63
53.57

Difficulty Index
45
42
44
46
44
34
40
42.14

These Items were no more difficult than the othe r
items of the test,

in fact,

they were a n s w e r e d c or rectly

slightly more often than was the average item, but they
were much more discriminating.
part,

They accounted,

to a large

for the r a t h e r h i g h mean di sc r imination value of the

items of the test.

The average e st imated coeffi ci en t of

correlation of these items with the total test score was
.71 while the mean difficulty of these items when ex p r e s s e d
as percent of successes was 35 percent.

The p er tinent data

for Test B are presented in Table IV.
TABLE IV
PERT IIIENT BATA FOR TEST B
Number of items
..................................
50
Range of scores
...................................
12 - 33
Kean
................................................
22.46 t 0.37
Standard d ev iation
.................................
4.77 £ 0.26
Reliability coeffic ie nt
............................. 61 t .05
Range of di sc r i m i n a t i o n indices
..
............
4 - 72
Kean discr im in at i on index
.......................
27.08
Range of diffic ul ty indices
.....................
11 - 75
Kean difficulty index
............................
39.40
A na ly s i s of Test C - Experimental P r o c e d u r e s .

Test G,

designed to m e a s u r e an un de rstanding of experimental p r o c e d ­
ures

(see page 128,

Chapter IV), was comprised of 62 items.

The scores ran ge d from 15 to 44;
26.30

t

the mean of the test was

0.41, and the standard deviation was 5.31 £ 0.29.

The reliability,

as estimated by the split-half m e t h o d and

adjusted by means of the Spearman-Brown prophecy formula was
.59 i .05.
The item analysis data for Test 0 are presented in

151
Table XXXVIII of Appendix I.

The range of estimated corre­

lations of the items with the total test score was from -.17
to .78, the range of discrimination indices was from -10 to
63*

The mean discrimination index was 21.52 which when

changed to an estimated coefficient of correlation was .34-.
The range of difficulty indices was from 0 to 59; the mean
difficulty index was 34.37, or in terms of percent of suc­
cess, 23 percent.

This low percent of success and the low

mean of the test both testify to the difficulty of this par­
ticular test. The large number of non-functioning items, that
is; those with low discriminating power and those answered
correctly by sufficiently few students to be accounted for
on the basis of chance alone, plus the negatively discriminat
ing items, may account for the rather low reliability of Test
C.

However, there was a sufficiently large number of satis

factory items in the test to warrant the use of some of the
items in the construction of Test I, The Ability to Think
Scientifically.

Table V is concerned with the pertinent

data on Test G.
TABLE V
PERTINENT DATA FOR TEST G
Number of items
..............
Range of scores
..............
Mean
..........................
Standard deviation
...........
Reliability coefficient
.....
Range of discrimination indices
Mean discrimination index
Range of difficulty indices
Mean difficulty index
.......

62
15 - 44
26.30 t 0.41
5.31 t 0.29
.59 i .05
-10 - 63

21.52
0-59
34.37

152
Analysis of Test D - Organization of D a t a .
designed to measure ability to organize data
Chapter IV), was comprised of 20 items.
test ranged from one to ten.
10.94

-

0 , 3 2 ,

Test D,

(see page 1 2 9 ,

The scores on this

The mean of the test was

the standard deviation was 4.12 ± 0.23.

test had a reliability of .93

-

The

*01 as determined by the

method of split-halves and correction by the Spearman-Brown
formula.
The item analysis data for Test D are presented in
Table XXXIX of Appendix I.

The range of item discrimina­

tions, as expressed by an estimated coefficient of corre­
lation with the total test score, was from .14 to .90; the
range of discrimination indices was from 22 to 90.

The

mean discrimination index was 52.60 which has a correspond­
ing value in terms of coefficient of correlation of .7 0 .
The range of difficulty indices was from 22 to 55, the mean
being 45.90.

This value corresponds to 42 percent successes.

The item analysis data and the mean of the test indi­
cate that the test was of average difficulty.
were unusually discriminating.

The items

As previously mentioned,

the tryout test scores were used as the criteria for deter­
mining item validity.

Since a test score is simply the sum

of the scores on individual items the correlation between
items and test score is related to the inter-correlations

153
of individual test items.
item validity

A s p o i n t e d out by Conrad,"^ high

is an indication that the items are h i g h l y

consistent or homogenous with o t h e r items of the test, and
if all of the items are d i s c r i m i n a t i n g it means that there
is internal consistency or h o m o g e n e i t y of the entire test.
Such internal consistency results in a h i g h split-half r e ­
liability coefficient.

That T es t D h a d c onsiderable inter n­

al consistency is shown by the h i g h
high reliability of the test.

An

item validity a n d the

inspection of the test

itself also gives evidence of its internal consistence,
since the items were all very similar.

A n inspection of

Table I in Chapter IV shows that this test was d e s i g n e d to
test a very

limited range of behaviors.

From the standpoint

of item analysis data and test reliability,
most successful of the tryout tests.

Test D was the

However,

the fact that

it tested a very narr ow range of ab il it i es limited its u s e ­
fulness as a measure of the ability
since this ability

to think scientifically,

includes a wide range of abilities as

shown by the analysis of be haviors

involved in scientific

thinking as presented in Chapter IV.

Table VI presents a

summary of the pertinent data for Test D.

7

H e r b er t 3. Conrad, C h a ra ct er is ti c s and Use of
Item-Analvsis D a t a . A m e r i c a n P sy c h o l o g i c a l Association,
Psychological M o n o g r a o h s : d e n e r a l a n d Applied.
No. 295.
1948. p. 15.

154
TABLE VI
PERTINENT DATA FOR TEST D
umber of items
..............
Range of scores
Mean
..........................
Standard deviation
...........
Reliability coefficient
.....
Range of discrimination indices
Mean discrimination index
Range of difficulty Indices
Mean difficulty index
........

20
1-19
10.94 ± 0.32
4.12 ± 0.23
.93 i .01

22

-

90

52.60

22

-

55

45.90

Analysis of Test E - Evaluation of Hypotheses. Test E
was designed to measure the ability to evaluate hypotheses
(see page 130, Chapter IV) and was comprised of 74 items.
The scores on this test ranged from 15 to 53.
the test was 34.37
6.38 t 0.35.

-

The mean of

0.49 and the standard deviation was

The estimated reliability as calculated by the

split-half method and adjusted by the Spearman-Brown formula
was .71 - .04.
The item analysis data for this test are presented in
Table XXXX of Appendix I.

The range of item discriminations

expressed in estimated coefficients of correlation of the
items

with the total test score was from .00 to .71» the

range

of discrimination indices was from 0 to 54.

The mean

discrimination index was -24.50 which, when expressed in
terms of estimated coefficients of correlation, was .38. The
range

of difficulty indices was from 0 - 77; the mean was

40.57.

This gave a value of 32

percent of successes.

percent when expressed as

155
The items were, as a whole, moderately successful as
evidenced by the mean discrimination index.

However,

the

test was somewhat difficult as shown by the fact that the
mean of the test was less than half of the total possible
points and also by the relatively low mean difficulty index.
However,

this was also true of most of the tryout tests.

Table VII presents a summary of the pertinent data for Test E.
TABLE VII
PERTINENT DATA FOR TEST E
Number of items
..............
Range of scores
..............
Mean
...........................
Standard deviation
.......
Reliability coefficient
.....
Range of discrimination indices
Mean discrimination index
Range of difficulty indices
Mean difficulty index
........

74
15 - 55
54.57 t 0.49
6.58 t 0.55
.71 t .04
0-54
24.6
0-77
40.57

Analysis of Test F - Experimentation and Interpreta­
tion of D a t a .

Test F was designed to measure the ability to

recognize experimental controls and the ability to interpret
data (see page 151, Chapter IV).
ranged from IS to 62.

The scores on this test

The total number of items was 72. The

mean of Test F was 47.S5 ± 0.66; the standard deviation was
3.48 i 0.46.

The estimated reliability was .89

-

.02.

Item analysis data for Test F are presented in Table
XKXXI of Appendix I.

The range of coefficients of correla­

tion with total test scores ranged from .00 to .75.

The dis­

crimination indices ranged from 0 to 59; the mean was 50.66.

156
This gave an e s t i m a t e d m e a n
items wi t h t o ta l score of
from 0 to 100;

c o e f f i c i e n t of c o r r e l a t i o n of

.47.

The item d i f f i c u l t i e s r a n g e d

the d i f f i c u l t y indices also r a n g e d f rom 0 to

100, the m e an was 55*13.

This gave a mean item d i f f i c u l t y

of 59 percent of successes,
V/ith the ex c e p t i o n of Test D,

this test was one of

the m ost s u c c e s s f u l tests of the tryou t b a t t e r y as e v i d e n c e d
by a r e l a t i v e l y h i g h re liability,
ity,

The test was

a n d by the h i g h item v a l i d ­

so mewhat easier than most of the tests of

the tryout b a t t e r y as shown by the m ea n of the test a n d the
item difficulty.

A

summary of the p er tinent data for Test F

is p r e s e n t e d in the f o l lo wi ng table.
T A BL E V I I I
P E R T I N E N T DATA F O R TEST F
...............................
72
Number of items
Range of scores
...... ........................
18 - 62
Mean
........................... ................
47,.85 * 0,66
Standard d e v i a t i o n
.............................
8.48 * 0.46
R e l i a bi l it y c o e f f i c i e n t
......................... 89 ± *02
Range of d i s c r i m i n a t i o n indices
............
0 - 5 9
Mean d i s c r i m i n a t i o n index
...................
30.66
Range of d i f f i c u l t y indices
................
0 - 100
Mean d i f f i c u l t y index
........................
55*13
Analysis

of Test G- - Draw in g of C o n c l u s i o n s .

T est 0,

a h u n d r e d item test, was d e s i g n e d to measure the abi li ty to
recognize log i ca l conclusions

(see page 133,

C h a p te r IV).

The scores on this test r a n g e d from 6 to 64.
38.01 ± .92;

The m ean was

the st andard dev ia ti on was 11.95 * 0.65*

estimated r e l i a b i l i t y of Test 3- was

The

.90 - .01.

Item a n a l y s i s data for this test are p re sented in

157
Table XXXXII of Appendix I.
-.07 to .88.

Item validities ranged from

Discrimination indices ranged from -4 to 80;

the mean discrimination index was 31.82.

This figure repre­

sents a mean correlation of .48 of the items with the total
test score.

The item'difficulties ranged from 0 to 89 per­

cent of successes and the difficulty indices was from 0 to
?6.

The mean difficulty index was 32.54 or an average of

20 percent of success.
The test mean and the percent successes indicate that
this was a very difficult test.

However, the test seemed to

offer considerable promise since the reliability of the test
was high and the items were on the average quite discriminat­
ing. Table IX presents a summary of the pertinent data for
Test G-,
TABLE IX
PERTINENT DATA FOR TEST 0
Number of items
..............
Range of scores
..............
Mean
..........................
Standard deviation
...........
Reliability coefficient
.....
Range of discrimination indices
Mean discrimination index
Range of difficulty indices
Mean difficulty index
........

100
6-64
38.01 ± 0.92
11.95 ± 0.65

.90

±

.01

-4 - 80

31.82
0 - 7 6
32.54

Analysis of Test H - Interpretation of D ata.

Test H

and Test J were presented to the students as a single test of
158 items (see page 135, Chapter IV).

However, for the pur­

poses of analysis this single test was considered as two
tests; Test H, Interpretation of Data and Test J,

158
generalizations and Assumptions.

The 75 items of the 168

item test which were answered by the key: true, probably true,
insufficient data, probably false, and false, constituted
Test H.

The range of scores for this test was from 16 to 48.

The mean of the test was 32.19 * 0.49 and the standard devia­
tion was 6.38 * 0.35*

The estimated reliability was .70 * .04.

Complete item analysis data on Test H are presented in
Table XXXXIII of Appendix I.

The range of item discrimina­

tions expressed as an estimated coefficient of correlation
with the total test score was from -.27 to .76.

The discrim­

ination indices ranged from -17 to 60 , resulting in a mean of
24.69.

This corresponds to an estimated coefficient of corre­

lation with the total test score of .39*
The range of item difficulties was from 0 to 89 per­
cent of successes.

The range of indices of difficulty was

from 0 to 76 giving a mean difficulty index of 35.69 and 25
percent success on the items.

This figure and the mean of

the test gave evidence that the test as a whole was quite
difficult.

A summary of the pertinent data for Test H is

given in Table X.
TABL3 X
PSRTINStfT DATA FOR TEST H
Mumber of items
........................
75
Range of scores
.............................
16 - 48
he an
.........................................
32.19 * 0.49
Standard deviation
...........................
6.38 - 0.35
Reliability coefficient
....................... 70 - •04
............
^
Range of discrimination indices
Mean discrimination index
..................
24.69
Range of difficulty indices
................
0-76
Mean difficulty Index
.................... .
35.69

159
Analysis of Test J - Generalizations and Assumptions.
Test J, consisting of 95 items of the 168 items which con­
stituted the combination Tests H and J, was designed to meas­
ure an understanding of generalizations and assumptions. The
scores on this test ranged from 16 to 59.

The mean of Test J

was 37.37 - 0.71 while the standard deviation was 9*31 £ 0.51
and the estimated reliability of the test was .81 - .03.
Complete item analysis data for Test J are presented in
Table XXXXIV of Appendix I. The range of item validity values
was from -.04 to .81.
0 to 68.

The discrimination indices ranged from

The mean discrimination index was 25*76.

This is

equivalent to an estimated coefficient of correlation of .40.
Tho item difficulties ranged from 0 to 66 in terms of percents
answering the item correctly.

The range of difficulty indices

was 0 to 59 and the mean was 34.62.

This figure corresponds

to a value of 23 percent when converted into percent passing
the item.
The mean of the test and mean item difficulty both test
ifled that this test, like Test H, was quite difficult.

Table

XI presents a summary of the pertinent data for Test J.
TABLE XI
PERTINENT DATA FOR TEST J
............................
93
Number of items
Range of scores
.............................
16 - 59
.........................................
37.37 * 0.71
Mean
Standard deviation
...........................
9.31
* 0.51
Reliability coefficient
........................ 81 ± .03
Range of discrimination indices
............
0 - 68
Mean discrimination index
....................
25.76
Range of difficulty indices
................
0 - 59
Mean difficulty index
.......................
34.62

160
The data on the means,
liabilities

Test B,

The two least r e l ia b le tests were

w h i c h p u r p o r t e d to m e a s u r e the abi li ty

it problems a nd

ize data,

to d e l i m ­

(2) Te s t C, w h ic h was d e s i g n e d to m e a s u r e an

u nd erstanding of e x p e r i m e n t a l design.
reliable.

and re­

for all of the tests of the try ou t b a t t e r y are

summarized in Table XII.
(1)

s t a n d a r d deviations,

This test,

Test D was the most-

d e s i g n e d to meas ur e abil it y to o r g a n ­

c o n t a i n e d items which probably

tested a v e r y na rr o w

range of a b i l i t y and items wh ich were all v er y similar.
A, pur po r ti ng
thinking,

to m e a s u r e k no wledge of steps of scientific

Test F, d e s i g n e d to m e a s u r e abil it y to interpret

data and an u n d e r s t a n d i n g of controls,
to measure abi li ty
1 ^\ o
-v M p

to draw conclusions,

and Test G-, d e s i g n e d
were all fairly re-

•

T A BL E XII
C OM P A R I S O N OF MEANS, S T A N D A R D DEVIATIONS,
A N D R E L I A B I L I T I E S OF THE TRYOUT TESTS

No. of
Items
a
7^
-I

F
or
TO
J

Test

74
50
62
20
74
72
100
75
95

Mean
50.60
22.46
26.30
10.94
34.37
47.85
38.01
32.19
37.37

±
±

t
*
±
±
±
±
±

.62
.37
.41
.32
.49
.66
.92
.49
.71

S t a n da r d
D ev iation
+ .44
8.13
+
.26
4.77
+
5.31
.29
+
4.12
.23
6.38 t .35
8.48 + .46
+ .65
11.95
6.38 ± .35
+ .51
9.31

Reliability
.87
.61
.59
.93
.71
.89
.90
.70
.81

+
+
±
+
+
+
+
+

.02
.05
.05
.01
.04
.02
.01
.04
.03

161
A summary of Item analysis data for all of the tests
of the tryout hattery is presented in Table XIII.

Inspection

of this table reveals that the mean item discrimination in­
dices were all above the criterion value of 20 suggested by
o
Davis.
Test D, the test to measure ability to organize
data, had the highest mean discrimination index of any of
the tests.

Tests A, F, and G-, judged on the basis of mean
i

discrimination indices, were the next most successful tests,
lest C,

Judged on the same basis, was the poorest.

It is of

interest to note that the rank order of the mean discrimina­
tion indices is very similar to the rank order of the reli­
abilities of the tests.
TABLE XIII
COMPARISON OF MEAN ITEM VALIDITIES AND
MEAN ITEM DIFFICULTIES OF THE TRIGUT TESTS

Test

Mean
Discrimination
Coefficient

A
3
.-1
W
D
E
F
'I
Lf
li
J

.45
.44
.34
.70
.38
.47
.48
.39
.40

Mean
Discrimination
Index
29.45
27.08
21.52
52.60
24.60
30.66
31.82
24.69
25.76

Q
Davis, op. pit.,

p. 15*

Mean
Percent
Success

60
30
23
42
32
59
20
25
23

Mean
Difficulty
Index
55.51
39.40
34.37
45.90
40.57
.55.13
32.54
35.69
34.62

162
The mean difficulty
55.51,
cult.

Indices ranged from 32.54 to

indicating that the tests were all relatively diffi­
A criticism of the tests as a whole might be that

they were a little too difficult for the group for which
they were intended.
Analysis of tryout tests considered as a single test.
In all there were 620 items used in the determination of the
scores

on the total tryout battery.

from 183 to 399.

The mean

The range of scores was

for the entire battery of tests

was 291.12 ± 3.48, while the standard deviation was 44.22 *
2.26.

The minimum reliability of the test, as estimated by

the Kuder-Richardson^ formula, was .92 * .01 for this group
of students.

Table AIV presents a summary of the pertinent

data for the tryout test battery.
TABLE XIV
PERTINENT DATA FOR THE TRYOUT TEST BATTERY
.............................
640
Number of items
Range of scores
.............................
183 - 399
Mean
.........................................
291.12 ± 3.48
..........................
44.22 ± 2.26
Standard deviation
Reliability coefficient
........................ 92 * .01
Intercorrelation of tr.vout test scores.

In order to

determine whether there was sufficient overlapping in the
tests to justify the elimination of any of the types of item3
presented in the tryout tests in the preparation of the final
form of the test, intercorrelations were calculated for all
of the tryout tests.

These intercorrelations are presented

^ Adkins, ojc. clt. . p. 154.

163
in Table XV.

The standard errors of these correlations were

small; they ranged from .05 to .08.
TABLE XV
INTERG0RRE1ATIONS OF TRYOUT TEST SCORES
Tests
Tests
A*
B
C

A

B

C

D

E

F

G

H

J

.18

.34

.27

.28

.37

.34

.44

.44

.21

.22

.30

.32

.18

.16

.11

.22

.26

.39

.32

.35

.33

.26

.26

.28

.29

.14

.47

.50

.45

.41

.47

.47

.45

.50

.31

D
E
F
r\
U
*
H

.59

J
A, Steps in Scientific Thinking. B, Delimitation
of Problems.
C, Experimental Procedures. D, Organization
of Data.
E, Evaluation of Hypotheses.
F, Experimentation
and the Interpretation of Data.
G, Drawing of Conclusions.
H, Interpretation of Data.
J, Generalizations and Assump­
tions.
These data show that Test D, the test devised to meas­
ure ability to organize data, had a low correlation with all
of the other tests of the battery.

Tests H and J, the tests

devised to measure interpretation of data, and the ability to
recognize generalizations and assumptions respectively, which
were presented to the students as a single test, had the high
est intercorrelation of any of the tests.

Was this due to

164
the fact that the same subject matt er was used for both
tests?

Or was It due to the fact that an understanding of

generalizations and as su mp t io ns was necessary for correct
interpretation of data?

The data presented are not such

that they suggest possible answers to these questions.
The correlation bet we en two tests is considered to
be lowered if the test scores are u n r e l i a b l e . ^ In order to
estimate the correlation betw ee n the true scores of two tests
a correction known as the correction for attenuation

11

is

frequently made which takes the unreliability of both tests
into account.

This correction gives the maximum correlation

which could be obt ai ne d bet we en the two test scores if both
measures were perfectly reliable;

that is, if the r eliabil­

ity coefficient of each test was 1.00.
mind, however,

It must be kept in

that this is a theoretical value.

The inter­

correlations corrected for attenuation are given in Table XVI.
A comparison of Tables XV a n d XVI reveals the fact
that all of the correlations have been increased by the correctipn for attenuation.

The comparison also shows that the

corrections of tests which were quite reliable, as Test D,
were increased much less than tests which were rather unre­
liable,

like Tests B a nd G.

In addition,

it can be seen that

the lower correlations were increased less than the higher

10 H enry S. G-arrett, Statistics in Psychology and
E d u c a t i o n . N e w York: Longmans, G-reen and Company.
1947.
P. 396.
11 L o c . cit.

165
correlations.
TABLE XVI
INTERCORRELATIONS OF TRYOUT TEST SCORES
CORRECTED FOR ATTENUATION

Tests
Tests
A*

A

B

C

D

E

F

.25

.48

.30

.35

.42

.39

.56

.52

.35

.29

.45

.44

.24

.24

.17

.30

.40

.53

.44

.55

.48

.32

.29

.30

.36

.16

.59

.63

.63

.54

.53

.59

.53

.63

.36

B
C
D
S
F
Or
H

Gr ■ H

J

.73

J
* A, Steps in Scientific Thinking.
B, Delimitation
of Problems.
C, Experimental Procedures.
D, Organization
of Data.
E, Evaluation of Hypotheses.
F, Experimentation
and the Interpretation of Data. G-, Drawing of Conclusions.
H, Interpretation of Data.
J, Generalizations and Assump­
tions .
Since the purpose of these correlations was to deter­
mine whether there was sufficient overlapping of factors in
the tests to warrant the omission of certain of these types
of items in the preparation of the final form of the test,
the degree of overlapping was determined by the coefficient
of determination.12

This figure is obtained by squaring the

12 Ibid.. p. 338.

c oefficient of correla ti o n.

In ord er to obtain a f i gu re

rep r e s e n t i n g the m a x i m u m overlap,

the c oefficients of c o r ­

relation c o r r e c t e d for a t t e n u a t i o n were used.

The c o e f ­

ficient of d e t e r m i n a t i o n denotes the percent of v a r i a n c e
in one test a s s o c i a t e d w i t h the o t h e r test.
is u s u al ly e x p r e s s e d as a percent.

This figure

For example,

the c o e f ­

ficient of d e t e r m i n a t i o n b e t we en T e s t A a n d Test B Is
which means t hat 6 p e r c e n t of the v a r i an c e of T es t A
a s s o c i a t e d w i t h Test B.

.06,
is

The co efficients of d e t e r m i n a t i o n

for the t r y ou t tests are g i v e n in Table XVII.
T A B L E XVII
C O E F F I C I E N T S OF D E T E R M I N A T I O N OF THE TRYOUT TESTS

A*
B
C
D
E
F
0
H

A

B

C

D

S

F

.06

.23

.09

.12

.18

.12

.08

.20

.19

.09

.16
.10

H

J

H
VJI

Tests

•

Tests

.31

.27

.06

.*06

.03

.28

.19

.30

.23

.08

.09

.13

.03

.35

.40

.40

.29

.28

.35

.28

.40

.13

0

.53

J
A, S t e p s in S c i e n t i f i c Thinking.
B, D e l i m i t a t i o n
of Problems.
C, E x p e r i m e n t a l Procedures.
D, O r g a n i z a t i o n
of Data.
E, E v a l u a t i o n of Hypotheses.
F, E x p e r i m e n t a t i o n
and the I n t e r p r e t a t i o n of Data.
G-, Draw in g of Conclusions.
H, I n t e r p r e t a t i o n of Data.
J, G-eneralizations a n d A s s u m p ­
tions.

167
The coefficients indicate that the degree of over­
lapping in

these tests is low.

Since the maximum over­

lapping is

only 53 percent all of the types of items repre­

sented in the tryout battery were used in the construction
of Test I, The Ability to Think Scientifically.
To determine whether the correlation between any of
the tests of the tryout test battery of tests was sufficient­
ly high to

be used instead of the composite of the scores

the tryout

test battery, the scores on each of the tests

were correlated with the total scores.

on

These correlations

are given in Table XVIII.
TABLE XVIII
CORRELATION OF TOTAL SCORES ON TRYOUT TEST
BATTERY V/ITH EACH OF THE TRYOUT TESTS
Tests
B

C

D

E

F

G-

H

J

.62

.44

.55

H
•

Tryout total

A

.73

.74

.71

.71

.69

The standard errors of these coefficients ranged from
.04 to .07.

It Is of interest to note that Test D, The Abil­

ity to Organize Data, had the lowest correlation with the
total scores on the tryout test battery.

This was to be ex­

pected on the basis of the nature of the test.

Inspection

of the test reveals that it was testing a much more restricted
range of objectives than any of the other tests, therefore,
it would not be expected that it would have as high a

168
correlation with the composite score as a test me as ur in g
a wider range of behaviors.

That T est F, Experi me nt at io n

and the Interpretation of Data,

would h a v e the h i g h e s t

correlation with the scores on the total test battery was
to be expected,

since that test m e a s u r e d b oth u n d e r s t a n d ­

ing of e x p e ri me nt at io n an d the abil i ty to interpret data,
that is,

it m e a s ur ed a wider range of the behaviors m e a s ­

ured by the battery of tests than did any other tryout
test
M u l t i p l e correlations of the scores on the total tr y ­
out test bat te ry with each combination of two of the indi­
vidual tests of the battery were c al culated to determine
which two tests w o ul d be the m o st satisfactory to use in
appraising the ability to think scientifically.

The follow

13
ing formula ^ was u s e d for the c alculation of these multiple
correlations:

13

Y/illiam D. ha ten, Elementary M at h e m a t i c a l
S t a t i s t i c s . N e w York:
John V/iley and Sons.
1938.

p. 187.

169

TABLE XIX
M U L T I P L E C O R R E L A T I O N OF T R Y O U T T O T A L
W I T H T WO OF T HE T R YOUT T E STS

.67

.84

.83

.82

.79

.77

.64

.54

.77

.77

.63

.82

.81

.76

.77

.74

D

.85

.85

.84
.87

H

Co

-<]

in

•

G

00

F

.84

•

.86

.76

GO

E

.74

00

C

.78

CO

B

•

-72

00

.69**

->]

J

•

H

CVI
CO

G

h•

F

K\

E

C

H
00

D

o•

A*

B

•

A

Tests

•

T ests

.77

J
A, S t e p s in S c i e n t i f i c Thinking.
B, D e l i m i t a t i o n
of Problems.
C, E x p e r i m e n t a l Procedures.
D, O r g a n i z a t i o n
of Data.
E, E v a l u a t i o n of Hypotheses.
F, E x p e r i m e n t a t i o n
and the I n t e r p r e t a t i o n of Data.
3-, D r a w i n g of Conclusions.
H, I n t e r p r e t a t i o n of Data.
J, G e n e r a l i z a t i o n s a nd A s s u m p ­
tions.
■**■* This is to be read:
M u l t i p l e c o r r e l a t i o n of t r y ­
out total w ith Tests A a n d B.
T able X I X is s i g n i f i c a n t In that it shows that any
two tests of the b a t t e r y gave fairly
tion w i t h the crite'rion.
Test D we r e

Multiple

substantial correla­

corr e l a t i o n s

involving

lower than any of the o t her correlations.

The

h i g h e s t m u l t i p l e c o r r e l a t i o n was o b t a i n e d w i t h Tests G a n d
J.

This

highest

is i n t e r e s t i n g since n e i t h e r of these
c o r r e l a t i o n with the criterion.

This

tests h a d the
can probably

be e x p l a i n e d by the fact that they h a d a r e l a t i v e l y

low

170
correlation with each other as can be seen in Table XV.
In problems involving more than four variables the
mechanics of calculating multiple correlations is almost
prohibitive unless some systematic method of solution is
used.1^ The Wherry-Doolittle method,^5 in addition to being
a systematic method of calculating multiple correlations,
corrects the correlation for chance errors.

Table XX pre­

sents the results of this method of obtaining multiple cor­
relations of the tryout tests with the criterion, which was
the total score on the tryout test, and shows the correla­
tions obtained by the addition of each successive test.
using this method the first test used, Test F, is the one
with the highest simple correlation with the criterion.
TABLE XX
MULTIPLE CORRELATION OF TRYOUT TESTS
WITH THE CRITERION - OBTAINED BY
THE WHERRY-DCOLITTLE METHOD

Tests
F
F. E
F, E,
T? 2,
T
yrt
*? ■^
S.
S,
”

9

9

4

9

J

9

J
J,
j,
j ,
J,

Multiple correlations

Ga, b
a, b , a
G-, B, C, H

* A simple correlation

Garrett, ojc. c i t. . p. 4-35.
15 Ibid.. pp. 435 - 448.

.740*
.856
.907
.948
.963
.972
.977

In

171
Tests k and D were not added because the Increase
in the multiple correlation by the addition of Test H had
been so slight that further additions seemed unnecessary.
As shown by the data presented in Table XX each successive
test added less to the correlation.

It would appear that

if a few of the individual tests of the tryout battery
were to be used as a measure of the ability to think scien­
tifically, Test 2, The Evaluation of Hypotheses, Test F,
Experimentation and the Interpretation of Data, Test J,
generalizations and Assumptions,

and Test 3-, Drawing of

Conclusions, would yield scores sufficiently like the ones
obtained from the entire battery to Justify the use of only
these four tests.
Correlations of scores on trvout tests with scores
on Intelligence tests and reading tests.

In order to deter­

mine whether the tryout tests were measuring intelligence
or reading ability to a considerable extent,

the scores made

by students on each of the tryout tests were correlated with
the quantitative score and with the linguistic score on the
American Council on Education Psychological Examination and
with the scores on the American Council on Education Reading
Comprehension Test.

These correlations are presented in the

correlations of tryout test scores with intelligence test
and reading test scores in Table XXI.

TABLi XXI
CORRELATION3 CF TRYOUT TEST SCORES WITH
INTELLIGENCE TEST AND READING TEST SCORES

Tests
Tests

Quantitative

Linguistic

Reading

A*

.17

.43

.25

B

.24

.26

.13

"n

.28

.41

.39

D

.31

.11

.10

E

.33

.42

.41

F

.38

.34

.35

.37

.21

.29

.37

.18

.25

.37

.29

.35

*

i

J
*

A, Steps in Scientific Thinking.
B, Delimitation
of Problems.
C, Experimental Procedures.
D, Organisation
cf Data.
E, Evaluation of Hypotheses.
F, Experimentation
and the Interpretation of Data.
G, Drawing of Conclusions.
H, Interpretation of Data.
J,. Generalizations and Assump­
tions.
The abilities measured by the tryout tests, although
all positively correlated with the quantitative and linguis
tic factors of intelligence and with reading ability,

do no

appear to be identical with any of these mental functions.
These data also give evidence that the inclusion of all of
the types of items presented in the tryout battery could
justifiably be included in the final form of the test since
none of the tryout tests seemed to be measuring either of
the factors of intelligence or reading ability.

173
The preparation of Test I. - The Ability to Think
5cier.tifleally.

Test I, presented in Appendix II, was

constructed from items of the tryout tests.

Because of

the nature of the items it was necessary to choose hlocks
of items from the tryout tests rather than individual items.
Therefore,

it was necessary to select the "best blocks of

items from each of the tryout tests.

Items within blocks

were eliminated if the estimated coefficient of correlation
of the item with the tryout test score was low.

Certain

items had to be retained even if the item correlation was
low because the information given in them was essential to
the development of an understanding of the entire block of
items.

An attempt was made to eliminate all items with a

discrimination index of less than 20 which corresponds to
a coefficient of correlation of .33.

This is in accord with

the recommendation of D a v i s . ^
Test authorities do not agree on the best form of
17
distribution of item difficulties. 1 Some recommend all
items as near 50 percent difficulty as possible; others
recommend equal distribution of Items from 0 to 100 percent
difficulty.

lo
17

Davis, ojc. c i t. . p. 15.

Herbert E. Hawkes, E. F. Lindquist, and C. R.
M a n n , The Construction and Use of Achievement x!<xaminatlons •
Cambridge: Houghton Mifflin Company.
1936.
p. 32.

174
Flanagan

18 has shown that on a theoretical basis the

best test would be one composed of items which were all
answered correctly by 50 percent of the group if the corre­
lations between individual items were zero; but that in a
theoretical case where the correlations between the indi­
vidual items of the test were one, the items should range
from zero to 100 percent of successes.
ations are, of course, hypothetical.

Both of these situ­
In reality, the situ­

ation is usually intermediate between these two extremes.
In the present case items have been chosen from a range' of
from 10 - 95 percent difficulty.

This is in accordance

with the suggestion of Hawkes, Lindquist and Mann.

19

Ten to 2 0 'items were selected from each of the try­
out tests with the exception that only four of the best
items were selected from Test D.

Test I, The Ability to

Think Scientifically (see Appendix II), was made up of 150
items selected from a total of 637 items which comprised
the tryout test battery.

An attempt was made to include

items to appraise most of the behaviors identified in Chap­
ter IV, therefore only four items were used from Test D,
despite the high discrimination index of the items.

Flanagan,
^

o d

. c i t .. p. 675-676.

Hawkes, Lindquist, and Mann, ojg. cit. , p. 32.

175
ANALYSES
Test

I, The A b i l i t y

administered

I A N D TEST

IA

to T h i n k S c i e n t l f i o a l l v . was

to 50 0 s t u d e n t s w h o h a d o o m p l e t e d a y e a r of

Biological Science at
and to 2 4 0

OF TEST

the e n d o f the

spring

t e r m of

1 950

s t u d e n t s who h a d n o t y e t h a d B i o l o g i c a l S c i e n c e

at the b e g i n n i n g of the f a l l t e r m o f 1950.
final f o r m of the test,

The A b i l i t y

T e s t 1A,

the

to T h i n k S c i e n t i f i c a l l y ,

was a d m i n i s t e r e d to 3 30 o t h e r s t u d e n t s

who h a d not

yet

taken B i o l o g i c a l S c i e n c e a t the b e g i n n i n g of the f a l l
of 1950.
group at

Test

IA was a l s o a d m i n i s t e r e d to 136

the end o f

Analysis
cally.

range of s c o r e s
course w as
78.92

£

for the

students who h a d
The m e a n

of

completed

the y e ar

1 5 * 4 1 £ .52.

reliability

of the t e s t f o r this g r o u p w a s

termined by

the s p l i t - h a l f m e t h o d ,

a b i l i t y of

The

the s c o r e s was

the s t a n d a r d d e v i a t i o n w a s

Brown formula.

same

to T h i n k S c i e n t i f i ­

c o m p r i s e d of a t o t a l of 150 items.

f r o m 30 to 117.

.73;

of t his

the f i r s t t e r m of the course.

of T e s t I - The A b i l i t y

T e s t I was

term

The

.89 - *01 as d e ­

c o r r e c t e d by the S p e a r m a n -

B y u s i n g the K u d e r - R i c h a r d s o n f o r m u l a a r e l i ­

.85 £

.01 w a s

obtained.

T h e r a n g e of s c o r e s
c o l l e g e b i o l o g y was
was 6 0 . 6 4 £ 1.18,

f o r the

f r o m 27 to 107*

the

T h e m e a n for this g r o u p

s t a n d a r d d e v i a t i o n was

The r e l i a b i l i t y for this g r o u p was
the s p l i t - h a l f m e t h od ,

s t u d e n t s w h o h a d h a d no

c o r r e c t e d by

.91

t

1 7*52

£

.83*

*01 c a l c u l a t e d by

the S p e a r m a n - B r o w n

176
formula.
.89

-

The Kuder-Richardson formula gave a value of

.01.
The complete item analysis data for Test I are pre­

sented in Table XXXXV of Appendix II.

The papers of the

500 students who completed this test in the spring term of
1950 were used for the analysis.

Item discriminations were
20
21
determined hy the methods of Flanagan
and of Davis
as
described previously.

The criterion used was the total

score on Test I, The Ability to Think Scientifically.
The item discrimination values ranged from -.23 to
.73.

The value of -.23 is somewhat difficult to explain

since discriminating items from the tryout test had been
used in the construction of Test I.

The discrimination

indices ranged from -14 to 56, giving a mean discrimination
index of 25.36.

This corresponds to a value of .39 expressed

as an estimated coefficient of correlation with the total
test score.

Item difficulties were estimated by the methods

suggested by Davis.

pp

The difficulties expressed in percent

of successes ranged from 0 to 86 percent.

The indices of

difficulty ranged from 0 to 73, with an average difficulty
index of 43.17.
20

This corresponds to an average of 32

Flanagan, oo. c i t .. pp. 674-680.
Davis, og. eft., pp. 8-15-

22 Ibid., pp. 2-4.

177
percent successes.
The mean of the test and the mean item difficulty
gave evidence that the test was probably a little too
difficult for the group for which it was devised.

The

reliability of the test compares favorably with the r e ­
quirements of most standardized tests.

The pertinent data

on Test I, The Ability to Think Scientifically, are pre­
sented in Table XXII.
TABLE XXII
PERTINENT DATA FOR TEST I

G-roup

O
H

240
150
27 - 107
60.64 ± 1.18
17.32 t .83
•

500
150
30 - 117
78.92 ± .73
15.41 ± .52
.89 t .01
-14 - 56
25.36
0-73
43.17

H
1+

No
Biological
Science

•
VO

Number of students
...........
Number of items
...............
Range of scores
. . . ...........
Mean
...........................
Standard deviation
.......... .
Reliability coefficient
......
Range of discrimination indices
Mean discrimination index
....
Range of difficulty indices
Mean difficulty index
........

3 terms
Biological
Science

A comparison of the discrimination indices and the
difficulty indices as determined for the same items in the
tryout tests and Test I is presented in Table XXIII.

The

data in this table constitute evidence that the test items
chosen from the tryout tests to make up Test I were, in
general, highly discriminating.

178
TABLE XXIII
COMPARISON OF
OF DIFFICULTY
AS OBTAINED FROM
AND AS OBTAINED

Item Number
Tryout
Test

A-13
A-14
A-15
A-16
A-17
A-18
A-19
A-20
A-21
A-22
A-23
A-24
A-25
A-26
A-27
A-28
A-29
B- 1
B- 5
B- 7
B- 8
B- 9
B-10
B-ll
B-12
B-13
B-14
B-19
B-21
B-22
B-25
B-28
B-30
G- 3
C- 6
G- 7
C- 8
C-21
C-23
*

Test I

1
2
3
4
5
6
7
8
*9
10
11
*12
13
14
15
16
17
*18 '
*19
*20
*21
*22
*23
*24
*25
*26
*27
*28
*29
*30
*31
*32
*33
34
*35
36
37
38
39

DISCRIMINATION INDICES A N D
INDICES OF IDENTICAL ITEMS
ITEM ANALYSIS OF TRYOUT TESTS
FROM ITEM ANALYSIS OF TEST I

Dlsorlmlnatlon Index
Tryout
Test

Test I

15
40
43
25
29
30
40
27
19
29
49
16
12
40
33
28
34
18
48
43
25
48
22
29
16
45
34
24
72
13
46
39
29
63
58
38
42
31
23

10
15
20
33
34
17
23
10
- 6
28
42
- 2
26
28
17
23
14
20
37
-14
8
32
10
10
23
17
22
16
33
14
27
29
12
37
1
50
26
35
32

Difficulty Index
Tryout
Test

72
53
53
70
68
47
73
69
54
68
32
46
56
38
62
55
56
38
45
28
34
42
55
52
49
43
53
33
44
49
30
46
38
40
38
43
66
51
61

Test I

73
55
55
68
68
40
60
59
45
53
27
40
60
44
55
51
52
43
29
9
37
39
52
52
46
47
54
37
35
45
36
36
45
25
1
33
59
50
55

Items eliminated from Test I in construction of Test IA

179
TABLE XXIII (continued)

Item N u m b e r
Tryout
Test

C-26
C-28
C-29
0-51
C-55
0-56
0-58
0-62
D-13
D- 8
D-16
D- 5
E- 1
E- 4
E- 5
E- 6
E- 7
E- 8
E- 9
E-10
E-ll
E-12
E-47
E-49
E-50
E-51
E-52
E-53
E-54
E-55
E-57
E-60
F-58
F-59
F-60
F-61
F-62
F-63
F-64
F-71
F-72
F-40
F-4l

Test

40
*41
42
43
*44
*45
*46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

I

Discrimination
Tryout
Test

31
20
36
54
16
52
41
34
90
71
58
28
31
51
26
42
29
9
37
11
15
38
40
38
51
23
45
18
54
34
20
23
25
60
49
50
56
55
59
33
25
49
36

Index

Test

27
2
11
22
46
3
0
24
24
20
13
9
14
50
22
23
20
16
16
8
21
48
19
27
33
23
31
15
3
27
8
11
23
56
24
34
35
37
35
34
29
46
43

I

Difficulty
Tryout
Test

44
45
54
40
34
34
34
53
48
51
38
55
42
33
51
34
64
70
52
44
41
32
38
60
33
61
36
26
35
38
35
44
66
56
58
45
54
53
56
78
63
53
54

Index
Test

40
42
56
42
30
38
O
46
46
56
45
56
32
33
51
34
63
67
44
45
39
38
45
60
36
47
20
15
22
35
39
46
55
40
47
22
49
46
47
68
60
55
49

I

180
TABLE XXIII (continued)

Item Number
Cryout
Test I
Test
F-42
83
84
F-43
F-44
85
86
F-45
F-46
87
F-52
88
89
F-53
F-54
90
91
F-55
0- 1
92
0- 4
93
94
5
0-15
95
96
0-17
0-18
97
0-20
98
0-35
99
100
0-39
0-40
101
0-41
102
0-47
103
104
0-48
0-50
105
106
0-51
107
3-53
0-54
108
0-89
109
0-90
*110
H-42
111
H-44
112
H-46
113
114
H-47
H-48
115
116
H-49
H-53
117
118
H-55
H-59
119
H-61
120
121
J-63
J-64
122
J-67
123
124
J-68
J-70
125

Discrimination Index
Tryout
Test
Test I
21
29
44
54
36
51
26
30
45
33
41
22
32
56
46
50
36
37
28
27
30
59
22
33
44
47
32
33
20
47
31
49
34
23
41
82
36
43
52
48
0
0
68
47
26
30
40
56
25
49
22
71
28
29
31
35
21
15
40
47
72
39
37
25
21
36
26
37
30
26
6
24
29
35
16
16
21
23
20
26
19
43
34
27
12
19

Difficulty Index
Tryout
Test
Test I
44
44
64
60
54
52
74
65
46
33
44
54
21
37
61
57
52
58
41
40
48
53
50
49
30
33
66
59
30
13
61
68
48
55
44
47
40
33
34
31
0
0
30
42
64
57
33
37
16
32
48
51
43
49
20
23
62
65
41
42
42
44
44
49
76
69
57
55
64
62
70
49
44
42
37
47
42
51
58
45
44
52
52
53
56
54

181
TABLE XXIII (continued)

Item N u m b e r
Tryout
Test

J-71
J-74
J-75
J-77
J-78
J-80
H-84
H-85
H-86
H-88
H-91
H-93
H-95
H-99
H-100
J-101
J-104
J-105
J-106
J-110
J-lll
J-116
J-119
J-122
J-123
*

Test I

*126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

D i s c r i m i n a t i o n Index
Tryout
Test

33
63
32
50
53
40
44
31
36
47
28
35
27
46
24
59
56
28
45
31
32
52
29
55
30

Test I

14
27
10
39
30
27
25
32
16
19
47
22
35
16
5
49
24
50
37
21
38
43
16
35
22

Difficulty Index
Tryout
T e st

35
40
54
38
35
41
46
66
37
42
56
69
42
30
48
47
36
50
51
46
40
34
36
36
53

Test I

9
43
45
42
33
45
40
59
23
34
42
62
44
10
48
36
31
47
49
52
25
32
10
27
47

Items e l i m i n a t e d from T est I in c on st ru ct io n of T es t IA

182
A f e w items of l o w d i s c r i m i n a t i o n w e r e
"because th ese

items g a v e

a n s w e r i n g of s u b s e q u e n t

included

information necessary
items.

This was

especially

of item 103 w h i c h h a d no d i s c r i m i n a t i v e v a l u e .
this s i n g l e i t e m the r a n g e o f d i s c r i m i n a t i o n
from 9 to 90,

which

corresponds

to the
t rue

Omitting

i n d i c e s was

to a r a n g e of f r o m

.18 to

.90 w h e n e x p r e s s e d as a n e s t i m a t e d c o e f f i c i e n t of c o r r e l a ­
tion w i t h

the c r i t e r i a

(the

The a v e r a g e d i s c r i m i n a t i o n
tryout data,

was

37.90.

individual
i n d e x of the

This

It

items,

corresponds

c o e f f i c i e n t of c o r r e l a t i o n of
relatively high

t r y o u t te s t

.56.

scores).

b a s e d on

to a n e s t i m a t e d

Th i s v a l u e r e p r e s e n t s

item validity.

is of i n t e r e s t to n o t e

indices o b t a i n e d by u s i n g T e s t
a few e x c e p t i o n s ,

lower than

t h a t the d i s c r i m i n a t i o n

I as

the

c r i t e r i o n are,

with

the i n d i c e s o b t a i n e d b y u s i n g

the i n d i v i d u a l t r y o u t t e s t s as

the c r i t e r i o n .

S i n c e each

of the t r y o u t tests was c o n s t r u c t e d to m e a s u r e a r a t h e r
n a r r o w r a n g e of a b i l i t i e s ,
p ecte d to be m o r e h i g h l y
single t r y o u t test,

individual

correlated with

of w h i c h

s is te nc y

In o t h e r words,
than

the

score of the

t he y w e r e a part,

the t est on m a n y of the a b i l i t i e s
thinking.

i tems w o u l d be e x ­

than w i t h

i n v o l v e d in s c i e n t i f i c

T e s t I h a d less

internal con­

the i n d i v i d u a l t r y o u t test.

T a b l e X X I V p r e s e n t s a c o m p a r i s o n of
a l ys is d a t a of the

items of

c o n s t r u c t i o n of T e s t I,

(2)

(1)

the item a n ­

the t r y o u t t ests u s e d in the
the

i t e m a n a l y s i s da t a on Test I

using the total score on this test as the criterion and
(3) the item analysis data on the items of Test I used in
the construction of Test IA.

(Item 103 has been omitted

from these comparisons since it had been included because
it was necessary to the development of the idea presented
in the block of items of which it was a part).
TABLE XXIV
SUMMARY OF ITEM ANALYSIS DATA FOR TRYOUT TEST ITEMS USED
IN CONSTRUCTION OF TEST I, ITEMS OF TEST I, A N D ITEMS
OF TEST I USED IN CONSTRUCTION OF TEST IA

Number of items
...............
Range of discrimination indices
Mean discrimination index
....
Range of difficulty indices
Mean difficulty index
........

Tryout

Test I

Test IA

150
9 -90
37.90
26 - 78
47.82

150
-14 - 56
25.36
0 - 7 3
43.17

125
3-36
27.22
10 - 73
44.64

Analysis of Test IA - The Ability to Think Scientifl
callv.

Since items were presented in blocks centering

around a particular problem or experiment,
be arranged in order of difficulty.

they could not

It was not intended

that the test be designed as a speed test, and the nature
of the sequence of items was not such that it could be ar ­
ranged as a power test.

Since the test was devised to meas

ure growth and was to be used as a means of evaluating in­
struction,

it seemed advisable to make the test of such a

length that all, or at least 99 percent,
could finish it in the allotted time.

of the students

As reported in

184
C h a p t e r III,

10 p e r c e n t of the

students

f a i l e d to

Test I in the h o u r a n d

fifty minutes available,

25 of the p o o r e r

of T e s t

i t e ms

basis of it e m a n a l y s i s
16 items of the

if r elated,

to m a k e T e s t

IA.

p r o b le m.

proved satisfactory.
for a n s w ering

s i nc e

there

It was d i f f i c u l t f o r
question was

I w e r e e l i m i n a t e d on the

to d e l i m i t a

choices a v a i l a b l e
satisfactory

the

the

student

of the

t h a t is t he f i ve
was

probably not

the p r o b l e m a n d w h e t h e r ,

or m i g h t n o t b e u s e f u l

for future

but because

portion

the

to d e t e r m i n e w h e t h e r a

item h a d p r o m i s e

in the

that this

t e s t as a w h o l e ,

solution
t y p e of

t e s ts w i t h r e v i s i o n of the p o s s i ­

these items did not

T e s t IA is p r e s e n t e d
The

i n c l u d e d the

s e e m e d to be too m u c h o v e r l a p p i n g .

It s e e m e d to the w r i t e r

them.

This

items,

of the p r o b l e m .

ute to the

These

T h e key,

or w a s n o t r e l a t e d to
it m i g h t

ble answ er s,

therefore

s e c t i o n on the t e s t d e s i g n e d to m e a s u r e

stu de n t* s a b i l i t y
test h a d n o t

complete

se e m to

it s e e m e d a d v i s a b l e
in A p p e n d i x

contrib­

to e l i m i n a t e

III.

i t ems m a r k e d w i t h an a s t e r i s k in T a b l e X X I I I are

the ones w h i c h w e r e d r o p p e d f r o m T e s t I in the c o n s t r u c t i o n
of T es t

IA.

Test

IA w a s a d m i n i s t e r e d at the b e g i n n i n g of the

fall t e r m to 350 s t u d e n t s w h o h a d h a d no B i o l o g i c a l S c i e n c e .
The r a n g e of s c o r e s m a d e by this g r o u p was
The m e a n of T e s t IA was
v i a t i o n was

16.28 ±

.78.

53.16

- 1.11,

while

f ro m 23 to 101.
the s t a n d a r d d e ­

T h e r e l i a b i l i t y as d e t e r m i n e d by

the s p l i t - h a l f m e t h o d v/ith the S p e a r m a n - B r o w n c o r r e c t i o n was
.91 ±

.01.

The Kuder-Richardson

formula gave a reliability

185
This test was administered to 136 of the same
students at the end of the fall term of 1950 after they had
taxen one term of Biological Science.
for this group was from 31 to 103.

The range of scores

The mean was 69.94 ±

1.41, and the standard deviation 16.43 ± .99.

The reliabil­

ity as determined by the split-half method corrected by the
Spearman-Brown formula was .90 - .02.

The reliability as

calculated by the Kuder-Richardson formula was .89 * .02.
Table XXV presents the pertinent data for Test IA.
TABLE XXV
PERTINENT DATA FCR TEST IA

_________3-roup______________________
No
136 of same group
3iological
after one term of
Science_______ Biological Science
Number of students ....
Number of items .......
Range of scores .......
Kean ...................
Standard deviation ....
Reliability coefficient

330
125
23 - 101
53.16 * 1.11
16.28 ± 0.78
.91 ± .01

136
125
31 - 103
69.94 ± 1.41
16.43 ± 0.99
.90 ± .02

Item analysis data were not collected for Test IA,
since this test was constructed by omitting items from Test
I.

Table XXIV presents the range of discrimination indices,

mean discrimination index, the range of difficulty indices,
and the mean difficulty index for the items used in this
test.

As can be seen from an inspection of this table, both

the mean discrimination index and the mean difficulty index

ere higher ‘than corresponding values for Test I.

This

robab iy accounts for the fact that the reliability of
est IA was at least a3 high as the reliability of Test I,
v a n

though it was 25 items shorter.

CHAPTER VI
THE VALIDATION OF THE TEST
This chapter is concerned with the validation of the
test, The Ability to Think Scientifically.

The methods used

in the curricular validation and the methods used in the
statistical validation are presented.
The most important characteristic of a test is its
validity, that is, the extent to which a test measures what
1 2
it is supposed to measure. ’
Validity is not a general
term which can be applied to a test, but is a very specific
concept and must be considered with reference to the purpose
for which the test is used.

A test is valid- only in so far

as it accomplishes its specific purpose for a particular
group.
Remmers and G-age^ have discussed the kinds of criteria
which have been used in the validation of tests.

They divide

the criteria into two interrelated classes; (1) criteria with

1 Herbert E. Hawkes, E. F. Lindquist,, and G. R. Mann,
The Gonstructlon ana use of Achievement Tests. Cambridge:
Houghton Mifflin Company.
1936.
p. 21
2
Dorothy C. Adkins, Construction and Analysis of
Achievement Tests. 7/ashington: U. S. Government Printing
Office.
194-7. p. 160.
^ Hermann H. Remmers and N. L. Gage, Educational
Measurement and Evaluation. New York: Harper and Brothers.
1943.
pp. 195-201.

188
which to compare test content, and (2) criteria with which
to compare test scores.
They state that the criteria with which the content
of a test may be compared are;

(1) analysis of courses of

study, (2) statements of instructional objectives,
alysis of text books,
ination questions,

(3) an­

(4) analysis of teacher1s final exam­

(5) pooled Judgments of competent persons,

(6) concepts of social utility, and (7) introspective logical
or psychological analysis of mental processes.

These types

of criteria have been referred to as curricular criteria.
The criteria which Remmers and 3-age mention to which
scores on the test may be compared are;
(2)

(1) school marks,

increases in percentage of success in successive ages

or grades,

(3) differences in scores obtained by any two or

more groups known to be widely separated in ability,

(4)

ratings of pupils by competent raters, and (5) correlations
with other tests.

The validity obtained by these methods

has been referred to as statistical validity.
THE CURRICULAR VALIDATION OF THE TEST
The course of study for Biological Science at Michigan
State College was analysed and objectives of the course were
considered in the construction of the test.

The curricular

validity of the test was Insured by the incorporation into
the test of the desired educational outcomes related to

189
scientific thinking which are emphasized in the course. In
addition, a detailed description of the behaviors involved
in scientific thinking was undertaken.

This detailed an ­

alysis, presented in Chapter IV, was based upon the analysis
of behaviors involved in scientific thinking as (1) described
by persons constructing tests designed to measure the ability
to think scientifically,
scientific thinking,

(2) inferred from the elements of

(3) described in committee reports on

behaviors involved in scientific thinking, and (4) described
in reports of research on behaviors of persons doing scien­
tific research.

In all 98 behaviors attending scientific

thinking were outlined. Test items were constructed from the
outline of behaviors presented in Chapter IV, and an attempt
was made to include as many of these behaviors as possible
in the tryout tests.

An inspection of Table I, (see pages

138 - 142), indicates that most of the 98 abilities which
were identified as critical aspects of scientific thinking
were appraised by the tryout tests.
A number of tests designed to measure the abilities
involved in problem-solving were examined.

The analysis of

these tests revealed the kinds of techniques which had been
used to measure the ability to think scientifically, and thus
provided a basis for the curricular validity of the test. The
use of some of the techniques used previously and an attempt
to include items in the tryout tests which measured most of

190
the behaviors measured by previous tests should contribute
to the curricular validity of the test.
Another method consisted of submitting the tryout
tests to five competent judges for criticism.

The judges

agreed that the items were valid measure of the abilities
which each of the tryout tests purported to measure.

In

addition, there was substantial agreement as to the correct
answer for each item.

Where there were disagreements among

the judges the items were discussed with each of them and
these items were either revised on the basis of the discuss­
ion or were eliminated.
Free responses of students were used as items of the
test whenever this method of obtaining items was feasible.
Situations were presented to students and students were
requested to indicate what problems were suggested by these
situations.

The problems suggested were utilized in the

construction of the test devised to measure the ability to
delimit problems.

Hypotheses were presented; students were

requested to describe experiments to test these hypotheses.
These experiments were used in the construction of the test
devised to measure the ability to plan experiments.

Data

were given to students; the students were instructed to draw
conclusions from the data.

These conclusions were used in

the construction of the test to appraise the ability to draw
conclusions.

In all cases the groups from which free re­

sponses were obtained were different groups from those to

191
which the tryout tests were administered.

The use of free

responses of students should contribute to the validity of
the test because items written by students should be compre­
hensible to other students and because the responses of the
students represent the kinds of answers which students give
on essay type examinations.
Careful selection of materials for the test items
should also contribute to the validity of the test.

The

criteria used in the selection of materials were discussed
in Chapter III.

The ones of importance to the validation

of the test were:
1.

The material should be comprehensible to students
who had had no training in biology.

2.

Data used for interpretation should be entirely
new to the student.

3.The material should be biological since the test
was devised for a course in first year college
biology.
The first of these criteria was met by the selection
of materials which were on subject matter which it was
assumed all students had encountered,
breathing, plants, etc.

such as colds, disease,

The second criterion was met by

choosing data from sources which the elementary student
would

not be expected to read, such as scientific journals

and advanced text books.

The third criterion was satisfied

by using materials of a biological nature.
Perhaps the most important method of validating a test
is by considering its social utility.

The committee reports

reviewed in C h a p t e r I t e s t i f y to the fact that the a b i l i t y
to think s ci e n t i f i c a l l y is one of the

i mp ortant obj e ct iv es

of general education, a n d a primary o b j e c ti ve of the t e a c h ­
ing of science.

The test,

d e s i g n e d to m e a s u r e an o bj ective

of importance in a d e m o c r a t i c soci et y

should have social

usefulness.
THE S T A T I S T I C A L V A L I D A T I O N OF THE TEST
V a l i d a t i o n by c o r r e l a t i o n w it h m e a s u r e s of i n t e l l i ­
gence.

reading ability,

a n d fac tu al

method us e d to es ta bl is h

information.

The first

the s t a t i s t i c a l v a l i d i t y of the

test was the c o r r e l a t i o n of scores m ad e by students on the
tests w ith oth er kinds of tests.

In a sense,

this

is a n e g ­

ative form of v a l i d a t i o n b e c a u s e a h i g h co r re la ti on of this
test with m e a s u r e s of such

traits as

intelligence,

reading

ability, and kn ow le dg e of facts w o u l d indicate that the test
could n o t then mea su r e
purported to measure,

in a ny co ns i d e r a b l e a m ou nt what it
a s s u m i n g that the test was d e s i g n e d to

measure something different.

It cannot be assumed, however,

that the test is a valid, m e a s u r e of a b i l i t y to think s c i e n ­
tifically m er e l y bec au se of a lack of substan ti al r e l a t i o n ­
ship to any of these factors.
P r e l im in ar y ev id e n c e

co nc e rn in g the s tatistical v a l i d i ­

ty of the test was o b t a i n e d by c o r r e l a t i n g the total scores
made by 162 students t a k i n g the tryout tests with;

(1) the

q ua ntitative scores on the p sy c h o l o g i c a l examinations,

(2)

193
the linguistic scores of the psychological examinations,
(3)

the total psychological examination scores,

reading examination scores,

(4) total

(5) the sums of the scores made

on the departmental term-end examinations for first and
second terms, (6) the scores on the factual portion of the
comprehensive examination, and (7) the scores on the portion
of the comprehensive examination which involved the use of
scientific thinking as well as biological information.

It

was assumed that a high correlation between the scores on
the battery of tryout tests and the scientific thinking
portion of the comprehensive examination would give some
positive evidence of validity.

It was also assumed that low

correlations with the first six tests would be desirable.
As previously mentioned, psychological examination
scores and reading examination scores were available for 264
of the 500 students who took Test I, The Ability to Think
Scientifically. In May, 1930.

The scores made by these 264

students on Test I were correlated with; (1) the quantitative
scores on the psychological examination,

(2) the linguistic

scores on the psychological examination,

(3) the total scores

on the psychological examination, and (4) the total reading
test scores.

These correlations together with those obtained

by correlating the same four factors with the total tryout
test are given in Table XXVI.

The standard errors of the

correlations ranged from .04 to .07.

194
TABLE XXVI
CORRELATION OF TRYOUT TEST SCORES AND SCORES ON TEST I
WITH PSYCHOLOGICAL EXAMINATION SCORES AND READING TEST SCORES

Tests

Tryout

.45

Test I

.48

Reading

•

Linguistic

Total
Psychological

00
rv

Quantitative

Tests

.51

.49

.43

.51

.43

These data show that there was a moderate positive cor­
relation between the ability to think scientifically, as m eas­
ured by these tests, and both the quantitative and linguistic
factors of intelligence, and reading ability.
Since the tryout tests and the American Council on
Education Psychological Examination both depended to some ex­
tent on reading ability, it seemed desirable to hold reading
ability as a constant factor in making a correlation between
the ability to think scientifically and intelligence.

This

was accomplished by the use of a partial correlation.

The

formula

4

used in the calculation of the partial correlation

wa s :
r i 2 *3

- r 12

^

-

r 13 r23

- r|3

The intercorrelations used to determine the partial correla­
tion are given in Table XXVII.
4

Quinn, McNemar.
Psychological Statistics. New York:
John Wiley and Sons.
1949.
p. 141.

195
TABUS XXVII
INTERCORRELATIONS OF TRYOUT TEST,
PSYCHOLOGICAL EXAMINATION AND READING TEST

Tests
Tests

Tryout

Psychological

Tryout

.51

Psychological

Reading
A

9

.59

Reading
Partial Correlation (rl2.3) = .31

The correlation between the ability to think scien­
tifically, as measured by the battery of tryout tests, and
intelligence, with reading ability held constant, was ,31.
This indicated that a part of the observed relationship
between the ability to think scientifically and intelligence,
as measured in these two tests, was due to the common depend­
ence of both tests upon reading but that most of the relation­
ship still remained.
A partial correlation was also calculated for the re­
lationship between Test I, The Ability to Think Scientifically,
and intelligence with reading ability held constant.

The

intercorrelations used in this computation are presented in
Table XXVIII.

The partial correlation was .33. These partial

correlations give evidence that the ability to think scien­
tifically was not identical to Intelligence but that the
abilities were related.

196
TABLE XXVIII
INTERCORRELATIONS OF TEST I,
PSYCHOLOG-ICAL EXAMINATION, AND READING TEST

Tests

Test I

Tests
Psychological

Reading

.51

.43

Test I

.60

Psychological
Reading
Partial Correlation (r12.3) = .33

In order to determine the degree to which reading
ability and the ability to think scientifically were related,
a partial correlation of the ability to think scientifically
and reading ability, with intelligence held constant, was
calculated.

For the scores on the tryout test battery this

partial correlation was .28; for the scores on Test I the
partial correlation was .18.

3oth correlations are suf­

ficiently low to show that the tests were not primarily
reading tests.
In order to give a more complete picture of the rela­
tionship of the scores made on the tryout tests to various
other abilities Table XXIX has been prepared.

In this table

are shown the correlations between (A) the total scores on
the battery of tryout tests,

(B) total scores on the Ameri­

can Council on Education Psychological Examination,

(C)

total scores on the American Council on Education Reading

197
Comprehensive Test,

(D ) total scores on departmental term-

end examinations in Biological Science,

(E) the scores on

the factual portion of the comprehensive examination for
Biological Science, and (F) the scores on the scientific
method portion of the comprehensive examination for Bi o ­
logical Science.
TABLE XXIX
IN T ERG ORRE Li TI OilS OF TOTAL TRYOUT TEST SCORES
AND SCORES ON OTHER TESTS
*
E

F

.51

.49

.65

.35

.70

.59

.25

.58

.47

.36

.59

.52

B

mm

C

-

D

-

E

-

CO

D

b•

-

c

OJ

A

B

in
•

A

.41
-

F
•it

A, Tryout Test Battery.
B, A. C. S. Psychological Examin­
ation.
C, A. C. E. Reading Test.
D, Term-end examinations.
2, Comprehensive, Fact.
F, Comprehensive, Scientific method.
An inspection of Table XXIX reveals that the ability
to think scientifically as measured by this battery of tests
was positively related to all of the other factors measured
by the other tests given.
A correlation of .70 (Table XXIX) obtained between the
scores on the scientific method portion of the comprehensive

198
examination and the total scores on the tryout tests is
evidence that the abilities involved in scientific think­
ing as defined by this investigator and the abilities in­
volved in scientific thinking as defined by the trained
examiner for the Department of Biological Science were in
substantial agreement.
About 25 to 30 percent of the items on the depart­
mental term-end examination are items designed to measure
scientific thinking.

On the basis of this fact one would

expect a moderate degree of relationship between scores on
the total tryout test and scores on the term-end examina­
tions, however,

the correlation of .65 may also indicate

that there is a higher relationship between knowledge of
facts and ability to think scientifically than the correla­
tion between the tryout tests and scores on the factual
portion of the comprehensive indicates.
was .33.

This correlation

The relationship between knowledge of facts and

ability to think scientifically should be further investi­
gated.
Validation bv comparison of scores of various croups.
Another method of statistical validation of the test was
the comparison of scores made by students who had not yet
taken Biological Science with scores made by students who
had taken Biological Science.
The scores of students at the beginning of the course

in Biological Science were compared with the scores made
by another group at the end of the three-term course in
Biological Science.

This comparison involved the assumption

that the groups were both representative samples of the same
population.

In reality,

this assumption Is not strictly

true since many persons proficient in Biological Science
were permitted to take the comprehensive examination before
completing three terms of the course, and hence were not
represented in the group that had completed the three-term
course in Biological Science. Also, more poor students drop
out of school than good students,
the lower scores.

thus eliminating some of

The lower standard deviation for this

group gives evidence that these factors were operative.
Equated groups might have been used to reduce the v a r i a ­
bility of the groups, but psychological examination scores
and reading scores were not available at the time of adminis­
tration of the test to the group beginning Biological Science.
The scores of students at the beginning of the course
in Biological Science were also compared with their scores
at the end of one term.

This method relieves one of making

an assumption concerning the nature of the group, but in­
volves the assumption that memory would play no substantial
part in any observed increase in scores.

However,

if the

two methods gave substantially the same results valid infer­
ences concerning the validity of the test could probably be

200
drawn.

The validation of the test by these comparisons is

based on the assumption that increase in scores results
from instruction in the objective being tested and not on
a maturation factor.
As previously mentioned, Test I was administered to
500 students who had completed three terms of Biological
Science.

Of this group 446 completed the test. The scores

made by this group were compared with the scores made by
216 other students who completed the same test before taking
Biological Science.

A comparison of the scores of the two

groups, as presented in Table XXX, gives evidence that there
was improvement of scores and that this improvement was high
ly significant.

The critical ratio of 13.15 showed that thi

difference between the two means was not due to chance.
TABLE XXX
COMPARISON OF MEANS AND STANDARD DEVIATIONS OF TEST I
FOR A CROUP BEFORE TAKING BIOLOGICAL SCIENCE WITH ANOTHER
GROUP AFTER TAKING THREE TERMS OF BIOLOGICAL SCIENCE

Group

Number

3 terms of
Biological Science

Mean

Standard Deviation

446

78.92

15.41

216

60.64

17.32

No
Biological Science

Ml - M2

Critical Ratio = 13.15

201
A comparison was also made of the scores made by
136 of the group who took Test IA before taking Biological
Science,

that is, a pre-te3t group, and the scores on the

same test made by this group after one term of Biological
Science, a post-test group.

The data for this phase of the

study are presented in Table XXXI.

The critical ratio of

8.62 gives further evidence that the difference between the
two means was not due to chance.
TABLE XXXI
COMPARISON OF KEANS AND STANDARD
DEVIATIONS OF TEST IA ON THE PRE-TEST AND POST-TEST

Croup

Number

Kean

Standard Deviation

Pre-test

136

55.60

15.84

Post-test

136

64.94

16.43

1,11

—

%

- =

Critical Ratic> = 8.62

- Kj,

The range of improvement on Test IA is of interest.
Of the 136 students who retook the test, three did not
change their scores,

seven had scores from one to ten points

lower on the post-test, and the remaining 126 students im­
proved their scores from one to 41 points.
Since in both comparisons the differences between the
means were highly significant and both in the same direction

202
we may make the inference that the test had some validity
in that there was an increase in score attending instruction
in the methods of science.

One is obliged, however,

to hold

this inference as tentative until (1) further evidence con­
cerning the relationship between increased knowledge of the
subject matter of the course and performance on the test is
further investigated,

(2) until it is demonstrated that

maturation did not produce the observed results, and (3)
until it is shown that other methods of instruction do not
produce the same results.
Validation bv comparison of scores with ratings of
students by competent

judges.

The final method used in the

statistical validation of the test was the comparison of
scores made on the test with the rating of competent Judges.
A rating scale for the ability to use the scientific method
(Appendix IV) was prepared.
Several members of the Department of Biological
Science at Michigan State College were interviewed in order
to determine the types of behaviors which they had observed
in students whom they considered to have superior ability to
think scientifically and the types of behavior which they had
observed in students whom they believed to be very inferior
in this ability.

The two areas in which they agreed that

ratings of the students could be made on the basis of obser­
vation of their performance in laboratory classes were

203
(1) the a b i l it y to devise a n d evaluate experiments,
the ability to interpret data,

a n d (2)

including the abi li t y to

form h y p o t h e s e s a n d draw conclusions.
The instru ct io ns for rating the students were:
Y/ill y o u please rato the pers o n w h o se name appe ar s
above on the two fo l lowing c h ar ac te ri st ic s ?
The two
extremes of these charac te r is ti cs are described.
Place a cross (X) on the line indicating your judgment
of the i n d i vi du a l w ith respect to the qua l it ie s in
question.
A person hav in g a h i g h degree of a b i l i t y to evaluate
and devise experim en ts was d e s c r i b e d in the following
manner:
Includes control factors, controls a ll but one v a r i ­
able, unders t an ds problem and devises experi me nt to test
hypothesis.
Can devise experiments which w i l l y i e l d
results, recogn iz e s problems inherent in the experiment,
a n d has an u n d e r s t a n d i n g of what is h a p p e n i n g in the
experiment.
A person h a v i n g a low degree of a b i l i t y to evaluate
and devise experim en ts was described:
E x p e r im e nt s lack control or control is faulty, e x p e r ­
iment u n r e l a t e d to hypothesis.
S t u d e n t does not u n d e r ­
stand the e x p e r i me nt al set-up, or the problems inherent
in the experiment.
P r of i ci en cy

in a b i l it y to interpret data could be

r e c o gn iz ed by the following description of a person very
superior in this ability:
Is able to make logical inferences from data, takes
pertinent facts into consideration, applies previous
knowledge to the n e w situation, is able to see r e l a t i o n ­
ships, especially cause and effect relationships.
Knows
what evidence for his inference is, and why it is
evidence.

204
The person very inferior in this ability:
Is unable to make logical inferences from data,
does not diflerentiate between relevant and i rr e l e ­
vant data or between critical and non-critical data,
is unable to see relationships.
The ratings were on a five point scale; very superior,
superior, average,

inferior, and very inferior.

One hundred

and forty-three students taking the first term'of Biological
Science who were given Test IA at the beginning of the first
term of the three-term sequence of the course were rated on
their ability to think scientifically by their instructors.
Test IA was administered again to 136 of these same students
at the end of the first term.

A part of these students were

taught by the present investigator and the remaining students
were taught by another instructor.

Bach of these students

was rated by his instructor on the rating scale described
above.
Students taking Biological Science at Michigan State
College do not necessarily have the same instructor for more
than one term,

therefore,

during the second term most of these

students h a d a different Instructor.

These students were

.

scattered throughout the classes of the 16 instructors teach­
ing the second term of the three-term sequence.

Some had

failed the first term's work and repeated it, hence they were
in classes of one of the three instructors teaching the first
term of the course.

These instructors were requested to rate

the students on their ability to think scientifically by

205
u s i n g the r a t i n g
19 i n s t r u c t o r s
The

two

sheet d es c r i b e d above.

were

Involved

instructors

that

second term

A very

to u s e

were

students

rating

rating 2 points,

and a very

inferior rating

of p o s s i b l e
An
formance

scores

o n the

expectancy
of p e r s o n s

composite

chart,^ which

entry

t a b l e was

as o n e a x i s a n d
Because
very

s u p e r i o r or v e r y

ratings.
average,

Rating

scores

scores below

^ Adkins,

ojd.

t wo

judges,

Since
a

was t h e r a n g e

reveals
test

t he e x p e c t e d p e r ­
scor es ,

was

of the t e s t in

their instructors.

the

one

on the

A
t est

o t h e r a x i s.

f e w r a t e d b y b o t h r a t e r s as e i t h e r

inferior,

c o n s t r u c t e d on the b a s i s

superior

rating.

o n t h e r a t i n g s as

there were very

student.

an inferior

constructed with scores

scores

compu­

1 point.

the v a l i d i t y

of t he r a t i n g of s t u d e n t s b y

double

by

a

of 4 p o i n t s

receiving various

of the m e t h o d s u s e d to d e s c r i b e
terms

5 points,

two a b i l i t i e s

Each

only.

5 points,

an average

m a x i m u m of 2 0 p o i n t s a n d a m i n i m u m

the

in s t a t i s t i c a l

rating 4 points,

e ach s t u d e n t w a s r a t e d on

during

calculated for each

superior rating was a l l o t t e d

students.

of the r a t i n g s .

these ratings

composite ratings

a t o t a l of

of t he

students

for m o s t

instructor rated a few

In o r d e r
tation,

in t h e r a t i n g

taught the

f i rs t t e r m w e r e r e s p o n s i b l e

I n a ll,

of

the

expectancy

superior,

from

average,

10 t h r o u g h 14 w e r e

10 w e r e

considered

c i t . . pp.

163-164.

c h a r t was
and inferior
considered

i n f e ri o r,

and

206
scores a b o v e

14 w e r e c o n s i d e r e d s u p e r i o r .

The expectancy
means

c h a r t can be

of the c h i - s q u a r e

t r e a t e d s t a t i s t i c a l l y by

test.^ The hypothesis

was th a t the s c o r e s m a d e b y the s t u d e n t s
e s s e n t i a l l y u n r e l a t e d to the r a t i n g s
their i n s t r u c t o r s o n t h e i r a b i l i t y
The e x p e c t a n c y charts,
served numbers
sis,

on T e s t IA w e r e

to t h i n k s c i e n t i f i c a l l y .

Tables XXXII and XXXIII,

of p e r s o n s

in e a c h c a t e g o r y and,

s h o w the o b ­
in p a r e n t h e ­

in e a c h of the c a t e ­

if th ere w e r e no r e l a t i o n s h i p b e t w e e n

Test IA a n d

tested

of the s t u d e n t s by

the n u m b e r s w h i c h w o u l d be e x p e c t e d

g o ri es

to b e

the sco re s on

the r a t i n g s .
TABLE XXXII

E X P E C T A N C Y C H A R T SHOWING- T H E C O M P A R I S O N
OF S C O R E S ON T H E T E S T IA P R E - T E S T A N D R A T I N G S

Superior

7 5 - 100

14*
(3.5)**
9
(10.6)
0
(8.9)
23

50 - 7 4
24 - 49
Totals

*
**

8
(12.6)
Ul
-<JVJl
• H
00

S c or es

KaLinss
Average

23
(31-5) .....
82

Inferior

Totals

0
(5.9)
6
(17.5)
32
(14.6)
38

22
66
55
143

D e g r e e s of F r e e d o m - 4
Chi-square - 83-179
F o r t hese data c h i - s q u a r e is s i g n i f i c a n t
a t the 1 p e r c e n t l e ve l a t 13-277
Observed number
Expected number

^ H e n r y E. G a r r e t t , S t a t i s t i c s in P s y c h o l o g y and.
E d u c a t i o n . N e w York:
Lo ng ma ns , G r e e n & Company.
1947.
pp. 252 -2 53 .

207
The e x p e c t a n c y
in the p r e - t e s t g r o u p

c h a r t for the scores of the s t ud e n t s
is p r e s e n t e d as T a b le XXXII.

square for these d a t a was 83.179.
13.277 is r e q u i r e d to m a k e
one p e r c e n t level,

T h e chi-

S ince a ch i- sq ua re of

the r e s u l t s s i g n i f i c a n t at

it is e v i d e n t that the h y p o t h e s i s

there was no r e l a t i o n s h i p b e t w e e n the test score a n d
rating m u s t be rej ec te d.

On the contrary,

the
that
the

there was a h i g h ­

ly s i g n i f i c a n t r e l a t i o n s h i p b e t w e e n the scores on T e s t IA
and the r a t i n g s of the s t u d e n t s b y

the

judges.

TABLE XXXIII
E X P E C T A N C Y C H A R T SHONIKO- THE C O M P A R I S O N
OF SC OR E S ON THE. T E S T IA P O S T - T E S T A N D RATING-S

Scores

Superior

80 - 104

21*
(6.9)**
2
(12.4)
0
(3,8)

55 - 79
30 - 54
T otal s

*
**

23

Ratings
Average
19
(24.2) .
56
(43.5)
6
(13.3)
81

Inferior

T o ta ls

0
(9.0)
14
(16.1)
16
(4.9)

40

30

72
22
134

D e g r e e s of F r e e d o m - 4
Chi-square - 84.471
F o r these data c h i - s q u a r e is s i g n i f i c a n t
at the 1 p e r c e n t level a t 13.277
Observed number
Expected number

The e x p e c t a n c y c h a r t for the scores made by the
students

in the p o s t - t e s t g r o u p is p r e s e n t e d as Table XXXIII.

208
The discrepancy in numbers in Table XXXII and Table XXXIII
is due to the fact that a number of students were absent
during the period when the test was given the second time.
It is of some interest to note that the inferior group had
a large number of absences.

The chi-square for these data

was 84.47 supporting the inference that there was a highly
significant relationship between the gcores on Test IA and
the rating of the students by the judges.

These findings

give evidence that the test was valid providing the ratings
of the judges were valid.
A comparison of the means of these three groups on
the two administrations of the test is of interest.

These

are presented in Table XXXIV.
TABLE XXXIV
MEAN GAINS OF STUDENTS RATED AS SUPERIOR,
INFERIOR AND AVERAGE ON TEST IA

Group
Post-test

Pre-test
Ratings

No.

Mean

3. D.

No.

Mean

S. D.

Gains

Superior

22

77.00

10.10

22

92.00

7.07

15.00

Average

82

57.10

12.63

81

70.45

12.12

13.35

Inferior

38

39.39

8.18

30

55.17

12.15

16.22

The differences between the means and the critical
ratios of these differences were calculated.

Table XXXV

gives evidence that the group rated as superior was superior

209
on performance on the test to a highly significant degree
and the performance of the group rated as inferior was
poorer than the performance of the group rated as average
to a highly significant degree.
TABLE XXXV
DIFFERENCES IN MEANS AND CRITICAL RATIOS OF DIFFERENCES
BETWEEN STUDENTS RATED SUPERIOR AND STUDENTS RATED AVERAGE
AND STUDENTS RATED AVERAGE AND STUDENTS RATED INFERIOR

Group

Superior
- Average
Dif. in mean
C.R.

Average
Dif. in mean

Inferior
C.R.

Pre-test

19.90

10.59

21.55

12.75

Post-test

17.73

10.75

15.28

8.08

Table XXXIV is also of interest in that it gives evi­
dence that the increase in scores discussed previously in
this chapter was not restricted to any particular group; the
means of all of the groups,

superior, average, and inferior

being higher after a term of Biological Science.
The final method used to indicate the validity of the
test was the determination of validity coefficients.

Coef­

ficients of correlation were calculated between total scores
on the rating scale and (l) scores made on the test prior to
taking Biological Science and,

(2) scores made on the same

test after one term of Biological Science.

These correlations

were .77 ± .04 and .72 * .04 respectively.

Such correlations

210
give evidence that the test had a considerable degree of
validity insofar (in all these comparisons) as one could
assume that the judges'

ratings were a valid measure of

the ability to think scientifically.

CHAPTER VII
S U M M A R Y A N D CONCLUSIONS
SUM MA R Y
1.

The purpose

of this study was to devise

a v a l id

test to measure some of the inductive aspects of the ability
to think scientifically,
2.

in the area of bio l og ic al science.

The educat io na l objectives to be m e a s u r e d by the

test were formulated from Keeslar's^* list of elements of
scientific

thinking.

These objectives were:

I.

The ability

to sense a problem.

II.

The ability

to state a problem.

III.

The ability

to delimit a problem.

IV.

The a b i l it y to recognize facts which are related
to the problem.

V.

The ability

to formulate hypotheses.

VI.

The a b i li ty

to plan experiments to test hypothese

VII.

The a bility

to carry out experiments.

VIII.

The a b i l it y

to interpret data.

IX.

The ability to formulate generalizations based
on da t a .

X.

The ability to apply generalizations to new
situations.

3.

The objectives were defined in terms of desired

1 Oreon Keeslar, "The elements of scientific method."
Science E d u c a t i o n . 2 9 s2 7 3 *• 278, December, 194-5.

212
behaviors
h a v i or s

i n v o l v e d in s c i e n t i f i c

thinking.

we r e r e c o g n i z e d as a t t e n d i n g

the

In all,

98 b e ­

s k i l l s of s c i e n ­

tific t hi nking.
4.

Situations

to d i s p l a y

in w h i c h the

the b e h a v i o r s

s t u d e n t c o u l d be e x p e c t e d

defined were

id e n t i f i e d .

The

sources of su c h s i t u a t i o n s w ere p o p u l a r a n d s c i e n t i f i c
Journals,

t e x t b o ok s,

and interviews with members

D e p a r t m e n t of B i o l o g i c a l S c i e n c e
5.

Techniques

of the

of M i c h i g a n S t a t e C ollege.

for o b t a i n i n g e v i d e n c e

concerning

the

a t t a i n m e n t of the e d u c a t i o n a l o b j e c t i v e s w e r e d e v e l o p e d .
some i n s t a n c e s
in o t h e r cases,
6.
items,

w e re

Nine

In

t e c h n i q u e s u s e d p r e v i o u s l y w e r e u t i l i z e d and,
n e w t e c h n i q u e s w e r e de vised.
t r y o u t tests,

constructed.

consisting

These nine

of a t otal of 637

t r y o u t te sts w er e

in­

t ende d to m e a s u r e r e s p e c t i v e l y :
T e st A.

Some Steps

in S c i e n t i f i c T h i n k i n g .

T e s t B.

The D e l i m i t a t i o n of P r o bl em s .

Test

C. E x p e r i m e n t a l P r o c e d u r e s .

T es t

D. O r g a n i z a t i o n of Data.

T e s t E.

E v a l u a t i o n of H y p o t h e s e s .

T e s t F.

Experimentation and
of Data.

the I n t e r p r e t a t i o n

Test

'3-. D r a w i n g of C o n c l u s i o n s .

Test

H. I n t e r p r e t a t i o n of Data.

T es t J.
7.

Generalizations and Assumptions.

The t r yo ut tests were a d m i n i s t e r e d to 168 students

213
during the spring term of 1950.

The means,

standard devi­

ations, and reliabilities were calculated for each of the
tryout tests.

The reliabilities, determined by the method

of split-halves with correction by the Spearman-Brown formu­
la, ranged from .59 to .93.

The mean, standard deviation,

and the reliability of the tryout tests considered as a
single test were determined.

The reliability determined

by the Kuder-Richardson formula was
8.

.92

-

.01.

Item validity and item difficulty were calculated

for each item of the tryout tests.

The scores on each of

the tryout tests were used as the criteria for item analysis.
The purpose of these determinations was to identify those
items of the tryout tests which were sufficiently discrim­
inating and of suitable difficulty to be included in a single
test, The Ability to Think Scientifically.
9.

In order to determine whether there was a suffic­

ient overlapping in the tryout tests to justify the elimina­
tion of some of the types of items in the construction of the
single test, The Ability to Think Scientifically, intercorre­
lations of all of the tryout tests were calculated.
intercorrelations ranged from .11 to .59.

These

Intercorrelations

corrected for attenuation ranged from .17 to .73.
10.

Coefficients of determination were calculated to

determine the degree of overlapping of the tests. The degree
of overlapping among the tryout tests ranged from 3 percent

214
to 53 percent.

These amounts of overlapping seemed to lndi

cate that there was not sufficient duplication to Justify
the elimination of any of these types of items.
11.

In order to determine whether any one of the try

out tests was sufficiently similar to the score on the
battery of tests to Justify its use instead of the score on
the tryout battery, the scores on each of the tryout tests
were correlated with the total score on the tryout test
battery.

These correlations ranged from .41 to .74 indicat

ing that all of the test had some relationship to the cri­
terion (total tryout test scores) but that no single test
measured all of the abilities appraised by the battery.
12.

Multiple correlations between the total tryout

test scores and each combination of two of the individual
tryout tests were correlated.

These multiple correlations

ranged from .54 to .87, showing some of the pairs of tests
were fairly adequate measures of the abilities involved in
scientific thinking, whereas other pairs were quite inade­
quate .
13.

A multiple correlation between the total tryout

test scores and seven of the nine tryout tests was calcu­
lated by the Vfherry-Doolittle method.
tion of .977 was obtained.

A multiple correla­

These data gave evidence that

the abilities could be measured quite adequately by less
tests than had been used in the tryout test battery.
14.

Correlations between the scores on the tryout

215
tests a n d the s c o r e s

on the q u a n t i t a t i v e

p o r t i o n of the

A m e r i c a n C o u n c i l on E d u c a t i o n P s y c h o l o g i c a l E x a m i n a t i o n
ranged from

*17

to

.38,

while

correlations between

scores on the t r y o u t t e s t s a n d

the

scores

on

the

the l i n g u i s t i c

p o rt io n of the A m e r i c a n

C o u n c i l on E d u c a t i o n P s y c h o l o g i c a l

Examination ranged from

.11 to

the scores

on the t r y o u t

.43.

t ests a n d

can C o u n c i l on E d u c a t i o n R e a d i n g

Correlations between
the s c o r e s

on the A m e r i ­

T e s t r a n g e d from

.10 to

.41.
15.

Test

I, T h e A b i l i t y

single t e st of 150 items,
tryout tests.
who h a d

This

was

to T h i n k S c i e n t i f i c a l l y , a

c o n s t r u c t e d f r o m items

t e s t was a d m i n i s t e r e d

of the

to 500 s t u d e n t s

c o m p l e t e d t h r e e terms of B i o l o g i c a l S c i e n c e a t the
9

end of the

soring

term of 1950,

b e g i n n i n g of the f al l
Science.
of this

The means,

and

to 2 4 0

s t u d e n t s a t the

t e r m of 1 950 w h o h a d h a d no B i o l o g i c a l
standard deviations,

and reliabilities

test w e r e d e t e r m i n e d for b o t h grou ps .

bilities

of T e s t I for

the two g r o u p s were

The r e l i a ­

.89 a n d

.91

respectively.
16.

It e m v a l i d i t i e s a n d

lated for e a ch
the test as
17.
single

item of this

the
Test

criterion
I proved

item difficulties were

test,

for the
too long

u s i n g the

calcu­

total score on

item a n a l ys is .
to be c o m p l e t e d in a

l a b o r a t o r y p e r i o d of one h o u r a n d f i f t y mi nu te s.

Therefore,

Test

IA, T h e A b i l i t y

c o n s t r u c t e d f r o m T e s t I, by

to T h i n k S c i e n t i f i c a l l y , was

the d e l e t i o n of t w e n t y - f i v e of

216
the poorer items as determined by item analysis.

This test

was administered at the beginning of the fall term to 330
students who had had no Biological Science and to 136 of
these same students at the end of the fall term of 1950.
The means,

standard deviations, and reliabilities were deter­

mined for the entire group and for the part of the group who
took the test again at the end of the term.

The reliabili­

ties were .91 - .01 and .90 - .02 respectively.
18.

The curricular validity of the test was estab­

lished by:
1.

Designing the test items to measure the
behaviors involved in scientific thinking.

2.

Submission of the tryout tests to competent
judges for criticism.

3.

Using free responses of students as items
wherever feasible.

4.

Careful selection of materials utilized in
the construction of the test items.

19.

The statistical validity of the test was

established by:
1.

Comparison of scores made on the tests with
scores made on tests of (a) intelligence,
(b) reading ability, and (c) knowledge
of facts.

2.

Comparison of scores made by students having
had no Biological Science with scores made
by students having had Biological Science.

3.

Comoarison of scores made on the test with
ratings of the students by their instruct­
ors on their ability to think scientifically.

217
20.

The

correlation between

tryout test b a t t e r y a n d

scores

the s c o r e s

I and the P s y c h o l o g i c a l E x a m i n a t i o n g a v e
The c o r r e l a t i o n b e t w e e n

b a t tery a n d

scores

Reading Test was

whereas

with the r e a d i n g t e s t w as
21.

Since

tif ically a n d the
ability,

the

tests was

.31 w h i l e

The

the

partial

calculated.
.28;

F o r the

with

The

on the t r y o u t test

23.

test both

involved reading

with reading ability

partialed

c o r r e l a t i o n of the t r y o u t

correlation

for T e s t

I was

s c i e n t i f i c a l l y wer e r e l a t e d

i n t e l l i g e n c e p a r t i a l e d out,

t r y o u t t ests

this p a r t i a l

c o r r e l a t i o n was

.18.

scores on the t o t a l t r y ­
exam­

in b a sic B i o ­

t e s t e d k n o w l e d g e of facts was
the 5 00

were

correlation

the p o r t i o n of the c o m p r e h e n s i v e

S c o r e s m a d e by

I

to t h i n k s c i e n ­

ina tion u s e d to m e a s u r e o v e r a l l a c h i e v e m e n t
log ical S c i e n c e w h i c h

same c o r r e l a ­

c o r r e l a t i o n of T e s t

correlation between

out test b a t t e r y a n d

on T e s t

the d e g r e e to w h i c h r e a d i n g a b i l ­

for T e s t I the p a r t i a l

22.

Scores

C o u n c i l on E d u c a t i o n

partial

to t h i n k

partial c o r r e l a t i o n s ,

the

t e s t s of the a b i l i t y

In o r d e r to d e t e r m i n e

ity a n d the a b i l i t y

.51.

.43.

partial correlations,
calculated.

was

the

intelligence

out, w e r e

.33.

scores

o n the A m e r i c a n

.49,

total

on the A m e r i c a n C o u n c i l on

E d u c a t i o n P s y c h o l o g i c a l E x a m i n a t i o n was

tion.

on the

.33.

students who took T est I

a f t e r t h ree t e r m s of B i o l o g i c a l S c i e n c e were

c o m p a r e d with

a n o t h e r g r o u p of 2 40 s t u d e n t s w ho h a d h a d no B i o l o g i c a l

218
Science.

The difference between the means of these two

groups was highly significant.

Test IA was given as a

pre-test to 136 students before taking Biological Science
and as a post-test to these same students after completion
of one term of Biological Science. The difference between
the means for the pre-test and the post-test was also highly
significant, giving some evidence that if the test was a
valid measure of the ability to think scientifically,

the

ability could be improved as a result of instruction.
24.

One hundred and forty-three students taught by

the present investigator and one other instructor in B i o ­
logical Science were rated by means of the rating scale
presented in Appendix IV on their ability to think scien­
tifically.

These students were a part of the 330 students

who were given Test IA at the beginning of the Fall term
of 1950.

As previously mentioned,

136 of these students

were given Test IA as a post-test at the end of the Fall
term of 1950.

These students were also rated on their

ability to think scientifically by the instructors who
taught them during the Winter term of 1951.
25.

The chi-square test revealed that there was a

significant relationship between the scores made on Test IA,
both as a pre-test and as a post-test, and the averaged
ratings of. the judges.
26.

The difference between the means of the test

219
for those students rated as superior and the means for those
students rated as average was highly significant.

So also

was the difference of the means of those rated as average
and those rated as inferior.
27.

The correlation between scores on the pre-test

and the ratings of the judges was
on the post-test and the

.77 - .04.

Between scores

judges' ratings the correlation was

.72 t .04.

CONCLUSIONS

Cn the basis of these findings the conclusion may be
drawn that the test, The Ability to Think S c i e n t ifically,
was sufficiently reliable for individual use, and that the
test had sufficient validity to be used as a measure of the
ability to think scientifically.
The data presented here support the inferences drawn
from findings of previous studies that there is a moderate
positive relationship between the ability to think scientifi­
cally and (1) intelligence,
knowledge of facts.

(2) reading ability, and (3)

The findings of this study also support

the inference that the ability to think scientifically is
subject to improvement when this is a specific objective of
instruction.

2 20
EDUCATIONAL
Educational

implications

Michigan State C o l l e g e .

The

SclentlfIcally. should be
t e a c h i n g of th e
Michigan State

IMPLICATIONS
f o r B i o l o g i c a l S c i e n c e at

test,

useful for

scientific method
College.

The A b ility

The

the a p p r a i s a l of

the

laboratory

scientific method.

had been a v a i l a b l e
being taught and
be w r i t t e n

poses.

to a p p r a i s e

in t he

The

test

The

to a p p r a i s e

t he

studies at Michigan

the e x p r e s s e d o b j e c t i v e
However,

from r e m e d i a l

laboratory

the v a l u e

low scores

instruction

be a l l o w e d to t a k e

program,
the

be u s e f u l

on this

which may

for d i a g n o s t i c

pur­

might

include

of the

and

pre-test might profit

in this a r ea.
to d e t e r m i n e w h i c h

comprehensive

thi s

students

should

e x a m i n a t i o n a f t e r the
three-term

tes t as a m e a s u r e of o ne of the

course.

The relative merit

have

studies now

of s t u d i e s

c o m p l e t i o n o f o n e t e r m i n s t e a d of a f t e r the

scientific

tes t

futu re.

should also

A pre-test

objectives

no v a l i d

t e s t m i g h t b e a d m i n i s t e r e d as a p r e - t e s t ,

students making

s equence,

the

in B i o l o g i c a l S c i e n c e a t

State C o l l e g e h a v e b e e n w r i t t e n w i t h
of t e a c h i n g

to T h i n k

thinking might

of v a r i o u s m e t h o d s
be

of t e a c h i n g

e v a l u a t e d b y th e test.

Some

claimed that a course without a lecture would implement

this o b j e c t i v e ,

while others have

claimed that a lecture

-

221
d e m o n s t r a t i o n m e t h o d w o u l d be as e f f e c t i v e as a ny other.
These a n d o t h e r methods m i g h t b e a p p r a i s e d b y use of this
test d e s i g n e d to m e a s u r e the a b i l i t y to t h i n k scientifi cally.
The f i n d i n g s of a s i g n i f i c a n t g a i n in scores a f t e r
taking B i o l o g i c a l S c i e n c e m a y s t i m u l a t e f u r t h e r e d u c a t i o n a l
research.

An

experiment

should be

what factors a r e r e s p o n s i b l e

Educational

for the

implications

general e d u c a t i o n .

Since

c a r r i e d out
increase

for s c i ence

to d e t e r m i n e
in scores.

courses

in

the a b i l i t y to t h i n k s c i e n t i f i c a l ­

ly is a stated o b j e c t i v e of a l m o s t a l l s c i e n c e courses
the g e n e r a l e d u c a t i o n p r o g r a m the test,
S c i e n t i f i c a l l y . m i g h t be u s e f u l

The A b i l i t y

in o t h e r c o u rses

in

to T h ink

in b i o l o g y

at o t h e r I n s t i t u t i o n s or m o d i f i e d for use for courses

in the

phy sical sciences.
Those t e s t items w h i c h p r e s e n t a n e w t e c h nique

for

e v a l u a t i n g the a b i l i t y to t h i n k s c i e n t i f i c a l l y m a y stimulate
fur ther w o r k in the d e v e l o p m e n t of tests
ability.
use

The

to m e a s u r e this

test as d e s i g n e d is p r o b a b l y too d i f f i c u l t for

in the s e c o n d a r y

school;

may be u s e f u l to persons

however,

constructing

some of the techniques
tests for s e c o ndar y

school use.
The findi n g s of i m p r o v e m e n t
scientifical ly,

a l t h o u g h not

c o n t r o l g r o u p was

in the a b i l i t y to think

in i t s e l f conclusive,

i n c l u d e d in this

study,

since no

t e n d to support

the c o n c l u s i o n that s c i e n t i f i c t h i n k i n g can be taught.

The

222
accumulating evidence for this conclusion has far-reaching
educational implications, and encourages educators to make
further efforts to implement this Important objective.
Other educational implications.

The test, The

Ability to Think Scientifically, might have some value for
prediction of success in the field of science, or the tech­
niques presented here might be modified in the construction
of such tests.

The present need for detecting of future

scientists might be in some measure rnet by portions of this
test.
Some of the techniques used in this test might also
be modified for the construction of tests of critical think­
ing in other areas,

such as the social sciences.

PROBLEMS SUGGESTED BY THE STUDY
Since the purpose of this study was to construct a
reliable and valid measure of the ability to think scientifi­
cally, the study presented more problems than it solved.

It

was not the primary purpose of this study to investigate
educability in the ability to think scientifically, nor was
it the purpose of this study to investigate the relationship
of ability to think scientifically to other traits such as
reading ability and Intelligence.
of the test,

However,

in the validation

some data relating to the above mentioned prob­

lems were accumulated.

These data suggest a number of

223
problems.
The v e r y e v ident question which arises
study is "Did instr uction

from this

in scientific think i n g cause the

significant increase in scores on the test?"
experiment s h o u l d he conducted.

A controlled

One group should be

taught

by the m e thod u s e d in B i o l o g i c a l Science at M i c h i g a n State
College,

a seco n d g r o u p should be taught the same subject

matter by t ra d i t i o n a l m e t h ods a nd a third g r o u p should
receive no science training.

Such an e x p e riment m i g h t indi­

cate whethe r the l a b o rato ry program in B i o l o g i c a l S c i ence
with the t e a c h i n g of the scientific method as
jective is more e f f e ct ive
than traditiona l methods.

in evoking changes

its m a j o r o b ­
in b e h a v i o r

It should also throw light on the

question of w h e t h e r a b i lity to think s cientifically
product of the teaching of science.

In addition,

is a b y ­

it should

show whet her imp rovement in the ability to think s c i e n t i f i ­
cally

is m e r e l y a g r o w t h or ma t u r a t i o n process.
A n o t h e r probl em arises from the finding of a moderate

correlation b e t w e e n the test, The A b i l i t y to Think S c i e n t i f i ­
cally . and intelligence.

The problem suggested is "What

factors of intelligenc e are related to the ability to think
scientifically."

In order to answer this question Thurstone's

test of P r i mary M e n t a l Abilities and the test,
Think Scientifically,

The A b i l i t y to

could be given to a g r oup of students.

Factor analysis might reveal the loadings of various

factors

224
in the test.
The finding of a correlation of .33 between knowledge
of facts and ability to think scientifically indicates that
some relationship exists between knowledge of facts and the
ability to think scientifically.

Since the factual test used

in this correlation was not over the same subject matter as
the test itself, this may not reflect the true relationship,
which might be higher than reported in this study.

In the

construction of the test reported in this study it was
assumed that students knew some general biological facts and
vocabulary,

such as the terms vitamin and bacteria.

assumption may not have been valid;

This

therefore, a test should

be devised to measure knowledge of the facts and vocabulary
which were assumed to be general information.

This informa­

tion test should be administered just prior to the administra­
tion of the test, The Ability to Think Scientifically.

A cor­

relation between the two tests might reveal a more valid rela­
tionship between knowledge and the ability to think scientifi­
cally.
Test IA was administered to 136 students as a pre-test
and as a post-test after one term of Biological Science.

A

few students made lower scores on the post-test than on the
pre-test but most of the students made gains.
ranged from one to 41 points.

These gains

This was not unusual; a test,

retest situation almost always shows a similar trend.

225
However, the question may be asked, “Why do a few students
fail to make any gain, while a few others make gains of
almost one hundred percent of this original score?”

Al­

though the variation might be due to chance, the problem
seems to be worth investigating.
suggested.

Several hypotheses are

It might be that those students who participate

actively in the laboratory program made large gains while
those who do not participate actively in the laboratory pro­
gram make small gains.

This hypothesis could be tested by

individual case study.

A few students could be observed

carefully by instructors and ratings of their acceptance of
the objective of the laboratory program correlated with
gains on the test.
Another hypothesis is that there may be a relation­
ship between gains on the test, and gain in knowledge of
biological facts.

This hypothesis could be tested by giving

pre-tests and post-tests.

The test of the ability to think

scientifically, and a test of knowledge of biological facts
taught in the course could be used.

G-ains on the two tests

could be correlated.
As discussed in Chapter IV, the test of ability to do
scientific thinking was limited to a measurement of the
critical aspects of scientific thinking.

This limitation of

the problem suggested a field of investigation which is of
interest.

What is the relationship between critical

226
thinking and creative thinking?

What part does critical

thinking play in creative thinking?

How can creative think­

ing be measured reliably and validly?
Problems of technique in test construction are also
suggested by this study.

These are (1) to devise a valid

and reliable test to be administered in fifty minutes,

(2)

to devise a test centered about a single problem, and (3)
to devise several forms of such a test.

A shorter test

would be desirable if it were to be used as a pre-test and
as a post-test each term, since it would not necessitate
the use of two entire laboratory periods.

A test revolving

around a single problem would aid in the integration of the
materials, while several forms of the test would reduce the
possibility of memory playing a part in observed increase
in scores.

227
LITERATURE CITED
BOOKS
Adkins, Dorothy C.
Construction a n d Analysis of A c h i e v e ­
ment T e s t s . Washington: U. S. (Government Printing
Office.
1947.
Pp. 292.
Aikin, Wilford M.
The Story of the Eight-Year S t u d y .
New York: Harper and Brothers.
1942.
Pp. 157.
American Council on Education, Executive Committee of the
Cooperative Study in (General Education.
Cooperation
in (General E d u c a t i o n . Washington: American Council
on Education.
1947.
Pp. 220.
Baten, William D.
Elementary Mathematical S t a t i s t i c s .
New York: John Wiley and Sons.
1938.
Pp. 338.
Bond, Austin D. M.
A n Experiment in the Teaching of
(Genetics with Special Reference to the Objectives
of G e neral E d u c a t i o n . Contributions to Education,
No. 797*
New York: Bureau of Publications, Teachers
College, Columbia University.
1940.
Pp. 99.
3uros, Oscar K.
The Nineteen Forty Mental Measurement
Y e a r b o o k . Highland Park, N ew Jersey: The Mental
Measurement Yearbooks.
Pp. 674.
Curtis, Francis D.
Some Values Derived from an Extensive
Reading of (General S c i e n c e . Contributions to E d u c a ­
tion, No. 163.
N e w York: Bureau of Publications,
Teachers College, Columbia University.
1924.
Pp. 142.
Daily, Benjamin W.
The Ability of High School Pupils to
Select Essential Data in Solving P r o b l e m s . Contribu­
tions to Education, No. 190.
N e w York: Bureau of
Publications, Teachers College, Columbia University.
1925.
Pp. 103.
Davis, Frederick B.
Item-Analvsis D a t a . Cambridge:
(Graduate School of Education, Harvard University.
1946.
P p . 42.
Dewey, John.
How We T h i n k . Boston:
Company.
1909.
Pp. 244.

D. C. Heath and

228
E d u c a t i o n a l P o l i c i e s C o m m i s s i o n , E d u c a t i o n f or A l l
American Y o u t h . Washington: National Education
Association.
194-4.
Pp. 421.
Flanagan, J o h n C.
C r i t i c a l R e q u i r e m e n t s f or R e s e a r c h
P e r s o n n e l . P i t t s b u r g : A m e r i c a n I n s t i t u t e for
Research.
1949.
Pp. 66.
Cans, R o ma.
A S t u d y of C r i t i c a l R e a d i n g C o m p r e h e n s i o n
in the I n t e r m e d i a t e G-rades. C o n t r i b u t i o n s to
E d u c a t i o n , No. 811.
N e w Y o rk: B u r e a u of P u b l i c a t i o n s ,
Teachers College, Columbia University.
1940.
Pp. 135.
Garrett, H e n r y E.
S t a t i s t i c s in P s y c h o l o g y a n d E d u c a t i o n .
N e w Y ork: L o n g m a n s , G r e e n a n d C o m pany.
1947.
Pp. 487.
G e n e r a l E d u c a t i o n in the A m e r i c a n C o l l e g e . T h i r t y - e i g h t h
Y e a r b o o k of the N a t i o n a l S o c i e t y for the S t u d y of
E d u c a t i o n , P a r t II, Pp. 380.
B l o o m i n g t o n , Illi nois:
P u b l i c S c h o o l P u b l i s h i n g C o m p a n y , 1939.
Glaser, E d w a r d K.
A n E x p e r i m e n t in the D e v e l o p m e n t of
C r i t i c a l T h i n k i n g . C o n t r i b u t i o n to E d u c a t i o n , No. 843.
N e w York: B u r e a u of P u b l i c a t i o n s , T e a c h e r s College,
Columbia University.
1941.
Pp. 212.
Gray,

W i l l i a m 3., edi tor.
Education.
Chicago:
Pp. 249.

R e c e n t T r e n d s in A m e r i c a n C o l l e g e
U n i v e r s i t y of C h i c a g o Press.
1931.

Harvard University.
G e n e r a l E d u c a t i o n in a F r e e S o c i e t y .
C a m b r i d g e : H a r v a r d U n i v e r s i t y Press.
1945.
Pp. 257.
H a w kes, H e r b e r t E., E. F. L i n d q u i s t , a n d C. R. Mann,
The C o n s t r u c t i o n a n d U s e of A c h i e v e m e n t E x a m i n a t i o n s .
Cambridge: Houghton M i f f l i n Company.
1936.
Pp. 497.
Judd,

C h a r l e s H.
E d u c a t i o n as C u l t i v a t i o n of the H i g h e r
M e n t a l P r o c e s s e s . N e w York: The M a c m i l l a n Company.
1936.
Pp. 201.

M c C a l l , W i l l i a m A.
Measurement.
Company.
1939.
Pp. 535.

N e w York:

M c N e m a r , Quin n.
Psychological Statistics.
'Wiley a n d S o n s . 1949 . P p. 3 o 4.

The M a c m i l l a n
N e w York:

M u s k i n g u m C o l lege.
A C o l l e g e L o o k s at Its P r o g r a m .
C o l u mbus: The S p a h r a n d G l e n C o m pan y.
1937.
Pp.

John

306.

229
Moll, Victor H.
The Habit of Scientific Thinking. A Hand­
book for Teachers. New York: Bureau of Publications,
Teachers College, Columbia University.
1935.
Pp. 27 .
Moll, Victor H.
The Teaching of Science in the Elementary
and Secondary S c h o o l s . New York: Longmans, 3-reen and
Company.
1939.
Pp. 238.
Program for Teaching; S c i e n c e . Thirty-first Yearbook for
the National Society for the Study of Education,
Part I, Pp. 364.
Eloomington, Illinois: Public School
Publishing Company.
1932.
President's Commission on Higher Education.
Higher Educa­
tion for American Democracy. Volume I. Establishing;
the G o a l s . New York: Haroer and Brothers.
1947.
Pp. 103.
Progressive Education Association.
Science in General
E du cation. New York: D. Appleton-Century Cornoany.
1938.
Pp. 591.
Remners, Hermann H. and N. L. G-age. Educational Measurement and Evaluation.
New York: Harper and Brothers.
1943.
Pp. 5o0.
Science Education in American S c h o o l s . Forty-sixth Yearbook
of the National Society for the Study of Education,
Part I, Pp. 298.
Chicago: The University of Chicago
Press.
1947.
Smith, Eugene R . , Ralph V/. Tyler and the Evaluation Staff.
Appraising and Recording Student Progress. New York:
Harper and Brothers.
1942.
Pp. 550.
Spafford, Ivol, editor.
Building a Curriculum for 3-eneral
Edu cation. Minneapolis: The University of Minnesota
Press.
1943.
Pp. 409.
Stroud, James B.
Psychology In Education. New York:
Longmans, Green and Company.
1946.
Pp. 664.
Tyler, Ralph W.
Constructing Achievement T e st s. Columbus,
Ohio: Ohio State University.
1934.
Pp. 110.
Tyler, Ralph W.
Service Studies in Higher Education.
Columbus, Ohio: Ohio State University.
1932.
Pp. 283Matson, Goodwin B.
The Measurement of Falrmindedness. Con­
tributions to Education, No. 176.
New York: Bureau of
Publications, Teachers College, Columbia University.
1925.
Pp. 97.

230
MONOGRAPHS AND BULLETINS
Conrad, Herbert S.
Characteristics and Use of ItemAnalysls D a t a . American Psychological Association,
Psychological Monographs: General and Aoplied.
No. 295.
1948. .p. 13.
National Education Association.
Reorganization of Science
in Secondary S c h o o l s . U. S. Bureau of Education
Bulletin, 1920, No. 26, Washington: Government Print­
ing Office.
Pp. 62.
PERIODICAL LITERATURE
Alpern, Morris L.
"The ability to test hypotheses."
Science E d u c a t i o n . 30:220-229, October, 1946.
Barnard, J. Darrell. "The lecture-demonstration versus the
problem-solving method of teaching a college science
course."
Science E d u c a t i o n . 26:121-132, October, 1942.
Arnold, Dwight.
"Testing ability to use data in the fifth
and sixth grades."
Educational Research B ulletin.
17:255-259» December, 1937.
Blair, Glenn M. and Max R. Goodson.
"Development of
scientific thought in general science."
School R e v i e w .
47:696-700, November, 1939.
Beauchamp, Wilber L.
"Teaching scientific method."
School
Science and M a t h e m a t i c s . 34:508-510, May, 1934.
Billings, Marion L.
"Problem solving in different fields
of endeavor."
American Journal of Psychology,
46:259-272, April, 1934.
Bingham, Eldred N.
"A direct approach to the teaching of
the scientific method."
Science Education. 331241-249,
April, 1949.
B u rh e, Paul J.
"Testing for critical thinking in physics."
American Journal of P h y si cs . 17!527-532, December, 1949.
Committee on Research in Secondary-School Science.
"Problems related to the teaching of problem-solving
that need to be investigated."
Science Education.
34:180-184, April, 1950.

231
Crowell, Victor L. Jr.
’’The scientific method."
School
Science and M a t h e m a t i c s . 37:525-531, May, 1937.
Curtis, Francis D.
"Teaching scientific methods."
School
Science and M a t h e m a t i c s . 37:816-819, November, 1934.
Davis, Ira C.
"is this the scientific method?"
School
Science and M a t h e m a t i c s . 34:S3-S6, January, 1934.
Dewey, John.
"Method in science teaching."
E du ca ti on . 29:119-123, April, 1945.

Science

Downing, Elliot R.
"The elements and safeguards of scien­
tific thinking."
Scientific M o n t h l y . 26:231-243,
March, 1928.
Downing, Elliot R.
"Some results of a test on scientific
thinking."
S cience E d u c a t i o n . 20:121-128, October,
1936.
Downing, Elliot R.
"Teaching scientific method."
School
Science and M a t h e m a t i c s . 34:400-405, March, 193*5.
Edwards, Thomas B.
"Measurement of some aspects of
critical thinking."
Journal of Experimental Sducation, 18:263-279, March, 1950.
Engelhart, Max D. and Hugh B. Lev/is, "An attempt to measure
scientific thinking."
Educational and Psychological
M e a s u r e m e n t . 1:289-294, Third Quarter, 194l7
Flanagan, John 0.
"General Considerations in the selection
of test items and a short method of estimating the
product-moment coefficient from data at the tails of
the distribution."
Journal of Educational Psychology.
30:574-680, December, 1939.
Frutchey, Fred P., Ralph tV. Tyler and B. Clifford Hendricks.
"Measuring the ability to interpret experimental data."
Journal of Chemical E d u c a t i o n . 13:62-64, February,
1936.
3-rant, Charlotte L. and Elsa M. Meder.
instruments for biology students.'
28:106-110, March, 1944.

"Some evaluation
Science Education.

3-rener, Norma and Louis E. Raths, "Thinking in third grade.
Educational Research B u l l e t i n . 24:33-42, February,
1945.

232
3-rim, P a u l R.
“ interpretation of d a t a a n d r e a d i n g ability
in s o c i a l s t u d i e s . "
Educational Research Bull e t i n .

19:372-374, September,
Hart,

1940.

E. H.
" M e a s u r i n g c r i t i c a l t h i n k i n g in a s c i e n c e
course.1
C a l i f o r n i a J o u r n a l of S e c o n d a r y E d u c a t i o n .
14:334-338, O c t o b e r , 1939.

Hered, W i l l i a m a n d H e r b e r t A. T h e len.
"Th e h i g h - 3 c h o o l
c h e m i s t r y te s t of the A r m e d F o r c e s I n s t itu te."
J o u r n a l of C h e m i c a l E d u c a t i o n . 21:507-515, O c t o b e r ,

1944.
H erring, J o h n P.
" M e a s u r e m e n t of some a b i l i t i e s in s c i e n ­
tific t h i n k i n g ."
J o u r n a l of E d u c a t i o n a l P s y c h o l o g y ,
9:535-558, D e c e m b e r , 1918.
H i g gins, C o n w e l l D.
" T he e d u c a b i l i t y of a d o l e s c e n t s in
inductive ability."
S c i e n c e E d u c a t i o n . 2 9 : 82-85,
M a r c h , 1945.
Howel l, W i l l i a m S.
" T h e e f f e c t of h i g h s c h o o l d e b a t i n g
on c r i t i c a l t h i n k i n g . "
5 p e e c h M o n o ^ r a p h s . 1 0 : 9 6-102,
Ann ual, 1943.
Johnson, A l m a .
"An e x p e r i m e n t a l study in the a n a l y s i s a n d
m e a s u r e m e n t of r e f l e c t i v e t h i n k i n g . "
Speech M o n o g r a p h s .
10 : 8 3 - 9 6 , A n n u a l , 1943.
Keeslar, Oreon.
"A s u r v e y of r e s e a r c h s t u d i e s d e a l i n g w i t h
the e l e m e n t s of s c i e n t i f i c m e t h o d as o b j e c t i v e s of
i n v e s t i g a t i o n in s c i e n c e . "
Science E d u c a t i o n .
29:212-216, O c t o b e r , 1945.
K e e slar, Oreon.
" T h e e l e m e n t s of s c i e n t i f i c meth od."
S c i e n c e E d u c a t i o n . 29:273-278, D e c e mber, 1945. Kelley, T r u m a n L.
" T h e s e l e c t i o n of u p p e r a n d lower g r o u p s
for the v a l i d a t i o n of test items.
J o u r n a l of E d u c a ­
t i o n a l P s y c h o l o g y . 30:17-24, J anuary, 1939.
Le Sourd, H o m e r W.
" T e a c h i n g s c i e n t i f i c method."
School
S c i e n c e a n d M a t h e m a t i c s . 34:234-235, March, 1934.
M a l l i s o n , G e o r g e Gr. " T h e i m p l i c a t i o n of r e c e n t r e s e a r c h
in the t e a c h i n g of s c i e n c e at the s e c o n d a r y s c h o o l
level."
J o u r n a l of E d u c a t i o n a l R e s e a r c h . 43:321-342,
J a n u a r y , 1950.

Keuhof, Mark.
" I n t e g r a t e d i n t e r p r e t a t i o n of data t e s t s . 11
Sci ence E d u c ation, 2 6 : 2 1-26, January, 1942.
Moll, V i c t o r H.
H T e a c h i n g the h a b i t s of sci e n t i f i c t h i n k ­
ing." T e a c h e r s C o l l e g e R e c o r d . 355 202-212, December,
1933.
Raths, Louis S., "A t h i n k i n g test."
Educational Research
B u l l e t i n . 23572-75, Marc h , 1944.
Read,

John 3-.
"a n o n - v e r b a l test of the a b i l i t y to use
the science m e t h o d as a p a t t e r n of thinking."
• *
S c i e n c e E d u c a t i o n . 33 5361-366, December, 1949.

Sinclair, James H. a n d R u t h S. Tolman.
"An a t t e m p t to
study the e f f e c t of s c i e n t i f i c t r a i n i n g upo n p r e j u d i c e
a n d i l l o g i c a l i t y of thought."
J o u r n a l of E d u c a t i o n a l
P s y c h o l o g y . 2 4 : 3 6 2 - 3 7 0 , May, 1?33.
Smith, V i c t o r C.
"A study of the d e g r e e of r e l a t i o n s h i p
existing b e t w e e n a b i l i t y to r e c a l l a n d two m e a s u r e s of
a b ility to reason."
S c i e n c e E d u c a t i o n . 30:86-90,
March, 19^+6.
Strauss, Sam.
"Some r e s u l t s of the test of s c i e ntifi c
thinking."
S c i e n c e E d u c a t i o n . 16:89-93, December,
1931.
Teller, James D.
" i m p r o v i n g a b i l i t y to i n t e rpret e d u c a t i o n ­
al data."
E d u c a t i o n a l R e s e a r c h B u l l e t i n . 19:363-371,
September, 1940.
Teichman, Louis.
"The a b i l i t y of science students to make
conclusions."
S c i e n c e E d u c a t i o n . 26:266-279, December,
1944.
Ter Keurst, A r t h u r J. an d R o b e r t E. 3ugbee.
"A test on
s c i e ntific method."
J o u r n a l of E d u c a t i o n a l R e s e a r c h .
36:489-501, Marc h , 1943.
Tyler, Ralph.
" M e a s u r i n g the r e s ults of college instruction
E d u c a t i o n a l R e s e a r c h B u l l e t i n . 11:253-260, Kay, 1932.
Ullsvik, B j a m e R.
"An a t t e m p t to m e a s u r e critical judgment
S c h o o l S c i e n c e a n d M a t h e m a t i c s . 49:445-452, June, 1949.
Vood, b e n D. a n d F. 3. B e e r s .
" K n o w l e d g e vers us thinking."
Teach ers C o l l e g e R e c o r d . 37:487-499, March, 1936.

234
Weller, Florence.
‘'Attitudes and skills in elementary
science."
Science Education. 17;90-97, April, 1933.
Zyve, D. L.
"A test of scientific aptitude."
Journal of
Educational Psychology. 18:325-546, November, 1927.
TESTS
Love, Kenneth G-. "Scientific Attitude - Thinking." Every
Pupil Test. Columbus, Ohio: The State Department of
Education. April, 1937.
UN PUB L131-1ED MA TERIA LS
Bedell, Ralph C.
"The Relationship Between the Ability to
Infer in Specific Learning Situations." Unpublished
Doctor's thesis, Department of Education, University
of Missouri.
1934.
Pp. 54.
Dunning, Cordon M.
"The Construction and Validation of a
Test to Measure Certain Aspects of Scientific Thinking
in the Area of First Year College Physics."
Unpublished Doctor's thesis, Department of Education,
Syracuse University.
1948.
Pp. 108.
Edwards, Thomas B.
"Measurement of Some Aspects of Critical
Thinking." Unpublished Doctor's thesis, Department of
Education, University of California.
1949.
Pp. 200.
Fleming, Maurice C.
"An Analytical Study of Certain Out­
comes of a Course for Orientation in Biological Science
at College Level." Unpublished Doctor's thesis, Depart­
ment of Education, New York University.
1942.
Pp. 324.
Furst, Edward J.
"Changes in Organization of Various
Abilities and Skills after Two Years of G-eneral Educa­
tion at the Secondary-School Level." Unpublished
Doctor's thesis, Department of Education, University of
Chicago.
1948.
Pp. 249.
Higgins, Conwell D. "Educability of Adolescents in Inductive
Ability." Unpublished Doctor's thesis, Department of
Education, New York University.
1942.
Pp. 206.

235
K o f f , Alfred G-. ”A Test for Scientific Attitude.”
Unpublished M a s t e r ’s thesis, Department of Education,
University of Iowa.
1930.
Pp. 156.
Thelen, Herbert A.
”An Appraisal of Two Methods for
Teaching Scientific Method in G-eneral Chemistry.”
Unpublished D o c t o r ’s thesis, Department of Education,
University of Chicago.
194-4.
Pp. 370.
Weisman, Leah L.
”Some Factors Related to the Ability
to Interpret Data in Biological Science.” Unpublished
Doctor's thesis, Department of Education, University
of Chicago.
1946.
Pp. 176.

APPENDIX I

236
TEST A
SOME STEPS IN SCIENTIFIC THINKING
This test is designed to measure your ability to
differentiate phases of thinking.
These steps include
major problems or perplexities, possible solutions to
problems, observations which are not results of experi­
mentation but rather preliminary observations, results of
experimentation, and conclusions.
Certain parts of the paragraph are underlined, and
each underlined item is a question.
Choose the proper
response from the key and blacken the appropriate space
on the answer sheet.

Key
1.
2.
3.
4.
5.

A major problem (either stated or implied).
Hypothesis (possible solution to problem).
Results of experimentation.
Observations (not experimental).
Conclusion (probable solution to problem).

Ever since the days of Hippocrates one of medicine's
big mysteries has been (1) the bodily process that trans­
forms disease into death.

With a special type of equipment

which makes blood vessels transparent and three dimensional
under a microscope, one investigator began examining the
blood of healthy animals.

The (2) blood cells of the

healthy animals are separate and move rapidly.

One day

while observing the blood of a monkey dying of malaria, this
researcher saw that the (3) blood was flowing slowly.
consistency changed before his eyes.

Its

The blood (4) cells

began to clump together in sluggish masses.

The invest­

igator realized that this (5) altered blood might be a major
cause in the animal's illness.

If the blood changes could

237
occur In malaria they might oocur

Abbreviated Key

In other diseases as well - perhaps

1. A major problem
2. Hypothesis
3. Results
4. Observations
5. Conclusions

all diseases. The Investigator
studied the circulation In other

diseased animals and found this clumping of blood (which
he called "sludged") in every diseased animal and those
suffering from severe injury or disease.
the red cells stick together?

(6) What makes

It was seen that (7) during

disease and injury the body deposits a sticky substance on
blood cells, causing the blood cells to stick together and
clog the circulation.
death occurs.

If the process continues unchecked

Other workers had seen sludged blood before

but its significance had been missed.

This researcher thinks

that red cells (8) clumping may account for many cases of
mental illness, since he has found (9) in a few psychiatric
patients plugB in the brain indicating that there has been
sludging at one time.

He also suggests that aging and

senility may (10) be accounted for by accumulated damage
from injury and illness.

The discovery that sludge is a

critical factor in many diseases may prove to be one of the
great accomplishments of medicine.
fighting disease.

It opens up new ways of

(11) To find drugs to break up the sludge

and (12) to discover why sludge forms are two approaches
which are being followed.

238
The following key Is to be used for the succeeding
paragraph.
Certain parts of the paragraph are underlined,
and each underlined Item Is a question.
Choose the proper
response from the key and blacken the appropriate space in
the answer sheet.

Mz
1.
2.
3.
4.
5.

A major problem (stated or implied).
Hypothesis (possible solution to problem).
Result of experimentation.
Initial observation (not experimental).
Conclusion (probable solution of problem).

(13) How does a homing pigeon navigate over territory
it has never seen before?
the pigeon in some way?

(14) Do air currents stimulate
(15) Are the pigeons eaulped with

some sort of magnetic compasses: that i s . are they sensitive
to the earth’s magnetism?

Yeagley tested the latter by

fastening small magnets to the wings of well-trained pigeons.
(16) Most of these birds never got h ome.

(17) Others, carry­

ing esual wing weights of non-magnetlo copper, made the home
roost without trouble. (18) indicating that the earth*s mag­
netism is a factor in pigeon navigation.

But the pigeons

magnetic compass could not, by itself, bring him back to his
roost; because many places on the earth's surface have
identical magnetic conditions.

Leagley endeavored (19) to

determine the other guiding factor.

(20) It might be the

sun or stars, but pigeons navigate under clouds.

While

looking at a map which had lines representing the intensity
of the earth's magnetism, he noted that the lines were crossed
at varying angles by the parallels of latitude.

(21) If

pigeons are sensitive to some factor connected with the lines
of latitude, they would have all they need to find their way

239
home. The next step was (22) to

Abbreviated Key

find some physical foroe. something
the pigeons might be able to d e t e o t .
related to the lines of latitude.

1. A major problem
2.
3. Results
4.
5* Conclusions

The effect of the earth's turning varies directly with
latitude; objects near the equator are carried daily around
the earth's circumference, moving at over 1,000 mi. per hr.
Objects near the poles are carried around more slowly.

The

direction and variation of this circling can be recorded by
various man-made instruments.
feel it. too?

(23) Why shouldn't the pigeons

(24) If they could, they would have, along

with their magnetic compass a satisfactory navigating instru­
ment.

Yeagley trained hundreds of pigeons to return to their

home roosts at State College, Pa.

Then he took them to a

part of Nebraska where the lines representing the earth's
magnetism cross the parallels of latitude at the same angle
as at State College.
this spot.

(25)

He released the pigeons to the east of

The pigeons all flew w e s t .

Yeagley

believes that (26) pigeons are guided by both the earth's
magnitude and by its turning.

(27) Just where the birds

keep their instruments Is still u n k n o w n ; but Yeagley found
that (28) birds have a mysterious organ in their e y e s , at
the end of the optic n e r v e .

(29) This organ may contain

the nerve fibers that pick up vibrations of magnetism and
the even more delicate sense that measure the earth's turning.

240
The following key is to be used for the succeeding
Certain parts of the paragraph are underlined,
and each underlined portion is an item of the test. Choose
the proper response from the key and blacken the appropriate
space on the answer sheet.
paragraph.

1.
2.
3.
4.
5.

Key
A major problem (stated or implied).
Hypothesis (possible solution to problem).
Results of experimentation.
Initial observation (not experimental).
Conclusion (probable solution of problem).

(30) The residents of Deaf Smith County. Texas, are
amazingly free of tooth decay.

(31) The vegetables grown

In this county are also unusual in that they attain a huge
size.

(32) Tooth decay has always puzzled scientists. (33)

Could there be a relationship between the eating of these
vegetables and the prevention of tooth decay? (34) Could
the milk in this area be better for teeth?

(35) Was the

water in some way responsible for both the freedom from
decay and the size of the vegetables?

In Bausite, Ark.,

dentists noted that (36) most of the residents had blemishes
on their teeth.
tained fluorine.

Analysis of the water showed (37) it con­
There was little doubt that (38) the

fluorine was responsible for the blemishes.

Dentists also

noticed that the (39) children of the community had almost
no cavities in their teeth.

On the assumption that (40)

tooth decay is related to the amount of fluorine in the
water, fluorine was used in a weak solution to paint the
teeth and gums of half of the children in a community where
no fluorine is normally found in the water.

(41) These

children had 40# less cavities than the children not

241
receiving the treatment.

Abbreviated Key

Dental

researchers have continued (42)

1.
2.
3•
4.
5.

the search for the essential cause
of decay.

(43) Diet deficiencies

A major problem
Hypothesis
Results
Observations
Conclusion

have always been considered to be a major factor in tooth
d e cay, but investigators found that (44) 124 patients
suffering from diseases caused by dietary deficiencies had
only one-third as many cavities as well-fed people.
why?

But

It has been found (45) that a certain germ called

Lactobacillus acldophllls is found in the saliva of persons
with many cavities, while it is practically absent from the
mouths of those without cavities.

One exper linentor fed a

group of people with large numbers of these germs in the
mouths a six-week diet low in sugars and starches.

He

found that (46) there were very few of the germs in the
mouths of these people from six months to two years after
the discontinuation of the treatment.

He believes (47)

that a sugarless diet may encourage the growth of other
g erms which fight the Lactobacillus acldophllls.

That

(48) the prime cause of tooth decay is this Lactobacillus
is supported by the fact that flourine is very potent in
reducing the number of them in the mouth.

242
T h e f o l l o w i n g k e y is to b e u s e d i n t h e p a r a g r a p h s
below.
C e r t a i n parts of the p a r a g r a p h are u n d e r l i n e d ,
a n d e a c h u n d e r l i n e d i t e m is a q u e s t i o n *
C h o o s e the p r o p e r
r e s p o n s e from the k e y a n d b l a c k e n the a p p r o p r i a t e sp a c e on
t he a n s w e r s h e e t .

Key
1.
2.
3.
4.
5.
The
smell.

A m a j o r pr o b l e m (stated or implied).
H y p o t h e s i s ( p o s s i b l e s o l u t i o n to p r o b l e m ) .
Results of experimentation.
O b s e r v a t i o n s (not e x p e r i m e n t a l ) .
C o n c l u s i o n ( p r o b a b l e s o l u t i o n to p r o b l e m ) .

(49)

It h a s

sense

least understood

been generally

believed that

i d e n t i f i e d od o r s by

chemical analysis.

suggested

it

(51)

measuring; o f
vapors.

that

It h a s

certain wave

is m o r e

infra-red

(Heat)

likely
rays

long been known

lengths

of

is t h e

(50)

Some
that

sense

smelling,

through vapor and note what wave

are

(53) W h y

absorbed.

same?

In a s t u d y o f

which do n o t ha v e
waves
can

between

absorb

absorb

7

%

to

waves

they

14 band

it may be

do t h e

upper nose
suggest

without

(56)

when

smellable.
smell

air passages.
pure

air

odors

Since

the

and

all of

long which do h a v e

those

The

lengths

odors

(54)

that the ability

smelling?

lie across

that

that

radiates heat waves

is w h a t m a k e s v a p o r s

the nose

which have

infra-red wave-lengths.

temperature

shoot

the h u m a n n o s e do

found

14 microns

infra-red whereas

these

at normal

substances

odors
to

shouldn't

absorb

Chemists

infra-red rays

is a

odorous

that many-gases
(52)

the nose

scientists

a b sorbed by

infra-red.

of

those
those

odors
do n o t

the h u m a n body

chiefly

at

t he

to a b s o r b h e a t

(55)

B u t h o w does

receptors
These

in t h e

researchers

is p a s s i n g t h r o u g h t he

243
nostrils the cells give no signal;
they get rid of their heat at a
standard r a t e .

(57) But when an

odorous vapor Is present In the

Abbreviated Key
1.
2.
3.
4.
5*

A major problem.
Hypothesis
Results
Observations
Conclusions

air it absorbB certain wave lengths of heat from the cells.
(58) The cells feel the change and the stimulus produces a
sensation of s m ell.

To confirm this, these scientists,

studied cockroaches which have their smell receptors on their
antennae (hence outside the body).
to be attracted by oil of cloves.

Cockroaches were known
They put cockroaches in

a gas tight box with a window made of a material which was
transparent to infra-red.

(50) The cockroaches responded

Just as strongly as if the window were not there, they
swarmed toward the window.

Then a window of glass, which

does not allow infra-red to go through it was put in as a
barrier.

(60) The cockroaches showed no more Interest in

the window than if the oil of cloves were not there.
Next the researchers tried bees.

(61) The bees

crawled all over the heat-transparent window with sweet
smelling honey vapor behind i t . whereas (62) they ignored
the window which did not allow the heat waves to pass
through.

Both (63) cockroaches and bees could smell vapors

at a distance from their antennae.

This may explain how

(64) some creatures, such as male moths seeking females.
seem able to detect odors from considerable distance.

244
The following key is to be used in the following
paragraph.
Certain parts of the paragraph are underlined,
and each underlined item is a question.
Choose the proper
response from the key and blacken the appropriate space on
the answer sheet.

Key
1.
2.
3.
4.
5.

A major problem (stated or implied).
Hypothesis (possible solution to problem).
Results of experimentation.
Observations (hot experimental).
Conclusion (probable solution to problem).

High blood pressure and hardening of the arteries
now afflict twice as many people as they did in 1900.

(65)

To find some cause for this Increase in deaths much research
has been conducted.
families.

(66) These conditions seem to run in

(67) Are the conditions inherited?

(68) Appar­

ently diet is a factor in the production of the conditions
because hardening of the arteries has been produced in rats
by feeding them a diet high in cholestoral, a fat substance
found in foods.

Some scientists believe that although (69)

people have a wonderful system to cope with emergencies.
the unrelenting stress of civilized life is too much for it .
The primary causes of these degenerative disorders, says one
worker,

(70) fare overwork, fear and exposure to the elements.

Any one of these may cause the pituitary gland at the base
of the brain to pour more of its secretion into the blood
stream.

The pituitary secretion then stimulates the adrenal

glands located above each kidney.

(71) Normally the adrenal

secretion causes a temporary rise in blood pressure during
these times of crisis.

This worker believes that if the

crisis persists hardening of the arteries results.

(7 2 )

245

This experimentor noticed that
people who had died after lives of
tension had abnormally lam e
adrenal glands. He then subjected

Abbreviated Key

1.
2.
3.
4.
5.

A major problem
Hypothesis
Results
Observations
Conclusion

animals to tensions to see if they developed similar
degenerative diseases.

(73)

He found that they did.

Although this work does not give a complete answer to
what causes degenerative diseases, (74) it does give
evidence that physically man is not quite adapted to
the civilization he has built.

246
TABLE XXXVI
ITEM ANALYSIS DATA FOR TEST A

Percent Success
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
*
**

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

*100.0
**100.0

86.7
83.3

.55
.50

33

91

78

82.2
77.8

35.5
19.4

.48
.60

42

48

49

82.2
77.8

71.1
63.9

.15
.17

10

70

61

86.7
83.3

62.6
52.8

.31
.35

22

69

60

93.3
91.7

77.8
72.2

.29
.32

20

82

69

97.8
97.2

88.9
86.1

.30
.32

20

91

79

71.1
63.9

37.7
22.2

.34
.43

28

43

46

91.1
88.9

62.2
52.8

.39
.43

28

70

61

82.2
77.8

46.7
33.3

.39
.46

30

55

53

91.1
88.9

53.3
41.7

.47
.54

36

64

58

77.8
72.2

51.1
38.9

.30
.34

21

55

53

75.6
69.4

60.0
50.0

.18
.22

13

59

55

93.3
91.7

82.2
77.8

.24
.24

15

85

72

Method of Flanagan
Method of Davis

247
TABLE XXXVI (continued)

Percent Success
Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

86.7
83.3

40.0
25.0

.51
.58

40

55

53

88.9
86.1

40.0
25.0

.54
.61

43

55

53

95.6
94.4

77.8
72.2

.37
.39

25

83

.70

95.6
94.4

73.3
66.7

.42
.45

29

80

68

73.3
66.7

37.7
22.2

.37
.46

30

44

47

100.0
100.0

80.0
75.0

.60
.58

40

86

73

95.6
94.4

75.6
69.4

.38
.41

27

82

69

77.8
72.2

53.3
41.7

.27
.31

19

57

54

95.6
94.4

73.3
66.7

.43
.45

29

80

68

51.1
38.9

8.9
0

.50
.67

49

19

32

64.4
55.6

44.4
30.6

.22
.26

16

42

46

25

77.8
72.2

62.6
52.8

.18
.20

12

61

56

26

62.6
52.8

24.4
5.6

.39
.58

40

28

38

27

93.3
91.7

62.6
52.8

.45
.50

33

71

62

28

84.4
80.6

51.1
38.9

.38
.43

28

59

55

14
15
16
17
18
19
20
21
22
23
24

248
TABLE XXXVI (continued)

Percent Success
Item

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

88.9
86.1

51.1
38.9

.46
.51

34

61

56

100.0
100.0

82.2
77.8

.60
.55

37

88

75

100.0
100.0

91.1
88.9

.50
.41

27

94

83

97.8
97.2

68.9
61.1

.54
.60

42

79

67

86.7
83.3

40.0
25.0

.51
.59

41

53

52

91.1
88.9

35.5
19.4

.60
.69

51

53

52

91.1
88.9

40.0
25.0

.57
.65

47

57

54

97.8
97.2

77.8
72.2

.47
.50

33

85

72

33.3
16.7

8.9
0.0

.36
.50

33

09

21

73.3
66.7

42.2
27.8

.32
.38

24

46

48

39

93.3
91.7

75.6
69.4

.32
.35

22

80

68

40

82.2
77.8

60.0
50.0

.27
.31

19

64

58

95.6
94.4

86.7
83.3

.25
.26

16

89

76

100.0
100.0

84.4
80.6

.55
.52

35

89

76

66.7
58.3

44.4
30.6

.23
.29

18

44

47

29
30
31
32
33
34
35
36
37
38

41
42
43

249
TABLE XXXVI

Percent Success
Item

Upper 27$

Lower 27$

(continued)

Discrimination
r

Index

Difficulty
$ Success

Index

75.6
69.4

33.3
16.7

.43
.54

36

42

46

68.9
61.1

17.8
0.0

.52
.77

61

30

39

95.6
94.4

86.7
83.3

.25
.23

14

89

76

86.7
83.3

73.3
66.7

.20
.22

13

74

64

84.4
80.6

73.3
66.7

.16
.17

10

73

63

49

73.3
66.7

42.2
27.8

.33
.38

24

46

48

50

86.7
83.3

40.0
25.0

.52
.59

41

53

52

51

100.0
100.0

71.1
63.9

.65
.66

48

82

69

52

20.0
0.0

13.6
0.0

.12
.00

0

0

0

53

57.8
47.2

20.0
0.0

.40
.71

54

24

35

54

22.2
2.8

11.1
0.0

.18
.20

12

02

8

55

100.0
100.0

93.3
91.7

.45
.38

23

95

85

56

91.1
88.9

51.1
38.9

.49
.55

37

63

57

57

68.9
61.1

42.2
27.8

.27
.34

21

44

47

58

77.8
72.2

44.4
30.6

.36
.40

26

51

51

59

95.6
94.4

77.8
72.2

.37
.39

25

83

70

44
45
46
47
48

250
TABLE XXXVI (continued)

Item
60

Percent Success
Upper 2 7 %
Lower 27#

Discrimination
r
Index

%

Difficulty
Success
Index

97.8
97.2

68.9
61.1

.54
.59

41

79

67

95.6
94.4

57.8
47.2

.55
.59

41

70

61

95.6
94.4

51.1
38.9

.60
.64

46

66

59

40.0
25.0

17.8
0.0

.27
.58

40

13

26

48.8
36.1

35.5
19.4

.14
.20

12

28

38

88.9
86.1

75.6
69.4

.22
.23

14

77

66

86.7
83.3

77.8
72.2

.16
.15

9

77

66

48.8
36.1

15.6
0.0

.38
.66

48

18

31

75.6
69.4

46.7
33.3

.32
.36

23

51

51

69

73.3
66.6

33.3
16.7

.41
.51

34

42

53

70

75.6
69.4

53.3
41.7

.25
.29

18

55

53

71

64.4
55.6

15.6
0.0

.52
.75

58

28

38

72

88.9
86.1

40.0
25.0

.54
.61

43

55

53

73

100.0
100.0

88.9
86.1

.50
.46

30

93

81

74-

*75.6
**69.4

48.8
36.1

.29
.34

21

53

52

61
62
63
64
65
66
67
68

*
**

Method of Flanagan
Method of Davis

251

TEST B
THE DELIMITATION OF PROBLEMS
This test is designed to test your ability to
delimit a problem. A problem is presented. This is
followed by a series of questions. Rate the questions
according to the following key.
1.
2.
3.
4.
5.

PROBLEM:

Key
This question must be answered in order
to solve the problem.
This question if answered might be
useful in the solution of the problem.
The answer to this question, though
related to the problem, would not help
in the solution of the problem.
This question is completely unrelated
to the problem.
This question if answered in the
affirmative is a basic assumption of
the problem.

What causes colds?

QUESTIONS:
1.

Do all people have colds?

2.

If one stays in bed with a cold does he get over
the cold more rapidly?

3.

Does one person Mcatch" a cold from another person
who has a cold?

4.

Why do some people have many colds and other
people have few colds?

5.

Is it possible to determine the cause of a cold?

6.

Is there a germ present in persons with colds and
absent from persons without colds?

7.

Does aspirin help to cure a cold?

8.

Gan some germ be isolated which, when injected,
will cause a cold?

252
9.

Do colds have a cause?

Abbreviated Key

10.

Does getting one's feet
wet cause a cold?

11.

Does becoming chilled after
being overheated cause a
cold?

12.

Why are colds more prevalent in the winter than in
the summer?

13-

Do other animals get colds?

14.

Are people who are tired more susceptible to colds?

15.

Are there people who do not have colds but who are
"carriers" ?

16.

How can colds be prevented?

1. Must be answered
2. Might be useful
3. Related, but would
not help
4. Unrelated
5* A basic assumption

The thymus gland is located in the chest cavity
just above the heart. This gland is largest during the
growing period and becomes progressively smaller after
maturity.
PROBLEM:

What is the function of the thymus gland?

QUESTIONS:
17.

Does lack of activity of the gland cause it to
become smaller?

18.

What causes the gland to stop functioning?

19.

Is the gland inactive after maturity?

20.

Does the removal of the gland before maturity cause
an animal to become mature earlier?

21.

Does the gland have a function?

22.

Gan any substance be extracted from the gland which
when injected into another animal cause growth?

23.

Why does the gland decrease in size after maturity?

24.

Does the removal of the gland from young animals
stunt their growth?

25.

If the gland is removed will the animal mature?

253

26.

Do animals or people ever
have disorders of this gland?

27.

Does the gland ever
completely disappear?

28.

Gan the function of the
gland be determined?

29.

What are the effects of the removal of the gland?

30.

What causes the gland to grow smaller?

Abbreviated Key

1. Must be answered
2. Might be useful
3- Related, but would
not help
4. Unrelated
5. A basic assumption

A plant appeared which was different from its
parents. The parent plants are essentially alike.
PROBLEM:

What caused the plant to be different from its
parents?

QUESTIONS:
31.

Were the parent plants from pure lines; that is,
were all of the known ancestors of both parents
like the parents?

32.

Howdoes the plant differ from its parents?

33.

Was the soil in which this plant was grown the same
as the soil in which the parents were grown?

34.

Why did this plant differ from its parents?

35.

Was thedifference
environment?

36.

When did the change occur?

37.

Will this plant produce seeds which when planted
grow into plants like it?

38.

Is it possible to determine what caused the change?

39.

Is the change due to some change in the hereditary
make-up of the plant, i.e., was it due to mutation?

40.

What kind of a plant is it?

41.

Do all plants produce offspring which are different
from the parents?

42.

Under what circumstances did the change occur?

due to the effects of the

254

43.

Under what conditions did the
plant develop?

44.

Why would any plant be like
its parents?

45.

Was there any reason why the
plant was different from its
parents?

46.

Were any of the ancestors like this plant?

47.

Was the difference due to difference in the amount
of sunlight the plant had?

48.

How does this plant benefit man?

Abbreviated Key
1. Must be answered
2. Might be useful
3* Related, but would
not help
4. Unrelated
5. A basic assumption

Bacterial cultures are frequently grown on the
surface of a gelatin-like substance poured into a flat,
covered dish. Occasionally these bacterial cultures become
contaminated with molds. One scientist observed that
Dacteria did not grow in the vicinity of a certain green
no Id.
PROBLEM:

What caused the bacteria-free zone around the mold?

QUESTIONS:
49.

W h a t k i n d of a m o l d w a s

50.

Is t h e r e a r e l a t i o n s h i p b e t w e e n the p r e s e n c e
m o l d a n d t h e a b s e n c e of b a c t e r i a ?

51.

Does

52.

'What k i n d

53.

Is m o l d o f a n y u s e

54.

Do

55.

Is s o m e s u b s t a n c e
the bacteria?

56.

Why

57.

Ha d the

the m o l d u s e

it?

the b a c t e r i a as

of t he

food?

of b a c t e r i a w e r e t h e y ?
to m a n ?

the b a c t e r i a cause any disease?
p r o d u c e d by

the m o l d w h i c h k i l l s

is t h e m o l d g r e e n ?
cultures

of b a c t e r i a been properly

prepared?

in

00

W a s there any r e a s o n for the b a c t e r i a not b e i n g
the v i c i n i t y of the mold?

in

255
59.

Do all molds cause bacteriafree zones around them?

Abbreviated Kei
1. Must be answered
2. Might be useful
3. Related, but would
not help
4. Unrelated
5. A basic assumption

60.

Where did the molds come
from?

61.

Do bacteria produce any
substance which kill the
mold?

62.

Does the mold harm the growth of the cultures?

63.

What substances cause bacteria-free zones?

64.

Under what conditions were the cultures kept?

65.

What is the green mold composed of?

66.

Did the green mold ki}.l the bacteria or did it only
stop their growth?

67.

Would the green mold injure the cells of animals?

256
TABLE XXX V I I
ITEM A N A L Y SI S D A T A F O R TEST B

P e r c e n t Success
Item

1

Upper

2

7

%

L o w e r 27^

Discrimination
r

Index

Difficulty
%

Success

Index

*53.3
**41.7

33.3
16.7

.22
.29

18

28

38

24.4
5.6

13.3
0.0

.17
.29

18

04

11

3

33.3
16.7

17.8
0.0

.20
.50

33

08

21

4

53.3
41.7

40.0
25.0

.14
.20

12

33

41

5

77.8
72.2

26.7
8.3

.52
.66

48

40

45

6

75.6
69.4

57.8
47.2

.20
*23

14

59

55

7

42.2
27.8

17.8
0.0

.29
.61

43

15

28

8

48.8
36.1

26.7
8.3

.24
.39

25

22

34

9

73.3
66.7

24.4
5.6

.49
.66

48

35

42

10

80.0
75.0

53.3
41.7

.31
.35

22

59

55

11

80.0
75.0

44.4
30.6

.38
.45

29

53

52

12

68.9
61.1

48.8
36.1

.22
.26

16

48

49

13

73.3
66.7

26.7
8.3

.47
.63

45

37

43

2

*
**

M e t h o d of F l a n a g a n
M e t h o d of Davis

257
TAB L E X X X V I I

Percent Success
Upper

27^

Lower

27%

(continued)

Discrimination
r

Index

Difficulty
%

Suc c e s s

Index

84.4
80.6

44.4
30.6

.45
.51

34

55

53

48.8
36.1

28.9
11.1

.21
.35

22

22

34

46.7
33.3

26.7
8.3

.22
.38

24

21

33

82.2
77.8

20.0
0.0

.62
.83

72

38

44

68.9
61.1

51.1
38.9

.19
.22

13

48

49

46.7
33.3

13.3
0.0

.40
.64

46

17

30

40.0
25.0

24.4
5.6

.18
.35

22

15

28

44.4
30.6

24.4
5.6

.22
.40

26

18

31

75.6
69.4

31.1
13.9

.45
.57

39

42

46

64.4
55.6

60.0
50.0

.05
.07

4

53

52

57.8
47.2

28.9
11.1

.30
.45.

29

28

38

91.1
88.9

88.9
86.1

.04
.07

4

88

75

33

62.6
52.8

44.4
30.6

.19
.23

14

40

45

35

44.4
30.6

31.1
13.9

.15
.23

14

22

34

37

37.7
22.2

13.3
0.0

.32
.55

37

12

25

19
21
22

25
26

27
28
29
30
31

258
TABL2 XXXVII (continued)

38

Percent Success
Upper 27^ Lower 27%
24.4
77.8
72.2
5.6

Discrimination
Difficulty
r
Index
% Success Index
.54
.70

53

38

44

71.1
63.9

60.0
50.0

.12
.15

9

57

54

46.7
33.3

37.7
22.2

.10
.14

8

28

38

55.6
44.4

35.5
19.4

.21
.29

18

31

40

60.0
50.0

31.1
13.9

.30
.41

27

31

40

26.7
8.3

8.9
0.0

.28
.35

22

05

14

45

55.6
44.4

17.8
0.0

.42
.70

52

22

34

46

51.1
38.9

20.0
0.0

.34
.67

49

19

32

75.6
69.4

55.6
44.4

.22
.26

16

57

54

68.9
61.1

33.3
16.7

.36
.47

31

38

44

53

86.7
83.3

80.0
75.0

.10
.12

7

79

67

54

80.0
75.0

60.0
50.0

.24
.26

16

61

56

56

51.1
38.9

40.0
25.0

.12
.14

8

31

40

57

35.5
19.4

24.4
5.6

.13
.29

18

12

25

58

71.1
63.9

20.0
0.0

.52
.78

63

31

40

39
40
42
43
44

47
51

259
TABLE XXXVII

Percent Success
Item

Upper

60
61
62
63
64
65
67
- -

*
**

2

7

%

Lower

2

(continued)

Discrimination
7

%

r

Index

Difficulty
%

Success

Index

28.9
11.1

11.1
0.0

.27
.40

26

06

17

35.5
19.4

8.9
0.0

.37
.52

35

10

23

44.4
30.6

22.2
2.8

.25
.51

34

16

29

28.9
11.1

13.3
0.0

.23
.40

26

06

17

64.4
55.6

42.2
27.8

.23
.29

18

42

46

55.6
44.4

26.7
8.3

.30
.47

31

25

36

*68.9
**6l.l

48.8
36.1

.21
.26

16

48

49

-

...................

M e t h o d of F l a n a g a n
M e t h o d of Davis

260

TEST 0
EXPERIMENTAL PROCEDURES
This test is designed to measure your ability to
recognize faulty experimental procedures and to test your
ability to select the best of a series of experiments. In
each case a problem and a possible solution to the problem
(an hypothesis) are presented. In each case the experi­
ments were designed by students to test the hypotheses.
Judge each experiment according to the following key.
Key

1.
2.
3.
4.
5.

PROBLEM:

This experiment is satisfactory.
This experiment is unsatisfactory because
it lacks a control or comparison.
This experiment is unsatisfactory because
the control or comparison is faulty.
This experiment is unsatisfactory because
it is unrelated to the hypothesis.
None of the above - the experiment or
situation is unsatisfactory for reasons
other than those listed in 2, 3, and 4.

What are some of the requirements for the
sprouting of seeds?

HYPOTHSSIS:

Oxygen is a requirement for the sprouting of seeds
1. Plant one seed in a container where oxygen is avail­
able and place another seed in a container where all
oxygen has been removed. Keep all other conditions
the same.
2. Put some seeds in soil in a flask from which all the
oxygen has been removed. Put an airtight stopper in
the flask to keep out all air. Then put some seeds
of the same type in soil in a flask that is open and
gets the oxygen from the air. See which sprouts or
if both 3prout. Keep moisture, temperature and
amount of light, etc., the same in each flask.
3. If a seed lacked oxygen under a controlled experiment
the seed would not function properly and would soon
die.
4. Put two groups of seeds side by side, in the ground,
only put a Jar over one group to keep the oxygen away
from them. Keep all other conditions the same for
each group.

261
5.

Take two groups of seeds each
Abbreviated Key
appropriately labeled and put
one group in a compartment
1. Satisfactory
with the average amount of
2. Lacks control
oxygen in normal conditions
3. Control faulty
and an excess amount of
4. Unrelated to
oxygen in another sealed
hypothesis
compartment.
Keep all other
5. None of the above
conditions, such as light,
moisture, etc., the same for each.

6.

Take two packages of seeds. Allow oxygen to be in
contact with one package but keep the other package
of seeds protected from all oxygen.
Observe which
sprouts.

7.

Place growing plants in an air tight container.
Pump out the oxygen.
Place other growing plants in
containers with oxygen.
Keep temperature, light,
etc., the same for each.

8.

Plant seeds in a container with glass covering it
so that no oxygen can enter and see if they sprout.
Keep temperature, light and moisture normal.

9.

Two groups of bean seeds might be set up. One in
an air-tight container, absolutely free from oxygen.
The other group could be allowed free circulation of
air. After a specified length of time, the specimens
could be examined and the need of oxygen for sprout­
ing determined.

10.

Set up two seed beds in which the moisture, tempera­
ture, amount of light, and all other factors are the
same, except that the experimental seed bed has a
very restricted supply of oxygen, while the control
seed bed has a normal supply of oxygen.

PROBLEM:

To determine the effects of a deficiency of
Vitamin Y.

HYPOTHESIS:

Vitamin Y affects the rate of growth of animals.

1.

Q-et 40 young monkeys.
Keep all vitamins from 20 of
them, and feed the other 20 a normal Bupply of
vitamins.
Observe the weights and height of these
monkeys for a year.

2.

Take 60 young rabbits, divide them into three groups
of 20 each.
Peed the first group of 20 a normal diet
of foods. Feed the second group a diet which contains

262
much Vitamin Y; feed the
third group a diet completely
devoid of Vitamin Y. Keep
an accurate record of weights
and length of the rabbits for
6 months.

Abbreviated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
Unrelated to
hypothesis
5. None of the above

3.

Use two groups of young
animals with all the condi­
tions affecting the rate of growth of animals held
constant and in one group supply Vitamin Y or omit
Vitamin Y and observe the results in growth.

4.

Take three normal young white rats.
One is fed a
well-balanced diet. Another is deprived of Vitamin
Y only.
The third is given an excess of Vitamin Y
only. . Make sure that all other conditions are kept
the same.

5.

Find some animals that are naturally without an ade­
quate supply of Vitamin Y.
Try and find out why.
From this you should be able to find out if Vitamin
Y affects the rate of growth of animals.

6.

Take different kinds of young animals and to one kind
feed a diet deficient in Vitamin Y, and to the other
kind a diet rich in Vitamin Y. Measure and weigh the
animals weekly.

7.

Give groups of animals identical diets for at least
2 weeks except for the omission of Vitamin Y from the
diet of one group. Make sure all other faotorB - size, age, living conditions, etc., are the same for
both.
Make careful observations on weights of the
animals.

8.

Start with 100 normal young animals.
Make the diet
of 50 of them deficient in Vitamin Y.
Observe the
differences between the two groups in rate of growth.

9.

G-ive Vitamin Y to a group of adults all about the
same age. To another group of the same age give no
Vitamin Y. Make certain that the diet and other
living conditions of the two groups is the same in
all other respects.
Continue the experiment for a
year.
Keep weekly records of weight.

10.

Give Vitamin Y to a group of children who are living
under favorable conditions (favorable to growth) and
see whether Vitamin Y affects the growth of this
group.

263
PROBLEM:

A minute insect (aphid)
is suspected of spreading
a virus disease of roses.
How would you determine
whether this is true?

HYPOTHESIS:
The aphid spreads a
virus disease of roses
1.

Abbreviated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
Unrelated to
hypothesis
5. None of the above

Put the insect among other kinds of plants other
than roses.
Leave another group of these plants
free from contact with the aphids.
Compare the
results.

2. I would expose rats or guinea pigs to the roses to
determine if the aphid is spreading a virus disease
of roses.
If the animals became ill then I would
continue on to determine whether or not it was true.
3. Since aphids travel through the air, a plot of roses
must be entirely protected from them, and another
exposed to aphids which in turn have been exposed
to roses afflicted with the virus disease.
All must
be under constant conditions of soil, atmosphere,e t c .
4.

Put some roses in a room; half which have the
disease and half which do not.
Put some of the
aphids in the room.
Observe and draw conclusions.

5.

I would place 3 plants of roses in one room; one
with a virus disease.
In this room should also be
the insects, aphid.
Allow insects to go from in­
fected plants to one of the other plants.
These
plants should be watched to see if the virus spreads.

6.

Take sample rose with the virus disease.
Obtain
same kind of rose with no disease. Use microscope
to aid in detection of the disease. Use some sort
of spray. Note results.

7.

Take 2 sets of the same kind of roses and expose
one set of them to aphids.
Keep the plants under
the same conditions at all times and if the roses
with the aphids contract the disease while the
isolated ones do not, then the aphids are carriers.

8.

Use rose plants which are known not to be diseased.
In the same area place rose plants which are diseased
but which have been treated to destroy the aphid.
Note whether the disease still spreads after the
aphids have been killed.

264
9.

10.

In order to determine whether
the aphid spreads a virus
disease in roses, a group
of roses should he put in
a hot house free from aphids
to see whether they get such
a virus disease.

Abbreviated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
Unrelated to
hypothesis
5. None of the above

Select numerous roses free
from the virus and from the same soil or area.
Divide these and expose one group to aphids and
isolate the other group.
One would try to keep all
environmental conditions for both groups alike with
the exception of exposing the one group of roses to
the insect.

PROBLEM;

To find a prevention for disease X.

HYPOTHESIS: A newly developed vaccine will prevent the
disease.
1.

Use animals such as rats, rabbits and inject them
with the vaccine.
If it is successful then use on
a human.
The vaccine should be successful upon many
people before the vaccine can be declared a prevent­
ative.

2.

Use 500 people who have disease X and do nothing for
them.
Then take 500 other people who have disease X
and give them the newly developed vaccine.

3.

In an area where the disease is prevalent inject half
of the population with the vaccine.
Do not inject
the other half.
Make sure that the two groups are
about the same in other ways.
Compare the number of
cases in the two groups.

4.

With 4 guinea pigs, inject 2 of them with the
vaccine, then place them in a contaminated place
where they will be susceptible to the disease. Place
2 un-vaccinated in also.
If the 2 un-vaccinated get
the disease and the vaccinated do not - after several
such experiments it is probable - if both get disease
X, the vaccine is no good.

5.

Put 2 animals into a region where disease X is prev­
alent.
In one inject the vaccine and if the vari­
able shows no signs of the disease the hypothesis
is true.

6.

Imject 20 mice with the new vaccine, leaving 20

265
untreated.
Then inject all
of the mice with disease X.
If all of the mice get "x"
vaccine is no good.
If out
of the 20 you injected with
new vaccine none got the
disease and the 20 control
mice d i d , then you have a
good vaccine.

Abbrev iated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
Unrelated to
hypothesis
5. None of the above

7.

You have to make sure the vaccine cures the disease
in animals first, as close to a natural condition in
humans as possible.
Then to try it on a human being
and see if it reacts the same way. You cannot tell
until you have tried it on a human.

8.

Inject a number of animals with disease X. With a
similar needle inject the other half with the
special vaccine.
Note that everything must be the
same except the vaccine.
If the vaccine injected
group does not get the disease and the others do it
will substantiate the hypothesis.

9.

Take a diseased animal and expose him to a group of
animals that have been innoculated with this new
vaccine.
Then take the same diseased animal and
expose him to a group of healthy animals that have
not been Innoculated.
(Rats preferably - but any
type would suffice, as long as they are susceptible
to the disease.)

PROBLEM:

What are some of the requirements for the sprout­
ing of seeds?

HYPOTHESIS:

Seeds sprout within a certain temperature range.

1.

Set up 7 seed beds In which the moisture, ventilation,
and amount of light are the same.
All factors are
same, except one bed will be kept at 0°F., another
at 20 , another at 40 , 60°, and 80°, 100°, and
120°F.
Observe which sprout first, and which, if
any, never do sprout.
Try this same experiment with
several different kinds of seeds.

2.

Place seeds in various temperatures: warm, hot, cold,
freezing, moderate.
This will determine the range
in which certain seeds will sprout.
Different types
of seeds may sprout in different temperature ranges.
Keep all other conditions such as moisture, light,
etc., the same.

266
3.

One seed u n d e r temperature
from
0-20°C•
2nd seed u n d e r temperature
from
20-40°C.
3rd seed u n d e r temperature
from
40-60°C.
4th seed u n d e r temperature
from -20-0°C.

Abbreviated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
U n r elated to
hypothesis
5. None of the above

Determine the temperature at which the seeds sprout
the best.
4.

Plant three seeds.
Keep one above normal tempera­
tures, the second below normal and the third at
normal temperature.
The seed that sprouts will
tell which temperature range is the best.

5.

Take about 10 sets of seeds of the same type planted
in the same condition.
Subject each set to a differ­
ent temperature ranging from 0° to 100°G.
Observe
if seeds sprout at a certain temperature.

6.

Put seeds in pots, and then put these pots in places
where the temperature can be properly adjusted.
Putr
one of these pots at every 10°, keeping all other
conditions constant, some of these plants will
sprout.

7.

Put different seeds at varying degrees of temperature.
See at which temperature they sprout.

8.

Set u p 2 conditions similar except one set of seeds
planted would be placed where the temperature would
be about 2 ° G , the other remain at about room tempera­
ture.

9.

If the seed was placed in the earth at freezing
temperatures it would not grow.
I would say that the
temperatures of 7 0 ° — 100°would sprout the seed.

10.

Take 2 groups of seed.
Attempt to sprout seeds
within this certain temperature.
Attempt to sprout
seeds u n der adverse temperature.

267
To d e t e r m i n e the cause of
Abbreviated Key
Illness w h i c h appears w h e n
large n u m b e r s of people are 1. S a t i s f a c t o r y
b e i n g c o n f i n e d to a s m a l l
2. Lacks control
space,
3. C o n t r o l f a u l t y
4. U n r e l a t e d to
H Y P O T H E S I S ; L ack of o x y g e n causes
hypothesis
the people to b e c o m e ill.
5. N o n e of the above
PROBLEM;

1.

Examine the ill peop l e a n d trace b a c k the illness to
whethe r it is c a u s e d by lack of o x y g e n or what.

2.

One m i g h t check the oxygen by p l a c i n g a n u m b e r of
people in a c o n f i n e d place where there was a c ontrol
amount.
O t h e r checks w o u l d h ave
to be made also
such as the purity of food, the purity
of w a t e r and
w h e t h e r or n o t p r o p e r s a n i t a t i o n rules were followed.

3.

Put 50 n o r m a l people into a small space u n d e r n o r m a l
conditions.
Put 50 n o r m a l people into a small space
with a large f o r c e d supply of oxygen. Compare the
two groups a f t e r a consid e r a b l e time.

4.

Take 50 monkeys or mice a n d put a g r o u p where the
oxygen is low a n d put a g r o u p where the oxy g e n is
kept higher.
If lack of oxy g e n causes people to
b e come ill it may make the monkeys o r mice ill.

5.

Have a p e r s o n w o r k and live n o rmally in a r o o m with
insufficient oxygen.
A n o t h e r person w ork and live
normal l y in a r oom w i t h sufficient oxygen.
C ompare
the effects.

6.

Confine one g r o u p to a small space in which there is
a limited supply of oxygen.
Let the other g r o u p have
u n l i m i t e d supply of
oxygen and a large space.
Let
t h eir diets and o t h e r items be the same.
If the cause
of the illness is as stated the c onfined g r o u p will
be ill f rom lack of oxygen.

7.

Set two groups of people, one with plenty of oxygen
and the o t her in a norm a l environment.
Determine
w h i c h g r oup b ecomes ill.

8.

Take 3 groups.
G r o u p 1 w ill be confined in small
space, w i t h the u s u a l things.
This is the control
group.
G-roup 2 w ill be c onfined in an equally small
and crowded place, only they shall h ave excellent
ventilation.
G-roup 3 will be confined to spacious
(relatively) quarters, and they shall not have good
ventilation.
K e e p careful records and see what
results suggest.

268
Put a lot of rabbits in a
small space for a period of
time.
Put a few rabbits In
the same amount of space.
Observe the rabbits and
draw conclusions.

Abbreviated Key
1. Satisfactory

2 . Lacks control
3. Control faulty
4. Unrelated to
hypothesis
5. None of the above

10.

First tests should be made
on the air to see If there
Is a lack of oxygen.
If there is a laok of oxygen
and there is no other reason for the people being
ill then the hypothesis would be true.

11.

Observe the effects of a large number of people in
a small room.
Then add pure oxygen to the same room
with the same people.
If the Illnesses were cured,
it would be likely that the laok of oxygen was the
cause•

12.

Put different groups of people in different rooms.
G-ive one group a greater amount of carbon dioxide
than oxygen, the second group a normal amount and
a third group a greater amount of oxygen than carbon
dioxide and check for results.

13.

Put one group of people in a
amount of carbon dioxide and
room with a normal amount of
the oxygen concentration the

room with an excessive
smother group in a
carbon dioxide.
Keep
same In both rooms.

269
TABLE! XXXVIII
ITEM ANALYSIS DATA F O R TEST G

Percent Success
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
*
**

Upper 27$

Discrimination

Lower 27$

r

*17.8
**0.0

0.0
0.0

.55
.00

0

0

0

77.8
72.2

66.7
58.3

.14
.15

9

64

58

71.1
63.9

17.8
0.0

.54
.78

63

31

40

66.7
58.3

48.8
36.1

.19
.22

13

46

48

33.3
16.7

17.8
0.0

.20
.50

33

08

21

64.4
55.6

15.6
0.0

.52
.75

58

28

38

71.1
63.9

28.9
11.1

.43
.56

38

37

43

97.8
97.2

64.4
55.6

.58
.61

43

77

66

77.8
72.2

68.9
61.1

.12
.14

8

66

59

57.8
47.2

51.1
38.9

.08
.08

5

43

46

33.3
16.7

17.8
0.0

.20
.50

33

08

21

93.3
91.7

91.1
88.9

.05
.07

4

90

77

60.0
50.0

48.8
36.1

.12
.15

9

42

46

4.4
0.0

11.1
0.0

-.18
.00

0

0

0

Method of Flanagan
Method of Davis

Index

Difficulty
$ Success

Index

270
TABLE X X X V I I I

Perc e n t S u c c e s s
Upper 27$

16
17
18
19

20
21
22

23
24
25
26
27
28
29
30

L o w e r 27$

(continued)

Discrimination
r

Index

D i f f iculty
$ Success

Index

44 . 4
30.6

22.2
2.8

.25
.51

34

16

29

73.3
66.7

31.1
13.9

.43
.55

37

40

45

2.2
0.0

6.7
0.0

-.16
.00

0

0

0

51.1
38.9

37.7
22.2

.14
.20

12

31

40

26.7
8.3

20.0
0.0

.13
.35

22

05

14

82.2
77.8

62.6
52.8

.24
.27

17

66

59

80.0
75.0

42.2
27.8

.40
.47

31

51

51

57.8
47.2

44.4
30.6

.14
.17

10

38

44

88. 9
86.1

64.4
55.6

.34
.36

23

70

61

53.4
41.7

33.5
16.7

.21
.29

18

29

38

0.0
0.0

8.9
0.0

-.45
.00

0

0

0

68.9
6 1.1

33.3
16.7

.37
.47

31

38

44

15.1
0.0

2.2
0.0

.38
.00

0

00

0

64 . 4
55.6

40.0
25.0

.25
.32

20

40

45

86.7
83.3

4 4.4
30.6

.48
.54

36

57

54

13.3
0.0

8.9
0.0

.13
.00

0

0

0

271
TABLE XXXVIII (continued)
Percent Success
Item
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

82.2
77.8

42.2
27.8

.43
.50

33

53

52

33.3
16.7

11.1
0.0

.32
.50

33

08

21

62.6
52.8

44.4
30.6

.19
.23

14

40

43

17.8
0.0

4.4
0.0

.31
.00

0

0

0

44.4
30.6

6.7
0.0

.50
.62

44

16

29

15.4
0.0

0.0
0.0

.55
.00

0

0

0

51.1
38.9

28.9
11.1

.23
.36

23

26

36

60.0
50.0

26.7
8.3

.34
.51

34

28

38

73.3
66.7

46.7
33.3

.27
.34

21

50

50

93.3
91.7

73.3
66.7

.35
.36

23

79

67

68.9
61.1

37.7
22.2

.32
.40

26

42

46

44.4
30.6

2.2
0.0

.64
.62

44

16

29

42.2
27.8

2.2
0.0

.62
.61

43

15

28

86.7
83.3

57.8
47.2

.35
.39

25

66

59

71.1
63.9

46.7
33.3

.25
.31

19

48

49

40.0
25.6

20.0
0.0

.24
.58

40

14

27

272
TABLE XXXVIII (continued)

Percent Success
Item
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
*
#■»*

Upper 27 %

Lower

2 7 %>

Discrimination
r

Index

Difficulty
%

Success

Index

48.8
36.1

40.0
25.0

.09
.14

8

30

39

53.3
41.7

37.7
22.2

.16
.23

14

31

40

33.3
16.7

13.3
0.0

.28
.50

33

08

21

35.5
19.4

33.3
16.7

.03
.05

3

17

30

71.1
63.9

22.2
2.8

.49
.71

54

33

40

17.8
0.0

22.2
2.8

-.03
-.17

-10

02

8

51.1
38.9

40.0
25.0

.12
.17

10

31

40

17.8
0.0

11.1
0.0

.11
.00

0

0

0

46.7
33.3

31.1
13.9

.17
.26

16

22

34

55.6
44.4

17.8
0.0

.43
.70

52

22

34

31.1
13.9

37.7
22.2

-.07
-.12

- 7

18

31

53.3
41.7

22.2
2.8

.34
.59

41

22

34

48.8
36.1

20.0
0.0

.32
.66

48

18

31

13.3
0.0

6.7
0.0

.10
.00

0

0

0

40.0
25.0

33.3
16.7

.08
.12

7

21

33

44.4
*84.4
30.6
**80.6
Method of Flanagan
Method of Davis

.44
.51

34

55

53

273

TEST D
ORGANIZATION OF DATA
This test is designed to test your ability to or­
ganize data. Select from the key below the curve which
best fits the data. If none of the curves fit the data
mark space five on your answer sheet.

1

3.
5.
4.

2

none of
the curves

The horizontal axis represents temperature. The
vertical axis represents the amount of Substance
A derived from Substance B.
Temperature

Amount of Substance A

4 grams
10°C.
25°G.
7 grams
35°C.
9 grams
14 grams
60°C.
The horizontal axis represents the amount of oxygen
in the experimental gas mixtures. The vertical
axis represents the amount of oxygen taken up by
red cells in these experiments.
Oxygen in gas mixtures

0
10
20

3.

Oxygen taken up by red cells
0

n

90
30
98
50
The horizontal axis represents the percent of carbon
dioxide in gas mixtures breathed in; the vertical
axis represents the percent increase in total amount
of gas breathed per minute.
Carbon dioxide percent

0
1
2
3
5
7

Percent Increase

0
10
25
50
100
200

274
The horizontal axis Is the
concentration of salt.
(Sodium chloride). The
vertical axis is the per­
cent of red cells destroyed
in these concentrations of
salt.
Concentration of salt

Abbreviated Key
1 - ^

3..

2.

4.

5. none

Percent red cells destroyed
98
75

.27
.36
.41
.50

10
1

The horizontal axis represents the amount of thyroprotein fed daily to cows. The vertical axis repre­
sents the percent increase in milk production.
Thyroproteln fed

Percent increase

.15
.20
.24
.30

18
23
27
33

grams
grams
grams
grams

The horizontal axis represents age in years. The
vertical axis is the percent increase in the weight
of the brain from birth to twenty years of age.
Percent increase
1 yr.
4

12

40
80
98

The horizontal axis represents the time in minutes
to kill bacteria in a weak solution of silver
nitrate. The vertical axis are the temperatures
to which the bacteria in the silver nitrate solu­
tions were subjected.
Temperature
Time in minutes
160
80
40
0

15° C.
20°C.
30° C.
45°C.

275

8.

The horizontal axis repre­
sents age in years. The
vertical axis is the per­
cent increase in the weight
of the ovaries and other
female sex organs from
birth to 20 years.
*s2
4

9.

Percent increase
‘

8
12

14
18

20
80

The horizontal axis represents time in seconds; the
vertical axis represents the amount of heat developed
in a single contraction of a single muscle fiber.

.0
.1
.2
.4
.8

Heat
0.0
10.0
15.0
14.0
4.0

The horizontal axis represents the number of days
since the memorization of certain nonsense syllables;
the vertical axis is the percent of the nonsense
syllables forgotten.
Time in days

11.

5. none

10

Time

10.

Abbreviated Key

Percent forgotten

1

45

2

60

6
12

80
84

The horizontal axis represents age of girls in
years; the vertical axis is the strength index of
these girls in pounds.
Age

1
5

Strength index

.5
2.0

8

5.0

15

12.0

20

12.5

276

12.

The horizontal axis repre­
sents the successive number
of trials In the learning
of a puzzle. The vertical
axis Is the time In seconds
of each trial.
Trial
1st
5th
10th
14th
18th

13.

1
3
6

15.

i- ^
2. X

3

/

4. t

5. none

Time in seconds
420
419
240
60
50

The horizontal axis represents the time in hours
after the injection of sugar into the blood; the
vertical axis is the amount of sugar in the blood.
Time after in .lection

14.

Abbreviated Kev

Blood suffar
35
12
8

The horizontal axis is the time in minutes after
pint Jars of c o m have been put in boiling water
and kept boiling; the vertical axis is the temperature in the center of the pint Jar.
Time

Temperature

5
10
30
60
100

20
21
55
90
99

The horizontal axis is age in years. The vertical
axis is the metabolic rate of an individual expressed
in calories per day.
A&2
2
5
15
25
40

Calories
60
40
30
25
23

277

16.

The horizontal axis represents
time in days; the vertical axis
is the number of yeast cells
In millions (starting with 100
yeast cells).
Time in days

4
8
12
20

17.

none

Number of yeast cells in millions

25
150
390
400

The horizontal axis is the temperature in Centigrade.
The vertical axis represents the amount of enzyme
activity of a certain type of bacteria in arbitrary
units.
Temperature

10

30
50
70
90
18

Abbreviated Key

Enzyme activity
0
1

2

3
2.5

The horizontal axis represents age in weeks; the
vertical axis represents the weight of an animal,
in kilograms.
Age

Weight

1

.05
.15
.80

3
8
12

19.

1.6

16

2.4

25

2.8

The horizontal axis represents the external temper­
ature; the vertical axis represents the amount of
oxygen absorbed by a frog at the various temperatures
Temperature
10

14
20

25

Oxygen

104
130
160
208

mg.
mg.
mg.
mg.

278
20.

The horizontal axis represents
Abbreviated Key
the time in hours.
The vertical
j
axis the temperature inside a
l._-X
5.none
thermos bottle containing germ­
inating pea seeds.
2.
Time

Temperature

0

20°C.

12
24
36

24°C.
3 0°G.
32°0.

279
TABLE XXXIX
ITEM ANALYSIS DATA FOR TEST D

Percent Success
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
**
**

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
;
$ Success

Index

*75.6
**69.4

42.2
27.8

.35
.40

26

48

49

95.6
94.4

37.7
22.2

.67
.74

57

57

54

80.0
75.0

24.4
5.6

.56
.71

54

40

45

88.8
86.1

20.0
0.0

.68
.87

80

42

46

84.4
80.6

51.1
38.9

.38
.43

28

59

55

91.1
88.9

26.7
8.3

.66
.78

63

48

49

77.8
72 .2

11.1
0.0

.67
.81

69

37

43

95.6
94.4

26.7
8.3

.73
.83

71

51

51

93.3
91.7

31.1
13.9

.67
.75

59

53

52

93.3
91.7

31.1
13.9

.67
.75

59

53

52

75.6
69.4

35.5
19.4

.41
.51

34

44

47

93.3
91.7

33.3
16.7

.65
.74

57

53

52

95.6
94.4

15.6
0.0

.80
.90

90

46

48

Method of Flanagan
Method of Davis

280
TABLE XXXIX (continued)

Percent Success
Item
14
15
16
17
18
19
20

*
**

Upper

2 7 %

Lower

2 7 %

Discrimination
r

Index

Difficulty
%

Success

Index

75.6
69.4

33.3
16.7

.43
.54

36

42

46

97.8
97.2

33.3
16.7

.75
.81

68

57

54

64.4
55.6

20.0
0.0

.46
.75

58

28

38

77.8
72.2

26.7
8.3

.52
.66

48

40

45

68.9
61.1

26.7
8.3

.43
.59

41

35

42

33.5
16.8

22.2
2.8

.14
.35

22

09

22

*42.2
**27.8

22.2
2.8

.23
.48

32

15

28

Method of Flanagan
Method of Davis

281
TEST E
EVALUATION OF HYPOTHESES
This test is designed to measure your understanding
of the relation of facts to the solution of a problem. The
over-all problem involved in this test is presented.
This
is followed by a series; of possible solutions to the problem
(hypotheses).
After each hypothesis there are a number of
items, all of which are true statements of fact.
Determine
how the statement is related to the hypothesis and mark each
statement according to the key which follows the hypothesis.,
GENERAL PROBLEM:
What factors are involved in the transmission and
development of Infantile Paralysis (Poliomyletis)?
HYPOTHESIS I.
In man the disease is contracted by direct contact
with persons having the disease.
For items 1 through 11 mark space if the item offers;
1.:direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no bearing on the hypothesis.
4. indirect evidence against the hypothesis.
5. direct evidence against the hypothesis.
1.

Monkeys free from the disease almost never catch
infantile paralysis from infected monkeys.

2.

Most strains of infantile paralysis virus can be
transferred from man only to monkeys and apes and
not to other animals.

3.

The virus has been isolated from the nasopharyngeal
washings of humans and monkeys.

4.

The curve of number of cases of the disease in a
given area is the same shape as the curve for the
fly population in that area, the Infantile paralysis
incidence curve lagging behind the fly population
curve by about two weeks.

3.

The virus has never been isolated from the blood.

6.

The virus is not found in the nasal secretion, nor
in the saliva.

282
7.

The incubation period for infantile paralysis is
from 4 to 21 days.

8.

Most persons in contact with the diseased individual
do not develop the disease.

9.

The incidence of infantile paralysis is higher in
rural districts than in the cities.

10.

Oases of infantile paralysis have been found to
follow the roads of communication of the population,
that is, the disease spreads from populated areas
along roads or rivers to other areas.

11.

Even during epidemics cases are spotty, it
is
usually impossible to trace one case from another.

12.

What is the status of hypothesis I ?
1.
It is true.
2.
It is probably true.
3.
It is false.
4.
It is probably false.
5. The data are contradictory, hence its truth
or falsity cannot be Judged.

HYPOTHESIS II.
The disease is spread by the excrement (excreted
material) of persons harboring the virus.
For items 13 through 23 mark space if the item offers:
1. direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no bearing on the hypothesis.
4. indirect evidence against the hypothesis.
3. direct evidence against the hypothesis.
13.

The virus is always found in the stools of persons
who have the disease.

14.

In the stools of persons not in contact with persons
with the disease the virus is found in only one
person in 100.

15.

During an epidemic non-paralytic cases outnumber
paralytic cases ten to one.

16.

The curve of number of cases of the disease in a
given area is the same shape as the curve for the
fly population in the area, the infantile paralysis
incidence curve lagging behind the fly population
curve by about two weeks.

283
17.

The Incubation period for infantile paralysis is
from 4 to 21 days.

18.

Nine out of 14 adult contacts had virus in the
stool, almost all child contacts have virus in the
stools..

19.

The virus has been isolated from streams carrying
sewage.

20.

Oases of the disease have been found to follow the
roads of communication of the population, that is,
the disease spreads from populated areas along
roads or rivers to other areas.

21.

The virus of the disease has been found in the
stools and vomit of flies up to two days after
eating an infected meal.

22.

Even during epidemics cases are spotty.

23.

It is usually impossible to trace one case from
another.

24.

What is the status of hypothesis II ?
1. It is true.
2.
It is probably true.
3. The data are contradictory, so the truth
or falsity cannot be Judged.
4. The hypothesis is probably false.
5.
It is definitely false.

HYPOTHESIS III.
The olfactory nerve (nerve from nose to brain) is
the route of entry of the virus.
For items 25 through 34 mark space if the item offers
1. direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no bearing on the hypothesis.
4. indirect evidence against the hypothesis.
5. direct evidence against the hypothesis.
25.

The virus has been isolated from nasopharyngeal
washings of humans and monkeys.

26.

A plug of cotton, saturated with virus, placed in
the nose of the monkey invariably causes the monkey
to contract the disease.
If the olfactory nerve is
cut the monkey does not contract the disease when a
plug saturated with the virus is placed in the nose.

284
27.

If the nose of a monkey is sprayed with zinc
sulphate the monkey (with virus plug inserted)
does not contract the disease.

28.

The virus is not found in the nasal secretion or
in the saliva.

29.

The virus has been isolated from the spinal cord
of 71# of the cases autopsled, and from the olfactory
nerve in 5# of the cases autopsled.

30.

The virus has been found in the nasopharynx from
several days before the onset of the disease until
about 3 days after the onset of the disease.

31.

Many doctors recommended the use of zinc sulphate
nasal spray (administered only by the physician).

32.

The virus is not affected by freezing.

33.

Most strains of the virus can be transferred
to monkeys and apes.

only

34.

The percentage of cases of infantile paralysis among
persons receiving the nasal spray of zinc sulphate
was the same as the percentage of cases in the total
population.

35.

What is the status of hypothesis III ?
1.
It is true.
2.
It is probably true.
3. The data are contradictory, hence truth or
falsity of the hypothesis cannot be Judged.
4.
It is probably false.
5.
It is definitely false.

HYPOTHESIS IV.
The higher the degree of sanitation the greater are
the chances of epidemic forms of the disease.
For items 36 through 45 mark space if the item offers:
1. direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no bearing on the hypothesis.
4. indirect evidence against the hypothesis.
5. direct evidence against the hypothesis.

36.

Monkeys free of the disease almost never catch
infantile paralysis from infected monkeys.

285
37.

The virus has been Isolated from streams carrying
sewage.

38.

In India epidemics seldom occur.

39.

In India children under five are about the only
ones affected.

40.

During the war there was one epidemic among the
European and American soldiers in India, the incidence
among the soldiers was extremely high.

41.

The percent of cases of infantile paralysis in
whites is about four times that in colored people.

42.

In the south (U.S.) there are three times as many
cases under five years as over five years of age.

43.

The percent of cases of infantile paralysis is
higher in rural districts than in the cities.

44.

In the north (U.S.) about 50$ of the cases are over
5 years of age..

45.

During an epidemic non-paralytic cases outnumber
paralytic cases ten to one.

46.

What is the status of hypothesis IV ?
1. The hypothesis is true.
2. It is probably true.
3. The data are contradictory, hence the truth
or falsity of the statement cannot be Judged.
4. It is probably false.
5. It is definitely false.

HYPOTHESIS V:
Healthy persons having had contact with diseased
individuals may carry the disease from one person
to another.
For items 47 through 59 mark space if the item offers:
1. direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no bearing on the hypothesis.
4. indirect evidence against the hypothesis.
5. direct evidence against the hypothesis.

47.

Monkeys free of the disease almost never catch
infantile paralysis from infected monkeys.

286
48.

During an epidemic non-paralytic cases outnumber
paralytic cases ten to one.

49.

It has been found that exertion prior to or at the
time of infection increases the incidence of the
disease.

50.

Even during epidemics cases
usually impossible to trace

51.

The virus is always found in the stools of people
who have the disease.

52.

Most persons in contact with the diseased individual
do not develop the disease.

53.

Nine out of 14 adult contacts had virus in stools,
almost all child contacts have virus in stools.

54.

U p to two months after contact the virus is found
in the stools of persons who contacted the victims,
but who did not contract the disease.

55.

In the stools of non-contacts the virus was found
In only one person in 100.

are spotty; it is
one case from another.

5 6 . Data on families each with one case of infantile
paralysis in the family:
39% of other children in
family from 1-4 years of age and 30% of other
children in family 5-9 years of age had minor
illnesses.
Only 9% of children in other homes
showed similar illnesses.
57.

The percent of cases of infantile paralysis is higher
in rural districts than in the cities.

58.

Under twenty years of age the percent of cases in
males is three times the percent of cases in females.

59.

Flies were allowed to feed on contaminated food.
The
flies were then placed in contact with food which was
fed to monkeys. The feces of the monkeys contained
the virus.

60.

What is
1.
2.
3.
4.
5.

the status of hypothesis V ?
The hypothesis is true.
It is probably true.
The data are contradictory, so the truth or
falsity cannot be Judged.
It is probably false.
It is definitely false.

287
HYPOTHESIS VI.
An immunity to the disease may he developed.
For items 63 through 71 mark space if the item offers:
1. direct evidence in support of the hypothesis.
2. indirect evidence in support of the hypothesis.
3. evidence which has no hearing on the hypothesis.
4. indirect evidence against the hypothesis.
5^ direct evidence against the hypothesis.
61.

Most strains of the infantile paralysis virus can be
transferred only from man to monkeys and apes and
not to other animals.

62.

During an epidemic non-paralysis virus cases outnumber
paralytic cases ten to one.

63.

The incubation period of infantile paralysis is from
4 to 21 days.

64.

Even during epidemics cases are spotty; it is usually
impossible to trace one case from another.

65.

Most persons in contact with the diseased individual
do not develop the disease.

66.

Up to two months after contact the virus is found in
the stools of persons who contacted the victims, but
who did not contract the disease.

6 7 . In the stools of persons not in contact with persons
with the disease the virus was found in only one
person in 100.
68.

Data on families each with one case of infantile
paralysis in the family: 39$ of the other children
in family from 1-4 years of age and 30$ of the other
children in family from 5-9 years of age had minor
illnesses.
Only 9$ of children in other homes
showed similar illnesses.

69*

Epidemics seldom occur in India and the disease is
almost entirely among children under 5 years of age.

70.

The percent of cases
in whites is about four times
the percent of cases in colored people.

71.

Gases of infantile paralysis may continue into the
winter, but an epidemic never arises anew during
the winter.

288
72.

What is
1.
2.
3.
4.
5.

73.

the status of hypothesis VI ?
The hypothesis is true,
It is probably true.
The data is contradictory, so the truth
or falsity cannot be Judged.
It is probably false.
It isdefinitely false.

How many of the six hypotheses are acceptable?
1. 1

2. 2
3.
4.
5.

74.

3
4
5

How many of the hypotheses are not acceptable?
1. 1
2. 2
3. 3
4. 4
5. 5

289
TABLE XXXX
ITEM ANALYSIS DATA F O R TEST E

Percent Success
Item
1
2

3
4

5
6
7
8
9
10
11
12

13
♦
♦♦

U p p e r 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

♦66.7
♦♦58.3

31.1
13.9

.38
.47

31

35

42

57.8
47.2

40.0
25.0

.18
.24

15

35

42

40.0
25.0

31.1
13.9

.10
.17

10

19

32

53.3
41.7

17.0
0.0

.41
.69

51

21

33

77.8
72.2

44.4
30.6

.36
.40

26

51

51

53.3
41.7

22.2
2.8

.34
.60

42

22

34

93.3
91.7

66.7
58.3

.40
.45

29

74

64

91.1
88.9

84.4
80.6

.13
.15

9

83

70

82.2
77.8

40.0
25.0

.45
.55

37

53

52

57.8
47.2

44.4
30.0

.13
.18

11

38

44

55.6
44.4

37.7
22.2

.20
.24

15

33

41

48.8
36.1

22.2
2.8

.28
.56

38

19

32

71.1
63.9

71.1
63.9

.00
.00

0

64

58

M e thod of Flanagan
M e thod of Davis

-

290

TABLE XKXX (continued)

Percent Success
Item

Upper

2

7

%

Lower

2

Discrimination
7

%

r

Index

Difficulty
%

Suc c e s s

Index

53.3
41.7

17.8
0.0

.38
.69

41

21

33

100.0
10 0.0

84.4
80.6

.50
.52

35

89

76

53.3
41.7

15.6
0.0

.41
.69

51

21

33

84.4
80 . 6

80.0
75.0

.05
.08

5

77

66

28. 9
11.1

20.0
0.0

.12
.39

25

5

16

66.7
58.3

33.3
16.7

.36
.44

27

35

42

53.3
41.7

44.4
30.6

.08
.14

8

35

42

53.3
41.7

22.2
2.8

.34
.60

42

22

34

60.0
50.0

48.8
36.1

.12
.15

9

42

46

38.7
22 . 0

17.8
0.0

.27
.55

37

12

25

84.4
80. 6

5 1.1
38.9

.38
.43

28

59

55

25

48.8
3 6.1

28 . 9
11.1

.21
.34

21

24

35

26

95.6
94.4

84.4
80.6

.25
.29

18

87

74

27

11.1
0.0

11.1
0.0

.00
.00

0

0

0

28

53.3
41 . 7

20.0
0.0

.36
.69

51

21

33

44.4
30.6

31.1
13.9

.15
.22

13

22

34

14
15
16
17
18
19
20
21
22
23
24

29

291
TABLE XXXX (continued)
Percent Success
Item

Upper 27 %

Lower

Discrimination
r

Index

Difficulty
%

Success

Index

57.8
47.2

33.3
16.7

.26
.38

24

31

40

46.7
33.3

24.4
5.6

.24
.41

27

19

32

97.8
97.2

84.4
80.6

.40
.41

27

89

76

91.1
88.9

75.6
69.4

.27
.27

17

79

67

55.6
44.4

37.7
22.2

.20
.24

15

33

41

62.6
52.8

24.4
5.6

.40
.58

40

28

38

75.6
69.4

44.4
30.6

.34
.39

25

50

50

17.8
0.0

17.8
0.0

.00
.00

0

0

0

55.6
44.4

17.8
0.0

.42
.70

52

22

34

39

31.1
13.9

20.0
0.0

.14
.46

30

8

20

40

46.7
33.3

24.4
5.6

.24
.43

28

19

32

62.6
52.8

28.9
11.1

.35
.47

31

31

40

26.7
8.3

26.7
8.3

.00
.00

0

8

20

43

46.7
33.3

13.3
0.0

.40
.65

47

17

30

44

15.6
0.0

13.3
0.0

.07
.00

0

0

0

100.0
100.0

77.8
72.2

.55
.61

43

85

72

30
31
32
33
34
35
36
37
38

41
42

45

292
TABLE XXXX (continued)

Percent Success
Item

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

33.3
16.7

13.3
0.0

.30
.48

32

8

21

62.6
52.8

24.4
5.6

.40
.58

40

28

38

15.6
0.0

13.3
0.0

.07
.00

0

0

0

93.3
91.7

55.6
44.4

.49
.56

38

69

60

53.3
41.7

17.8
0.0

.49
.56

38

69

60

88.9
86.1

64.4
55.6

.33
.36

23

70

61

57.8
47.2

22.2
2.8

.38
.63

45

25

36

35.5
19.4

24.4
5.6

.14
.29

18

13

26

57.8
47.2

20.0
0.0

.40
.71

54

24

35

57.8
47.2

26.7
8.3

.32
.51

34

28

38

44.4
30.6

37.7
22.2

.06
.10

6

25

36

57

48.8
36.1

28.9
11.1

.21
.32

20

24

35

58

95.6
94.4

88.9
86.1

.20
.18

11

90

77

59

35.5
19.4

20.0
0.0

.18
.54

36

11

24

60

64.4
55.6

37.7
22.2

.28
.37

23

38

44

46
47
48
49
50

51

52
53
54
55
56

293
TABLE XXXX (continued)

Percent Success
Item
61
62

63
64

65
66

67
68

69
70

71
72

73
74

*
**

U p p e r 27$

Lower 27$

Discrimination
r

Index

Difficulty
%

Success

Index

46.7
33.3

40.0
25.0

.07
.10

6

28

38

60.0
50.0

46.3
33.3

.13
.17

10

42

46

95.6
94.4

88.9
86.1

.20
.18

11

90

77

37.7
22.2

35.5
19.4

.03
.02

2

21

33

68.9
61.1

24.4
5.6

.44
.64

46

34

41

48.8
36.1

35.5
19.4

.13
.17

10

28

38

66.7
58.3

42.2
27.8

.27
.31

19

42

46

48.8
36.1

35-5
19.4

.14
.20

12

28

38

51.1
38.9

33.3
16.7

.18
.27

17

27

37

46.7
33.3

24.4
5.6

.24
.43

28

20

32

73.3
66.7

46.7
33.3

.28
.34

21

50

50

71.1
63.9

37.7
22.2

.36
.43

28

42

46

42.2
27.8

13.3
0.0

.35
.61

43

15

28

*40.0
**26.7

25.0
8.3

.17
.31

19

17

30

Method of Flanagan
Method of Davis

294TEST F
EXPERIMENTATION AND THE INTERPRETATION OF DATA
This test was designed to measure your ability to
interpret data and to test your understanding of experi­
mentation.
In each case the numbers in the first column
are the numbers which you will use as your answer.
Thus
the table presented becomes both the source of data and
your key for the questions which follow it.
In each case
where a test tube number or group number is called for the
one which gives positive evidence for the statement should
be given.
Below this the control or comparison is called
for.
This is the test tube or group number of the data
which offers a comparison.
For example:
1.
2.

Leaf
Leaf

in dark - no starch.
in light - starch.

Light is necessary for the production of starch.
You would mark space 2 because this is thepositive
evi­
dence, but it would be meaningless if itwere not compared
with the leaf in the dark.
Therefore, the following item,
•'What is the control (comparison) for item 1 ?" would be
marked space 1.
Items 1 through 15 refer to the data presented
below.
Some test tubes were set up and each contained 1
gram of fat.
They were marked 1, 2, 3, 4-, and 5. Mark
each item according to the test tube number called for.
Various substances were added to the tubes containing fat.
All substances were dissolved in water before they were
added to the fat.
All test tubes were kept at 85 F .
(Water boils at 212° F.)
For test tube 5, Substance A
was boiled and then allowed to cool before it was added
to the fat.
Test Tube
Number
1

2
3
4
5

Content of tube
Fat plus Substance A
Fat plus Substance A
plus Substance C
Fat plus Water
Fat plus Substance C
Fat plus Substance A
(boiled)

Amt. of Substance B
present after 24 hrs.
.1 gram
.5 gram
-0 gram
»0 gram
.0 gram

295
1.

Give the number of the test tube which acts as a
control (comparison) for the entire experiment,

2.

G-ive the number of the tube which gives evidence
that fat does not break down spontaneously into
Substance B in 24 hours.

3.

Give the number of the tube used to show that a
temperature of 85 degrees F. was not sufficient
to cause fat to be broken down into Substance B.

4.

Give the test tube number of the tube which gives
evidence that Substance A is the active substance
in the breakdown of fat to Substance B.

5.

Give the teBt tube number of the tube which is the
control (comparison) for item # 4.

6.

Give the number of the tube which provides evidence
that Substance 0 alone is ineffective in the break­
down of fats.

7.

What is the control for item # 6?

8.

Which test tube gives evidence that Substance C
accelerates the rate of activity of Substance A?

9.

Give the tube which is the control for item # 8 .

10.

Which tube gives evidence that Substance A is a
substance whose properties can be destroyed?

11.

Give the control for the tube in item # 10.

12.

Which tube gives evidence that Substance G affects
the fat in some way so that Substance A can more
easily act upon it?

13.

Which tube is the control for # 12?

14.

Which tube gives evidence that Substance A is not
a stable substance?

15.

What is the control for item # 14?

296
Items 16 through 28 refer to the d ata presented
below.
Mark each item a c c ording to the g r o u p called for.
Each g r o u p contained 100 persons fed on the diets indicated.
Group
1
2
3
4
5

D iet

Gases of Beri Beri

whole rice (i.e. rice w ith hulls)
polished rice
(i.e. rice with h u l l s removed)
polished rice plus V i t a m i n B i
polished rice plus V i t m a i n B 2
polished rice plus V i t a m i n B complex

none

60#
none
60#
none

16.

G-ive the number of the g r o u p which is the control
(comparison) for the entire experiment.

17.

Give the group w h ich gives evidence that rice hulls
contain a beri beri preventing substance.

18.

Give the control for item 17.

19.

Give the number of the g r o u p which provides e v i ­
dence that V i t a m i n B is not a single entity.

20.

Give the control for item 19.

21.

Give the numb e r of the g r o u p which
rice hulls may contain V i t a m i n B.

22.

Give the control for item 21.

23.

Give the number of the group which provides evidence
that rice hulls may contain Vit a m i n B^.

24.

Give the num b e r of the g r o u p which is the control
for item 23.

25.

Which group gives evidence that a differing of
Vi t am i n B causes beri beri.

26.

What is the control for item 257

27.

What g r oup gives evidence that Vit a m i n B2 is not the
active factor in the prevention of beri beri?

28.

What is the control for item 27?

indicate

that

297
Items 29 through 39 r e fer to the d a t a p r e sented
below.
M a r k e ach Item ac c o r d i n g to the g r o u p numb e r
called for.
W h e n a person ascends to h igh altitudes his
blood cell count Increases a f ter about 10 days.
The
following d a t a were o b t a i n e d from a study of altitude
effects on rats.
7 6 0 mm, of m e r c u r y Is atmospheric
pressure at sea level.
A i r Is c o m p o s e d of about 20$
oxygen a nd 8 0 $ nitrogen.
G r oup
1
2
3
4
5

A t m o s p h e r i c pressure
760
380
760
760
380

$02
10
20
20
40
40

$ N
90
80
80
60
60

R e d cell count
Increased
increased
normal
d e c reased
normal

29.

G i v e the n u m b e r of the g r o u p which is the control
for the entire experiment.

30.

G i v e the n u m b e r of the g r o u p that gives evidence
that a decrease in atmospheric pressure causes
an increase in red cell count at h i g h altitude.

31.

W h i c h g r o u p is the control

32.

W h i c h g r o u p gives evidence that it is the decrease
of oxygen pressure which is responsible for the
increase in cell count at h i g h altitudes?

33.

W h i c h of the groups is the

34-.

W h i c h of the groups gives evidence that a decrease
in at m o s pheric pressure is n o t the cause of an in­
c r e a s e d red cell count at h i g h altitudes?

35*

W h a t is the control for item 34?

36.

Give the n u m b e r of the g r o u p which gives evidence
that a decrease in n i trogen pressure is not
respon s i b l e for the I n c reased r e d cell count at
h i g h altitudes.

37.

Give the n u m b e r of the group that is the control
for item 36.

38.

W h i c h g r o u p gives evidence that an increase in
o x y g e n pressure decreases the red cell count?

39.

Wha t is the control for item 38?

(comparison) for item 30?

best control for item 32?

2 9 8

Items 40 through 57 refer to the data presented
below. Mark each item according to the leaf number called
for. Plant A normally stores starch in its leaves while
plant B does not normally store starch in its leaves. The
following experiments were performed in a dark room at 72
degrees F. Glucose (sugar) solutions were made with 20
grams of glucose per 100 cubic centimeters of water.
Leaves of plant A taken from a plant that had been in the
dark for 48 hours were floated in the 5 solutions listed
below and left in the glucose solution for an hour.
Leaf
1
2

3
4
5

Solution
Glucose
Water
Glucose plus Juice
from Plant B
Glucose plus Juice
from Plant C
Glucose plus boiled
Juice from Plant B

Analysis of leaf after 4 hours
Starch in leaf
No starch in leaf
No starch in leaf
No starch in leaf
Small amount of starch
in leaf

40.

Give the number of the leaf which showed that starch
does not develop spontaneously in the leaf in the
dark.

41.

This leaf indicates that a temperature of 72 degrees
F. does not cause starch to form in the leaf.

42.

Give the number of the leaf which is the control
(comparison) for the entire experiment.

43. Give the number of the leaf which gives evidence
that Plant A is capable of manufacturing starch
from glucose.
44.

Give the number of the leaf which is the control
for item 43.

45.

Give the number of the leaf which gives evidence
that the juice of Plant B is capable of preventing
the manufacture of starch from glucose.

46.

What is the control for item 45?

47.

Give the number of the leaf which gives evidence
that Plant A is normally able to store starch in
its leaves.

48.

What is the control for item 47?

299
49.

Gi v e the n u m b e r of the leaf w h i c h gives evidence that
P l a n t C d o e s n o t n o r m a l l y f o r m s t a r c h in I t s l e a v e s .

50.

G-ive

the

51.

Wh i c h leaf shows that w a t e r does not
d u c t i o n o f s t a r c h in t h e l e a f ?

52.

G-ive
the n u m b e r of the leaf w h i c h g i v e s e v i d e n c e
that the juices o f Plant B contain a substance which
i n h i b i t s t h e p r o d u c t i o n o f s t a r c h in i t s l e a v e s .

53.

Give

54.

This
leaf g i v e s e v i d e n c e t h a t the
s t a n c e is n o t a s t a b l e s u b s t a n c e .

55.

What

56.

G i v e the n u m b e r of the leaf w h i c h shows that b o i l ­
i n g d e s t r o y s t h e a c t i v i t y o f t h e J u i c e o f P l a n t B.

57.

Give

the

is

the

leaf n u m b e r

leaf w h i c h

the

control

control

for

of

is

for

the

the

control

control

item

f o r I t e m 49.
cause

for

the p r o ­

i t e m 52.

inhibitory

sub­

54?

i t e m 56.

I t e m s 58 t h r o u g h 7 2 r e f e r t o t h e d a t a p r e s e n t e d
□elow.
F i v e t e s t tubes, e a c h c o n t a i n i n g a g r a m of protein,
r f e re
set up.
M a r k e a c h i t e m a c c o r d i n g to t h e t e s t t u b e
l u m b e r c a l l e d for.
A l l s u b s t a n c e s w e r e d i s s o l v e d in w a t e r ,
ill t e s t t u b e s w e r e k e p t a t 37 ° C . ( w a t e r b o i l s a t 1 0 0 ° C.)
ror t e s t t u b e 5, S u b s t a n c e X w a s b o i l e d a n d t h e n c o o l e d
D e f o r e it w a s a d d e d to t h e p r o t e i n .
Tube
1
2
3
4
5

•

59.

Contents

of

tubes

Protein plus S ubs ta nce X
Protein plus W a t e r
Protein plus Substance X
plus h y d r o c h l o r i c acid
Protein plus H y d r o c h l o r i c
acid
Protein plus S u b s t a n c e X
(boiled)

A m t . of S u bstance W
present after 24 hours
.05 g r a m
.00 g r a m
.08 g r a m
•00 g r a m
.00 g r a m

G i v e t h e n u m b e r o f t h e t e s t t u b e w h i c h a c t s as a
control (comparison) for the entire experiment.
G i v e the n u m b e r of the test tube wh i c h gives e v i ­
dence that protein does not break down spontaneously
into S u b s t a n c e W.

300
60.

G-ive the number of the test tube which gives evi­
dence that Substance X is the active substance in
the break down of proteins.

61.

Give the number of the tube which is the control
for item 60.

62

. G-ive

the number of the test tube which shows that
a temperature of 37° C. does not cause protein to
break down into Substance W.

63.

Which test tube gives evidence that Substance X
is not a stable substance?

64.

Which

65.

Which tube gives evidence that acid accelerates
the activity of Substance X?

66.

Which

67.

Which tube gives evidence that Substance X is a
substance whose properties can be destroyed?

tube is the control for item 63?

tube is the control for item 65?

68. Give the test tube number of the control for item 67.
69.

Which test tube gives evidence that acid affects
the protein in some way so that Substance X can
act upon it more easily?

70.

Give the tube number which is the control for
item 69.

71.

Give the number of the test tube which indicates
that hydrochloric acid alone is ineffective in
breaking down proteins.

72.

G-ive the control for item 71.

301

TABLE

X

X

X

X

I

ITEM A N A L Y S I S D A T A F O R TEST F

Percent Success
Item
1
2

3
4

5
6

7
8

9
10
11
12

13

*
**

U p p e r 27 %

Lower 27$

Discrimination
r

Index

Difficulty
%

Success

Index

*91.1
**88.9

77.8
7 2.2

.24
.24

15

80

68

77.8
72.2

24.4
5.6

.55
.70

52

38

44

64.4
56.6

22.2
2.8

.44
.68

50

28

38

88.9
86.1

60.0
50.0

.37
.41

27

68

60

31.1
13.9

24.4
5.6

.08
.18

11

10

23

100.0
100. 0

73.3
66.7

.62
.64

46

82

69

86. 7
83.3

51.1
38.9

.43
.48

32

61

56

100.0
100.0

82.2
77.8

.52
.55

37

88

75

88.9
86.1

64.4
55.6

.34
.36

23

70

61

97.8
97.2

80.0
75.0

.45
.47

31

86

73

9 3.3
91.7

73.3
66.7

.34
.38

24

79

67

95.6
94.4

80.0
75.0

.35
.35

24

79

67

8 0. 0
75.0

5 1.1
38.9

.33
.36

23

57

54

M e t h o d of F l a n a g a n
M e t h o d of Davis

302
TABL2 XXXXI (continued)

Percent Success
Upper
14

2 1 %

Lower

2 1 %

Discrimination
r

Index

Difficulty
%

Success

Index

91.1
88.6

48.8
36.1

.50
.63

38

61

56

88.9
86.1

53.3
41.7

.44
.48

32

63

57

33.3
16.7

8.9
0.0

.38
.48

32

8

21

77.8
72.2

55.6
44.4

.25
.29

18

57

54

77.8
72.2

53.3
41.7

.28
.31

19

59

54

71.1
63.9

35.5
19.4

.36
.46

30

42

46

28.9
11.1

15.6
0.0

.18
.31

19

59

54

62.6
52.8

53-3
41.7

.11
.10

6

46

48

15.6
0.0

6.7
0.0

.16
.00

0

0

0

23

42.2
27.8

42.2
27.8

.00
.00

0

28

38

24

35-5
19.4

22.2
2.8

.16
.45

29

10

23

25

100.0
100.0

86.7
83.3

.47
.50

33

91

78

26

46.7
33.3

25.9
11.1

.19
.32

20

22

34

27

95.6
94.4

53.3
41.7

.60
.63

45

69

60

53.3
41.7

28.9
11.1

.25
.39

25

25

36

15

16
17
16
19
20
21
22

303
TABLE XXXXI (continued)

Item

Percent Success
Upper 2 7 %
Lower 27 %

Discrimination
r
Index

%

Difficulty
Success Index

97.8
97.2

73.3
66.7

•52
•52

35

82

69

93.3
91.7

80.0
75.0

.25
.29

18

83

70

71.1
63.9

33.3
16.7

.38
.51

34

40

45

91.1
88.9

68.9
61.1

.33
.37

23

74

64

77.8
72.2

40.0
25.0

.38
.47

31

48

49

93.3
91.7

71.1
63.9

.37
.39

25

77

66

53.3
41.7

35.5
19.4

.18
.23

14

31

40

64.4
55.6

35.5
19.4

.29
.39

25

38

44

37

20.0
0.0

6.7
0.0

.25
.00

0

0

0

38

100.0
100.0

88.9
86.1

.45
.46

30

93

81

71.1
63.9

35.5
19.4

.36
.43

25

38

44

91.1
88.9

35.5
22.2

.61
.67

49

55

53

86.7
83.3

44.4
30.6

.48
.54

36

57

54

42

62.6
52.8

35.5
19.4

.27
.34

21

38

44

43

100.0
100.0

62.6
52.8

.68
.71

54

74

64

29
30
31
32
33
34
35
36

39
40
41

304
TABL2 XXXXI (continued)

Percent Success
Upper 27%

Lower

2 7 %

Discrimination
r

Index

Difficulty
%

Success

Index

93.3
91.7

40.0
25.0

.62
.69

51

57

54

97.8
97.2

80.0
75.0

.45
.46

30

87

72

80.0
75.0

28.9
11.1

.52
.63

45

42

46

91.1
88.9

73.3
66.7

.29
.31

19

77

66

86.7
83.3

86.7
83.3

.00
.00

0

83

70

49

100.0
100.0

100.0
100.0

.00
.00

0

100

100

50

0.0
0.0

0.0
0.0

.00
.00

0

0

0

100.0
100.0

86.7
83.3

.48
.50

33

91

79

88.9
86.1

42.2
27.8

.53
.59

41

57

54

53

62.6
52.8

11.1
0.0

.55
.73

56

27

37

54

97.8
97.2

55.6
44.4

.65
.68

50

70

61

55

91.1
88.9

53.3
41.7

.48
.54

36

64

58

56

100.0
100.0

75.6
69.4

.58
.62

44

84

71

57

93-3
91.7

64.4
55.6

.42
.47

31

73

63

93.3
91.7

71.1
63.9

.36
.39

25

77

66

44
45
46
47
48

51
52

305
TABLE XXXXI (continued)

Percent Success
Item
59
60
61
62
63
64
65
66
67
68
69
70
71
72

*
**

Upper 27^

Lower 27^

Discrimination
r
Index

%

Difficulty
Success
Index

97.8
97.2

40.0
25-0

.72
.76

60

61

56

95.6
94.4

46.7
33.3

.64
.67

49

64

58

80.0
75.0

26.7
8.3

.53
.68

50

40

45

95.6
94.4

37.7
22.2

.67
.73

56

57

54

93.3
91.7

35.5
19.4

.60
.72

55

55

53

97.8
97.2

42.2
27.8

.72
.75

59

61

56

100.0
100.0

91.1
88.9

.40
.41

27

93

82

82.2
77.8

44.4
30.6

.42
.48

32

53

52

100.0
100.0

80.0
75.0

.55
.58

40

86

73

100.0
100.0

73.3
66.7

.62
.64

46

82

69

97.8
97.2

88.9
86.1

.32
.32

20

91

79

84.4
80.6

44.4
30.6

.45
.51

34

55

53

100.0
100.0

86.7
83.3

.45
.50

33

91

78

*91.1
**88.9

66.7
58.3

.35
.39

25

73

63

Method of Flanagan
Method of Davis

306
TEST G
DRAWING- OF CONGLUS IONS
This test was designed to measure your ability to
make conclusions.
W h e n facts are analysed and studied
they sometimes yield evidence which help in the solution
of a problem.
However, any conclusion must be checked
before it can be accepted.
The following key includes
four ways in which conclusions may be faulty.
Each of
the items present a question or problem, a brief descrip­
tion of an experiment and one or more conclusions drawn
from the experiment.
Each experiment was repeated many
times.
Read each problem, experiment and the conclusions.
Where several conclusions are giv e n evaluate each conclu­
sion separately.
Is the conclusion tentatively Justified
by the data?
If so, mark space 1 on your answer sheet.
If the conclusion is not Justified determine whether 2,
3, 4, or 3 in the key is the best reason for it being
faulty and mark the proper space on your answer sheet.
Key
The conclusion Is:
1.
Tentatively justified.
2. U n j u s tified - it does not answer p r o b l em.
3. Unjust i f i e d - the experiment lacks a
control (comparison).
4. Unjust i f i e d - the data are faulty or
Inadequate, though a control
was i ncluded.
5. U njustified - it is contradicted by the
data.
PROBLEM:

1.

He concluded that the test was a-specific test for
the substance.

PROBLEM:

2.

A student was interested in developing a test
for a certain type of substance.
In all 100
cases his test was positive.

A student knew that a purple color develops when
iodine is added to starch and that this is a
specific test for starch.
He wished to determine
whether a certain food contained starch. He added
iodine to the food and found that it turned purple.

He concluded that the food was fattening.

Another student concluded that iodine is & test
for starch.
•I:

An investigator wanted to know what causes people
to hreathe faster when they are running rapidly.
He found that breathing more carbon dioxide in­
creased the breathing rate, but that the breath­
ing of air deficient in oxygen did not increase
the breathing rate.

He concluded that people breathe faster when they
are running because they need more oxygen.
Someone else concluded that running increases the
rate of breathing.
Another person said that people running rapidly
take in more carbon dioxide, causing them to breathe
acre rapidly.
Still another claimed that it is harder for the
heart to pump faster without sufficient oxygen.
Another concluded that carbon dioxide affects the
breathing rate.
Someone else concluded that people who are exercis­
ing must breathe pure carbon dioxide to cause an
Increase in breathing rate.
*r
.:

An individual,
oxygen is used
air of a large
found that the

wishing to determine whether
during sleep, analyzed the expired
number of sleeping persons.
He
expired air contained oxygen.

He concluded that oxygen is not used during sleep.
Another concluded that oxygen is needed for life.
Someone else claimed that people breathe while they
are sleeping.
Still another person concluded that oxygen is given
off as well as taken in during sleep.
Another person said that this proved that oxygen is
used during sleep.

308
PROBLEM:

An investigator wished to determine whether
temperature increased the rate of a certain
reaction.
On repeated tests he found that if
he started out with a certain amount of his
original substances he would obtain, after one
hour, 1 gram of the substance produced by the
reaction at 0 ° C ., 2 grams at 20°G., 5 grams at
40°C. and 3 grams at 60°C.

13.

He concluded that increased temperature increased
the rate of the reaction.

16.

Another person claimed that this shows that an in­
crease in temperature increases the amount of the
original substance.

PROBLEM:

A person wanted to determine whether bile aided
in the digestion of fats.
He found that whenever
he mixed pancreatic juice with fats a small part
of the fat was digested, but whenever he mixed
pancreatic juice and bile with fat, he found that
the fat was completely digested.
When he mixed
bile alone with fat he found that there was no
digestion.

17.

He

concluded that bile aided in the digestion of fats.

18.

Another concluded that pancreatic Juice was necessary
for digestion of fats.

19.

One person concluded that it was necessary that the
bile and pancreatic juice work together, in order
that fats may be digested.

20.

Someone else claimed that bile does not aid in the
digestion of fat.

PROBLEM:

In order to find out if all foods contained starch,
ten foods were tested by the iodine test which
was known to be a specific test for starch.
All
of the foods tested contained starch.

21.

The conclusion drawn was that all foods contain
starch.

22.

Another conclusion was that iodine is a good reagent
to determine the presence of starch.

23.

Another conclusion was that the iodine test proved
that starch was present.

309
PROBLEM:

24.

In order to determine whether oortioosterone
caused a certain disease, a person analyzed the
blood of several hundred patients suffering from
the disease.
He found that in each case the
blood contained cortln.

He concluded that the disease was caused by corti­
costerone.

PROBLEM:

In order to determine the cause of increased red
blood cell count at high altitude, experimenters
subjected rats, dogs and guinea pigs at sea level
to a reduced total atmospheric pressure.
The red
cell count was higher in these than in the same
kinds of animals not subjected to reduced atmos­
pheric pressure.

25.

Conclusion:
A decrease in the oxygen in the air
breathed at high altitude causes the increase in
red cell count.

26.

Another conclusion:
The red cell count varies in­
versely with the atmospheric pressure.

PROBLEM:

Two students desired to know whether certain
types of mosquitos or whether all mosquitos
spread malarial fever.
They captured many speci­
mens of three kinds of wild mosquitos, types A,
B, and C.
They examined the digestive tracts of
all three types.
They found malarial parasites
only in type A mosquitos.

27.

Conclusion:
Malarial fever is spread by type A
mosquitos but not by types B and C.

28.

Another conclusion:
parasites.

Not all mosquitos carry malaria

29.

Another conclusion:
parasites.

Not all mosquitos have malarial

PROBLEM:

A student Interested in frozen food preservation
wanted to determine whether extremely low tempera­
tures killed the kind of bacteria that spoil meat.
He cut a number of pieces of various types of
meat into two pieces leaving one piece of each
sample at room temperature and the other of each
sample in a locker at a temperature of 40 degrees
below freezing.
All samples were sealed in bacteria-proof containers.
After thirty days he
opened the packages.
He found the room temperature

310

PROBLEM: (cont inued)
samples badly decomposed.
The frozen samples
were in their original condition except for
being frozen solid.
30.

Conclusion: A temperature 40 degrees below freez­
ing will kill the bacteria that are responsible for
the decay of meat.

31.

Another conclusion:
Heat is a controlling factor in
the preservation of foods.

32.

Another conclusion:
Meat kept in a temperature of
40 degrees below freezing does not become decomposed.

33.

Another conclusion:
Room temperature causes meats
to spoil, whereas frozen meats are preserved.

34.

Still another conclusion:
Bacteria must not have
been present in the frozen packages.

PROBLEM:

A person wanted to know what caused a certain
disease. He examined 1000 patients with the
disease.
All had a certain bacteria (Bacteria A)
in the digestive tract.

35.

He concluded that Bacteria A was the cause of the
disease.

36.

Another conclusion:
digestive tract.

The disease starts in the

37.

Another conclusion:
digestion.

Bacteria A is necessary for

38.

Another conclusion:
spoilage of food.

The cause of the disease was

PROBLEM:

A person wanted to know why plants bend toward
the light. He placed one group of plants in the
light with the light source at the right. He
placed another group of similar plants in the
dark.
The plants in the dark grew straight, the
plants in the light were bent to the right.

39.

He concluded that plants bend toward the light.

40.

Another concluded that plants bend toward the light
because they need light to grow.

311
41.

Someone else concluded that light influences the
direction in which plants grow.

42.

Another concluded that plants bend toward the sun
in order to get the beneficial rays of the sun.

PROBLEM:

An investigator wanted to know what caused fish
to swim against the current.
He placed fish in
a bottle.
If the bottle was moved to the right
the fish moved to the left and vice versa. Blind
fish did not respond to the water currents in the
bottle, but fish do orient against the current in
a stream at night.

43.

He concluded that fish can see at night.

44.

Another concluded that fish
swim against the current
because fish will drown if water enters the rear of
the gills with force over a long period.

45.

Another concluded that normal fish swim against the
current.

46.

Someone else concluded that blind fish do not swim
against the current because they cannot see.

PROBLEM:

Investigator A wanted to know what caused people
to become ill if confined in large numbers to a
small closed area.
He found on repeated tests
that the air in very crowded closed areas con­
tained about 5 % carbon dioxide, while normal air
contains .03^ carbon dioxide.

47.

He concluded that excessive carbon dioxide caused
the illness.

48.

Another investigator concluded that the illness was
caused by insufficient oxygen.

49.

Another investigator claimed that the illness was
caused by the germs exhaled by the people in the room.

PROBLEM:

50.

Investigator B in an attempt to solve the same
problem repeated the experiment done by investi­
gator A but in addition had people in uncrowded
rooms breathe air containing 3 % carbon dioxide.
No ill effects were noted among those in the u n ­
crowded r o o m s .

He also concluded that excessive carbon dioxide
caused the illness.

312
51.

Anothe r Investigator claimed that this showed that
the disease was caused by insufficient oxygen.

52.

The investigator who callmed the disease was due
to germs was convinced by this experiment that he
was correct.

53.

A n o the r conclusion was that 5/6 carbon dioxide will
produce no ill effects.

54.

Still another claimed that people live better in
un c r ow d e d a r e a s •

PROBLEM:

55.

To find out if all foods contain sugar. Benedict's
solution is a specific test for sugar.
Ten foods
were tested with Benedict's solution.
A l l of the
foods contained sugar.

Conclusion:
sugar.

Benedict's solution is a good test for

56.

Another

57.

A n o the r conclusion:
sugar was present.

PROBLEM:

58.
59.

conclusion:

All foods contain sugar.
The Benedict test showed that

To determine whether a certain bacteria uses
oxygen.
The Winkler test is an oxygen test.
A
broth in which bacteria were grown was tested
for oxygen.
The broth was shown, by the Winkler
test, to contain oxygen.

Conclusion:
oxygen.

This type of bacteria does not use

Another conclusion: This type of bacteria gives off
oxygen as a waste product.

60.

Still smother conclusion:
The presence of oxygen
does not stop the growth of bacteria.

61.

Another person concluded that this proves that
oxygen is needed by bacteria.

PROBLEM:

To determine the cause of disease X.
One thousand
persons with the disease were examined.
Bacteria
^ was found in the mouth of all of the persons
wit h the disease.

313
62.

One conclusion:

63.

A n o t h e r conclusion:
mouth.

64.

A n o t h e r co n c l u s i o n :
This d i s e a s e is caus e d b y
b a c t e r i a i n t r o d u c e d into the m o u t h f rom c o n t a m i n ­
a t e d food.

PROBLEM:

Bacteria ^

causes

the disease.

This d i s e a s e starts in the

To d e t e r m i n e the r e a c t i o n of insects to light.
F l i e s w e r e p l a c e d in a Jar, the u p p e r h a l f of
w h i c h w a s c o v e r e d w i t h b l a c k paper.
A light
w a s p l a c e d n e a r the Jar.
A l l of the flies
fle w to the l o w e r h a l f of the Jar and toward
the i l l u m i n a t e d side.

65.

Conclu s i o n :

66.

A n o t h e r conclu s i o n :

Insects are a t t r a c t e d to heat.

67.

A n o t h e r conclusion:
warmth.

The flies n e e d e d light for

PROBLEM:

Insects are a t t r a c t e d to light.

To d e t e r m i n e some of
the r e q u i r e m e n t s for the
s p r o u t i n g of seeds.
Two groups of plants were
p l a n t e d in f l o w e r pots.
C o n d i t i o n s of b o t h were
the same e x c e p t that one pot was put in the g r e e n ­
h o u s e at 4 0 degrees; the o t h e r g r o u p was put in a
g r e e n h o u s e at 7 0 degrees.
Those in the c old roo m
d i d n o t sprout, those in the w a r m r o o m sprouted.
M a n y kinds of seeds w ere u s e d in e a c h group.

68.

Conclusion:
A t e m p e r a t u r e of 7 0 degrees is r e q u i r e d
f o r seeds to sprout.

69.

A n o t h e r conclusion:

70.

A n o t h e r c o n c l usion:
M o i s t u r e is one of the r e q u i r e ­
ments f o r the s p r o u t i n g of seeds.

71.

A n o t h e r conclusion:
needed.

72.

A n o t h e r c onclusion:
A t e m p e r a t u r e of 40 degrees
keeps seeds f rom sprouting.

PROBLEM:

Plants n e e d h e a t to live.

F o r a n y t h i n g to g r o w ener g y is

To d e t e r m i n e s ome of
the r e q u i r e m e n t s for the
s p r o u t i n g of seeds.
Two groups of seeds were
planted.
C o n d i t i o n s w ere the same for both
grou p s exc e p t that one g r o u p was p l a n t e d in

314
PROBLEM:

73.

(continued)
stoppered “bottles, the other group in open bottles.
Only the seeds in the open bottles sprouted. Many
different kinds of seeds were included in each
group.

Conclusion:

Seeds require oxygen to sprout.

74.

Another
conclusion: One of the requirements for
sprouting of seeds is moisture.

75.

Another
conclusion: The seeds in the stoppered
bottles were dormant.

76.

Another
conclusion: Energy from the outside is
necessary for growth.

77.

Another
conclusion: Carbon dioxide is a requirement
of sprouting seeds.

PROBLEM:

the

What are some of the requirements for seeds to
sprout? A student put many different kinds of
seeds in pots containing garden soil and many
different kinds of seeds in pots containing the
same type of soil with all of the potassium salts
removed.
The plants in the garden soil grew and
developed well.
The plants in the other pots were
small and soon died. All other conditions were
the same for both groups.

78.

Conclusion:
to sprout.

Potassium salts are required for seeds

79.

Another conclusion: Heat and moisture are necessary
for seeds to sprout.

80.

Another conclusion: Minerals are essential for the
germination of seeds.

81.

Another conclusion: Potassium salts
important energy for plants.

82.

Another conclusion: When the plants had used up
their supply of food they couldn*t replace it.

83.

Another conclusion: Potassium salts as well as
other minerals are essential to plants and their
lack will slow down growth.

contain some

315
PROBLEM:

What are some of the requirements for seeds to
sprout?
The student placed two groups of seeds
In two pots and watered one pot daily.
The
other group he watered on alternate days. All
of the seeds sprouted.
Many types of seeds
used, other conditions same for both groups.

84.

Conclusion:- Water is necessary if seeds are to
sprout but it is not necessary to water them every
day.

85.

Another conclusion:
Seeds will sprout with a
limited amount of water.

86.

Another conclusion:
One of the requirements of
seeds to sprout is moisture.

87.

Another conclusion:
sprouting of seeds.

88.

Another conclusion:
Both groups of plants had an
adequate amount of water.

PROBLEM:

Water is a minor factor in the

What are some of the requirements for seeds to
sprout? The same student planted two groups of
seeds of different types in pots and placed one
group of the pots in the light, the others in
the dark.
Those plants in the light were green,
those in the dark were yellow.
Other conditions
were the same for both groups.

89.

Conclusion:
seeds.

90.

Another conclusion:
properly.

Plants require light to mature

91.

Another conclusion:

Light makes the plants green.

PROBLEM:

92.

Light is necessary for sprouting of

An Investigator wanted to determine whether in­
creased light increased the rate of a certain
reaction.
On repeated tests it was found that
a certain amount of the original substance (X),
after one hour, would produce 1 gram of sub­
stance Y with 10 photons (units of light) of
illumination, 2 grams with 20 photons, 4 grams
with 30 photons and 3 grams with 40 photons.

Conclusion:
Increased amount of light increases
the rate of the reaction.

316
93.

Another conclusions
reaction.

PROBLEM:

Heat increased the rate of the

A student wanted to determine whether plants
grow more rapidly in the light or in the dark.
Two groups of seeds were planted.
After two
weeks the plants were measured.
Those in the
light were green and a few inches long.
Those
in the dark were yellow and a foot long.
All
other conditions were the same for both groups.
The experiment was repeated with several kinds
of seeds.
The results were the same as given
above.

94.

Conclusion:
The plants in the dark put all their
energy into height trying to reach light while the
other ones put their energy into strength.

95.

Another conclusion: Light is necessary for faster
and better growth of plants.

96.

Another conclusion:
were more healthy.

The plants grown in the light

97.

Another conclusion:
dar k .

Plants grow more rapidly in the

98.

Another conclusion: Light is necessary for the
development of the green color of plants.

PROBLEM:

99.
100.

A student wanted to determine whether a certain
beverage contained sugar. Benedict's solution
which is blue when added to sugar and heated
turns the solution yellow.
(It is known to be a
specific test for sugar). Benedict's was added
to the beverage and heated.
The solution turned
yellow.

Conclusion:

The beverage is not fattening.

Another student concluded that Benedict's solution
is a good test for sugar.

317
TABLE XXXXII
ITSK ANALYSIS DATA FOR TEST G-

Percent Success
Item

Upper 27$

Discrimination

Lower 27$

r

Index

Difficulty
$ Success

Index

.33
.43

28

33

41

77.8
72.2

64.4
55.6

.17
.15

9

64

58

60.0
50.0

24.4
5.6

.37
.56

38

28

38

95.6
9 4 . 4

35.5
19.4

.72
.75

59

55

53

80.0
75.0

40.0
25.0

.42
.50

33

50

50

20.0
0.0

4.4
0.0

.36
.00

0

0

0

37.7
22.2

35.5
19.4

.02
.04

2

21

33

20.0
0.0

8.8
0.0

.20
.00

0

0

0

33.3
16.7

29-8
11.1

.04
.07

4

3.4

27

10

46.7
33.3

26.7
8.3

.21
,38

24

21

33

11

91.1
88.9

60.0
50.0

.46

30

68

60

88.9
86.1

40.0
25.0

.54
.63

45

57

54

22.2
2.8

6.7
0.0

.32
.12

7

2

5

2
3
4
5
6
7
8
o
«✓

12
13
*
**

Method of Flanagan
Method of Davis

•

31.1
13.9

CM

*62.6
**52.8

1

318
TABLE XXXXII (continued)

Percent Success
Upper 27% Lower 27%
14

Discrimination
Difficulty
r
Index
% Success Index
.14
2
.12
7
5

22.2
2.8

13.3

15

46.7
33.0

17.8
0.0

.33
.65

47

17

30

16

35.5
19.4

26.7
8.3

.13
.23

14

14

27

17

95.6
94.4

68.9

.47
.50

33

78

66

46.7
33.0

4.4

.58
.65

47

17

30

19

22.2

4.4

2.8
0.0

.37
.12

7

2

5

20

100.0
100.0

68.9
61.1

.65
.67

49

80

68

4.4

24.4
5-6

-.05
-.07

-4

5

15

22

93.3
91.7

33.3
16.7

.65
.72

56

55

53

23

77.8

8.9

72.2

0.0

.69
.81

69

37

43

.23
.31

19

35

42

30

8

20

0

0

0

18

21

20.0

0.0

61.1
0.0

60.0

37.7

50.0

22.2

25

31.1
13-9

13.3
0.0

.25
.46

26

15.6
0.0

6.7

.18

0.0

.00

37.7

28.9

.10

24

27

22.2

11.1

.18

11

17

30

26.7

8.3

.50
.35

22

5

14

2.2

0.0

319
TABLE XXXXII (continued)

Item

Percent Success
Upper 27# Lower 27#

Discrimination
Difficulty
r
Index
# Success Index

22.2

37.7

2.2
0.0

.55

37

12

25

26.7
8.3

26.7
8.3

.00
.00

0

8

20

48.8

11.1
0.0

.46
.66

48

18

31

60.0
50.0

8.9
0.0

.58
.72

55

25

36

62.5

20.0
0.0

.44
.73

56

27

37

33.3
16.7

6.7

.40

0.0

•

00

32

9

21

35

71.1
63.9

42.2
27.8

.30
.36

23

46

48

36

60.0
50.0

31.1
13.9

.30
.41

27

31

40

37

77.8
72.2

53.3
41.7

.29
.31

19

57

54

13.3

8.9

.00
.00

0

0

0

82

44

47

43

31

40

29
30
31

36.1

32
33

52.8

34

38

.61

0.0

0.0

39

91.1
88.9

15.6
0.0

40

66.7
58.3

24.4
5.6

41

55.6
44.4

6.7
.0.0

.60

.70

52

22

34

42

42.2
27.8

22.2
2.8

.23
.46

30

16

29

43

86.7
83.3

33.3
16.7

.57
.67

49

50

50

.72
.88

.43
.61

320
TABLS XXXXII (continued)

Item
44

Percent Success
Upper 27% Lower 27%

Discrimination
Difficulty
Index
r
% Success Index
.18
14
40
.23
31

53.3
41.7

35.5
19.4

45

75.6
69.4

11.1
0.0

.68

81

35

42

46

73.3
66.7

17.8
0.0

.56
.79

65

33

41

47

6.7
0.0

0.0
0.0

.00

0

0

0

48

75.6
69.4

17.8
0.0

.58
.81

68

35

42

49

66.7
58.3

42.2
27.8

.26

.31

19

42

46

50

95.6
94.4

66.7
58.3

.49
.46

30

74

64

51

62.6
52.8

17.8

.48
.73

56

27

37

52

48.8

24.4
5.6

.43

29

21

33

36.6

0.0

.65

.35

.26

53

51.1
38.9

8.9
0.0

.51
.67

49

19

32

54

95.6
94.4

26.7
8.3

.74
.83

71

51

51

55

95.6
94.4

26.7
8.3

.74
.83

71

51

51

56

20.0
0.0

20.0
0.0

.00
.00

0

0

0

57

84.4
80.6

0.0
0.0

.87
.84

74

40

45

58

37.7

13.3

.32
.55

37

12

25

22.2

0.0

321
TABLE XXXXII

59
60

61
62

Percent Success
Upper 27# Lower 27#
26.7
6.7
0.0
8.3

(continued)

Discrimination
Difficulty
r
Index
# Success Index
.34
22
14
5
.35

86.7
83.3

35.5
19.4

.55
.63

45

51

51

35.5
19.4

22.2
2.8

.15
.39

25

10

23

66.7
58.3

40.0
25.0

.28
.34

21

42

46

53.3
41.7

37.7

.16

22.2

.23

14

31

40

64

20.0
0.0

22.2
2.8

.00
-.12

7

2

5

65

4.4

4.4

0.0

0.0

.00
.00

0

0

0

40.0

17.8

25.0

0.0

.58

40

13

26

67

48.8
36.1

31.1
13.9

.18
.29

18

25

36

68

51.1
38.9

28.9

.23
.35

22

24

35

69

46.7
33.3

17.8

.33
.64

46

17

30

70

82.2
77.8

33-3
16.7

.61

43

46

48

71

60.0
50.0

17.8

.45
.72

55

25

36

72

33.3
16.7

4.4
0.0

.48
.48

32

8

21

22.2
2.8

13.3

.14
7

2

5

63

66

11.1
0.0

0.0

0.0

.26

.50

.12

-:

322

TABLE XXXXII (continued)

Percent Success
Item

27%

Lower 2.1%

r

Index

Difficulty
%

Success

Index

GO

73.3
35.5

35.5
19.4

•

74

Upper

Discrimination

.46

30

44

47

48.8

22.2
2.8

.30
.56

38

19

32

17.8
0.0

.31
.62

44

16

29

22.2

17.8
0.0

.23
.55

37

12

25

26.7
8.3

22.2
2.8

.06

.17

10

5

15

79

80.0
75.0

24.4
5.6

.56
.71

54

40

45

80

28.9

8.9

11.1

0.0

.32
.40

26

6

16

33.3
16.7

6.7

.42
.48

32

8

21

.34
.41

27

46

48

75

36.1

76

44.4
30.6

77
78

81

37.7

0.0

73.3
66.7

40.0

S3

11.1
0.0

2.2
0.0

.34
.00

0

0

0

84

6.7

4.4

0.0

0.0

.08
.00

0

0

0

83

20.0
0.0

8.9
0.0

.20
.00

0

0

86

20.0
0.0

4.4

.34

0.0

.00

0

0

A
0

87

51.1
38.9

24.4
5.6

.29
.47

31

22

34

88

55.6
44.4

6.7

.58
.70

52

22

34

82

25.0

0.0

A

0

323
TABLE XXXXII (continued)

Percent Success
Item
89
90
91
92
93
94
95
96
97
98
99
100

*
**

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty

%

Success

Index

75.6
69.4

40.0
25.0

.37
.43

28

48

49

35.5
19.4

6.7
0.0

.45
.52

35

10

23

46.7
33.0

4.4
0.0

.57
.64

46

17

30

51.1
38.9

6.7
0.0

.55
.67

49

19

32

53.3
41.7

20.0
0.0

.36
.69

51

21

33

44.4
30.6

4.4
0.0

.57
.62

44

17

30

86.7
83.3

62.6
52.8

.32
.35

22

68

60

77.8
72.2

17.8
0.0

.60
.81

69

37

43

91.1
88.9

91.1
88.9

.00
.00

0

89

76

86.7
83.3

11.1
0.0

.74
.86

77

40

45

77.8

72.2

53.3
41.7

.28
.31

19

57

54

*97.8
**97.2

40.0
25.0

.72
.77

61

61

56

Method of Flanagan
Method of Davis

324
TSST H

INTERPRETATION OP DATA
TEST J
GENERALIZATIONS A N D ASSUMPTIONS
Tills test was de s i g n e d to measure your ability to
interpret data.
F o l lowing the d a t a you will find a number
of statements.
Y o u are to assume that the d a t a as p r e ­
sented are true.
Evaluate each statement according to the
following key and m a r k the appropriate space on your answer
sheet.

Key
1.
2.

3.
4.

5.

True:
The d a t a alone are sufficient to show
that the statement is true.
Probably true:
The data indicate that the
statement is probably true, that It is logical
on the basis of the d a t a but the data are not
sufficient to say that it is definitely true.
Insufficient evidence:
There are no data to
indicate whether there is any degree of truth
or falsity in the statement.
Probably false:
The data indicate that the
statement is probably false, that is, it is not
logical on the basis of the data but the data
are n o t sufficient to say that it is definitely
false.
False:
The d ata alone are sufficient to show
that the statement is false.

In freezing of vegetables the common practice for
both commercial a n d home frozen vegetables is to scald the
vegetables first, b y placing them in boiling water for two
or three minutes.
The following data were obtained In an
experiment w h ich mea s u r e d the amounts of Vitamin C in fresh
vegetables, scalded vegetables before freezing, and v e g e ­
tables frozen for six months.
One group of the frozen
vegetables was frozen without first scalding, the other
group was first scalded.
The V i t a m i n C content of the
frozen vegeta b l e s was determ i n e d before and after they were
cooked.
A l l figures indicate the amount of Vitamin C in mg.
per 100 cc.

325

Vegetable
Ghard (greens)
Spinach
Peas
G-reen beans
Lima beans
I.
*
!
i
,

Fresh
60
82
29
34
.
2?

Scalded
37
43
21
29
20

Frozen
Unscalded
Scalded
Raw
Gooked Raw
Gooked
20
2
24
14
10
1
16
27
14
10
20
16
25
13
23
17
26
18
20
14

Scalding of all vegetables causes destruction of some
of the Vitamin C content of the vegetables.

2. Spinach is a good source of Vitamin C.
3. Leafy green vegetables are a better source of Vitamin C
than the pod type vegetables.
4.

Leafy green vegetables are a better source of Vitamin C
than root vegetables.

5.

The practice of scalding leafy vegetables before freezing
should be eliminated because scalding destroys some of
the Vitamin G.

6.

Lima beans should be frozen without scalding provided
the quality of the unscalded product is equal to the
scalded in other respects.

7.

A better tasting product is obtained if lima beans are
scalded before freezing.

8.

After commercially frozen peas have been cooked they are
a good source of Vitamin G as commercially frozen chard
which has been cooked.

9.

The percentage of the total Vitamin C destroyed by scald­
ing is about the same for all vegetables.

10.

Since the vitamin content of food is an important consid­
eration in its purchase, in buying frozen green vegetables
one should be careful in choosing the kind of vegetables
because the Vitamin G content of different frozen vege­
tables varies considerably.

II. The breakdown of Vitamin C is hastened by heating.
12. Since frozen leafy vegetables are much easier to prepare,
the practice of using them exclusively is Justified from
the dietary standpoint.

326
13.

Frozen orange Juice contains somewhat less Vitamin C
than freshly extracted orange Juice.

14.

(Fresh spinach is usually cooked for about ten
minutes).
Cooked spinach (unfrozen) contains less
Vita m i n C than scalded spinach.

13.

Heating causes some change to occur in the Vitamin C
molecule.

Items 16 through 21 are a re-evaluation of some of
the items 1 through 15.
Re-read items 1, 3, 9, 11, 13 and
15 and determine whether they are generalizations, exten­
sions of data, explanations of the data or merely restate­
ments of the data, etc.
Answer each according to the
following k e y : •
1.
2.
3.
4.
5.

A generalization, that is the data says it is true
for this situation, a generalization says it is
true for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of cause and
effect.
A restatement of results.
None of the above.

16.

Item 1.

19.

Item 11.

17.

Item 3*

20.

Item 13.

18.

Item 9.

21.

Item 15.

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion is not necessarily
Justified by the data).
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 1 through 15.
1.
2.
3.
4.
5.

An assumption which must be made to make the
conclusion valid (true).
An assumption which if made would make the
conclusion false.
An assumption which has no relation to the
validity (truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

327
Conclusion I:
The breakdown of V i t a m i n C proceeds spon­
taneously but is a relatively slow process at low
temperature.
22.

V i t a m i n C is a stable substance.

2 3 . There is order in the universe.
24.

V i t a m i n C is not destroyed by the freezing process.

2 5 . V i t a m i n C responds in a similar way to the environment
no m a t t e r what the source of Vit a m i n C is.
26.

The V i t a m i n C content of all the vegetables studied
was reduced after b e i n g frozen for six months.

27.

All chard is similar in its reactions to
studied in this experiment.

the chard

28.

V i t a m i n C is gradually destroyed by freezing and is
not suddenly destroyed.

Conclusion II:
The breakdown of V itamin C is hastened by
heating.
29.

All vitamins react in the same way.

30.

Vitam i n C evaporates when heated.

31.

A l l beans are similar in their reaction to the ones
studied in this experiment.

32.

H e a ti n g causes some change to occur in the Vitamin C
molecule.

33.

V i t a m i n C reacts in the same way no matter what the
source of the Vit a m i n C.

34.

Pod type vegetables have a basic similarity.

Conclusion III:: The Vitamin A content of vegetables is
affected by heating.
35.

Pod type vegetables have a basic similarity.

36.

V i t a m i n C is gradually destroyed by heating.

37.

All vitamins react In a similar way to heat.

38.

There is a direct relationship between the amount of
V i t a m i n C and Vitamin A In foods.

328
39.

There

Is o r d e r in the u n i v e r s e ,

40.

H e a t i n g a f f e c t e d the a m o u n t o f V i t a m i n 0 in the
v e g e t a b l e s studied,

41.

In a l l cases s t u d i e d c o o k i n g r e d u c e d the V i t a m i n C
c o n t e n t of the v e g e t a b l e s .

T h i s t e s t was d e s i g n e d to m e a s u r e y o u r a b i l i t y to
I n t e r p r e t data.
F o l l o w i n g the d a t a y o u w i l l f i n d a n u m b e r
of sta t e m e n t s .
Y o u are to a s s u m e t h a t the d a t a as p r e s e n t e d
are true.
E v a l u a t e e a c h s t a t e m e n t a c c o r d i n g to the f o l l o w ­
ing k ey a n d m a r k the a p p r o p r i a t e space on y o u r a n s w e r sheet.

Key
1.
2.

3.
4.

5.

True:
The d a t a alone are s u f f i c i e n t to show that
the s t a t e m e n t Is true.
P r o b a b l y true:
The d a t a i n d i c a t e that the s t a t ement
is p r o b a b l y true, that it is l o g i c a l on the b a sis of
the d a t a b u t the d a t a are n o t s u f f i c i e n t to say that
it is d e f i n i t e l y true.
I n s u f f i c i e n t evidence:
T h e r e are no d a t a to Indicate
w h e t h e r there Is any d e g r e e of t r u t h o r f a l s i t y in
the statement.
P r o b a b l y false:
The d a t a i n d i c a t e that the statement
Is p r o b a b l y false, that is, it is n o t log i c a l on the
b a s i s of the d a t a b u t the d a t a are n o t s u f f i c i e n t to
say t h a t it is d e f i n i t e l y false.
F a lse :
T h e d a t a alone are s u f f i c i e n t to show that
the s t a t e m e n t is false.

I t e m s 42 t h r o u g h 61 r e f e r to the f o l l o w i n g graph. U s e
the key a b o v e to a n s w e r
the items. T h e l i z a r d
is c o n s i d e r e d
to b e c o l d blo o d e d , the others w a r m blooded.

d o

40°

€»'

ss

30°

© c

20o

..

10°

«-

4 3 *CJ

■

m «a
4»
>

«

m
#

lO

External

329
42.

T h e b o d y t e m p e r a t u r e of the cat v a r i e s m o r e t h a n the
b o d y t e m p e r a t u r e of the a n t eater.

43.

T he cat a n d suit e a t e r h ave some type of m e c h a n i s m
w h i c h r e g u l a t e s the b o d y t e m p e r a t u r e .

44.

W h e n the e x t e r n a l t e m p e r a t u r e
of the l i z a r d is a lso 50^0.

45.

T he b o d y t e m p e r a t u r e of w a r m b l o o d e d suiimals is u n ­
a f f e c t e d by the e x t e r n a l temper a t u r e .

46.

A t a m e x t e r n a l t e m p e r a t u r e of 5 0 ° C . the t e m p e r a t u r e of
the c a t is 50°C.

Is 50°C.

the te m p erature

47.

W h e n the e x t e r n a l t e m p e r a t u r e is 50°C. the t e m p e rature
of the suit e a t e r w o u l d be h i g h e r t h a n the t e m p e r a t u r e
of t he cat.

48.

T he t e m p e r a t u r e of a m o u s e w o u l d be a b o u t h a l f way
b e t w e e n that o f the cat a n d the ant eater.

49.

A t no time d u r i n g the e x p e r i m e n t d i d suiy of the animals
h a v e t h e same b o d y tempe r a t u r e .

50.

The a nt e a t e r e x h i b i t s a c l o s e r r e l a t i o n s h i p to the
l i z a r d than to the opossum.

51.

The s h a r p rise in the b o d y t e m p e r a t u r e of the lizard
i n d i c a t e s that the l i z a r d u s e s f ood at a f a s t e r rate
than the c a t .

52.

The
du e

53.

T h e r e is a c l o s e c o r r e l a t i o n b e t w e e n the b o d y t e m p e r a ­
tur e of the l i z a r d a n d that o f the e x t e r n a l environment.

54.

T he h e a r t rate of the l i z a r d w o u l d Increase w i t h t e m p ­
e r a t u r e in the same w a y as the b o d y t e m p e r a t u r e increases

55.

a b i l i t y of the cat to
to its c o a t of hair.

maintain

its t e m p e r a t u r e is

T he b o d y t e m p e r a t u r e of the cat s h o w e d the l e ast v a r i a ­
t i o n in t e m p e r a t u r e d u r i n g the e x p e r i m e n t a l period.

5 6 . T he
t e m p e r a t u r e o f a l l of the w a r m b l o o d e d suiimals was
a l w a y s h i g h e r t h a n the e x t e r n a l temperature.
57.

The w a r m b l o o d e d a n i m a l s are s u f f i c i e n t l y
c o n s e r v e heat.

i n s u l a t e d to

58.

W a r m b l o o d e d a n i m a l s csui withstsuid c old b e t t e r thsui
c o l d b l o o d e d ani m a l s .

330
59.

At 20 degrees below 0°C. the lizard would be frozen.

60.

The normal body temperature of the duckbill is higher
than that of the echidna.

61.

If the temperature of other cold blooded animals were
plotted it would resemble that of the lizard.

Items 62 through 68 are a re-evaluation of some of
the Items 42 through 61.
Re-read items 43, 44, 47, 50, 52,
55 and 61 and determine whether they are generalizations,
extensions of the data, explanations of the data or merely
restatements of the data, etc.
Answer each according to
the following key:
Key
1. A generalization, that is the data says it is true
for this situation, a generalization says it is true
for all similar situations.
2.
The data Indicates a trend which if continued in
either direction would make the statement true.
3* An explanation of the data in terms of cause and
effect.
4.
A restatement of results.
5.
None of the above.
62.

Item 43.

66.

Item 52

63.

Item 44.

67.

Item 55

64.

Item 47.

68.

Item 61

65.

Item 50.

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion is not necessarily
Justified by the data).
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 41 through 61.
1.
2.

Key
An assumption whi c h must be made to make the conclu­
sion valid (true).
An assumption which if made would make the conclusion
f*€LlS0 •

3.
4.
5.

An assumption which has no relation to the validity
(truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

331
Conclusion I: Warmblooded animals have some type of heat
regulating mechanism.
69.

All cats react similarly

to changes in temperature.

70.

It is possible for animals to have some type of heat
regulating mechanism.

71.

The cat and the duckbill are very different in their
reaction to the external environment.

72.

A man and a cat react similarly to the external temp­
erature.

73.

The lizard has no heat regulating mechanism.

74-.

The opossum had a lower body temperature than the cat.

Conclusion II: Anteaters and duckbills are more closely
related than anteaters and cats.
75.

Similarity of reaction of living things indicate a
relationship.

76.

All anteaters react similarly to changes in external
temperature.

77.

The temperature of the anteater varied more with the
external temperature than did that of the cat.

78.

The degree of closeness of similarity of response of
living things runs parallel with the closeness of kin­
ship.

79.

Close relationship means that two living things have a
common ancestor.

80.

The temperature of the cat varied less than that of
the anteater and duckbill with change of temperature.

This test was designed to measure your ability to
interpret data.
Following the data you will find a number
of statements.
Y o u are to assume that the data as pre­
sented are true.
Evaluate each statement according to the
following key and mark the appropriate space on your answer
sheet.

332
Key
1.
2.

3.
4.

5.

True:
The data alone are sufficient to show that
the statement is true.
Probably true:
The data indicate that the state­
ment Is probably true, that it is logical on the
basis of the data but the data are not sufficient
to say that it is definitely true.
Insufficient evidence:
There are no data to indi­
cate whether there is any degree of truth or falsity
in the statement.
Probably false:
The data indicate that the state­
ment is probably false, that is, it is not logical
on the basis of the data but the data are not
sufficient to say that it is definitely false.
False:
The data alone are sufficient to show that
the statement is false.

Analyses were made of the Vitamin 0 content of red
ripe and green tomatoes as soon as they were picked. Mature
green tomatoes were stored at the temperatures indicated in
the following table.
Those which had ripened by the end of
the first week were analyzed for their Vitamin C content;
those ripened at the end of the second week were analyzed
at the end of the second week, etc.
In addition some mature
green tomatoes were analyzed each week.
Condition
when taken
from field
mature green
red ripe
mature green
mature green
mature green
mature green
mature green
mature green
mature green
mature green

No. of
T e m p • when weeks
stored
stored
not stored
not stored
700F.
70°F.
70°F.
800F.
80°F.
80°F.
70°F.
70°F.

0
0
1
2
3
1
2
3
■1
2

Stage of
ripeness
when analyzed
mature green
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
mature green
mature green

Vitamin C
mft/100 Rrams
15.0
16.2
14.4
12.9
8.2
14.0
9.8
7.1
10.0
7.2

81.

At the time of harvest the green tomatoes were only
slightly lower in Vitamin 0 content than the red ripe
ones.

82.

Tomatoes which ripened during the first week of storage
were almost as high in Vitamin C as those which were
ripe at the time of harvest..

83.

Tomatoes ripening during the second week of storage
were lower in Vitamin C content than those which
ripened during the first week.

333
84.

Tomatoes ripened at 90°C; would have less Vitamin 0
after three weeks than those stored at 80°P.

85.

Tomatoes could not be stored at 90°F. because at this
high a temperature they would rot or spoil.

86.
87.

The lower the
temperature at which tomatoes are stored
the less is the breakdown of Vitamin 0.
At 75°F. there would be about 14 mg/100 grams of V ita­
min G after a week of storage.

88.

Heat causes a breakdown of the Vitamin 0 molecule.

89.

If tomatoes are to be stored for a considerable length
of time they should be held at as low a temperature as
possible, but high enough to avoid freezing.

90.

When one buys tomatoes inthe winter the Vitamin G
content of the tomatoes compares favorably with the
Vitamin G content of those bought fresh in the summer.

91.

After four weeks of storage tomatoes stored at 70°F.
would contain less than 7 mg/lOO grams of Vitamin C.

92.

Vitamin C does not develop in the tomatoes as they
change from mature green to red ripe on the vine.

93.

Some mature green tomatoes ripen in storage within
week.

94.

(Tomatoes are often picked green and allowed to ripen
during the early fall).
The Vitamin G content of these
tomatoes is about the same as when they were picked.

95.

The green tomatoes which did not ripen in a week had
lost about the same amount of Vitamin C as those which
ripened during the week..

96.

Vitamin G breaks down spontaneously at room temperature.

97.

The Vitamin G content of other vegetables decreases if
stored at high temperatures.

98.

Boiling of vegetables destroys

99.

Vitamin G is a stable substance.

100.

Vitamin C is manufactured some
than in the fruit (tomato) and

a

some of the Vitamin G.

place else in the plant
is stored in the fruit.

334

Items 101 through 107 are a re-evaluation of some
of the items 81-100.
Re-read items 82, 84, 86, 88, 91, 93,
and 97 and determine whether they are generalizations, ex­
tensions of the data, explanations of the data or merely
restatements of the data, etc.
Each of these items is to
be answered according to the following key.

Key
1.
2.
3.
4.
5.

A generalization, that is the data says it is true
for this situation, a generalization says it is true
for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of cause and
effect.
A restatement of results.
None of the above.

101.

Item 82.

105.

Item 91.

102.

Item 84.

106.

Item 93.

103.

Item 86.

107.

Item 97.

104.

Item 88.

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion is not necessarily
Justified by the data).
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 81 through 100.
1. An assumption which must be made to make the conclu
sion valid (true).
2. An assumption which if made would make the conclusion
false.
3. An assumption which has no relation to the validity
(truth) of the conclusion.
4. Not an assumption; a restatement of fact.
5. Not an assumption; a conclusion.
Conclusion Is
Sunlight causes an increase in the Vitamin C
content of tomatoes as they ripen on the vine.
108.

The test used to measure the amount of Vitamin C in
this experiment was a specific test for Vitamin C.

335
109.

The Increase of Vitamin C in tomatoes ripening on the
vine was caused by the action of sunlight on the
leaves.

110.

The tomatoes which were analyzed when green ripe would
have contained more Vitamin G if they had been allowed
to ripen on the vine.

111.

The test u s e d to measure the amount of Vitamin G
accurately measures the amount.

112.

The same results would not have been obtained if the
plants h a d been kept in the dark for the week during
which the tomatoes ripened.

113.

All tomatoes would yield the same type of res\ilts as
those obtained in this experiment.

114.

The Vitamin C content of the tomatoes used in this e x ­
periment increased as the tomatoes ripened on the vines

115.

The Vitamin C was formed in the roots and was trans­
ported to the fruits.

116.

The Vitamin G content of ripe tomatoes on the vine was
higher than the Vitamin C content of the green ripe
tomatoes on the vine.

117.

The plant is capable of manufacturing Vitamin G.

118.

Some change takes place in the Vitamin C molecule at
high temperatures.

Conclusion II:
Vitamin G breaks down spontaneously at room
te m p e r a t u r e .
119.

Vitamin C reacts similarly in all plants in which it
is found.

120.

Tomatoes are all similar in the amount of Vitamin C
they contain.

121.

The Vitamin G content of all tomatoes would decrease
when stored at room temperature.

122.

When the tomatoes were stored at room temperature the
Vitamin G content decreased.

123.

All vitamins react similarly to storage at room temp­
erature.

124.

There is order in the universe.

336
125.

V i t a m i n 0 evaporates at room temperature.

126.

The V itamin C molecule undergoes changes which change
the properties of the substance.

This test was d e signed to measure your ability to
interpret data.
Fo l l o w i n g the data you will find a number
of statements.
Y o u are to assume that the data as presented
are true.
Evaluate each statement according to the follow­
ing key and mark the appropriate space on your answer sheet.
1. True:
The data alone are sufficient to show that the
statement is true.
2. Probably true:
The data indicate that the statement
is probably true, that it is logical on the basis of
the data but the data are not sufficient to say that
it is definitely true.
3. Insufficient evidence:
There are no data to indicate
whether there is any degree of truth or falsity in
the statement.
4. Probably false:
The data indicate that the statement
is probably false, that is, it is not logical on the
basis of the data but the data are not sufficient to
say that it is definitely false.
5. False: The data alone are sufficient to show that the
statement is false.
The following data is concerned with the temperature
at which various seeds germinate (sprout).
Three kinds of
seeds were used, seeds from Species A, Species B and Species
0.
The number of seeds germinating at various temperature
in two weeks is given in the table.
No seeds germinated at
temperatures b e low 40°F. or above 95°F •

0

B

0

6

20

41

G

0

0

0

127.

95° 100

0

0

5

18

50

70

84

65

30

0

4

7

0

92

65

30

5

0

0

0

0

0

4 3

72

90

81

52

34

6

0

0

16

75°

o
o

0

V0

0

00 1
o
0

0

85°

o
o

o
o
-V

A

65°

45° 50° 55°

VO

35°

o
o

Temperatures in Degrees
Farenhelt_________

Plant B should be planted early in the spring but not
in midsummer in middle western states, such as Illinois,
Iowa, etc.

337
128.

Plant 0 is a tropical plant.

129.

M o r e s e e d s of P l a n t A w i l l g e r m i n a t e at 8 2 ° t h a n at
any o t h e r t e m p e r a t u r e .

130 . N o n e o f the see d s o f P l a n t A w i l l g e r m i n a t e b e l o w 65°.
131.

S e eds do n o t g e r m i n a t e at f r e e z i n g temperature.

132.

The h i g h e r the t e m p e r a t u r e ,
germinate.

133.

One w o u l d n o t g e t a c r o p f r o m plants of the A type in
the c l i m a t e o f the n o r t h e r n states, such as Mi chigan,
M i n n e s o t a , etc.

134.

The o p t i m u m t e m p e r a t u r e f o r the g r o w t h of pla n ts of
the 0 type is 70°.

135.

S o m e s e e d s of the C v a r i e t y w i l l g e r m i n a t e at 95°.

the m o r e seeds w i l l

136 . T h e o p t i m u m t e m p e r a t u r e f o r the g e r m i n a t i o n of seeds
of the B type is a b o u t 56°.
137.

P l a n t s of the A t y p e are f o u n d in h o t wet climates..

138.

The r a t e at w h i c h seeds g e r m i n a t e is a f f e c t e d by the
temperature.

139.

A d e c r e a s e in m o i s t u r e r e d u c e s the n u m b e r of seeds
g e r m i n a t i n g m o r e t h a n does a d e c r e a s e in temperature.

140.

If P l a n t B t a kes a r e l a t i v e l y long time to mature,
seeds s h o u l d be s t a r t e d in g r e e n h o u s e s and set out
l a t e r if a c r o p of this type p l ant is d e s i r e d in
n o r t h e r n states.

141.

P l a n t A c o u l d be w a t e r m e l o n .

142.

N o p l a n t s g e r m i n a t e a t t e m p e r a t u r e s above 100°.

143.

M o r e s e e d s w o u l d h a v e g e r m i n a t e d at l o wer tempe ratures
if they h a d b e e n l e f t for a longer time.

144.

A n i n c r e a s e of 10° above 8 5 ° r e s u l t e d in a m u c h g r e a t e r
r e d u c t i o n in the n u m b e r of type A seeds g e r m i n a t i n g
t h a n d i d a r e d u c t i o n of 10°.

145.

If one w e r e d e s i r o u s of r a i s i n g all three of these
p l ants in one g r e e n h o u s e one shou l d k e e p the g r e e n h o u s e
at a b o u t 72°.

338
146.

A temperature of 100° will kill plants of the B
and 0 t y p e s •

Items 147 through 151 are a re-evaluation of some of
the items 127 through 146.
Re-read items 131, 1 3 8 , 139,
142 and 144 and determine whether they are generalizations,
extensions of the data, interpretations of the data or merely
restatements of the data, etc.
Sach of these items is to he
answered according to the fpllowing key:

1

.

2.
3.
4.
5.

Key
A generalization, that is the data says it is true for
this situation, a generalization says it is true for
all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of cause and
effect.
A restatement of results.
None of the above.

147.

Item 131.

150.

Item 142.

148.

Item 138.

151.

Item 144.

149.

Item 139.

This phase of the test is designed to measure your
understan d i n g of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion is not necessarily
Justified by the data).
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 127-146.
1.
2.
3.
4.
5.

A n assumption which must be made to make the conclu­
sion v a l i d (true).
An assumption w h ich if made would make the conclusion
false.
An assumption which has no relation to the validity
(truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

Conclusion I:
Seeds will germinate only in the range of
temperature from 35°F. to 100°F.
152.

The seeds u s e d in this experiment are representative
of the extremes of germinating temperatures of seeds.

339
153.

No seeds of Species B ever germinate below 35°F.

154.

None of the seeds which were planted of Species A
germinated above 100°F.

155*

Too few seeds were u sed in the experiment to make
it valid.

156.

A ll seeds of Species A behave similarly in their
response to temperature to the ones used in this
experiment.

157.

The seeds from Species 0 germinated at a higher
temperature than the seeds of Species B.

158.

Plants which do not germinate at high temperatures
will not grow at h i g h temperatures even when germin
ated at lower temperatures.

159.

Seeds will germinate only in a limited temperature
range.

Conclusion II:
80° f .
160.
161.

Some seeds of Species B will germinate at

The seeds u sed in this experiment are completely
representative of seeds of Species B.
A larger sample would yield a greater range of germ
ination temperature.

162.

All seeds of a species are exactly alike in their
response to temperature.

163.

Some seeds of C germinate at 80°F.

164.

The entire range in which seeds of Species B will
germinate is not represented by this experiment.

165.

Species B is a cold climate plant.

340
TABLE XXXXIII
ITEM ANALYSIS DATA FOR TEST H

Percent Success
Item
1

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
%

Success

Index

*40.0
**25.0

20.0
0.0

.24
.58

40

13

26

37.7
22.2

24.4
5.6

.16
.31

19

14

27

60.0
50.0

44.4
30.6

.17
.20

12

40

45

91.1
88.9

48.8
36.1

.51
.56

38

61

56

48.8
36.1

13.3
0.0

.42
.66

48

18

31

64.4
55.6

33.3
1 6 .7

.32
.42

27

35

42

95.6
94.4

71.1
63.9

.45
.45

29

79

67

62.6
52.8

37.7
22.2

.26
.32

20

37

43

71.1
63.9

51.1
38.9

.21
.26

16

51

51

17.7
0.0

2.0
0.0

.43
.00

0

0

0

11

44.4
30.6

20.0
0.0

.28
.62

44

16

29

12

6.7
0.0

2.2
0.0

.22
.00

0

0

0

13

22.2
8.9

8.9
0.0

.23
.18

11

3

10

2
3
4
5
6
7
8
9
10

*
**

Method of Flanagan
Method of Davis

341
TABLE XXXXIII (continued)

Percent Success
Item
14

Upper

Lower

2 7 %

Discrimination
r

Index

Difficulty
%

Success

Index

11.1
0.0

15.6
0.0

-.07
.00

0

0

0

64.4
55.6

27.2
8.6

.38
.55

37

31

40

91.1
88.9

71.1
63.9

.33
.34

21

76

65

51.1
38.9

37.7
22.2

.15
.20

12

30

39

71.1
63.9

24.4
5.6

.46
.65

47

35

42

45

60.0
50.0

31.1
13.9

.30
.41

27

31

40

46

82.2
77.8

20.0
0.0

.62
.83

72

38

44

47

66.7
58.3

35.5
19.4

.32
.39

25

38

44

48

100.0
100.0

84.4
80.6

.50
.54

36

89

76

49

88.9
86.1

46.7
33.3

.48
.55

37

59

55

50

26.7
8.3

8.9
0.0

.28
.35

22

5

14

51

71.1
63.9

68.9
61.1

.02
.01

1

61

56

52

13.3
0.0

22.2
2.8

-.18
-.12

7

2

5

53

91.1
88.9

64.4
55.6

.38
.40

26

71

62

54

82.2
77.8

77.8
72.2

.04
.10

6

76

65

55

95.6
94.4

77.8
72.2

.37
.38

24

83

7 0

15
42
43
44

342
TABLE XXXXIII (continued)
Percent Success
Upper 27% Lower 27%

Discrimination
Difficulty
r
Index
% Success Index

57.8
47.2

31.1
13.9

.28
.38

24

30

39

44.4
30.6

26.7
8.3

.20
.35

22

19

32

33.3
22.2

16.7
2.8

.14
.35

22

9

22

66.7
58.3

28.9
11.1

.38
.52

35

35

42

0.0
0.0

0.0
0.0

.00
.00

0

0

0

61

66.7
58.3

46.7
33.3

.20
.26

16

44

47

81

88.9
86.1

80.0
75.0

.15
.17

10

80

68

82

75.6
69.4

66.7
58.3

.12
.12

7

63

57

83

93.3
91.7

80.0
75.0

.25
.29

18

83

70

84

80.0
7 5.0

31.1
13.9

•50
.62

44

42

46

85

95.6
94.4

68.9
61.1

.35
.47

31

77

66

86

57.8
47.2

24.4
5.6

.35
.54

. 36

27

37

87

62.6
52.8

40.0
25.0

.23
.31

19

38

44

88

71.1
63.9

24.4
5.6

.47
.65

47

35

42

60.0
50.0

37-7
22.2

.23
.31

19

35

42

42.2
27.8

46.7
33.0

-.05
-.04

-2

44

47

56
57
58
59

60

343
TABLE XXXXIII (continued)

Item

Percent Success
Upper 27$ Lower 27$

Discrimination
Difficultyr
Index
$ Success Index

86.7
83.3

53.3
41.7

.40
.43

28

61

56

17.8
0.0

8.9
0.0

.17
.00

0

0

0

97.8
97.2

73.3
66.7

.48
.52

35

82

69

40.0
25.0

11.1
0.0

.38
.58

40

13

26

64.4
55.6

33.3
16.7

.32
.41

27

35

42

40.0
25.0

20.0
0.0

.24
.58

40

13

26

28.9
11.1

11.1
0.0

.27
.40

26

5

17

37.7
22.2

22.2
2.8

.19
.47

31

12

25

99

46.7
33-0

13.3
0.0

.40
.64

46

17

30

100

73.3
66.7

42.2
27.8

.32
.38

24

47

48

127

64.4
55.6

15.6
0.0

.50
.75

58

28

38

128

64.4
55.6

13.3
0.0

.54
.75

58

28

38

129

28.9
11.1

11.1
0.0

.27
.40

26

5

17

130

6.7
0.0

4.4
0.0

.10
.00

0

0

0

131

37.7
22.2

15.6
0.0

.30
.55

37

12

25

91
92
93
94
95
96
97
98

344
TABLE XXXXIII (continued)

Percent Success
Item

U p p e r 27%

Lower 27#

Discrimination
r

Index

Difficulty
%

Success

Index

51.1
38.9

28.9
11.1

.23
.36

23

25

36

4.4
0.0

6.7
0.0

-.10
.00

0

0

0

2. 2
0.0

6.7
0.0

-. 2 0
.00

0

0

0

37.7
22.2

8.9
0.0

.40
.55

37

12

25

88.9
86.1

80.0
75.0

.15
.17

10

80

68

8.9
0.0

11.1
0.0

.05
.00

0

0

0

100.0
100.0

75.6
69.4

.55
.62

44

83

70

53.3
41. 7

20.0
0.0

.36
.69

51

21

33

20.0
0.0

24.4
5.6

-. 06
-.27

-17

4

12

33 . 3
16.7

3-3.3
0.0

.28
.50

33

8

21

13.3
0.0

13.3
0.0

.00
.00

0

0

0

8.9
0.0

11.1
0.0

.05
.00

0

0

0

1 45

66.7
58.3

1 5.6
0.0

.54
.76

60

30

39

146

*51.1
**38.9

6.7
0.0

.46
.67

49

19

32

133
134
135
136
137
138
139
140
141
142

143
144

*
**

M e t h o d of F l a n a g a n
M e t h o d of D a v i s

345
TABLE XXXXIV
ITEM ANALYSIS DATA FOR TEST J

Percent Success
Item

Upper

2 7 %

Lower

2 7 %

Discrimination
r

Index

Difficulty
%

Success

Index

*84.4
**80.6

60.0
50.0

.30
.34

21

64

58

44.4
30.6

20.0
0.0

.27
.62

44

16

29

48.8
36.1

28.9
11.1

.22
.34

21

22

34

33.3
16.7

33.3
16.7

.00
.00

0

17

30

66.7
58.3

64.4
55.6

.04
.02

1

57

54

46.7
33.3

24.4
5.6

.25
.41

27

19

32

82.2
77.8

64.4
55.6

.23
.24

15

66

59

11.1
0.0

6.7
0.0

.10
.00

0

0

0

35.5
16.7

17.8
0.0

.23
.50

33

8

21

25

48.8
36.1

17.8
0.0

.35
.66

48

18

31

26

53.3
41.7

24.4
5.6

.32
.50

33

24

35

48.8
36.1

22.2
2.8

.30
.57

39

19

32

28.9
11.1

4.4
0.0

.45
.40

26

6

17

16
17
18
19
20
21
22
23
24

27
28
«
**

Method of Flanagan
Method of Davis

346
TABLE XXXXIV (continued)
Percent Success
Item

Upper

2 7 %

Lower 27/6

Discrimination
r

Index

Difficulty
%

Success

Index

80.0
75.0

60.0
50.0

.24
.26

16

61

56

13.3
0.0

17.8
0.0

-.10
.00

0

0

0

51.1
38.9

15 •6
0.0

.39
.67

48

19

32

20.0
0.0

11.1
0.0

.16
.00

0

0

0

53.3
41.7

24.4
5.6

.32
.50

33

24

35

15.6
0.0

20.0
0.0

-.04
.00

0

0

0

35

26.7
8.3

8.9
0.0

.29
.35

22

5

14

35

64.4
55.6

57.8
47.2

.07
.07

4

51

51

37

51.1
38.9

24.4
5.6

.29
.48

32

22

34

38

55.6
44.4

57.8
47.2

-.02
-.02

- 1

45

47

39

15-6
0.0

8.9
0.0

.09
.00

0

0

0

40

64.4
55.6

-24.4
5.6

.42
.59

39

35

42

41

46.7
33.3

8.9
0.0

.47
.64

46

17

30

62

53-3
41.7

28.9
11.1

.25
.39

25

28

36

63

62.9
52.8

35.5
19.4

.28
.36

23

35

64

68.9
61.1

37.7
22.2

.33
.40

26

40

29
30
31
32
33
34

J. /-s
42

45

TABLE XXXXIV (continued)

Percent Success
U p p e r 27#

L o w e r 27#

Discrimination
r

Index

Difficulty
# Suc c e s s

Index

40.0
25.0

20 . 0
0.0

.24
.58

40

13

26

71.1
63.9

42.2
2 7.8

.30
.36

23

46

48

86.7
83.3

37.7
22.2

.54
.61

43

53

52

84.4
80.6

44.4
30.6

.44
.51

34

55

53

73.3
66 . 7

51.1
38.9

.23
.29

18

51

51

77.8
72.2

53.3
41.7

.27
.31

19

57

54

53.3
41.7

24.4
5.6

.32
.50

33

24

35

66.7
5 8.3

46.7
33.3

.23
.26

16

44

47

73

20.0
0.0

6.7
0.0

.27
.00

0

0

0

74

71.1
63.9

20.0
0.0

.52
.78

63

31

40

75

84.4
80.6

46.7
33.3

.43
.48

32

57

54

76

48.8
36.1

31.1
13.9

.18
.30

18

25

36

77

64.4
55.6

22.2
2.8

.43
.68

50

28

38

78

57.8
47.2

20.0
0.0

.40
.70

53

24

35

15.6
0.0

11.1
0.0

.10
.00

0

0

0

66.7
58 . 3

26.7
8.3

.41
.58

40

33

41

65

66
67

68
69
70
71
72

348
TABLE XXXXIV (continued)

Item

Percent Success
Upper 27^ Lower 27^

Discrimination
Difficulty
r
Index
% Success Index

84.4
80.6

24.4
5.6

.60
.75

59

44

47

82.2
77.8

35.5
19.4

.48
.59

41

48

49

26.7
8.3

24.4
5.6

.03
.07

4

7

19

60.0
50.0

15.6
0.0

.48
.73

56

25

36

77.8
72.2

42.2
27.8

.38
.43

28

50

50

86.7
83.3

35.5
19.4

.56
.63

45

51

51

57.8
26.7

47.2
8.3

.33
.51

34

28

38

64.4
55.6

40.0
25.0

.25
.32

20

40

45

109

13.3
0.0

11.1
0.0

.04
.00

0

0

0

110

73.3
66.7

35.5
19.4

.38
.47

31

42

46

62.6
52.8

28.9
11.1

.34
.48

32

31

40

66.7
58.3

40.0
25.0

.27
.34

21

42

46

64.4
55.6

35.5
19.4

.30
.39

25

38

44

37.7
22.2

35.5
19.4

.03
.05

3

21

33

115

53.3
41.7

22.2
2.8

.33
.59

41

22

34

116

55.6
44.4

17.8
0.0

.45
.70

52

22

34

101
102
103
104
105

106
107
108

111
112
113
114

349

TABLE XXXXIV

Percent Success
Upper 27$

Item

Lower 27$

(continued.)

Discrimination
r

Index

Difficulty
%

Success

IndLex

60.0
50.0

35.5
19.4

.25
.35

22

35

42

62.6
52.8

33.3
16.7

.29
.40

26

33

41

53.3
41.7

26.7
8.3

.28
.45

29

25

36

64.4
55.5

44.4
30.6

.22
.27

17

42

46

71.1
63.9

48.8
36.1

.24
.29

18

50

50

60.0
50.0

17.8
0.0

.45
.72

55

25

36

123

82.2
77.8

46.7
33.3

.40
.46

30

55

53

124

20.0
0.0

6.7
0.0

.38
.00

0

0

0

125

17.8
0.0

15.6
0.0

.04
.00

0

0

0

126

13.3
0.0

6.7
0.0

.06
.00

0

0

0

147

53.3
41.7

37.7
22.2

.16
.23

14

31

40

148

40.0
25.0

28.9
11.1

.12
.22

13

18

31

149

82.2
77.8

42.2
27.8

.43'
.50

33

53

52

150

55.6
44.4

31.1
13.9

.26
.36

23

28

38

151

8.9
0.0

8.9
0.0

.00
.00

0

0

0

117
118
119
120
121
122

'

350
TABLE XXXXIV

Percent Success
Item
152
153
15^
155
156
157
158
159
160
161
162
163
164
165
*
**

U p p e r 27$

Lower 27$

(continued)

Discrimination
r

Index

Difficulty
%

Success

Index

75.6
69.4

53.3
41.7

.25
.29

18

55

53

57.8
47.2

53.3
41.7

.05
.05

3

44

47

53.3
41.7

37.7
22.2

.16
.23

14

31

40

68.9
61.1

28.9
11.1

.41
.56

38

35

42

84.4
80.6

35.5
19.4

•52
.60

42

50

50

64.4
55.6

17.8
0.0

.48
.75

58

28

38

80.0
75.0

51.1
38.9

.33
.36

23

57

54

42.2
27.8

20.0
0.0

.26
.61

43

15

28

68.9
61.1

24.4
5.6

.44
.64

46

33

41

77.8
72.2

33.3
16.7

.46
.56

38

44

47

64.4
55.5

28.9
11.1

.36
.61

43

33

41

22.2
2.8

13.3
0.0

.14
.15

9

3

10

75.6
69.4

15.6
0.0

.60
.81

68

35

42

*17.8
** 0.0

11.1
0.0

.13
.00

0

0

0

Method of Flanagan
Method of Davis

A P P E N D I X II

351

TEST I
THE ABILITY TO THINK SCIENTIFICALLY
GENERAL DIRECTIONS
1.

Place your name, age and sex In the spaces provided
on the answer sheet.

2.

Place your student number in the space provided for
"data of birth”.

3. On

the space marked ”school” place your major.

4. In the space marked ”1” below "school” give courses
you have had in science in high school, in the space
marked ”2” give any courses you have had in science
in college in addition to biological science.
5. Answer all items; if you don't know - guess.
6. Do not mark on the test booklet.
if you wish.

Use scratch paper

7.

Be sure to mark dark on the answer sheet; the machine
does not pick up light markings.

8.

Each item has only one answer; do not mark more than
one.

This test has been devised to msasur© your ability
to think scientifically.
It is divided into several parts,
each of these parts tests a different phase of scientific
thinking.
This portion of the test is designed to measure
your ability to differentiate phases of thinking.
These
steps include major problems or perplexities, possible
solutions to problems, observations which are not results
of experimentation but rather preliminary observations,
results of experimentation, and conclusions.
The following key is to be used for the succeeding
paragraph.
Certain parts of the paragraph are underlined,
and each underlined item is a question.
Choose the proper
response from the key and blacken the appropriate space in
the answer sheet.

1.
2.
3.
4.
5.

A major problem (stated or implied).
Hypothesis (possible solution to problem).
Result of experimentation.
Initial observation (not experimental).
Conclusion (probable solution of problem).

(1) How does a homing pigeon navigate over territory
it has never seen before?
the pigeon in some way?

(2)
(3)

Do air currents stimulate
Are the pigeons equipped with

some sort of magnetic compasses; that is, are they sensitive
to the earth’s magnetism?

Yeagley tested the latter by

fastening small magnets to the wings of well-trained pigeons
(4)

Moat of these birds never ftot home.

(5)

Others. carry

ins equal wing weights of non-map;netlo copper, made the home
roost without trouble,

(6) indicating; that the earth's mag­

netism la a factor in plseon navigation.

But the pigeons

magnetic compass could not, by itself, bring him back to his
roost; because many places on the earth’s surface have
identical magnetic conditions.

Leagley endeavored (7) to

determine the other guiding; factor.

(8)

It might be the

353
sun or s t a r s , but pigeons navigate
under clouds.

Abbreviated Key

While looking at a

1.
2.
3*
4.
5.

map which had lines representing
the intensity of the earth's mag-

A major problem
Hypothesis
Results
Observations
Conclusions

netism, he noted that the lines were crossed at varying
angles by the parallels of latitude.

(9)

If pigeons are

sensitive to some factor connected with the lines of lati­
t u d e . they would have all they need to find their way h o m e .
The next step was

(10) to find some physical f o r c e , some­

thing the pigeons might be able to d e t e c t , related to the
lines of lat i t u d e .

The effect of the earth's turning varies

directly with latitude;

objects near the equator are carried

daily around the earth's circumference, moving at over 1,000
mi. per hr.
slowly.

Objects near the poles are carried around more

The direction and variation of this circling can be

recorded by various man-made instruments.
not the pigeons feel it. tooY

(12)

(11)

Why should

If they c o u l d , they

would h a v e , along with their magnetic compass, a satisfactory
navigating instrument.

Yeagley trained hundreds of pigeons

to return to their home roosts at State College, Pa.

Then

he took them to a part of Nebraska where the lines represent­
ing the earth's magnetism cross the parallels of latitude at
the same angle as at State College, Pa.
pigeons to the east of this spot.
west.

He released the

(13) The pigeons a ll flew

Yeagley believes that (14) pigeons are guided by both

the earth's magnitude and by its turning. (15) Just where
the birds keep their instruments is still u n k n o w n ; but he
found that (16) birds have a mysterious organ in their eyes.

354
at the end of the optlo nerve.

(17)

This organ may con-

tain the nerve fibers that pick up vibrations of magnetism
and the even more delicate aense that meaaure the earth's
turning.
This portion of the test Is designed to test your
ability to delimit a problem. A problem is presented.
This is followed by a series of questions.
Rate the ques­
tions according to the following key.

Key
1.
2.
5.
4.
5.
PROBLEM:

This question must be answered In order
to solve the problem.
This question If answered might be useful
in the solution of the problem.
The answer to this question, though related
to the problem, would not help in the solu­
tion of the problem.
This question is completely unrelated to
the problem.
This question if answered in the affirmative
ii a basic assumption of the problem.

What causes colds?

QUESTIONS:
18.

Do all people have colds?

19.

Is it possible to determine the cause of a cold?

20.

Does aspirin help to cure a cold?

21.

Can some germ be isolated which, when injected, will
cause a cold?

22.

Do colds have a cause?

23.

Does getting one's feet wet cause a cold?

24.

Does becoming chilled after being overheated cause
a cold?

25.

Why are colds more prevalent in the winter than in
the summer?

26.

Do other animals get colds?

27.

Are people who are tired more susceptible to colds?

355
PROBLEM:

What is the function of the thymus gland?
(The thymus gland is located in the chest cavity
Just above the heart.) This gland is largest
during the growing period and becomes progress­
ively smaller after maturity.

QUESTIONS:
28.

Is the gland inactive after maturity?

29.

Does the gland have a function?

30.

Can any substance be extracted from the gland which
when injected into another animal cause growth?

31.

Ifthe gland is removed

32.

Can the

33.

will the animal mature?

function of the gland be determined?

What causes the gland to grow smaller?

This portion of the test is designed to measure
your ability to recognize faulty experimental procedures.
In each case a problem and a possible solution to the
problem (an hypothesis) are presented.
In each case the
experiments were designed by students to test the
hypotheses.
Judge each experiment according to the
following key.

Key
This experiment is:
1.
Satisfactory
2. Unsatisfactory because it lacks a control
or comparison.
3. Unsatisfactory because the control or com­
parison is faulty.
4. Unsatisfactory because it is unrelated to
the hypothesis.
5. None of the above - the experiment is u n ­
satisfactory for reasons other than listed
in 2, 3, and 4.
PROBLEM:

What are some of the requirements for the sprout­
ing of seeds?

HYPOTHESIS:
34.

Oxygen Is a requirement for the sprouting of
seeds.

If a seed lacked oxygen under a controlled experiment
the seed would not function properly and would soon
die.

356
35*

Take two packages of seeds.
Allow oxygen to be in con­
tact with one package but
keep the other package of
seeds protected from all
oxygen.
Observe which
sprouts.

Abbreviated Key
1.
2.
3.
4.

Satisfactory
Lacks control
Control faulty
Unrelated to hypoth­
esis
5* None of the above

36.

Place growing plants in an
air tight container.
Pump out the oxygen. Place
other growing plants in containers with oxygen.
Keep temperature, light, etc., the same,for each.

37•

Plant seeds in a container with glass covering it
so that no oxygen can enter and see if they sprout.
Keep temperature, light and moisture normal.

PROBLEM:

A minute insect (aphid) is suspected of spreading
a virus disease of roses. How would you determine
whether this is true?

HYPOTHESIS:

The aphid spreads a virus disease of roses.

38.

Put the insect among other kinds of plants other than
roses.
Leave another group of these plants free from
contact with the aphids.
Compare the results.

39.

Since aphids travel through the air, a plot of roses
must be entirely protected from them, and another ex­
posed to aphids which in turn have been exposed to
roses afflicted with the virus disease. All must be
under constant conditions of soil, atmosphere, etc.

40.

Take sample rose with the virus disease. Obtain
same kind of rose with no disease. Use microscope
to aid in detection of the disease. Use some sort
of spray. Note results.

41.

Use rose plants which are known not to be diseased.
In the same area place rose plants which are diseased
but which have been treated to destroy the aphid.
Note whether the disease still spreads after the
aphids have been killed.

42.

In order to determine whether the aphid spreads a
virus disease in roses, a group of roses should be
put in a hot house free from aphids to see whether
they get such a virus disease.

357
PROBLEMS

To determine the cause
of illness which appears
when large numbers of
people being confined
to a small space.

HYPOTHESIS:
Lack of oxygen causes
the people to become ill

Abbreviated Key
1.
2.
3«
4.

Satisfactory
Lacks control
Control faulty
Unrelated to
hypothesis
5* None of the above

43.

One might check the oxygen by placing a number of
people in a confined place where there was a con­
trol amount.
Other checks would have to be made
also such as the purity of food, the purity of
water and whether or not proper sanitation rules
were followed.

44.

Confine one group to a small space in which there
is a limited supply of oxygen.
Let the other
group have unlimited supply of oxygen and a large
space.
Let their diets and other items be the
same.
If the cause of the illness is as stated
the confined group will be ill from lack of oxygen.

45.

Set two groups of people, one with plenty of oxygen
and the other in a normal environment.
Determine
which group becomes ill.

46.

Put a lot of rabbits in a small space for a period
of time.
Put a few rabbits in the same amount of
space.
Observe the rabbits and draw conclusions.

47.

Put one group of people in a
amount of carbon dioxide and
room with a normal amount of
the oxygen concentration the

room with an excessive
another group in a
carbon dioxide.
Keep
same in both rooms.

This portion of the test is designed to test your
ability to organize data.
Select from the key below the
curve which best fits the data.
If none of the curves
fit the data mark space five on your answer sheet.
The
curves need not have the same amount of slope as the
curves presented in the key. Use scratch paper if you
wish.

1
5. None of
the curves

358
48.

The horizontal axis represents
Abbreviated Key
the time in hours after the in/
Jection of sugar into the blood; 1. J
3
the vertical axis is the amount
— /
5 . none
of sugar in the blood.
Time after in .lection
1
3

6

49.

Percent increase
8

10

12

14
18

20
80

The horizontal axis represents time in days; the
vertical axis is the number of yeast cells in millions
(starting with 100 yeast cells).
Number of

Time in days

51.

8

The horizontal axis represents age in years.
The
vertical axis is the percent increase in the weight
of the ovaries and other female sex organs from
birth to 20 years.
Age
4

50.

Blood sugar
35
12

yeast cells in millions

4

25

8

130

12

390

20

400

The horizontal axis represents the amount of thyroprotein fed daily to cows.
The vertical axis repre­
sents the percent increase in milk production.

Thyroproteln fed
.15
.20
.24
.30

grams
grams
grams
grams

Percent Increase
18
23
27
33

This test is designed to measure your understanding
of the relation of factB to the solution of a problem. The
over-all problem involved in this test is presented. This
is followed by a series of possible solutions to the prob­
lem (hypotheses). After each hypothesis there are a number
of items, all of which are true statements of fact. Deter­
mine how the statement is related to the hypothesis and
mark each statement according to the key which follows the
hypothesis.

GENERAL PROBLEM:
What factors are involved in the transmission and
development of Infantile Paralysis (Poliomyletis)?
HYPOTHESIS I:
In man the disease is contracted by direct contact
with persons having the disease.
For Items 52 through 60 mark space if the item offers
1.
Direct evidence in support of the hypothesis.
2.
Indirect evidence in support of hypothesis.
3.
Evidence which has no bearing on the hypothesis
4.
Indirect evidence against the hypothesis.
5.
Direct evidence against the hypothesis.
52.

Monkeys free from the disease almost never catch
infantile paralysis from infected monkeys.

53.

The curve of number of cases of the disease in a
given area is the same shape as the curve for the
fly population in that area, the infantile paralysis
incidence curve lagging behind the fly population
curve by about two w e e k s .

54.

The virus has never been isolated from the blood.

55.

The virus is not found in the nasal secretion, nor
in the saliva.

56.

The Incubation period for infantile paralysis is
from 4 to 21 days.

57.

Most persons in contact with the diseased individual
do not develop the disease.

58.

The incidence of infantile paralysis is higher in
rural districts than in the cities.

59.

Gases of infantile paralysis have been found to
follow the roads of communication of the population,
that is, the disease spreads from populated areas
along roads or rivers to other areas.

60.

Even during epidemics cases are spotty, it is
usually impossible to trace one case
from another.

61.

What is the status of hypothesis I ?
1.
It is true.
2.
It is probably true.
3.
The data are contradictory, so the truth or
falsity cannot be Judged.
4.
The hypothesis is probably false.
5.
It is definitely false.

360
HYPOTHESIS II:
Healthy persons having had contact with diseased
Individuals may carry the disease from one person
to another.
For items 62 through 70 mark space if the item offers:
1.
Direct evidence in support of the hypothesis.
2.
Indirect evidence in support of the hypothesis.
3«
Evidence which has no bearing on the hypothesis.
4.
Indirect evidence against the hypothesis.
5.
Direct evidence against the hypothesis.
62.

Monkeys free of the disease almost never catch
infantile paralysis from infected monkeys.

63*

It has been found that exertion prior to or at the
time of infection increases the incidence of the
disease.

64.

Even during epidemics cases
usually impossible to trace

are spotty; it is
one case from another.

6 5 . The virus is always found in the stools of people
who have the disease.
66.

Most persons in contact with the diseased Individual
do not develop the disease.

6 7 . Nine out of 14 adults contacts had virus in stools,
almost all child contacts have virus in stools.
68.

U p to two months after contact the virus is found
in the stools of persons who contacted the victims,
but who did not contract the disease.

6 9 . In the stools of non-contacts the virus was found
in only one person in 100.
70.

The percent of cases of Infantile paralysis is
higher in rural districts than in the cities.

71.

What
1.
2.
3.
4.
5.

is the status of hypothesis II ?
The hypothesis is true.
It is probably false.
The data are contradictory, so the truth or
falsity cannot be Judged.
It is probably false.
It is definitely false.

361
This portion of the test was designed to measure
your ability to interpret data and to test your understand­
ing of experimentation.
In each case the numbers in the
first column are the numbers which you will use as your
answer.
Thus the table presented becomes both the source
of data and your key for the questions which follow it.
In each case where a test tube number or group number is
called for the one which gives positive evidence for the
statement should be given.
Below this the control or com­
parison is called for.
This is the test tube or group "
number of the data which offers a comparison.
For example:
1.
2.

Leaf in dark Leaf in light -

no starch.
starch.

"Light is necessary for the production of starch."
You would mark space 2 because this is the positive evi­
dence, but it would be meaningless if it were not compared
with the leaf in the
dark.
Therefore, the
following
"What
is the control
(comparison) for item
1?"would be
marked space 1.
Items 72 through 80 refer to the data presented
below.
Five test tubes, each containing a gram of protein,
were set up.
Mark each item according to the test tube
number called for.
All substances were dissolved in water*.
All test tubes were kept at 37° 0. (water boils at 100° C.).
For test tube 5, Substance X was boiled and then cooled
before it was added to the protein.
Test Tube
1
2
3
4
5.

Contents of Tubes

Amt. of Substance W
present after 24 hours.

Protein plus Substance X
Protein plus water
Protein plus Substance X
hydrochloric acid
Protein plus Hydrochloric
acid
Protein plus Substance X
(boiled)

.05 gram
.00 gram
.08 gram
.00 gram
.00 gram

Give the number of the test tube which acts as a
control (comparison) for the entire experiment.
73.

Give the number of the test tube which gives evidence
that protein does not break down spontaneously into
Substance W.

74.

Give the number of the test tube which gives evidence
that Substance X is the active substance in the break­
down of proteins.

item,

75.

G i v e the n u m b e r of the tube which is the control
for Item 74.

76.

aiv e the n u m b e r of the test tube w h ich shows that
a temper a t u r e of 37 degrees G • does n o t cause p r o ­
t ein to bre a k down into S u b stance W.

77•

W h i c h test tube gives evidence that Substance X is
no t a stable substance?

78.

W h i c h tube is the control for item 77.

79.

G ive the n u m b e r of the test tube w h ich indicates that
h y d r o c h l o r i c a c i d alone is ineffective in b reaking
d o w n proteins.

80.

G-ive the control for item 79.

Items 81 through 91 r e fer to the d ata presented
below.
M a r k e ach item ac c o r d i n g to the leaf num b e r called
for.
P l a n t A n o r m a l l y stores starch in its leaves while
Plant B does not nor m a l l y store starch in its leaves.
The f o l lowing experiments were p e r formed in a dark
room at 7 2 ° F.
G-lucose (sugar) solutions were m ade with
20 grams of glucose per 100 cubic centimeters of water.
Leaves of plant A taken from a plant that h a d been in the
dark f o r 48 h o urs were flo a t e d in the 5 solutions listed
b e low and left in the glucose solution for an hour.
Leaf
1
2
3
4
5

______________ S o l u t i o n ___________
G-lucose
Water
G-lucose plus juice from Plant B
G-lucose plus juice from Plant G
G l ucos e plus b o i l e d juice from
Plant B

Analysis of leaf
a f ter 4 hours.
Starch in leaf
No ptarch
in leaf
No s t a r c h
in leaf
No starch
in leaf
Sma l l amount of
starch in leaf

81.

Giv e the n u m b e r of the leaf w h ich
does not d e v e l o p spontaneously in

showed that
the leaf in

starch
the dark.

82.

This leaf indicates that a temperature of 72°
n o t cause starch to form in the leaf*.

83.

G i v e the n u m b e r of the leaf which is the control
(comparison) for the entire experiment.

84.

G ive the n u m b e r of the leaf which gives evidence that
P l a n t A is capable of manufa c t u r i n g starch from glucose

F. does

363
85.

G-ive the n u m b e r
for item 84.

86.

G-ive the n u m b e r of the leaf w h i c h gives evidence
that the Juice of P l a n t B is capable of preventing
the m a n u f a c t u r e of s t a r c h f r o m glucose.

87.

of the leaf w h i c h is the control

W h a t is the c o n t r o l for item 86?

88.

Giv e the n u m b e r of the leaf w h ich gives evidence
that the Juices of P l a n t B contain a substance w h i c h
inhibits the p r o d u c t i o n of starch in its leaves.

89.

Give the leaf w h i c h is the con t r o l for item 88.

90.

This leaf g i ves e v i d e n c e that the
stance is n o t a stable substance.

91.

W h a t is the c o n t r o l for item 90?

inhibitory s u b ­

This portion of the test was d e s i g n e d to measure
your ability to m a k e conclusions.
W h e n facts are a n a l y z e d
and s t u d i e d they s o m e t i m es y i e l d evidence which h e l p in the
solution of a problem.
However, any conclusion m ust be
c h e cked b e f o r e it can be accepted.
The fo l l o w i n g key in­
cludes four w ays in w h i c h c onclusions may be faulty.
Each
of the items p r e s e n t a q u e s t i o n or problem, a brief d e s c r i p ­
tion of an ex p e r i m e n t a n d one o r m o r e conclusions drawn
from the experiment.
E a c h ex p e r i m e n t was r epeated many
times.
R e a d each problem, e x p e r i m e n t and the conclusions.
W h e r e severa l c o n c l u s i o n s are g i v e n evaluate each c o n c l u ­
sion separately.
Is the c o n c l u s i o n tentatively Justified
by the data?
If so, m a r k space 1 on your answer sheet.
If
the conclusi o n is n o t J u s tified d e t e r m i n e whe t h e r 2, 3, 4,
or 5 in the k e y is the b e s t reas o n for it b e i n g faulty and
m ar k the p r o p e r space on y our a n s w e r sheet..
Key
The
1.
2.
3.
4.
5.

co n c l u s i o n is:
Ten t a t i v e l y l u s t i f i e d .
U n j u s t i f i e d b e c a u s e it does not answer the p r o b l e m .
U n j u s t i f i e d b e c a u s e the e x p e r i m e n t lacks a control
comparison.
U n j u s t i f i e d b e c a u s e the d a t a are faulty or i n a d e q u a t e ,
though a con t r o l w a s included.
U n j u s t i f i e d b e c a u s e it is c o n t r a d i c t e d by the d a t a .

364
PROBLEM:
A student was In t e r e s t e d in developing a test
for a certain type of substance.
In all 100 cases
h i s test was positive.
92.

He concluded that the test was a specific test for
the substance.

PROBLEM:
A n i n v e s t i g a t o r w a n t e d to k n o w w hat causes
people to b r e a t h e fas t e r w h e n they are running
rapidly.
He f o u n d t h a t b r e a t h i n g m ore carbon dioxide
i ncreased the b r e a t h i n g rate, but that the breathing
of air d e f i c i e n t in oxygen did not increase the
breathing r a t e .
93.

He concluded that people breathe faster when they
are running b e c a u s e they n e e d more oxygen.

94.

Someone else c o n c l u d e d that running increases the
rate of breathing.

PROBLEM:
A n investigator w i s h e d to determine w hether temp­
eratur e increased the rate of a certain reaction. On
repeat e d tests h e found that if he started out with
a certain amount of his original substances he would
obtain, after one hour, 1 gram of the substance pro­
d u c e d by the r e a c t i o n at 0° C., 2 grams at 20° C.,
5 grams at 40° C ., and 3 grams at o0° C .
95.

He concluded that increased temperature increased
the rate of the reaction.

PROBLEM:
A person w a n t e d to determine whether bile aided
in the d i g estion of fats.
He found that w henever he
m i x e d pancreatic Juice with fats a small part of the
fat was digested, b u t w h e n e v e r he m i x e d pancreatic
juice and bile with fat, he found that the fat was
completely digested.
W h e n he m i x e d bile alone with
fat he found that there was no digestion.
96.

He concluded that
fats.

bile aided in the digestion of

97.

A n o t h e r c o n cluded that pancreatic
ary for dig e s t i o n of fats.

98.

Someone else cla i m e d that bile does not aid in the
digest i o n of fat.

juice was n e c e s s ­

PROBLEM:
A person w a n t e d to k now what caused a certain
disease.
H e e x a m i n e d 1000 patients with the
disease.
A l l h a d a certain bacteria (Bacteria A)
in the digestive tract.

365
99•

He concluded that Bacteria A was the cause of
the disease.

PROBLEM:
A person wanted to know why plants bend toward
the light.
He placed one group of plants in the
light with the light source at the right.
He
placed another group of similar plants in the dark.
The plants in the dark grew straight, the plants
in the light were bent to the right.
100.

He concluded

101.

Another concluded that plaints bend toward the
because they need light to grow.

102.

that plants bend toward the light.
light

Someone else concluded that light influences the
direction in which plants grow.

PROBLEM:
Investigator A wanted to know what caused people
to become ill if confined in large numbers to a
small closed area.
He found on repeated tests that
the air in very crowded closed areas contained about
5 %
carbon dioxide, while normal air contains .03/6
carbon d i o x i d e .
103.

He concluded that excessive carbon dioxide caused
the illness.

104.

Another investigator concluded that the illness was
caused by Insufficient oxygen.

PROBLEM:
Investigator B in an attempt to solve the same
problem repeated the experiment done by investigator
A but in addition had people in uncrowded rooms
breathe air containing 5 % carbon dioxide.
No ill
effects were noted among those in the uncrowded
rooms.
105.

He also concluded that excessive carbon dioxide
caused the illness.

106.

Another investigator claimed that this showed that
the disease was caused by insufficient oxygen.

107.

Another conclusion was that 5$ carbon dioxide will
produce no ill effects.

108.

Still another claimed that people live better in
uncrowded areas.

366
PROBLEM:

What are some of the requirements for seeds
to sprout?
The same student planted two groups
of seeds of different types in pots and placed
one group of the pots in the light, the others
in the dark.
Those plants in the light were
green, those in the dark were yellow.
Other
conditions were the same for both groups.

109.

Conclusion:
Light is necessary for sprouting of seeds.

110.

Another conclusion:
Plants require light to mature properly.

This portion of the test was designed to measure
your ability to interpret data.
Following the data you
will find a number of statements.
You are to assume that
the data as presented are true.
Evaluate each statement
according to the following key and mark the appropriate
space on your answer sheet.
Key
1.
2.

3.
4.

5.

True:
The data alone are sufficient to show that the
statement is true.
Probably t r u e :
The data indicate that the statement is probably
true, that it is logical on the basis of the
data but the data are not sufficient to say that
it is definitely true.
Insufficient evidence:
There are no data to indicate whether there is
any degree of truth or falsity in the statement.
Probably false:
The data indicate that the statement is probably
false, that is, it is not logical on the basis
of the data but the data are not sufficient to
say that it is definitely false.
False:
The data alone are sufficient to show that the
statement is false.

Items 111 through 131 refer to the following graph.
Use the key above to answer the items.
The lizard is con­
sidered to be cold blooded, the others warm blooded.

Q>
ST <D
•P

XS

40°
_

2 2 30°

?

10°
... inn, , # .

10°
20°
30°
40°
External temperature ^Centigrade
111.

The body temperature of the cat varies more than the
body temperature of the ant eater.

112.

When the external temperature is 50°C., the tempera­
ture of the lizard is also 50°C.

113.

At an external temperature of 50°C.,
of the cat is 50°C.

114.

When the external temperature is 50°C., the tempera­
ture of the ant eater would be higher than the temp­
erature of the cat.

115.

The temperature of a mouse would be about half way
between that of the cat and the ant eater.

116.

At no time during the experiment did any of the
animals have the same body temperature.

the temperature

117.

There is a close correlation between the body tempera­
ture of the lizard and that.of the external environment.

118.

The body temperature of the cat showed the least varia­
tion in temperature during the experimental period.

119.

At 20 degrees below 0°G., the lizard would be frozen.

120.

If the temperature of other cold blooded animals were
plotted it would resemble that of the lizard.

368
Items 121 through 124 are a re-evaluation of some of
the items 111 through 120. Re-read items 112. 114. 118 and
120 and determine whether they are generalizations, exten­
sions of the data, explanations of the data or merely re­
statements of the data, etc. Answer each according to the
following key:
Ke y

1.
2.
3.
4.
5.

A generalization, that is the dataisays it is true
for this situation, a generalization says it is
true for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of cause and
effect.
A restatement of results.
None of the above.

121.

Item 112

122.

Item 114

123.

Item 118

124.

Item 120

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions. A
conclusion is given.
(This conclusion is not necessarily
Justified by the data) The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key. These items all relate to the
data presented for items 111 through 120.
Key

1.
2.
3.
4.
5.

An assumption which must be made to make the con­
clusion valid (true).
An assumption which if made would make the conclu­
sion false.
An assumption which has no relation to the validity
(truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

Conclusion I: Warmblooded animals have some type of heat
regulating mechanism.

369
125.

It is possible for animals to have some type of heat
regulating mechanism.

126.

The cat and the duckbill are very different in their
reaction to the external environment.

127.

The opossum had a lower body temperature than the
cat.

Conclusion II:
Ant eaters and duckbills are more closely related
than ant eaters and cats.
128.

Similarity of reaction of living things indicate a
relationship.

129.

The temperature of the ant eater varied more with
the external temperature than did that of the cat.

130.

The degree of closeness of similarity of response of
living things runs parallel with the closeness of
kinship.

131-

The temperature of the cat varied less than that of
the anteater and duckbill with change of temperature.

This portion of the test was designed to measure your
ability to Interpret data. Following the data you will find
a number of statements. You are to assume that the data as
presented are true. Evaluate each statement according to
the following key and mark the appropriate space on your
answer sheet.

1.
2.

3.
4.

5.

True: The data alone are sufficient to show that the
statement is true.
Probably true: The data indicate that the statement
is probably true, that it is logical on the basis of
the data but the data are not sufficient to say that
it is definitely true.
Insufficient evidence: There are no data to indicate
whether there is any degree of truth or falsity in
the statement.
Probably false: The data indicate that the statement
is probably false, that is, it is not logical on the
basis of the data but the data are not sufficient to
say that it is definitely false.
False: The data alone are sufficient to show that
the statement is false.

370
Analyses were made of the Vitamin G content of red
ripe and green tomatoes as soon as they were picked.
Mature green tomatoes were stored at the temperatures in­
dicated in the following table. Those which had ripened
by the end of the first week were analyzed for their Vita­
min G content; those ripened at the end of the second week
were analyzed at the end of the second week, etc. In addi­
tion some mature green tomatoes were analyzed each week.
Condition
when taken
from field
mature green
red ripe
mature green
mature green
mature green
mature green
mature green
mature green
mature green
mature green

No. of
Temp, when weeks
stored
stored

Stage of
ripeness
when analyzed

not stored
not stored
70° f .
70°F.
70°F.
80°F.
80°F.
80°F.
70°F.
70°F.

mature green
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
mature green
mature green

0
0
1
2
3
1
2
3
1
2

Vitamin G
mg/100 grams
15.0
16.2
14.4
12.9
8.2
14.0
9.8
7.1
10.0
7.2

132.

Tomatoes ripened at 90°C. would have less Vitamin C
after three weeks than those stored at 80°F.

133.

Tomatoes could not be stored at 90°F. because at this
high a temperature they would rot or spoil.

134.

The lower the temperature at which tomatoes are stored
the less is the breakdown of Vitamin C.

135.

Heat causes a breakdown of the Vitamin G molecule.

136.

After four weeks of storage tomatoes stored at 70°F.,
would contain less than 7 mg/100 grams of Vitamin C.

137.

Some mature green tomatoes ripen in storage within a
week.

138.

The green tomatoes which did not ripen in a week had
lost about the same amount of Vitamin 0 as those which
ripened during the week.

139.

Vitamin C is a stable substance.

140.

Vitamin G is manufactured some place else in the plant
than in the fruit (tomato) and is stored in the fruit.

371
Items 141 through 144 are a re-evaluatlon of some
of the Items 132 - 140. Re-read items 133, 135, 136 and
137 and determine whether they are generalizations, exten­
sions of the data, explanations of the data or merely re­
statements of the data, etc* Each of these items is to be
answered according to the following key:
Ke y

1.
2.

A generalization, that is the data says it is true
for this situation, a generalization says it is true
for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true
An explanation of the data in terms of causes and
effect.
A restatement of results.
None of the above.
■*

3.
4.
5.
141.

Item 133

142.

Item 135

143.

Item 136

144.

Item 137

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions. A
conclusion is given.
(This conclusion is not necessarily
Justified by the data).
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key. These items will relate to the
data presented for items 81 through 100.
K ey

1.
2.
3.
4.
5.

An assumption which must be made to make the conclu­
sion valid (true).
An assumption which if made would make the conclusion
false.
An assumption which has no relation to the validity
(truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

Conclusion I:
Sunlight causes an Increase in the Vitamin C content
of tomatoes as they ripen on the vine.

372
145.

The tomatoes which were analyzed when green ripe
would have contained more Vitamin C if they had
been allowed to ripen on the vine.

146.

The test used to measure the amount of Vitamin C
accurately measures the amount.

147.

The Vitamin G content of ripe tomatoes on the vine
was higher than the Vitamin C content of the green
ripe tomatoes on the vines.

Conclusion IIs
Vitamin C breaks down spontaneously at room tempera­
ture.
148.

Vitamin C reacts similarly in all plants in which _
it is found.

149.

When the tomatoes were stored at room temperature
the Vitamin C content decreased.

150.

All vitamins react similarly to storage at room
temperature.

373
TABLE XLV
ITEM ANALYSIS DATA FOR TEST I

Percent Success
Item

Upper 27#

Lower 27#

Discrimination
r

Index

Difficulty
%

Success

Index

*92.8
**91.0

86.4
83.0

.16
.17

10

86

73

77.6
72.0

56.8
46.0

.24
.24

15

59

55

80.0
75.0

55.2
44.4

.28
.32

20

59

55

96.8
96.0

72.0
65.0

.48
.50

33

80

68

96.8
96.0

71.2
64.0

.50
.51

34

80

68

56.0
45.0

36.0
20.0

.21
.27

17

31

40

88.0
85.0

63.2
54.0

.33
.36

23

68

60

78.4
73.0

67.2
59.0

.14
.17

10

66

59

48.8
36.0

56.0
45.0

-.10
-. 10

- 6

40

45

80.8
76.0

47.2
34.0

.37
.43

28

55

53

11

41.6
27.0

20.8
1.0

.24
.60

42

14

27

12

44.8
31.0

47.2
34.0

-.03
-.05

- 2

31

40

88.8
86.0

60.8
51.0

.37
.40

26

68

60

1
2
3
4
5
6
7
8
9
10

13

*
**

Method of Flanagan
Method of Davis

374
TABLE XLV (continued)

Percent Success
Upper 27$ £ower 27$

Discrimination
r
Index

Difficulty
$ Success Index

66.4
58.0

34.4
18.0

.38
.43

28

38

44

78.4
73.0

57.6
47.0

.24
.27

17

59

55

76.0
70.0

46.4
33.0

.33
.36

23

51

51

72.0
65.0

53.6
42.0

.21
•23

14

53

52

61.6
52.0

37.6
22.0

.24
.32

20

37

43

44.8
31.0

21.6
2.0

.26
.55

37

16

29

18.4
0.0

23.2
4.0

-.05
-.23

-14

3

9

46.4
33.0

37.6
22.0

.10
.14

8

27

37

60.0
50.0

28.0
10.0

.33
.48

32

30

39

23

69.6
62.0

56.8
46.0

.14
.17

10

53

52

24

71.2
64.0

58.4
48.0

.14
.17

10

53

52

25

68.8
61.0

40.0
25.0

.30
•36

23

24

46

26

67.2
59.0

45.6
32.0

.22
.27

17

25

47

27

80.0
75.0

52.8
41.0

.31
.35

22

57

54

28

50.4
38.0

33.6
17.0

.16
.26

16

27

37

29

53.6
42.0

24.8
6.0

.33
.50

33

24

35

14
15

16
17
18
19
20
21
22

375
TABLE XLV (continued)
Percent Success
Item Upper 27% Lower 27%

Discrimination
Difficulty
r
Index
% Success Index

62.4
53.0

44.8
31.0

.17
.23

14

40

45

53.6
42.0

27.2
9-0

.28
.41

27

25

36

55.2
44.0

27.2
9.0

.29
.45

29

25

36

33

60.8
51.0

4 5 .6
32.0

.16
.20

12

40

45

34

36.8
21.0

10.4
0.0

.37
.55

37

12

25

35

20.8
1.0

11.2
.0

.17
.02

1

1

1

36

52.8
41.0

20.0
0.0

.36
.68

50

21

33

37

88.0
85.0

6 0.0
50.0

.36
.40

26

66

59

38

80.8
76.0

39.2
24.0

.44
.52

35

50

50

39

84.8
81.0

48.8
36.0

.41
.48

32

59

55

40

60.0
50.0

31.2
14.0

.30
.41

27

31

40

41

49.6
37.0

47.2
34.0

.03
.04

2

35

42

42

76.0
70.0

61.6
52.0

.16
.18

11

61

56

43

60.0
50.0

35.2
19.0

.26
.35

22

35

42

44

48.0
35.0

16.0
0.0

.38
.64

46

17

30

45

44.8
31.0

41.6
27.0

.04
.05

3

28

38

30
31
32

376
TABLE XLV (continued)
Percent Success
Upper 27$ Lower 2 7

48
49
50
51
52
53
54
55
56
57
58
59

60

%

Discrimination
r
Index

%

Difficulty
Success Index

14.4
0.0

16.0
0.0

-.03
.00

0

0

0

84.4
81.0

49.6
37.0

.40
.46

30

57

54

72.0
62.0

41.6
25.0

.31
.38

24

42

46

80.8
75.0

57.6
46.0

.27
.32

20

61

56

60.8
51.0

43.2
29.0

.18
.22

13

40

45

76.8
69.0

64.0
55.0

.15
.15

9

61

56

42.4
28.0

29.6
12.0

.14
.23

14

19

32

52.0
40.0

16.8
0.0

.39
.68

50

21

33

74.4
68.0

47.2
34.0

.28
.35

22

51

51

48.8
36.0

28.0
10.0

.23
.36

23

22

34

89.6
87.0

68.8
61.0

.32
.32

20

73

63

90.4
88.0

76.0
70.0

.23
.26

16

79

67

61.6
51.0

40.0
27.0

.22
.26

16

38

44

56.8
46.0

47.2
34.0

.11
.14

8

40

45

56.0
45.0

32.8
16.0

.24
.34

21

30

39

377
TABLE XLV (continued)
Percent Success
It<

61

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

62.4
53.0

22.4
3.0

.42
.66

48

28

38

64.0
55.0

40.8
26.0

.24
.31

19

40

45

89.6
87.0

60.8
51.0

.39
.41

27

68

60

55.2
44.0

21.6
7.0

.36
.50

33

25

36

69.6
62.0

40.8
26.0

.32
.36

23

44

47

32.0
15.0

18.4
0.0

.18
.47

31

8

20

26.4
8.0

21.6
2.0

.05
.24

15

5

15

28.0
10.0

26.4
8.0

.03
.05

3

9

22

52.8
41.0

27.2
9.0

.27
.41

27

24

35

48.8
36.0

39.2
24.0

.11
.14

8

30

39

71

61.2
52.0

47.2
34.0

.14
.18

11

42

46

72

81.6
77.0

54.4
43.0

.32
.36

23

59

55

70.4
6 3.0

21.6
2.0

.48
.73

56

31

40

71.2
64.0

41.6
27.0

.32
.38

24

44

47

34.4
18.0

15.2
0.0

.25
.51

34

9

22

62
63
64
65

66
67

68
69
70

73

378
TABLE XLV (continued)
Percent Success
Upper 27^ Lower 27$

Discrimination
Difficulty
Index
% Success Index
r

80.0
75.0

38.4
23.0

.44
.52

35

48

49

75.2
6 9.0

32.0
15.0

.43
.55

37

42

46

75.2
69.0

34.4
18.0

.42
.52

35

44

47

79

96.8
96.0

71.2
64.0

.55
.51

34

80

68

80

88.8
86.0

60.0
50.0

.37
.45

29

68

60

92.0
90.0

42.2
28.0

.58
.64

46

59

55

84.0
80.0

34.4
18.0

.52
.61

43

48

49

83

68.8
61.0

34.4
18.0

.36
.45

29

38

44

84

96.0
95.0

55.2
44.0

.58
.62

44

68

60

85

84.0
80.0

40.8
26.0

.47
.54

36

53

52

86

92.8
91.0

68.8
61.0

.39
.40

26

76

65

87

51.2
39.0

39.0
5.0

.29
.50

33

21

33

88

64.8
56.0

38.4
23.0

.27
.35

22

38

44

32.8
16.0

10.4
0.0

.34
.48

32

94.4
93.0

48.0
35.0

.58
.64

46

76
77
78

81
82

9

•

21

63

57

379
TABLE XLV (continued)

Item

Percent Success
Upper 27/6 Lower 27%

Discrimination
Difficulty
r
Index
% Success Index

84.8
81.0

40.8
26.0

.53
.55

37

53

52

60.8
51.0

32.0
15.0

.31
.41

27

31

40

76.8
71.0

40.0
25.0

.38
.46

30

46

48

72.8
66.0

45.6
32.0

.28
.35

22

48

49

52.8
41.0

21.6
2.0

.34
.62

44

21

33

91.2
89.0

56.8
46.0

.45
.48

32

66

59

25.6
7.0

5.6
0.0

.38
.32

20

4

13

90.4
88.0

60.4
51.0

.40
.47

31

70

61

99

88.0
85.0

49.6
37.0

.45
.51

34

59

55

100

73.6
67.0

33.6
11.0

.41
.59

41

38

44

101

51.2
39.0

23.2
4.0

.30
.54

36

21

33

102

49.6
37.0

18.4
0.0

.36
.66

48

18

31

103

20.0
0.0

6.4
0.0

.28
.00

0

0

0

104

48.0
35.0

18.4
0.0

.35
.65

47

17

30

85.6
82.0

55.2
44.0

.37
.40

26

63

57

91
92
93

94
95
96
97
98

105

380
TABLE XLV (continued)

Percent Success
Item

U p p e r 27/6

Lower 27$

D i s c r imination
r

Index

Difficulty
%

Success

Index

52.8
41.0

22.4
3.0

.34
.58

40

21

33

28.0
10.0

14.4
0.0

.20
.39

25

5

16

72.0
65.0

44.0
30.0

.28
.35

22

46

48

66.4
58.0

32.8
16.0

.34
.45

29

37

43

32.0
15.0

17.6
0.0

.20
.47

31

8

20

85.6
82.0

69.6
62.0

.23
.24

15

71

62

66.4
58.0

26.4
8.0

.41
.58

40

33

41

69.6
62.0

28.0
10.0

.43
.57

39

35

42

81.6
7 7.0

37.6
22.0

.47
.55

37

48

49

115

94.4
93.0

78.4
73.0

.31
.34

21

82

69

116

86.4
83.0

56.0
45.0

.37
.40

26

63

57

117

93.6
92.0

65.6
57.0

.44
.46

30

74

64

118

63.2
54.0

55.2
44.0

.08
.10

6

48

49

119

68.8
61.0

35.2
19.0

.34
.45

29

38

44

120

50.4
38.0

33.6
17.0

.17
.26

16

27

37

106
107
108
109

110
111
112
113
114

381
TABLE XLV (continued)
Percent Success
Item

121
122

12 3
124
125

126
127
128
129

130
131
132
133
134
135

Upper 27% Lower 27$

Discrimination
r

Index

Difficulty

% Success

Index

75.2
69.0

48.8
36.0’

.28
.34

21

51

51

84.0
80.0

60.8
51.0

.28
.32

20

64

58

61.6
52.0

38.4
23.0

.24
.31

19

38

44

79.2
74.0

44.2
31.0

.36
.41

27

53

52

76.0
70.0

60.8
51.0

.17
.20

12

61

56

23.2
4.0

20.0
0.0

.04
.23

14

3

9

65.6
57.0

34.4
18.0

.35
.41

27

37

43

58.4
48.0

44.8
31.0

.13
.17

10

40

45

70.4
63.0

28.0
10.0

.42
.57

39

35

42

49.6
37.0

24.8
6.0

.27
.46

30

21

33

68.0
60.0

36.0
20.0

.33
.41

27

40

45

59.2
49.0

32.0
15.0

.29
.39

25

31

40

92.0
90.0

55.2
44.0

.48
.48

32

66

59

32.8
16.0

24.0
5.0

.11
.26

16

10

23

48.0
35.0

29.6
12.6

.19
.31

19

22

34

382
TABLE XLV (continued)

Percent Success
Item
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

*
**

Upper 27$

Lower 27$

Discrimination
r

Index

Difficulty
$ Success

Index

72.0
65.0

24.8
6.0

.47
.65

47

35

42

89.6
87.0

66.4
58.0

.34
.35

22

71

62

72.0
65.0

32.0
15.0

.41
.52

35

38

44

24.0
5.0

12.8
0.0

.17
.26

16

5

10

60.8
51.0

54.4
43.0

.07
.08

5

46

48

59.2
49.0

21.6
2.0

.38
.67

49

25

36

44.8
31.0

25.6
7.0

.21
.38

24

18

31

83.2
79.0

28.8
11.0

.55
.68

50

44

47

81.6
77.0

37.6
22.0

.47
.55

37

48

49

76.8
71.0

50.4
38.0

.30
.34

21

53

52

40.0
25.0

17.6
0.0

.28
.56

38

12

25

51.2
39.0

21.6
2.0

.32
.61

43

19

32

24.0
5.0

19.2
0.0

.06
.26

16

3

10

41.6
27.0

21.6
2.0

.23
.52

35

14

27

*69.6
**62.0

41.6
27.0

.29
.35

22

44

47

Method of Flanagan
Method of Davis

APPENDIX III

383
TEST IA

THE ABILITY TO THINK SCIENTIFICALLY
GENERAL DIRECTIONS
1.

Place your name, age and sex in the spaces provided
on the answer sheet.

2.

Place your student number in the space provided for
’’date of birth” .

3.

On

the space marked "school” place your major.

4.

In
the space marked ”l” below "school” , give courses
you have had in science in high school, in the space
marked ”2” give any courses you have had in science
in college in addition to biological science.

3.

Answer all items; if you don't know - guess.

6.

Do not mark on the test booklet.
if you wish.

Use scratch paper

7.

Be sure to mark dark on the answer sheet; the machine
does not pick up light markings.

8.

Each item has only one answer; do not mark more than
one.

384
This test has been devised to measure your ability
to think scientifically.
It is divided into several parts,
each of these parts tests a different phase of scientific
thinking.
This portion of the test is designed to measure
your ability to differentiate phases of thinking. These
steps include major problems or perplexities, possible
solutions to problems, observations which are not results
of experimentation but rather preliminary observations,
results of experimentation, and conclusions.
The following key is to be used for the succeeding
paragraph. Certain parts of the paragraph are underlined,
and each underlined item is a question. Choose the proper
response from the key and blacken the appropriate space in
the answer sheet.
1.
2.
3.
4.
3.
(1)

KEY
A major problem (stated or implied).
Hypothesis (possible solution to problem).
Result of experimentation.
Initial observation (not experimental).
Conclusion (probable solution of problem).

How does a homing pigeon navigate over territory

it has never seen before?
the pigeon in some way?

(2)
(3)

Do air currents stimulate
Are the pigeons equipped with

some sort of magnetic compasses; that is, are they sensitive
to the earth’s magnetism?

Yeagley tested the latter by

fastening small magnets to the wings of well-trained pigeons
(4)

Most of these birds never got home.

(5) Others, carry­

ing equal wing weights of non-magnetlc copper, made the home
roost without trouble. (6) Indicating that the earth's mag­
netism is a factor in pigeon navigation.

But the pigeons

magnetic compass could not, by itself, bring him back to his
roost; because many places on the earth's surface have ident
ical magnetic conditions.

Yeagley endeavored (7) to deter­

mine the other guiding factor.

(8)

It might be the sun or

385
stars, but pigeons navigate under
clouds.

While looking at a map

which had lines representing the
intensity of the earth's magnetism,

Abbreviated Key
1. A major problem
2.
3. Results
4.
5. Conclusions

he noted that the lines were crossed at varying angles by
\

the parallels of latitude.

If pigeons are sensitive to

some factor connected with the lines of latitude, they
would have all they need to find their way home.

The next

step was (9) to find some physical force, something the
pigeons might be able to detect, related to the lines of
latitude.

The effect of the earth's turning varies directly

with latitude; objects near the equator are carried daily
around the earth's circumference, moving at over 1,000 mi.
per hr.

Objects near the poles are carried around more

slowly.

The direction and variation of this circling can

be recorded by various man-made instruments.
shouldn't the pigeons feel it. too?

(10) Why

If they could, they

would have, along with their magnetic compass, a satisfactory
navigating instrument.

Yeagley trained hundreds of pigeons

to return to their home roosts at State College, Pa.

Then

he took them to a part of Nebraska where the lines represent­
ing the earth's magnetism cross the parallels of latitude at
the same angle as at State College.
to the east of this spot.

He released the pigeons

(11) The pigeons all flew west.

Yeagley believes that (12) pigeons are guided by both the
earth's magnitude and by its turning.(13) Just where the
birds keep their Instruments is still unknown; but Yeagley

386
found that (14) birda have a mysterious organ In their
eyes at the end of the optic nerve.

(15) This organ may

contain the nerve fibers that Pick up vibrations of mag­
netism and the even more delicate sense that measure the
earth's turning.
This portion of the test is designed to measure your
ability to recognize faulty experimental procedures.
In
each case a problem and a possible solution to the problem
(an hypothesis) are presented.
In each case the experi­
ments were designed by students to test the hypotheses.
Judge each experiment according to the following key.
Key
This experiment is:
1. Satisfactory.
2. Unsatisfactory because it lacks a control
or comparison.
3. Unsatisfactory because the control or com­
parison is faulty.
4. Unsatisfactory because it is unrelated to
the hypothesis.
5. None of the above - the experiment is un­
satisfactory for reasons other than those
listed in 2, 3, and 4.
PROBLEM: What are some of the requirements for the sprout­
ing of seeds?
HYPOTHESIS:
Oxygen is a requirement for the sprouting of seeds.
16.

If a seed lacked oxygen under a controlled experiment
the seed would not function properly and would soon
die.

17.

Place growing plants in an air tight container. Pump
out the oxygen. Place other growing plants in con­
tainers with oxygen. Keep temperature, light, etc.,
the same for each.

18.

Plant seeds in a container with glass covering it so
that no oxygen can enter and see if they sprout. Keep
temperature, light and moisture normal.

387
PROBLEM:

A minute insect (aphid) is suspected of spread­
ing a virus disease of roses. How would you
determine whether this is true?

HYPOTHESIS:

The aphid spreads a virus disease of roses.

19.

Put the insect among other kinds of plants other than
roses. Leave another group of these plants free from
contact witn the aphids. Compare the results.

20.

Since aphids travel througn the air, a plot of roses
must De entirely protected from them, and another
exposed to aphids which in turn have oeen exposed
to roses afflicted witn the virus disease. All must
be under constant conditions of soil, atmosphere, etc.

21.

Take sample rose witn the virus disease. Obtain same
kind of rose with no disease. Use microscope to aid
in detection of the disease. Use some sort of spray.
Note results.

22.

In order to determine whether the aphid spreads a
virus disease in roses, a group of roses should be
put in a hot house free from aphids to see whether
they get such a virus disease.

PROBLEM:

To determine the cause of illness which appears
when large numbers of people are confined to a
small space.

HYPOTHESIS:
23.

24.

Lack of oxygen causes the people to become ill.

One might check the oxygen by placing a number of
people in a confined place where there was a control
amount.
Other checks would have to be made also such
as the purity of food, the purity of water and whether
or not proper sanitation rules were followed.
Put one
amount
witn a
oxygen

group of people in a room with an excessive
of carbon dioxide and another group in a room
normal amount of carbon dioxide.
Keep the
concentration the same in both rooms.

388
This test is designed to measure your understanding
of the relation of facts to tne solution of a problem. The
over-all problem involved in this test is presented.
This
is followed by a series of possible solutions to the prob­
lem (hypotheses). After each hypothesis there are a number
of items, all of which are true statements of fact. Deter­
mine how the statement is related to the hypothesis and mark
each statement according to the key which follows the hypoth­
esis •
G-ENERAL PROBLEMS What factors are involved in the trans­
mission and development of Infantile Paralysis
(Poliomyletis)?
HYPOTHESIS I: In man the disease is contracted by direct
contact with persons having the disease.
For items 25 through 34 mark space if the item offers:
1. Direct evidence in support of the hypothesis.
2. Indirect evidence in support of the
hypothesis.
3. Evidence which has no bearing on the
hypothesis.
4. Indirect evidence against the hypothesis.
5. Direct evidence against the hypothesis.
25.

Monkeys free from the disease almost never catch
infantile paralysis from infected monkeys.

26.

The curve of number of cases of the disease in a given
area is the same shape as the curve for the fly popula­
tion in that area, the infantile paralysis incidence
curve lagging behind the fly population curve by about
two weeks.

27.

The

28.

The virus is not found in the nasal secretion nor in
the saliva.

virushas never been Isolated from the blood.

29.

The incubation period for infantile paralysis is from
4 to 21 days.

30.

Most persons in contact with the diseased individual
do not develop the disease.

31.

The incidence of infantile paralysis is higher in
rural districts than in the cities.

32.

Cases of infantile paralysis have been found to follow
the roads of communication of the population, that is,
the disease spreads from populated areas along roads
or rivers to other areas.

389
33.

Even during epidemics cases are spotty, it is usually
impossible to trace one case from another.

34.

What is the status of hypothesis I?
1. It is true.
2. It is probably true.
3. The data are contradictory, so the truth or
falsity cannot be judged.
4. The hypothesis is probably false.
5. It is definitely false.

HYPOTHESIS II: Healthy persons having had contact with
diseased individuals may carry the disease from one
person to another.
For items 35 through 44 mark space if the item offers:
1. Direct evidence in support of the hypothesis.
2. Indirect evidence in support of the hypothesis.
3. Evidence which has no bearing on the
hypothesis.
4. Indirect evidence against the hypothesis.
5. Direct evidence against the hypothesis.
35.

Monkeys free of the disease almost never catch infantile
paralysis from Infected monkeys.

36.

It has been found that exertion prior to or at the time
of infection increases the incidence of the disease.

37.

Even during epidemics cases are spotty; it is usually
Impossible to trace one case from another.

38.

The virus is always found in the stools of people who
have the disease.

39.

Most persons in contact with the diseased individual
do not develop the disease.

40.

Nine out of 14 adult contacts had virus in stools,
almost all child contacts have virus in stools.

41.

Up to two months after contact the virus is found in
the stools of persons who contacted the victims, but
who did not contract the disease.

42.

In the stools of non-contacts the virus was found in
only one person in 100.

43.

The percent of cases of infantile paralysis is higher
in rural districts than in the cities.

390
44.

What is the status of hypothesis II?
1. The hypothesis is true.
2. It is probably true.
3. The data are contradictory, so the truth or
falsity cannot be Judged.
4. It is probably false.
5. It is definitely false.

This portion of the test was designed to measure
your ability to interpret data and to test your understand­
ing of experimentation.
In each case the numbers in the
first column are the numbers which you will use as your
answer.
Thus the table presented becomes both the source
of data and your key for the questions which follow it.
In each case where a test tube number or group number is
called for the one which gives positive evidence for the
statement should be given. Below this the control or com­
parison is called for. This is tne test tube or group
number of the data which offers a comparison. For example:
1.
2.

Leaf in dark Leaf in light -

no starch
starch.

"Light is necessary for the production of starch."
You would mark space 2 because this is the positive evi­
dence, but it would be meaningless if it were not compared
with the leaf in the dark. Therefore, the following item,
"What is the control (comparison) for item 1?" would be
marked space 1.
Items 45 tnrough 53 refer to tne data presented be­
low. Five test tubes, each containing a gram of protein,
were set up. Mark each item according to the test tube
number called for. All substances were dissolved in water.
All test tubes were kept at 37° C. (water boils at 100° G.)
For test tube 5, Substance X was boiled and then cooled
before it was added to the protein.
Test Tube
1
2
3
4
5

Contents of Tubes

Amt. of Substance W
present after 24 hours

Protein plus Substance X
Protein plus Water
Protein plus Substance X
hydrochloric acid
Protein plus Hydrochloric
acid
Protein plus Substance X
(boiled)

•05 gram
.00 gram
.08 gram
.00 gram

00 gram

391
45.

dive the number of the test tube which acts as a
control (comparison) for the entire experiment.

46.

Give the number of the test tube which gives evidence
that protein does not break down spontaneously into
Substance W.

47.

Give the number of the test tube which gives evidence
that Substance X is the active substance in the break
down of proteins.

48.

Give the number of the tube which is the control for
item 47.

49.

Give the number of the test tube which shows that a
temperature of 37° C. does not cause protein to break
down into Substance W.

50.

Which test tube gives evidence that Substance X is
not a stable substance?

51.

Which test tube

52.

Give the number of the test tube which indicates that
hydrochloric acid alone is ineffective in breaking
down proteins.

53.

Give the control for item 52.

is the control for item 50?

Items 54 through 64 refer to the data presented below.
Mark each item according to the leaf number called for.
Plant A normally stores starch in its leaves while Plant B
does not normally store starch in its leaves. The following
experiments were performed in a dark room at 72° F. Glucose
(sugar) solutions were made with 20 grams of glucose per 100
cubic centimeters of water.
Leaves "of plant A taken from a
plant that had been in the dark for 48 hours were floated in
the 5 solutions listed below and left in the glucose solution
for an hour.
Analysis of leaf
Leaf
___
Solution____________
after 4 hours
1
2
3
4
5

Glucose
Water
Glucose plus juice from Plant B
Glucose plus Juice from Plant 0
Glucose plus boiled Juice from
Plant B

Starch in leaf
No starch in leaf
No starch in leaf
No starch in leaf
Small amount of
starch in leaf

392
54.

Give the number of the leaf which showed that starch
does not develop spontaneously in the leaf in the dark.

55.

This leaf indicates that a temperature of 72° F. does
not cause starch to form in the leaf.

56.

Give the number of the leaf which is the control (com­
parison) for the entire experiment.

57.

Give the number of the leaf which gives evidence that
Plant A is capable of manufacturing starch from glucose.

58.

Give the number of the leaf which is the control for
item 57.

59.

Give the number of the leaf which gives evidence that
the Juice of Plant B is capable of preventing the manu­
facture of starch from glucose.

60.

What is the control for item 59?

61.

Give the number of the leaf which gives evidence that
the juices of Plant B contain a substance which in­
hibits the production of starch in its leaves.

62.

Give the leaf which is the control for item 61.

63.

This leaf gives evidence that the inhibitory substance
is not a stable substance.

64.

Give the control for item 63.

This portion of the test was designed to measure
your ability to make conclusions. When facts are analyzed
and studied they sometimes yield evidence which help in the
solution of a problem. However, any conclusion must be
checked before it can be accepted. The following key in­
cludes four ways in which conclusions may be faulty. Each
of the items present a question or problem, a brief descrip­
tion of an experiment and one or more conclusions drawn from
the experiment. Each experiment was repeated many times.
Read each problem, experiment and the conclusions. Where
several conclusions are given evaluate each conclusion sepa­
rately. Is the conclusion tentatively Justified by the
data? If so, mark space 1 on your answer sheet. If the
conclusion is not Justified determine whether 2, 3, 4, or 5
in the key is the best reason for it being faulty and mark
the proper space on your answer sheet.

393
The conclusion is:
1. Tentatively Justified.
2. Unjustified because it does not answer the problem.
3. Unjustified because the experiment lacks a control
comparison.
4. Unjustified because the data are faulty or Inadequate.
though a control was included.
5. Unjustified because it is contradicted by the data.
PROBLEM: A student was interested in developing a test for
a certain type of substance.
In all 100 cases his test
was positive.
65.

He concluded that the test was a specific test for the
substance.

PROBLEM: An investigator wanted to know what causes people
to breathe faster when they are running rapidly. He
found that breathing more carbon dioxide increased the
breathing rate, but that the breathing of air deficient
in oxygen did not increase the breathing rate.
66.

He concluded that people breathe faster when they are
running because they need more oxygen.

67.

Someone else concluded that running increases the rate
of breathing.
An investigator wished to determine whether temp­
erature increased the rate of a certain reaction.
On
repeated tests he found that if he started out with a
certain amount of nis original substances he would
obtain, after one hour, 1 gram of the substance produced
by the reaction at 0°C., 2 grams at 20°C., 5 grams at
40°C. and 3 grams at 60°G.

PROBLEM:

68.

He concluded that Increased temperature increased the
rate of the reaction.
A person wanted to determine whether bile aided in
the digestion of fats. He found that whenever he mixed
pancreatic Juice with fats a small part of the fat was
digested, but whenever he mixed pancreatic juice and
bile with fat, he found.that the fat was completely
digested. When he mixed bile alone with fat he found
that there was no digestion.

PROBLEM:

r

69•

He concluded that bile aided in the digestion of fats.

70.

Another concluded that pancreatic Juice was necessary
for digestion of fats.

71.

Someone else claimed that bile does not aid in the
digestion of fat.

PROBLEM:• A person wanted to know what caused a certain ,
disease. He examined 1000 patients with the disease.
All had a certain bacteria (Bacteria A) in the digest­
ive tract.
72.

He concluded that Bacteria A was the cause of the
disease.

PROBLEM: A person wanted to know why plants bend toward
the light. He placed one group of plants in the light
with the light source at the right. He placed another
group of similar plants in the dark. The plants in the
dark grew straight, the plants in the light were bent
to the right.
73.

He concluded that plants bend toward the light.

74.

Another concluded that plants bend toward the light
because they need light to grow.

75-

Someone else concluded that light influences the
direction in which plants grow.

PROBLEM:
Investigator A wanted to know what caused people
to become ill if confined in large numbers to a small
closed area. He found on repeated tests that the air
in very crowded closed areas contained about 5$ carbon
dioxide, while normal air contains .03$ carbon dioxide.
76.

He concluded that excessive carbon dioxide caused the
illness.

77.

Another investigator concluded that the illness was
caused by insufficient oxygen.

PROBLEM: Investigator B in an attempt to solve the same
problem repeated the experiment done by investigator A
but in addition had people in uncrowded rooms breathe
air containing 5$ carbon dioxide. No ill effects were
noted among those in the uncrowded rooms.

395
78.

He also concluded that excessive carbon dioxide
caused the illness.

79.

Another investigator claimed that this showed that
the disease was caused by insufficient oxygen.

80.

Another conclusion was that 5% carbon dioxide will
produce no ill effects.

81.

Still another claimed that people live better in
uncrowded areas.

PROBLEM: What are some of the requirements for seeds to
sprout? The same student planted two groups of seeds
of different types in pots and placed one group of
the pots in the light, the others in the dark. Those
plants in the light were green, those in the dark were
yellow. Other conditions were the same for both groups.
82.

Conclusion:

Light is necessary for sprouting of seeds.

This portion of the test was designed to measure
your ability to interpret data. Following the data you
will find a number of statements. You are to assume that
the data as presented are true. Evaluate each statement
according to the following key and mark the appropriate
space on your answer sheet.

Ml
1.
2.

3.
4.

5.

True: The data alone are sufficient to show that
the statement is true.
Probably true: The data indicate that the statement
is probably true, that it is logical on the basis of
the data but the data are not sufficient to say that
it is definitely true.
Insufficient evidence: There are no data to indicate
whether there is any degree of truth or falsity in
the -statement.
Probably false: The data indicate that the statement
is probably false, that is, it is not logical on the
basis of the data but the data are not sufficient to
say that it is definitely false.
False: The data alone are sufficient to show that
the statement is false.

396
Items 83 through 102 refer to the following graph.
Use the key above to answer the items. The lizard is con
sidered to be cold blooded, the others warm blooded.

40°

External temperature

grade

83.

The body temperature of the cat varies more than the
body temperature of the ant eater.

84.

When the external temperature is 50° C ., the tempera­
ture of the lizard is also 30° C.

85.

At an external temperature of 50° 0., the temperature
of the cat is 50° C.

86.

When the external temperature is 50° C., the tempera­
ture of the ant eater would be higher than the temp­
erature of the cat.

87.

The temperature of a mouse would be about half way
between that of the cat and the ant eater.

88.

At no time during the experiment did any of the animals
have the same body temperature.

89.

There is a close correlation between the body tempera­
ture of the lizard and that of the external environment

90.

The body temperature of the cat showed the least varia­
tion in temperature during the experimental period.

91.

At 20 degrees below 0° C. the lizard would be frozen.

92.

If the temperature of other cold blooded animals were
plotted it would resemble that of the lizard.

397
Items 93 through 96 are a re-evaluation of some of
the items 83 through 92. Re-read items 93, 94, 95 and 96
and determine whether they are generalizations, extensions
of the data, explanations of the data or merely restatements
of the data, etc. Answer each according to the following
key:

Key
1.
2.
3.
4.
5.

A generalization, that is the data says it is true
for this situation; a generalization says it is true
for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of cause and
effect.
A restatement of results.
None of the above.

93.

Item 84

94.

Item 86

95*

Item 90

96.

Item 92

This phase of the test Is designed to measure your
understanding of assumptions underlying conclusions.
A
conclusion is given.
(This conclusion is not necessarily
justified by the data)
The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key.
These items all relate to the
data presented for items 83 through 92.

MX
1.
2.
3.
4.
5.

An assumption which must be made to make the conclu­
sion valid (true).
An assumption which if made would make the conclusion
false.
An assumption which has no relation to the validity
(truth) of the conclusion.
Not an assumption; a restatement of fact.
Not an assumption; a conclusion.

Conclusion I: Warmblooded animals have some type of heat
regulating mechanism.
97.

It is possible for animals to have some type of heat
regulating mechanism.

98.

The opossum had a lower body temperature than the cat.

Conclusion II:; Ant eaters and duckbills a r e .more closely
related than ant eaters and cats..

398
99.

Similarity of reaction of living things indicate a
relationship.

100.

The temperature of the ant eater varied more with
the external temperature than did that of the cat.

101.

The degree of closeness of similarity of response of
living things runs parallel with the closeness of
kinship.

102.

The temperature of the cat varied less than that of
the ant eater and duckbill with change of temperature.

This portion of the test was designed to measure your
ability to interpret data. Following the data you will find
a number of statements.
You are to assume that the data as
presented are true. Evaluate each statement according to
the following key and mark the appropriate space on your
answer sheet.

Keg
1.
2.

3.

4.

5.

True:
The data alone are sufficient to show that the
statement is true.
Probably true:
The data indicate that the statement
is probably true, that it is logical on the basis of
the data but the data are not sufficient to say that
it is definitely true.
Insufficient evidence:
There are no data to indicate
whether there is any degree of truth or falsity in
the statement.
Probably false:
The data indicate that the statement
is probably false, that is, it is not logical on the
basis of the data but the data are not sufficient to
say that it is definitely false.
False:
The data alone are sufficient to show that the
statement is false.

Analyses were made of the Vitamin C content of red
ripe and green tomatoes as soon as they were picked.
Mature
green tomatoes were stored at the temperatures indicated in
the following table.
Those which had ripened by the end of
the first week were analyzed for their Vitamin 0 content;
those ripened at the end of the second week were analyzed
at the end of the second week, etc.
In addition some mature
green tomatoes were analyzed each week.

399
Condition
when taken
from field
mature green
red ripe
mature green
mature green
mature green
mature green
mature green
mature green
mature green
mature green

No. of
Temp, when weeks
stored
stored

Stage of
ripeness
when analyzed

not stored
not stored
70°F.
70°F.
70°F.
80°F.
80°F.
- 80°F.
70°F.
70°F.

mature green
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
red ripe
mature green
mature green

0
0
1
2
3
1
2
3
1
2

Vitamin G
mg/100 grams
15.0
16.2
14.4
12.9
8.2
14.0
9.8
7.1
10.0
7.2

103.

Tomatoes ripened at 90°G. would have less Vitamin G
after three weeks than those stored at 80°F.

104.

Tomatoes could not he stored at 90°F. because at this
high a temperature they would rot or spoil.

105.

The lower the temperature at which tomatoes are stored
the less is the breakdown of Vitamin 0.

106.

Heat causes a breakdown of the Vitamin C molecule.

107-

After four weeks of storage tomatoes stored at 70°F.
would contain less than 7 mg/100 grams of Vitamin 0.

108.

Some mature green tomatoes ripen in storage within a
week.

109.

The green tomatoes which did not ripen in a week had
lost about the same amount of Vitamin G as those which
ripened during the week.

110.

Vitamin C is a stable substance.

111.

Vitamin G is manufactured some place else in the plant
than in the fruit (tomato) and is stored in the fruit.

Items 112 through 115 are a re-evaluation of some of
the items 103 - 111. Re-read items 105 , 106, 107 and 108
and determine whether they are generalizations, extensions
of the data, explanations of the data or merely restatements
of the data, etc. Each of these items is to be answered
according to the following key:

400

Key
1.
2.
3.
4.
5.

A generalization, that is the data says it is true
for this situation, a generalization says it is true
for all similar situations.
The data indicates a trend which if continued in
either direction would make the statement true.
An explanation of the data in terms of causes and
effect.
A restatement of results.
None of the above.

112.

Item 105

113.

Item 106

114.

Item 107

115.

Item 108

This phase of the test is designed to measure your
understanding of assumptions underlying conclusions. A
conclusion is given.
(This conclusion is not necessarily
Justified by the data) The statements which follow the
conclusion are the items which are to be evaluated accord­
ing to the following key. These items will relate to the
data presented for items 103 through 111.
Key

1.

An assumption which must be made to make the conclu­
sion valid (true).
2. An assumption which if made would make the conclusion
false.
3. An assumption which has no relation to the validity
(truth) of the conclusion.
4. Not an assumption;
a restatement of fact.
5. Not an assumption; a conclusion.
Gonclusion I; Sunlight causes an increase in the Vitamin
C content of tomatoes as they ripen on the vine.
116.

The tomatoes which were analyzed when green ripe would
have contained more Vitamin G if they had beenallowed
to ripen on the vine.

117.

The test used to measure the amount of Vitamin G
accurately measures the amount.

118.

The Vitamin G content of ripe tomatoes on the vine was
higher than the Vitamin C content of the green ripe
tomatoes on the vines.

Conclusion IIs: Vitamin G breaks down spontaneously at room
temperature.

401
119.

Vitamin C reacts similarly in all plants in which
it is found.

120.

When the tomatoes were stored at room temperature
the Vitamin C content decreased.

121.

All vitamins react similarly to storage at room
temperature.

This portion of the test is designed to test your
ability to organize data. Select from the key below the
curve which best fits the data. If none of the curves
fit the data mark space five on your answer sheet. The
curves need not have the same amount of slope as the curves
presented in the key. Use scratch paper if you wish.

5.

122.

The horizontal axis represents the time in hours
after the injection of sugar into the blood; the
vertical axis is the amount of sugar in the blood.
Time after in .lection
1
3

6

123.

Blood sugar
35
12

8

The horizontal axis represents age in years. The
vertical axis is the percent Increase in the weight
of the ovaries and other female sex organs from
birth to 20 years.
Age
4
10
14
18

124.

None of
the curves

Percent increase

8
12
20
80

The horizontal axis represents time in days; the
vertical axis is the number of yeast cells in
millions (starting with 100 yeast cells).

402
Time In days
4

125.

Number of yeast cells
In millions_______
25

8

150

12
20

390
400

The horizontal axis represents the amount of thyroproteln fed dally to cows. The vertical axis repre­
sents the percent increase In milk production.
Thyroprotein fed
.15 grams
.20 grams
.24 grams
.30 grams

Percent Increase
18
23
27
33

APPENDIX IV

RATING SCALE FOR ABILITY TO USE SCIENTIFIC METHOD
Person Rated
Directions!

Rater

Date

Will you please rate the person whose name appears above on the two
following characteristics. The two extremes of these characteristics
are described. Place a cross (X) on the line indicating your judgment
of the individual with respect to the qualities in question.

1. Ability to evaluate and devise experiments
Very superior
Superior
Average
High degree of ability:
Includes control factors, controls all
but one variable, understands problem
and devises experiments to test hy­
pothesis. Can devise experiments
which will yield results, recognizes
problems inherent in the experiment,
and has an understanding of what is
happening in the experiment.

Inferior
Very inferior
Low degree of ability:
Experiments lack control or
control is faulty, experiment
unrelated to hypothesis. Student
does not understand the experi­
mental set-up, or the problems
inherent in the experiment.

2. Ability to interpret data (ability to form hypotheses and draw conclusions)
»
Very superior
Superior
Average
High degree of ability:
Is able to make logical inferences from
data, takes pertinent facts into con­
sideration, applies previous knowledge
to the new situation, is able to see
relationships, especially cause and effect
relationships. Knows what evidence for
is inference is, and why it is evidence.

Inferior
Very inferior
Low degree of ability:
Is unable to make logical infer­
ences from data, does not dif­
ferentiate between relevant and
irrelevant data or between
critical and non-critical data,
is anable to see relationships.