A CLINICAL INVESTIGATION OF
COLLEGE STUDENTS’ RELIANCE UPON .
THE HEURISTICS 0F AVAILABILITY
AND REPRESENTATIVENESS IN
ESTIMATING THE LIKELII-IOOD
OFIPROBABILISTIC EVENTS: "

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
' 2". MICHAEL SHAUGHNESSY
T976 ‘

  
   
    

  

t.

IIIIIIIIIZIIIIIIIIIIIILIIIIIIIIIIIIIIIIIIIII LIBRARY
Michigan Stan:
Univcnity

This is to certify that the

thesis entitled

A CLINICAL INVESTIGATION OF COLLEGE STUDENTS'
RELIANCE UPON THE HEURISTICS OF AVAILABILITY
AND REPRESENTATIVENESS IN ESTIMATING THE
LIKELIHOOD OF PROBABILISTIC EVENTS

presented by
J. Michael Shaughnessy
has been accepted towards fulfillment

of the requirements for

Pho Do degree in Mathematics

 

  

I? -
ﬂ/ 1/ . 7/
L" ’ at» F

Major profeser

 

/,> a, M,
Date '45”; /¢, f" ‘4’

0-7 639

 

 
   
  
   

 
  

DINO!“ BY

DIG G SOIS'
IOOK BIIDEIY ‘-‘~.

.Iaunv muons ‘5‘

.Ol-Rnlf HIOIIQI I

 

 

 

 

tl/V** "

ABSTRACT

A CLINICAL INVESTIGATION OF COLLEGE STUDENTS'
RELIANCE UPON THE HEURISTICS OF AVAILABILITY
AND REPRESENTATIVENESS IN ESTIMATING THE
LIKELIHOOD OF PROBABILISTIC EVENTS

BY
J. Michael Shaughnessy

The purpose of this study was to investigate college
students' reliance upon availability and representativeness
in estimating the likelihood of events. An experimental
activity-based course in elementary probability and statis-
tics was developed. Groups of college students who took
the activity-based course were compared to groups who took
a lecture—based course for their relative success in over—
coming reliance upon the heuristics of availability and
representativeness.

The subjects involved in the study were 85 undergraduate
students who had enrolled in a finite mathematics course at
Michigan State University. Four class sections were randomly
chosen and two each were randomly assigned to either the
eXperimental activity-based course or a lecture-based course.
The materials for the activity-based course had been piloted
during the quarter preceding the main study and had been re-

vised as a result of the pilot study.

J. Midhael Shaughnessy

The subjects were pretested and posttested for their
reliance upon the heuristics of availability and represent-
ativeness and for their knowledge of elementary probability
concepts. The instruments used had been piloted prior to
the main study and contained a probability concept subscale,
an availability subscale, and a representativeness subscale.
The data was analyzed by t-tests (a = .05) with the individ-
ual as the unit of analysis on the pretest and class section
as the unit of analysis on the posttest.

The pretest analysis indicated that there were no sig-
nificant differences between the groups on any of the three
subscales prior to a formal course in probability. A sig-
nificant difference was found between the activity—based
groups and the lecture-based groups on the representativeness
subscale on the posttest. The experimental activity-based
groups scored significantly higher on the representativeness
subscale on the posttest than the lecture-based groups.
There was a tendency for the experimental groups to score
higher on the availability subscale on the posttest, but the
difference was not significant at the .05 level. There was
no significant difference between the two groups on the
probability subscale on the posttest. The activity-based
groups attained significantly higher mean gain scores than
the lecture—based groups on both the availability and repre-

sentativeness subscales.

J. Michael Saughnessy

The author concluded that course methodology appears
to be an important factor in helping college students to
replace heuristic principles with probability theory when
making estimates for the likelihood of events. Learning
elementary probability concepts may not be sufficient to
overcome reliance upon the heuristics of availability and

representativeness.

A CLINICAL INVESTIGATION OF COLLEGE STUDENTS'
RELIANCE UPON THE HEURISTICS OF AVAILABILITY
AND REPRESENTATIVENESS IN ESTIMATING THE
LIKELIHOOD OF PROBABILISTIC EVENTS

BY

J. Michael Shaughnessy

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Mathematics

1976

DED I CATI ON

To Joan. We really did it!

ii

ACKNOWLEDGMENTS

I would like to thank my Committee Chairman, Professor
‘William M. Fitzgerald, and the members of the committee,
Professor Peter Lappan, Professor John Masterson, Professor
Bruce Mitchell, and Professor John wagner, for their support
and constructive comments during this investigation.

The patience and encouragement of my wife Jean were
greatly appreciated throughout my studies and especially
during the writing of this thesis.

The assistance of Joseph'Wisenbaker with the analysis
of data, and the assistance of A1 Stickney who taught one
of the experimental sections, were essential to the develop-
ment of this study and were most helpful.

A special thanks to Jill Hagan is merited for a superb

typing job and patience in deciphering my difficult scrawl.

iii

TABLE OF CONTENTS

LIST OF TABLES

Chapter

I.

II.

INTRODUCTION AND DEFINITION OF THE PROBLEM.
Definition of Heuristic . . . . . . . . . .
The Representativeness Heuristic

The Availability Heuristic . . . . . .
Introduction to the Problem . . . . . . . . .
Purpose of the Study . . . . . . . . . .
Importance of the Study . . . . . . . . . .

i) Consequences of the Misuse of
Probability and Statistics . .

ii) Subjective Probability and
Mathematical Probability:
A Modeling Point of View . . . . .

iii) Psychological Theory and
Educational Practice . . . . . . .

A Summary of the Procedure . . . . . . . .
Overview of the Organization of the Study . .

REVIEW OF THE LITERATURE RELATED TO THE STUDY
Introduction . . . . . . . . . . . . . . .

The Use of Heuristics to Estimate
Probability . . . . . . . . . . . . . .

Models of Human Judgment and Decision
Maki'rlg O O O O O I O O O O I O O O O O O O

The Development of the Probability Concept
in Ybung Children and Adolescents . . .

The Pilot Study . . . . . . . . . . . . . .

iv

Page

vii

\DmO‘WNl-‘H

10

14
16
18

20
20

20

35

46
64

III.

IV.

A DESCRIPTION OF THE DESIGN OF THE STUDY .

The Experimental Course . . . . . . . .
A Description of the Control Course

Comparison of the Experimental and
Control Courses . . . . . . . .

Rationale for the Experimental Course

Subjects . . . . . . . . . . . . . . . .
Procedure . . . . . . . . . . . . . . .
Measures . . . . . . . . . . . . . . .
Hypotheses . . . . . . . . . . . . . . .
Method of Analysis . . . . . .

Summary . . . . . . . . . . . . . . . .

ANALYSIS OF THE RESULTS OF THE STUDY . .

Part I: Report on the Experimental Course

Activities . . . . . . . . . . . . .
Misuses of Statistics . . . . . .
Experimental Course Evaluation Forms

Part II: Analysis of the Statistical
Results . . . . . . . . .

Introduction . . . . . . . . . . . .

Comparisons Between the Experimental
and Control Groups on the Four Scales

Notation . . . . . . . . . . . . . .
Reliability . . . . . . . . . . . . .
Individual Item Statistics . . . . .
Correlation Matrices . . .

SUMMARY, CONCLUSIONS, AND DISCUSSION .
Summary . . . . . . . . . . . . . . . .
Limitations . . . . . . . . . .

Results of the Hypothesis Testing . . .
Conclusions and Discussion . . . . . .
Implications for Future Research .

BIBLIOGRAPHY . . . . . . . . .

77
77
87

88
91
93
95
100
103
104
105

107
107
107
123
126

129
129

129
130
130
139
196

205
205
207
208
209
215

218

APPENDICES . . .

A.

B.

OUTLINE OF DAILY PLAN FOR THE

EXPERIMENTAL COURSE

ACTIVITIES, PROBLEMS, AND NOTES
TO THE INSTRUCTOR

COURSE OUTLINE FOR THE

CONTROL GROUPS
THE INSTRUMENTS

vi

224

224

229

266
268

Table

3

4.

4.

4.

4.

4.

4.

4.

4.

.1

1A

1B

1C

1D

2

2A

2B

2C

.2D

.3

LIST OF TABLES

Order of Topics in the Experimental

and Control Courses . . . . . . . . . .
Number and Sex of Subjects . . . . . . . . .
Class Level and Major Field . . . . . . . . .
Previous Mathematics Course Wbrk . . . . . .

Scale Means and Standard Deviations
for the Four Groups on the Pretest . . . . .

t-Test Results for Pretest Scale Tbtal

t—Test Results for Pretest
Scale Probability . . . . . . . . . . . . . .

t—Test Results for Pretest
Scale Availability . . . . . . . . . . .

t-Test Results for Pretest
Scale Representativeness . . . . . . . . . .

Scale Means and Standard Deviations for
the Four Groups on the Posttest . . . . . .

t-Test Results for Posttest Scale Total

t-Test Results for Posttest
Scale Probability . . . . . . . . . . . . .

t-Test Results for Posttest
Scale Availability . . . . . . . . .

t-Test Results for Posttest
Scale Representativeness . . . . . . . . . .

Mean Gain Scores on the Availability
and Representativeness Scales . . . . . . . .

vii

Page

89
96
96

97

131
132

132

132

133

134
135

135

135

136

137

4.3A t—Test Results on Pre-post Gain Scores

on the Availability Scale . . . . . . . . . . 137
4.33 t—Test Results on Pre-post Gain Scores

on the Representativeness Scale . . . . . . . 138
4.4A Group Results on Item R1 . . . . . . . . . . 140

4.43 t-Test on Posttest Item R 141

l 0 O O O I O O O O
4.5A Group Results on Item R2 . . . . . . . . . . 142

4.5B t-Test on Posttest Item R2 . . . . . . . . 143
4.6A Group Results for Item R3 . . . . . . . . . 144
4.68 t-Test on Posttest Item R 145

3 o O O O O O O O
4.7A Group Results on Item R4 . . . . . . . . . . 146

4.7B t-Test on Posttest Item R 147

4 0 O O O 0 O O O
4.8A Group Results on Item R5 . . . . . . . . . . 148
4.8B t-Test on Posttest Item R5 . . . . . . . . . 148
4.9A Group Results on Item R6 . . . . . . . . . . 150

4.93 t-Test on Posttest Item R 151

6 O I O 0 O O O O
4.1OA Group Results on Pretest Item N1 . . . . . . 152
4.103 Group Results on Posttest Item N1 . . . . . 153

4.lOC t-Test on Posttest Item N 153

1 . . . . . . . .
4.11A Group Results on Item N2 . . . . . . . . . . 154
4.113 t-Test on Posttest Item N2 . . . . . . . . . 154
4.12A Group Results on Item N3 . . . . . . . . . . 156
4.123 t-Test on Posttest Item N3 . . . . . . . . . 156

4.13A Group Results on Item N4 . . . . . . . . . . 158

4.138 t—Test on Posttest Item N4 . . . . . . . 158
4.14A Group Results on Item Al . . . . . . . . . . 161
4.143 t-Test on Posttest Item A1 . . . . . . . . . 162

viii

.15A Group Results on Item A2 . . . . . . . . . . 164

.158 t-Test on Posttest Item A2 . . . . . . . . 165
.16 Group Results on Pretest Item N5 . . . . . . 167
.17A Group Results on Item A3 . . . . . . . . . . 168
.17B t-Test on Posttest Item A3 . . . . . . . . . 169
.18A Group Results on Item A.4 . . . . . . . . . . 170
.18B t-Test on Posttest Item A4 . . . . . . . . 171
.19A Group Results on Item P1 . . . . . . . . . . 172
.l9B t—Test on Posttest Item P1 . . . . . . . . . 172
.20A Group Results on Item P2 . . . . . . . . . . 174
.20B t—Test on Posttest Item P2 . . . . . . . . 174
.21A, Group Results on Pretest Item P3 . . . . . . 176
.213 Group Results on Posttest Item P 177

3 o o o 0
.21C t-Test on Posttest Item P3 . . . . . . . . . 177

.22A Group Results on Posttest Item P4 . . . . . 179
.22B t-Test on Posttest Item P4 . . . . . . . . . 179
.23A Group Results on Posttest Item P5 . . . 181
.23B t-Test on Posttest Item P5 . . . . . . . . . 181
.23 Group Results on Pretest Item P6 . . . . 183
.24 Group Results on Pretest Item P7 . . . . 184
.25 Group results on Pretest Item P8 . . . . . . 185
.26 Group Results on Pretest Item P9 . . . . . 187
.27 Group Results on Pretest Item Plo . . 188
.28 Group Results on Pretest Item P11 . . 189
.29 Group Results on Pretest Item P12 . . . 190
.30 Group Results on Pretest Item Pl3 . 192

ix

4.31
4.32

4.33

4.34

4.35

Group Results on Pretest Item P14

Scale-to-Scale Correlation Matrix .

Availability Item-to-Scale
Correlation Matrix . . . . . . . .

Representativeness Item-to-Scale
Correlation Matrix . . . . . . . .

Probability Item-to-Scale
Correlation Matrix . . . . . . . .

Representativeness Item-to-Item
Correlation Matrix . . . . . . . .

Availability Item-to-Item
Correlation Matrix . . . . . . . .

Availability and Representativeness
Inter-Item Correlation Matrix .

194

196

197

198

199

200

202

203

CHAPTER I
INTRODUCTION AND DEFINITION OF THE PROBLEM

In a series of studies, Daniel Kahneman and Amos Tversky
have found evidence in support of the hypothesis that human
beings rely upon certain specific principles when they are
asked to estimate the probability of complex events, to
predict the likelihood of outcomes, or to make judgments
under uncertainty (Tversky and Kahneman, 1971, 1973;

Kahneman and TVersky, 1972, 1973, 1974). Kahneman and
Tversky call their principles “heuristics". These heuristic
principles often lead to incorrect or biased estimates of

the likelihood of events.

Definition of Heuristic

A heuristic can be defined as a principle by which an
individual reduces a complex task to a simple one. In the
present study, a heuristic is a principle by which an in-
dividual reduces the complex task of assessing likelihood
or predicting outcomes to a simple judgment.

Two heuristics which were isolated and studied by
Kahneman and Tversky are the representativeness heuristic
and the availability heuristic. According to Kahneman
and Tversky, these two heuristic principles enable

human beings to decode complex probabilistic situations.

1

The Representativeness Heuristic

According to the representativeness heuristic, subjects
will make decisions about the relative likelihood of events
based upon how representative an event is of the distribution
of the parent population, or of the process by which the
outcomes are generated (Kahneman and Tversky, 1972).

For example, a long string of heads in tossing a coin
does not appear to be representative of the random process
of throwing a coin. Subjects who were employing the repre—
sentativeness heuristic would tend to believe that tails
will be more likely than heads on a subsequent toss, even
though the tosses are independent of each other. Similarly,
a subject using the representativeness heuristic would judge
the outcome "two heads and two tails" in flipping four coins
to occur with a probability of 1/2. The event "two heads
and two tails" appears to be representative of the distri—
bution of heads in the parent population of outcomes for
flipping one coin once.

Kahneman and Tversky (1974) mention that the repre-
sentativeness heuristic can be shown to account for falla-

cies in prediction that arise from:

i) insensitivity to prior probabilities and dis-
regard for population proportions

ii) insensitivity to the effects of sample size
on predictive accuracy

iii) unwarranted confidence in a prediction that
is based upon invalid input data

iv) misconceptions of chance, such as the gambler's
fallacy

v) misconceptions about the tendancy for data to
regress to the mean

When making predictions, subjects tended to ignore
sample size, population base-rate data, and the validity
of input information. On the other hand, according to
Kahneman and Tversky, subjects g9 make predictions for the
likelihood of events based upon how well the events reflect
the distribution of the parent population or the process by
which the outcomes are generated.

"People view chance as unpredictable but essenti-
ally fair. Thus, they expect that in a purely random
allocation of marbles each child will get approximately
(though not exactly) the same number of marbles.
Similarly, they expect even short sequences of coin
tosses to include about the same number of heads and
tails. More generally, a representative sample is
one in which the essential characteristics of the
parent population are represented not only globally
in the entire sample, but also locally in eadh of
its parts." (Kahneman and Tversky, 1972; 435)

The Availability Heuristic

According to the availability heuristic, subjects will
make decisions about the relative likelihood of events
based upon the ease with which instances of that event can
be constructed or called to mind (Tversky and Kahneman, 1973).

For example, if asked whether there are more distinct
three person committees or more distinct nine person commit-
tees that can be formed from a group of twelve people, sub—

jects who employ the availability heuristic will tend to

favor the three person committees. It is easier to call to

mind more examples of three person committees than nine
person committees, even though the number of distinct
nine person committees is the same as the number of dis-
tinct three person committees.

Kahneman and Tversky claim that the availability
heuristic causes systematic bias in probability estimates
because subjects will tend to believe that those outcomes
which can easily be brought to mind, will also be more
likely to occur (Tversky and Kahneman, 1973). If a subject
is asked to estimate the divorce rate in his city, or to
estimate the probability of being involved in an automobile
accident, the frequency of his own personal contact with
these events may lead to bias in the estimates that he
gives for the likelihood of these events.

"In general, availability is a useful clue for

assessing frequency or probability, because in-

stances of large classes are recalled better and
faster than instances of less frequent classes.
waever, availability is also affected by other
factors besides frequency and probability. Con-
sequently, the reliance on availability leads to
predictable biases... ." (Tversky and Kahneman,

1974; 1128)

In summary, then, Kahneman and Tversky claim that
humans often do not apply the theory of mathematical prob-
ability in estimating the likelihood of events, nor do
.humans conform to the laws of statistical decision theory
when making predictions. Instead, human subjects tend to
employ principles such as the representativeness heuristic

and the availability heuristic when they are asked to make

subjective probability estimates.

Most of the subjects involved in this series of
studies by Kahneman and Tversky were combinatorially naive
college students with no prior training in probability or
statistics. It is not surprising that these subjects util-
ized such heuristics as representativeness and availability
in their predictions of the likelihood of events. waever,
Kahneman and Tversky also found that trained psychologists,
who had had a substantial background in probability and
statistics, were subject to the same types of bias and
fallacies as the combinatorially naive college students.
(Tversky and Kahneman, 1971; Kahneman and TVersky, 1973).
Evidently, exposure to the theory of probability and sta-
tistics is not necessarily sufficient to overcome the biases
induced by the availability heuristic and the representative-
ness heuristic. Kahneman and Tversky found such strong and
widespread evidence for the use of these heuristics, that
they suggest that misconceptions of probability and statis—
tics that arise from the use of these heuristics may be
extremely difficult, if not impossible, to overcome.

"Corrective experiences are those that provide

neither motive nor opportunity for spurious ex-

planation. Thus, a student in a statistics course

may draw repeated samples of a given size from a

population, and learn the effect of sample size

on sampling variability from personal observation.

We are far from certain, however, that expectations

can be corrected in this manner, since related biases,

such as the gambler‘s fallacy, survive considerable
contradictory evidence.

Even if bias cannot be unlearned, students can
learn to recognize its existence and take the nec-
essary precautions. Since the teadhing of statistics

is not short on admonitions, a warning about
biased statistical intuitions may not be out
of place. The obvious precaution is computa-
tion." (Tversky and Kahneman, 1971; 109-110)

"we surely do not mean to imply that man is in-

capable of appreciating the impact of sample

size on sampling variance. PeOple can be taught

the correct rule, perhaps even with little diffi—

culty. The point remains that people do not fol—

low the correct rule, when left to their own

devices. Furthermore, the study of the conduct

of research psychologists (Cohen, 1962; TVersky

and Kahneman, 1971) reveals that a strong tendency

to underestimate the impact of sample size lingers

on deSpite knowledge of the correct rule and ex-

tensive statistical training. For anyone who would

wish to view man as a reasonable intuitive statis-

tician, such results are discouraging." (Kahneman

and Tversky, 1972; 444-445)
Introduction to the Problem

Kahneman and Tyersky are psychologists who did their
research in the spirit of developing a model of how human
beings make judgements and decisions. The results of their
work have led them to conclude that neither a Bayesian model
(Edwards, 1968) nor a regression model (Hoffman, 1968) is
sufficient to describe the human decision making process
under conditions of uncertainty. Neither of these models
account for the use of heuristic principles, such as re-
presentativeness and availability, which human subjects
were found to employ. A detailed discussion of models of
human judgment will be presented in chapter two.

The present study is interested in the research of
Kahneman and TVersky from the viewpoint of mathematics ed—

ucation. The psychological investigations of Kahneman and

Tyersky, together with other studies (Cohen and Hansel,
1956; Cohen, 1957; Bruner, Goodnow, and Austin, 1956;
Edwards, 1968), have diagnosed some misconceptions that
human subjects have about how probability and statistics
work. These studies point out a set of entering charac-
teristics that can be observed in students who are about
to take an introductory course in probability and statistics.
The evidence from the studies above indicates that
students come to such a course with a set of misconceptions
about probability and statistics, and with a set of heuristic
principles that can propagate and maintain their misconcep-
tions of probability and statistics. Prior to any formal
training in probability, students have had exPerience in
and have dealt almost exclusively with "subjective proba-
bility". Suddenly, in their formal course work, they are
confronted with a completely mathematized model of "statis—
tical probability“ (Carnap, 1953). From the viewpoint of
mathematics educatiOn, the problem that arises can be stated

as follows:

How should elementary probability and statistics
be taught so as to maximize the students' chances of
overcoming their misconceptions of probability and
statistics?

Given that students rely upon such heuristic
principles as representativeness and availability
to make estimates of the likelihood of events, what
is the best way to teach elementary probability so
that a student would learn to rely upon probability
theory in making estimates of the likelihood of events
rather than relying on heuristic principles which may
bias his estimates?

Purpose of the Study

The purpose of this study was to:

1. Develop an activity-based experimental course
in introductory probability and statistics at
the undergraduate level.

2. Examine the effects of an activity-based course
in probability and statistics upon college stu-
dents' use of the availability and representa-
tiveness heuristics in making estimates for the
likelihood of events.

3. Compare groups of college students who took the
experimental course to groups who took a lecture-
based course in probability in order to test the
relative effectiveness of these two approaches
in helping college students to overcome their
reliance upon the representativeness and availa-
bility heuristics when estimating the likelihood
of events.

In order to accomplish these three goals, the study

was done in two parts: a pilot study and a main
study.

In the pilot study, activities for the experimental
course were developed and taught. Instruments were devised
by the experimenter to measure reliance upon the availability
heuristic and the representativeness heuristic in estimating
the likelihood of events. Revisions of the activities and
the instruments were made as a result of the pilot study.

The main study compared two approaches to teaching
elementary probability, the experimental activity-based
course and a lecture-based course, and examined their rela-
tive effectiveness in helping students to overcome reliance

upon the heuristics of availability and representativeness

when making estimates for the likelihood of events.

Importance of the Study
There are three main factors that provided a rationale
for this study:
i) the importance of a basic reading knowledge
of elementary probability and statistics, and
the consequences of the misuses of probability
and statistics
ii) the distinction between subjective probability
and mathematical probability, and the need for
teaching elementary probability from a modeling
point of view
iii) the importance of anchoring educational research
and educational practice in psychological theory
Consequences of the Misuse of Probability and Statistics
The importance of teaching elementary probability and
statistics in the schools has been made clear by many authors.
The National Council of Teachers of Mathematics (1940, 1959),
Wilks (1958), Page (1959), Pieters and Kinsella (1959), The
College Entrance Examination Board (1959), and The Cambridge
Conference on School Mathematics (1963) have all recommended
that topics in probability and statistics be a part of every
students' school experience. The arguments for literacy
in probability and statistics that these authors have made
will not be repeated here.
Darrell Huff sums up the consequences of the misuses
of probability and statistics in his little book How to Lie
with Statistics.
"So it is with much that you read and hear. Averages
and relationships and trends and graphs are not al-

ways what they seem. There may be more in them than
meets the eye, and there may be a good deal less.

10

The secret language of statistics, so appealing
in a fact-minded culture, is employed to sensation-
alize, inflate, confuse, and oversimplify. Statis-
tical methods and statistical terms are necessary
in reporting the mass data of social and economic
trends, business conditions, 'opinion' polls, census.
But without writers who use the words with honesty
and understanding and readers who know what they
mean, the result can only be semantic nonsense."
(Huff, 1954; 8)

Exposure to the misuses of statistics and experiences
that enable students to confront their own misconceptions
about probability would seem to be essential to any intro-
ductory course in probability and statistics. Such misuses
and misconceptions affect the human decision making process,
which in turn affects the course of human lives. Material
on the misuses of statistics andactivites designed to enable
students to come to grips with some of their own probabilistic
misconceptions form the basis for the experimental course in

introductory probability that was developed in this study.

Subjective Probability and Mathematical Probability: ‘5
Modeling Point of View

In an attempt to define probability, Rudolph Carnap
points out the distinction between subjective probability
and mathematical probability.

"Most scientists will define it (probability) as sta-
tistical probability, which means the relative fre-
quency of a given kind of event or phenomena within

a given class of phenomena, usually called the
'population'. ...But you will find that there are
scientists who define probability in another way.
They prefer to use the term in a sense nearer to
everyday use, in which it means a measurement, based
on the available evidence, of the chances that some—
thing is true. ... This concept is called inductive
(or subjective) probability.... Statistical probability

11

characterizes an objective situation, e.g., a state
of a physical, biological, or social system. On
the other hand, inductive probability, as I see it,
does not occur in scientific statements but only

in judgments about such statements.“ (Carnap, 1953;
123)

The phenomemon of subjective probability suggests that
students enter elementary probability and statistics courses
with a set of preconceived notions of how probability works.
The experimental course developed in this study emphasized
the framework of mathematical model-building as a vehicle
to promote a gradual transition from subjective probability
to mathematical probability. Thus, rather than beginning
with the laws of probability and then attempting to apply
these laws to specific problems, the exPerimental course
begins with specific problems and encourages the students
to gradually build and constantly modify their own laws of
probability.

There are two reasons for a modeling approach in a
course in elementary probability and statistics. First,
the transition from preconceptions of probability to proba—
bilistic laws could be greatly facilitated if students are
encouraged to see probability as a process of describing
observed phenomena more and more accurately, rather than
as a system of rules, axioms, and techniques which one
attempts to apply to problems. Second, the process of
building models gets students involved in a part of applied

mathematics that is sorely neglected in low level mathematics

12

courses. Henry Pollak's comments on the subject of appli-
cations indicate the need for a model-building approach.

"All too often, our teaching has failed to
present this open—ended and constructive nature
of both pure and applied mathematics. ‘we most
always say to the student 'Here is a theorem.
Prove it.‘ and say very rarely 'Here is a situ-
ation. Think about it. Find out what the problem
should be, or what the theorem is that you ought
to be trying to prove.‘ Such a radical improvement
in pedagogy and student involvement will help the
teaching of mathematics from many angles, not just
the problem of applications." ... "Instead, the
mathematization has become so familiar to the
teacher that he forgets all about this part and
begins immediately with the mathematical model,
all built and ready to go. This deprives the
student of the essential experience of partici-
pating in the model building - and, incidentally,
tends to ossify the mathematical formulation."
(Pollak, 1968; 25-26)

Pollak is not alone in this opinion. There is con-
siderable support among mathematicians, mathematics edu-
cators, and operations research eXperts for elementary
courses taught from a modeling point of view. Thompson
(1974) suggests that students should be given a chance to
experience the process of doing applied mathematics. Ele-
mentary courses should, according to Thompson, spend time
encouraging students to identify precise problems from only
partially understood situations. Freudenthal (1968) points
out that teachers do not know how to teach mathematics so
that it will be useful because they have not experienced
the subject as a mathematization of reality, and consequently,
students do not get a chance to learn how to apply mathematics.

The secondary schools are virtually devoid of experiences in

13

the uses and applications of mathematics, as pointed out
by Fitzgerald (1975).

“In short, in spite of all the attention which
has been paid to the mathematics curriculum during
the past two decades, most of the mathematics teach-
ing occurring today in the schools in the United
States continues to be mechanistic, skill oriented,
and motivated principally by the supposed need for
those skills in the next mathematics class. The
results of these efforts can be seen when one looks
at the population which comes out the schools. With
the exception of a small percentage, most students
leave school with very little conceptual basis of
understanding of mathematics. They are not very
skillful at using mathematical ideas and have rather
negative attitudes about mathematics." (Fitzgerald,
1975: 40)

Klamkin (1968) complains that students trained in mathe-
matics who subsequently work in operations research are often
unable to solve problems. He makes a plea for the teaching
of mathematics so as to encourage students to think problems
through for themselves.

"I have thought for a long time that one of
the most important goals of education is to get
the students to 'think for themselves'. As I look
over the American education scene, it seems that
each year more and more material is being crowded
into the curriculum. The net result being that
most students hardly have any time to sit back and
think out various problems for themselves. Con-
sequently, most students will just parrot back the
material from their texts or from their classroom
notes. Or at best, students will get together to
independently work out their problems as a group.
This being the case, it is no wonder that when a
student or even a graduate is faced with a problem
that is not directly in the books, he will have
difficulties." (Klamkin, 1968; 131)

In a later paper, Klamkin (1971) provides an extensive
review of the literature on the education of industrial math-

ematicians, and then outlines a modeling approach to

14

elementary geometry to provide an example of how the model-
ing approach could be started in secondary school and con—
tinued throughout graduate education in mathematics.

There have been several attempts to write materials
that develop mathematical models in elementary courses.
Among these are: Mathematical Uses and Models in

Our gygryday Wbrld (Bell, 1972); Man and His Technology

 

(Piel and Truxal, 1973); Statistics: ,5 Guide 59 the Unknown
(Mosteller et. al., 1972); and §tatistics by_Example
(Mosteller et. al., 1973).

The experimental course developed in this study attempts
to integrate this emphasis on model building and on dealing
with problem situations, esposed above by Pollak (1968),
Fitzgerald (1975), and Klamkin (1968), with an activity-

based course methodology.

Psychological Theory and Educational Practice

This study attempts to formulate an approach to the
teaching of elementary probability and statistics that builds
upon the discoveries of paychological research. The work of
Kahneman and Tversky on bias in probabilistic estimation
provided an initial diagnosis of the preconceptions that a
student in an elementary probability and statistics course
is likely to have. It becomes the task of mathematics edu-
cation to prescribe learning experiences which meet the

needs of the student, as suggested by psychological theory.

15

This interplay among the psychology of learning, the
subject matter content, and the method of teaching the
content is precisely what Joseph Schwab advocates when he
speaks of “practical deliberation" (Schwab, 1969). Schwab
suggests that the interplay between psychological theory
and educational practice should be a continual process.

The diagnosis of Kahneman and Tversky may be incomplete.

A proposed project in mathematical curriculum development
may fall short of correcting the learning problems that are
brought out by psychological theory. The point is that

it is unlikely that the psychologist or the mathematics edu-
cator will get close to the truth by“working independently.

In a paper on the psychology of school subjects, Shulman
(1974) expounds on the advantages of co-operation between
psychologists and content specialists.

"All these assertions lead to the conclusion
that, whereas the psychology of school subjects is
undoubtedly deserving of immediate disinterment, its
future vitality will be predicated on its no longer
remaining the exclusive province of psychologists.

It must become the joint focus of subject-matter

experts and psychologists, if its study is to be

fruitfully pursued, and if useful theoretical
statements are to emerge from that research."

(Shulman, 1974; 330)

Shulman goes on to say that such co-operation can best
be obtained if researchers adapt a view of the teacher as
clinician. The implications for research are stated in
what follows.

"we must be prepared to broaden the range of
methods we employ in our research, as we reformu-

late the questions we propose to raise. Although
good experimental and correlational investigations

16

will continue to be useful, we need add more

varied kinds of studies - longitudinal case

studies, anthropological analysis of class-

rooms and teachers, information-processing

modelings of the thought processes of teachers

and learners using methods of controlled intro-

spection and retrospection, investigations of

basic phenomena, such as transfer, under con-

ditions of varying subject matter, to name but

a few. we should be prepared to treat our sub-

ject more clinically, both in terms of the

teacher and the investigator as clinicians."

(Shulman, 1974; 335)

The experimental course designed in this study specifies
the role of the instructor to be that of clinician. The
instructor in the experimental course diagnoses errors and
misconceptions that students encounter as they do the activ-
ities. He then suggests other problems for them to look at,
or asks them questions about their results, until they en-
counter sufficient dissonance with previous results to force
them to embark in a more profitable direction.

The role of the instructor in the experimental course
is, therefore, quite different than the usual role of con-
veyor of vast amounts of information that is assumed in many

courses at the undergraduate level.

Aimmnary of the Procedure

In the winter quarter of 1976 a pilot study was con-
ducted. An experimental activity-based course in elementary
probability and statistics was developed by the investigator
and taught to twenty-four undergraduate students at Michigan
State University. The subjects for the experimental course

had all enrolled in the same section of finite mathematics,

17

and were taught the experimental course instead of a lecture-
based course from Weiss and YOseloff's Finite Mathematics
which was taught in all the other sections. Students who
take the course in finite mathematics at Michigan State are
primarily freShmen business, agriculture, and horticulture
majors.

The experimental section and a section of the lecture-
based course in finite mathematics were both pretested and
posttested for knowledge of basic concepts in probability
and for reliance upon the heuristics of representativeness
and availability when giving estimates for the likelihood
of events. The results of the pilot study are reported in
chapter two of this thesis.

In the spring quarter of 1976, the main study was car-
ried out. TWO sections of finite mathematics were randomly
assigned to the experimental course and two lecture sections
were randomly chosen to serve as a control group. Altogether
there were 85 undergraduates enrolled in the four sections.
One of the experimental sections was taught by the investi-
gator. Three other instructors each taught one of the other
sections involved in the study. All four sections were pre-
tested and posttested for knowledge of some basic concepts
in probability and for reliance upon the heuristics of avail-
ability and representativeness. The instruments used were
devised by the investigator and had been revised as a result

of information gained in the pilot study. The instruments

18

that were used are presented in Appendix D of this study.

A representativeness subscale and an availability subscale
were constructed from the instruments. Comparisons between
the groups were made on the pretest scores and on the post-
test scores for each subscale, as well as for the whole
test. A comparison of the groups responses on each individ—
ual test item was also made. A detailed report of the de—
sign of the study can be found in chapter three of this

thesis.

Overview of the Organization of the Study

The study will be organized and presented in five parts.
This first section has presented the problem, the purpose
of the study, and a rationale for the study. Chapter two
contains a review of the literature of psychology and mathe-
matics education that is related to this study, and a report
on the pilot study in which the activities for the experi-
mental course and the instruments used in the main study were
developed. The third chapter discusses the design of the
study, including a detailed description of the experimental
course, a description of the course taught to the control
groups, specific statements of the hypotheses tested, and a
report on the method of data analysis.

Chapter four reports the results of the main study,
both descriptive results and statistical results. The fifth
chapter gives a summary of the findings of the study and a

discussion of the results. A complete outline of the

19

experimental course, as well as the activities and notes
that formed the basis of the experimental course, can be
found in the appendices. An outline of the course taught
to the control group is also included in the appendices,
as are the instruments that were devised by the investi-

gator and used in the study.

CHAPTER II

REVIEW OF LITERATURE RELATED TO THE STUDY

Introduction

The review of the literature related to this study
will be presented in four parts. The first section dis-
cusses the work of Kahneman and Tversky on the use of
heuristics to estimate the likelihood of events. Related
results of other investigators will be included in this
discussion.

The second section contains the literature on models
of human judgment and decision making.

The third section discusses literature related to the
development of the probability concept in elementary and
secondary school children.

In section fbur, the results of a pilot study on col-
lege students' use of heuristics to estimate the likelihood

of events are presented.

The Use of Heuristics to Estimate Probability

A series of studies that dealt with college students'
misconceptions of probability was carried out at the Univer—
sity of Jerusalem and at Oregon Research Institute at the

university of Oregon by two psychologists, Daniel Kahneman

20

21

and Amos Tyersky (1971, 1972, 1973, 1974). These studies
demonstrated that college students who are combinatorially
naive tend to rely on strategies that simplify complex prob—
abilistic situations when estimating the likelihood of events.
Two of these simplification strategies are called the repre-
sentiveness heuristic and the availability heuristic. These
two heuristics have been defined in chapter one of this

study and were discussed briefly in that chapter. A detailed
discussion of the work of Kahneman and Tversky will be pre-
sented in this chapter. The research of Kahneman and Tversky
concerning representativeness will be presented first, fol-
lowed by their work on the availability heuristic.

Tversky and Kahneman (1971) showed that even trained
scientists tend to have little regard for the effects of
sample size upon the validity of statistical results. When
asked how many subjects should be in a replication study to
test an experimentally significant result, researchers were
found to favor sample sizes smaller than in the original
study. The resulting loss of power in the statistical test
makes it only about half as likely that a significant re—
sult will be obtained. When confronted with non-significant
results from a simulated replication study, these same re-
searchers failed to pool the results of the initial study
and replication study together in order to obtain support
for their hypotheses. Rather, they attempted to "explain"
the non-significant replication results by some quirk in the

sample.

22

Actually, a large sample should be used in a replica-
tion study to insure the power of the statistical test and
minimize the chances of missing a statistically significant
result. Kahneman and Tversky explain this behavior of sci-
entific researchers by the representativeness heuristic.
Apparently the non-significant replication studies are viewed
separately from the results of the initial study, rather than
as part of it. Thus, evaluators tend to use each study as
representative of the same population, and they feel signif-
icant results should appear regardless of the sample size
of the study. The representativeness heuristic causes peOple
to believe that a "law of small numbers" holds, much like
the mathematical law of large numbers.

Tversky and Kahneman (1971) also mention that repre-
sentativeness may affect the way that people View random
sampling. A gambler tends to feel that even small devia—
tions from a 50-50 distributions of heads and tails in
sequence of coin tosses will be corrected with subsequent
tosses. Tversky and Kahneman found evidence for this gam-
bler's fallacy among college subjects. The subjects were
told that the theoretical mean of all 1.0. scores is 100.
They were then told that in a sample of 50 recorded 1.0.
scores, the first score was 150. The eXpected mean for
this sample of 50 scores is greater than 100 in light of
the evidence of the unusually high score of 150. However,
when subjects were asked to estimate the mean I.Q. score of

the sample of 50, there was a tendency to stick with 100 as

23

the expected mean of the sample. These college subjects
believed that the sample of 50 1.0. scores was representa-
tive of the entire population. They felt the remaining
scores in the sample should contain some very small entries
that would counter-balance the large score of 150.

Kahneman and Tversky point out that systematic use of
the representative heuristic leads researchers to gamble
research hypotheses on small samples (to overestimate the
power of small samples), to have unreasonably high expecta-
tions for the success of replication studies (to underesti-
mate the breadth of confidence intervals), and to fail to
attribute deviations in expected results to sampling vari-
ability. The results of this first paper ”Belief in the
Law of Small Numbers" (1971) encouraged Kahneman and Tversky
to continue their investigations on the use of the repre-
sentativeness heuristic.

In subsequent investigations, Kahneman and Tversky
(1972, 1973) report widespread employment of the representa-
tiveness heuristic by college students. College students
were told that about oneéhalf of all babies born are boys.
They were then asked to estimate the relative frequency of
families in which the order of B B B G G G or B G G B G B
would occur for having six children, where B stands for
boy and G for girl. 75 of 92 responses favored the latter.
The sequence B G G B G B appears to be more representative
of the random process of having children. A similar tendency

was observed for the sequences B B B B G B and B G G B G B.

24

The latter sequence was again preferred, apparently be-
cause the former sequence has too many boys to be repre-
sentative of the population proportion of 50% boys.

Kahneman and Tversky (1972) include further support
for their hypothesis that subjects rely upon representa-
tiveness and tend to neglect the effects of sample size
upon sampling distributions. They asked college students
the following question.

A certain town is served by two hospitals. In

the larger hospital about 45 babies are born

each day, and in the smaller hOSpital about 15

babies are born each day. As you know, about

50% of all babies are boys. The exact percentage

of baby boys, however, varies from day to day.

Sometimes it may be higher than 50%, sometimes
lower!

For a period of 1 year, each hospital recorded

the days on which (more/less) than 60% of the

babies born were boys. Which hospital do you

think recorded more such days? The larger hos-

pital, the smaller hospital, about the same?

Twenty-eight of fifty college students said that it
made no difference whether the hospital was large or small,
because probability was still 1/2 that a baby would be a
boy. The remaining 22 subjects divided their answers about
equally between the other two responses. These subjects
tended to see no difference in the effects of sample size
on sampling variability within the two hospitals. Reliance
upon the representativeness of the 50:50 population propor-

tion of boys to girls tends to obscure the instability of

the sampling variation in the smaller hOSpital.

25

Kahneman and Tversky (1973) also found that the re-
presentativeness heuristic could account for biases in the
prediction of categorical events and of numerical events.
Given a list of 9 fields of graduate study, and a descrip-
tive personality profile, college subjects were asked to
rank the fields of specialization (business, law, education,
medicine, social science, etc.) according to the likelihood
that the person described in the profile would be in those
areas. There was an overwhelming tendency fer the subjects
to rank the fields according to how well they “fit" the
stereotyped personality description. The subjects ignored
the actual relative frequencies that graduate students enter
any of these fields. Thus, while education may have more
graduate students than any other area and engineering may
have the fewest, the personality description of the graduate
student could lead people to rank engineering as most likely
if a stereotype engineer was described. Information given
about a person, regardless of its worth or validity, was
deemed more representative of a person's career pursuits
than were the actual percentages of people pursuing those
careers in graduate school.

The representativeness heuristic is suggested by
Kahneman and TVersky to be a major reason why opinions are
formed on the basis of stereotypes. Similar results on
stereotypes have been obtained by Chapman and Chapman (1967),

and by Bruner, Goodnow and Austin (1956). Bruner et.aJu say

26

that preferential attributes lead to prejudicial cate-
gorizing. Overreliance upon preferred but highly unre-
liable cues, such as length of brow, biases subjects in
picking out highly intelligent people. Intelligence is,
thus, categorized by reference to the cue of length of
brow.

Kahneman and Tversky found that subjects ignore the
effects of regression in making numerical predictions.
Even graduate students in educational statistics courses
who had recently been exposed to the effects of regression
and sample variability ignored regression when making pre-
dictions.

Results of Kahneman and Tversky (1972), Peterson and
Du Charme and Edwards (1968), and Edwards (1968) indicate
that college subjects are unable to construct optimal bi-
nomial distributions. Edwards and his collaborators offer
substantial evidence for what they call "conservatism" in
human beings' construction of subjective probability dis-
tributions. .Humans behave suboptimally in their revisions
of probability distributions in light of new data. Edwards
suggests the theory that man is a conservative Bayesian and
and behaves according to a Bayesian model of probability
revisions when making judgments. A more detailed discussion
of Edwards' point of view is included in the next section
of this chapter.

Kahneman and Tversky (1972) disagree with Edwards'

claim that man is a conservative Bayesian. They prefer to

27

explain human's suboptimal preformance in constructing
subjective binomial probability distributions by the use

of the representativeness heuristic. Subjects, according

to Kahneman and Tversky, use the representativeness heuris-
tic to simplify the complex task of estimating probabilities
in binomial contexts. The population proportion of successes,
p, is deemed representative of what should happen, even for
small samples of a binomial experiment. Thus, when subjects
were presented with p = 4/5 and N = 10 in a binomial
experiment, there was a significant tendency to predict 6
successes and 4 failures to be more likely than 10 successes.
The former is more representative of the population propor-
tion of successes, although the latter is actually more
likely to occur.

In addition to the representativeness heuristic, Tversky
and Kahneman (1973) identify and discuss the heuristic of
availability. Subjects are said to employ the availability
heuristic when they judge probability according to the ease
with which instances of an event can be constructed.

Evidence for the accurate assessment of outcomes using
availability is presented by TVersky and Kahneman (1973)
in experiments with word construction. Some subjects were
asked to estimate the number of words that they thought
they could form in a two minute time interval form a list
of letters. Other subjects were asked to perform the task.

The difficulty of the task was controlled by varying the

28

available letters. Correlations between the estimators
and the performers were all above .9. Thus, people can
be accurate assessors of availability.

While the use of availability can lead to correct
judgments, as above, it is also possible that availability
can bias the judgments that subjects make for the likeli-
hood of events. When asked whether the letters K, L, R,
N, and V are more likely to appear in the first position
or the third position in English words (Tversky and Kahneman,
1973), 105 of 152 college subjects favored the initial po-
sition. In fact, each of these letters appears more fre—
quently in the third position in the English language.
Availability of instances where these letters start words
is probably responsible for the incorrect judgments made
by these subjects.

Kahneman and Tversky also found uses of availability
which cause errors in judgments given for combinatorial
outcomes. A path was defined as a sequence of line seg-
ments connecting a symbol in the top row to a symbol in
the bottom row, such that one and only one symbol was
touched in each row. Then subjects were asked whether grid

A or grid B (below) had more paths in it.

Grid A Grid B

>¢NIK><N:<><X:K
Nﬁﬁ>4XIE>¢Ntﬂ><

29

An easy application of the sequential counting prin-
ciple shows that each grid has 512 paths possible in it.
However, 46 of 54 subjects (Tversky and Kahneman, 1973)
responded that grid A had more paths in it. Evidently it
is easier to construct particular instances of paths in
grid A, i.e., subjects tend to see paths as more available
in grid A.

In estimating the binomial coefficient (A?) for
choosing a committee of r persons from 10, 118 college
students indicated that (if) was monotone decreasing as
r increased. The symmetrical nature of the binomial coef-
ficient escaped subjects. There was an overwhelming ten-
dency for subjects to view small groups from 10 to be more
available than large groups. TVersky and Kahneman propose
that it is easier to construct instances of three person
committees than seven person committees.

Availability also comes up when subjects are asked to
extrapolate beyond an initial pattern. College students
were exposed for 5 seconds to either the expression 1.x2 x
3 x4 x5 x6 x7 x8, or the expression 8 x7 x6 x5 x4x3 x2 x1-
Median estimates were 512 for the first expression and 2,250
for the second. The correct answer is much larger, 40,320.
The availability of large or small initial partial products
had a significant effect on the size of the extrapolated
estimate, although both median estimates were very conser-

vative when compared to the correct answer.

30

In estimating likelihoods in a binomial distribution,
subjects were also found to employ the availability heuris—
tic. Tversky and Kahneman presented college students with

the following grid.

XXOXXX
XXXXOX
XOXXXX
XXXOXX
XXXXXO
OXXXXX

Then the students were asked to estimate the relative fre-
quency of paths that hit 6X, 5X, and 10, 4X and 20, .. .,
60's. Other students were simply asked to decide whether
paths with 6X or with 5X and 10 would be more likely
to occur. In both cases Kahneman and Tyersky found a pre-
ponderance of subjects who favored the paths with 6X's .

The explanation they give for this phenomenon is that sub-
jects see the X's as more available than the 0's.

If the mathematical construction for the above problem
was preserved, but the structure of the task was altered,
college subjects were found to favor 5X and 10 over
6X's. Suppose that six players in a card game received a
single card at random from a deck in which 5/6 of the cards
are marked X and 1/6 are marked 0. What is more likely
to occur, SX's and 10 or all 6X's? In this context,
Tversky and Kahneman found that a majority of their subjects
favored SX's and 10. These subjects were presented
with the actual population proportion of X's, 5/6. Hence,

their judgments are "representative" of the p0pulation

31

proportion. The representativeness heuristic, rather than
the availability heuristic, was utilized by these subjects
in evaluating the likelihood of the card outcomes.

Kahneman and Tversky (1972) propose that the major
difference between the evaluation of subjective probability
by representativeness or by availability is the nature of
the judgment presented by the task environment. Representa-
tiveness encourages evaluators to judge the degree of cor-
respondence between the sample and the population. Thus,
representativeness emphasizes the generic features of the
task environment, the connotative distance between sample
and population. 0n the other hand, availability is employed
to judge subjective probability by the retrieval of parti—
cular instances, and so it focuses on the denotative dis-
tance between sample and population. These heuristics,
according to Kahneman and Tversky, are adopted not only
because they simplify complex tasks and thereby reduce cog-
nitive strain, but also because in many instances they serve
to produce quite accurate estimates by human judges. Uh-
fortunately, they can also lead to bias and misconception,
as pointed out above.

In addition to the investigation mentioned above, many
other authors who eXplore the areas of probability learning
and subjective probability have reported results which sup-
port the hypothesis that human judges employ the heuristics

of availability and representativeness.

32

Reviews of the literature on probability-serial-
learning experiments by Tune (1964) and Vlek (1970) re-
port that subjects are overconfident of their ability to
predict events based upon a long series of observations.

A discussion of this phenomenon can be found in Howell
(1970). Tune cites considerable evidence for subjects'
overestimation of low frequency events and underestimation
of high frequency events. Reliance upon representativeness,
which tends to make subjects think outcomes Should "even
out", may explain this behavior. Komorita (1959) also pro-
vides experimental results verifying the overestimation of
low and underestimation of high probability events.

The work of John Cohen and Mark Hansel (Cohen and Hansel
1956; Cohen 1957, 1960) in the area of subjective probability
has also provided support for the use of availability and
representativeness.

Cohen and Hansel sampled one bead at a time from an
urn filled with blue and yellow beads in a 3:1 ratio. Their
subjects were not told what the ratio was, but were asked
to predict the color of the next bead after each trial.
Cohen and Hansel found a tendency among adults to first
predict the color which had appeared less often, and then
the color which had appeared more often. This behavior is
entirely compatible with the representativeness heuristic.
Subjects will first assume that things should "even out",

and then predict outcomes based on observed population pro-

portion.

33

In another experiment, the beads were sampled in groups
of 4 and placed in hidden beakers. Cohen and Hansel told
their subjects, who were from all different age groups from
age 6 up to adult, that the population proportion of blue
beads to yellow was 1:1, or 3:1 or 2:1. Then they asked
their subjects to estimate the number of beakers among 16
which would contain 4 blues, 3 blues and a yellow, etc.,
down to 4 yellows. The subjects were, thus, generating
subjective sampling distributions. The chief effect noticed
was an induced preference among the subjects for the number
of beakers containing the exact proportion of the population
in the large urn. Cohen and Hansel's subjects manifested
the judgment of representativeness, because the small samples
of size 4 were judged to be very representative of the 1:1 or
3:1 population proportions.

Cohen and Hansel mention that subjects fail to grasp the
independence of sequential events. They claim that subjects
feel forced to believe that failure is more likely to occur
after a long run of successes, and vise-versa. "Even those
familiar with the theory of statistical independence often
involuntarily share this belief" - (then and Hansel, 1956;
10). This tendency to be influenced by recent events in a
series is characteristic of the "local representativeness"
attributed by subjects even to small samples. Jarvick (1951)
obtained results similar to those of Cohen and Hansel when

subjects were asked to predict the next event based upon a

sequence of observed events.

34

Cohen (1957, 1960) reports that 15 year old subjects
feel that uncertainty from two or more sources reduces the
probability of success more than does a single source of
uncertainty. Cohen also reports that when subjects 15 years
adult are asked if they would prefer to try to pull a winning
ticket out of a box of ten tickets, or get ten tries at pul-
ling a winning ticket from a box of 100 tickets (with re-
placement), the subjects prefer the smaller sample size and
the single attempt by 4 to 1. This behavior suggests the
use of the availability heuristic. People may view their
chances of winning as more "available" in the 1 in 10 situ-
ation than in the additive context of 10 chances at l in 100.

In summary then, there is evidence in the literature
supporting the theory of Kahneman and Tversky that subjects
employ the heuristic of availability and representativeness
when making judgments as to the likelihood of events. The
theoretical positions of Cohen, and of Kahneman and Tyersky,

are summed up below.

"It is a fact of great interest that in the long
and intricate course of mental development, our
subjective probabilities tend, in many respects,
increasingly to approach objective or mathemati-
cally determined probabilities. This convergence
owes something to learning as well as to maturation.
0f no less significance is the fact that, in certain
respects, subjective and Objective or mathematical
probabilities never converge. The subjective then
Obeys laws of its own. From this we may infer that
consciousness is not merely 'a reflection of exter-
nal reality' but creates other realities from with-
in." (Cohen, 1960; 191)

35

"Perhaps the most general conclusion obtained

from numerous investigations, is that people do
not follow the principles of probability theory

in judging the likelihood of uncertain events.

This conclusion is hardly surprising because

many of the laws of chance are neither intuitively
apparent nor easy to apply. Less obvious, however,
is the fact that the deviations of subjective prob-
ability seem reliable, systematic, and difficult
to eliminate. Apparently, people replace the laws
of chance by heuristics, which sometimes yield
reasonable estimates and quite often do not."
(Kahneman and Tversky, 1972; 430-431)

The quote by Cohen suggests a developmental theory of
the probability concept. Literature related to the devel-
opment of the probability concept in children will be dis-
cussed in section three. The current concern is an overview
of the representational theories of human judgment. This
is a large area of research in the context of which the

studies of Kahneman and Tversky are imbedded.

Models of Human Judgment and Decision Making

In the introductory chapter to a symposium on repre-
sentations of human judgment, Allen Newell (1968) discusses
scientific and motivational questions about judgment, and
provides an overview of the branches of research in the
area of representing human judgment. The scientific ques-
tions that researchers ask include: What information is
being used to make the judgment? How does the judgment
depend upon the input information? What is the process
that takes place between input information and the judgment
which is the output? What other variables besides the in-

put information affect the judgment or decision?

36

From a motivational standpoint research on the judg—
mental process and simulations of it are pursued in order
to find out: Why do human beings fail to make Optimal judg-
ments? Can machines simulate the human judgmental process,
equalling or perhaps even surpassing the accuracy of human
judges? What are the ways in which humans simplify complex
tasks, thus rendering them amenable to a simpler analysis,
in order to make a decision?

The research on subjective probability that has been
surveyed in section one of this chapter is very much in the
tradition of the research on the questions listed above.

In particular, the studies of Kahneman and Tversky on the
uses of heuristics are concerned with the process that occurs
between the inputs and outputs of human judgments. Kahneman
and Tversky are primarily concerned with the process or
strategies that humans employ to simplify the complex task
environment in situations of uncertainty. Models of human
judgment are, therefore, of central importance to the work
of Kahneman and Tversky, and so also to this study.

Newell (1968) mentions that there are four major strands
of research that attempt to model the process and products of
human judgment. These are mathematical models of the formal
judgmental process, models of the task environment, informa-
tion processing models, and models that view judgment as
embedded in a larger problem. The first three of these
strands are concerned with isolating and studying the judg-

ment of and in itself, and are related to the area of

37

subjective probability. The fourth strand, which investi-
gates judgment embedded as a step in larger process, will
not be discussed in this paper. The interested reader is
referred to Feigenbaum and Lederberg (1968).

The mathematical models of human judgment are concerned
with simulating the judgmental law. They are, therefore,
primarily product oriented models and tend to be only se—
condarily concerned with the process that a human being
uses to make a judgment. .A mathematization is attempted in
order to calculate optimal decisions based upon input cues.
According to Newell (1968), the general ingredients for
these formal models consist of a state of the environment
x, action possibilities by the judge, given by a, the out-
come p that results from the environment "x" and the
judge's action "a" (p is a function of x and a, p =
p(x,a)), a utility function v which measures or assigns_
a value to each outcome p, and a payoff function V of
x and a. The payoff resulting from application of a par-
ticular outcome is formally represented as w(x,a) =
v[p(x,a)]. The usefulness of such a general model is heavily
dependent upon some valid and reasonable method of mathema-
tizing x and a. In general, it is not even clear what
x and a are, for the state of the environment may be un-
certain, and the action options available to the judge are
hard to estimate, and may even vary from judge to judge.
Thus more specific mathematizations of pieces of this gen—

eral model must be attempted.

38

Hoffman (1960, 1968) has used the techniques of linear
regression to obtain weights for the cues utilized by human
judges and to develop regression equations which will pre-
dict their decisions. Hoffman's general model has its the-
oretical roots in a correlational model proposed by Egon
Brunswick's probabilitistic functionalism (Brunswick, 1947).
In one of his studies Hoffman (1968) gave college subjects
(the judge) information for nine predictor variables from
student profiles. The predictor variables included such
items as a high school counselor's rating, study habits,
amount of parents' education, emotional stability, anxiety
level, and so forth. Subjects then used the predictor vari-
able information to make a judgment of a student's intelli-
gence. weights were assigned to the predictor variables and
a regression equation was computed for each judge with the
judgment being the dependent variable. The simplest possible
regression model is the best fitting hyperplane obtained by
the method of least squares. In computing a multiple cor-
relation coefficient to measure the precision with which a
linear combination of the predictor variables can account
for a subject judgments, Hoffman found that 64-81% of the
variability in the judges decisions could be accounted for
by a simple linear regression model.

The question then arises: Do human judges actually
process information according to a linear model, or is the

regression model just an accurate simulation of the judg-

mental product? Hoffman declares that even though the

39

linear model is very accurate in predicting judgment, he
is unwilling to accept the fact that the actual process
of human judgment is a simple linear combination of weighted
cues. He indicates that there is some evidence (Hammond,
Hurch, and Todd, 1964; Rorer and Slovic, 1966) that humans
combine information configurally, that is, in a non-linear
manner. Hoffman prefers to look upon the regression model
of human judgment as a paramorphic representation of the
product of a judgment rather than as an isomorphic copy of
the judgmental process. By "paramorphic" Hoffman means that
although regression equations may perfbrm like a human judge,
they do not necessarily describe the actual processing of
the information in the human brain.

Edwards (1968) is a spokesman for those researchers
who model human judgments by calculating the outputs of
Bayes' theorem and comparing them to human judgments. Pro-
ponents of the Bayesian representation of human judgment
are primarily concerned with how people revise their judg-
ments in light of new evidence. Probabilities for condi-
tional events can be formally computed from Bayes' theorem,
which says if A and B are events, then the probability
of event A occuring given that event B gig occur,

P(A|B), is given by

p(AlB) = ESPAIH-‘I-Bl .

4O

Bayes' theorem allows one to calculate a revised probability
in a sample space that has been restricted or narrowed down
by the introduction of new evidence. Studies conducted by
Edwards (1968), and Peterson, Du Charme, and Edwards (1968),
have led them to believe that human beings do, in fact, make
decisions that are in accord with the theorem of Bayes.
However, the results of human judgments are generally much
more conservative than would be predicted by Bayes' theorem.
Edwards placed 700 red poker chips and 300 blue poker
chips in one bookbag, and the opposite distribution, 300
red and 700 blue, in another. He then drew a sample of 12
chips from one bag - determined at random. The probability
that the sample comes from one specific bag is .5 before a
subject is given any information as to the composition of
the sample. Subjects were then told that the sample con—
sisted of 8 red and 4 blue chips, and were asked to estimate
the probability that the sample was drawn from the predomin—
ately red bag in light of the new information. The mean
estimates for subjects fell between .7 and .8. The actual
probability, calculated from Bayes' theorem, that the sample
was drawn from the predominantly red bag is .97. Edwards
(1968) sights considerable evidence for the hypothesis that
human beings do, in fact, process information according to
Bayes theorem, and do so conservatively. A study by Phillips,
Hays, and Edwards (1966) gives strong evidence for the fact

that human judges are conservative Bayesians.

41

The phenomenon of conservativism is somewhat distres-
sing to the Bayesians, since it prevents human judges from
performing optimally on judgmental tasks. Considerable
effort has been devoted to the problem of overcoming con-
servatism in human judgment in studies by Peterson et. a1.
(1968) and by Wheeler and Beach (1968). Their findings
suggest that the phenomenon of conservatism is very diffi-
cult to overcome, and that only long and involved training
procedures appeared to have any effect upon human subjects'
performance on probability revision tasks. The Bayesians
cannot agree among themselves as to whether conservatism
results because humans misaggregate data, or misperceive
the impact of data, or if conservatism is an artifact of
the human judgmental process.

Kahneman and Tyersky (1974) claim that the reason
Bayesians have difficulty fitting their model to the process
of human judgment is that humans are not conservative Bayesians
at all. They claim that, whereas the Bayesian model may pre-
dict fairly accurate products of human judgment, the model
does not capture the essential characteristics of the judg-
mental process. Since sample size has no effect upon sub-
jective probability distributions (see section one of this
chapter), human judges must be using the proportion of suc-
cesses in a binomial experiment as a representative charac-
teristic which will predict the actual distribution. Thus,
for Kahneman and Tversky, human judges apply the representa-

tiveness heuristic in the experiments conducted by the

42

Bayesians, and do not process information by applying Bayes'
theorem.

The linear regression model and the Bayesian model
are the main attempts at mathematization of the human judg—
mental process. Each of these models has shown some success
in predicting human decisions, but neither of them is widely
accepted as having accurately depicted the process that
actually takes place in the human mind when judgments are
rendered. A thorough review of the literature comparing
and contrasting these policy capturing models, regression
and Bayesian, has been done by Slovic and Lichtenstein (1971).

The actual process that occurs during a judgment is
more carefully investigated by models of the task environ-
ment and by information processing models, than by formal
mathematization of human judgment. Simon and Newell (1971)
describe the elements of an information processing model of
problem solving, judgment, and decision making. The model
is concerned with a problem solver confronted by a task that
is objectively defined in terms of a task environment. The
task environment is the state of the problem at any given
moment in the prdblem solving process. The problem solver
defines the problem (and constantly redefines it) in terms
of operations that constitute what Simon and Newell call the
problem space. Thus, for example, in chess, the task envir-
onment is the state of the board in between moves and the

problem space consists of those operations which constitute

permissable moves by the players. The assumptions that

43

infOrmation processing models make about human problem
solvers include: the existence of a Short term memory to
temporarily store bits of input information; the existence
of a long term memory bank of facts and strategies which
can be brought to bear upon a problem; and the assumption
that human problem solving occurs in an essentially serial
manner. Simon and Newell believe that human information
processors proceed step by step in solving a problem or
making a decision, rather than carrying out several paral—
lel procedures simultaneously.

Models that deal specifically with the nature of the
task environment attempt to analyze the task itself to see
if the structure of the task dictates the strategies that
are possible for making a judgment. The work of Chapman
and Chapman (1967) on categorizing personality on the basis
of facial components, and the extensive investigation of
Adrian DeGroot involving mid-game position in chess (DeGroot,
1965), as well as Newell and Simon's own work on chess and
cryptarithmetric problems (Newell and Simon, 1972), provide
examples of attempts by researchers to model the task envir-
onment.

Models of the task environment typically do not attempt
to describe the entire process of human judgment. They focus
on one element of this process. In this respect, information
processing models try to simulate not only the task environ-
ment and the information that gets used, but also the repre-

sentations of the environment developed by the decision-maker

44

and hpg the human decision-maker actually processes these
representations. A judgmental law is generated by the
information structure of the task environment, and any math—
ematization representing the process of the judgment is only

developed afterwards. Often the model culminates Una computer

 

program which attempts to simulate the actual judgmental
process that a human judge uses.

Newell, Simon, and Shaw (1958), and Newell and Simon
(1972), have simulated the task environment, the problem
space, the process used by a human judge, and finally written
a computer program to carry out the processes of proving the-
orems in logic, playing chess, and solving cryptarithmetic
problems. Clarkson (1962) tape recorded stock brokers while
they were thinking outloud and making decisions about invest-
ments. He then analyzed the protocols that he had taped,
identified crucial elements that stock brokers used in making
decisions including a wealth of information on past stock
performances, and wrote a computer program that simulated the
process that the stockbrokers go through. Predictions made
by the program were tested against those made by stockbrokers
over a six-month period, and the program predictions were in
agreement with those of the stockbrokers more than 90% of
the time .

Kleinmutz (1968) studied the diagnostic decisions made
by clinical psychologists in interpreting personality pro-

files. The psychologists were tape recorded while thinking

45

outloud, protocols were analyzed, and a computer program
was written to simulate the optimum human judgment in ana-
lyzing the profiles.

Shulman and Elstein (1974) describe a four step infor-
mation processing model which they claim is used by physi-
cians in doing a diagnostic work up of a patient. The model
includes one aquisition, hypothesis generation, cue inter-
pretation, and hypothesis evaluation. This paper of Shulman
and Elstein's includes a comprehensive review of the liter-
ature of problem solving models in the information processing
tradition, and compares the research in information processing
to the aforementioned research on judgment by the Bayesian
and regression model builders.

In comparing these two branches of research, Shulman
and Elstein point out that the information processors are
primarily concerned with isomorphic models of the judgmental
process that humans actually go through, while the Bayesian
and regression schools deal with.paramorphic simulations that
have outputs similar to those made by human judges. Bayesian
and regression investigators distrust intrOSpective techniques,
while process tracers rely heavily upon introspective tech-
niques, such as thinking outloud, in order to represent the
process of human decision making. The information process
models are concerned first with understanding and explaining
the judgmental process, and only secondarily with accurate
prediction and control. On the other hand, prediction and

control are the primary concerns in the "policy capturing"

46

research of the Bayesian and regression models. Shulman
and Elstein suggest that the two areas of research have

much to offer each other, and that they complement each

others' weak points.

This study is primarily concerned.with several heuris-
tics that human information processors use when making judg-
ments. The availability heuristic and the representativeness
heuristic (Kahneman and Tyersky, 1974) appear to be part of
the problem space. That is, they are examples of operations
that judges might use in dealing with a task environment of
uncertainty. There is evidence that human judges misuse
these operations when making estimates for the likelihood
of events (see section one of this chapter). Difficulties
encountered in overcoming the misuse of the representative-
ness and availability heuristic appear to be deep-seated and
dependent upon the development of the probability concept.
The next section, therefore, will contain a discussion of
the development of the probability concept in young children

and adolescents.

The Development of the Probability Concept in Young Children
and Adolescents.

In this section, literature from the fields of psychology
and mathematics education that deals with the development of
probability concepts in both elementary and secondary school-
age children will be cited. The section begins with Piaget's

study of the probability concept in children, ages 3-15.

47

Although this study is concerned with misconceptions that
college age students have about probability, the develop-
ment of probability concepts in younger subjects is also
important to the study. The roots of misconceptions that
adults have about probability can sometimes be traced back
to earlier experiences, or lack of experiences, with prob-
abilistic concepts.

The work Piaget and Inhelder reported in their book
Inhelder, 1951) is the source of much of the research in
the development of the probability concept in young chil-
dren. Piaget presents clinical evidence from interviews
with children and concludes that the learning of probability
concepts proceeds in stages, in accord with his theory of
the development of thought in children. There are three
stages in Piaget's theory of the development of the prob-
ability concept in children.

In the first stage, generally characteristic of chil-
dren under seven years of age, the child is unable to dis-
tinguish between the necessary and the possible. In this
stage, "uncertainty" means only unpredictability of events
in the near future. The child does not possess a concept
of logical uncertainty, and so does not understand the true
nature of a random mixture. Piaget found that children in
this first stage of development tried to superimpose an

order or discover a pattern amid the chaos of a random mix-

ture.

48

Two behaviors that Piaget observed in children in the
first stage are worth noting in connection with the present
study.

In the first place, if a subject was shown instances
of events A and B, and if A appeared more frequently
than B, the subject would tend to bet on B because it had
been skipped too often. This type of behavior, sometimes
referred to as the "gambler's fallacy", exemplifies a sub-
ject's use of the representativeness heuristic, in the
language of Kahneman and Tversky (1972). A truly represent-
ative sequence of instances of A's and B's Should not
favor one or the other (provided, of course, that p(A) =
P(B)).

In the second place, Piaget's subjects tended to pre-
dict those events which had been observed most frequently,
with total disregard for the population distribution. This
type of behavior is characteristic of the availability
heuristic (Tversky and Kahneman, 1973), wherein events are
predicted based upon constructible instances.

The author does not want to imply that Piaget's subjects
were actually employing the representativeness heuristic or
availability heuristic in their responses. The use of these
heuristics, as described by Kahneman and Tversky, would re-
quire the subject to possess logical operations which are
not observable in Piaget's preoperational children. However,
the fact that behaviors characteristic of these two heuris-

tics exist in small children, for whatever reason, and that

49

these behaviors are manifestly still observable in mature
adults (Cohen and Hansel, 1956; Kahneman and TVersky, 1972),.
indicates that they might present formidable obstacles to
learning the correct theoretical rule.

In the second stage of the development of the proba-
bility concept (up to about 14 years), Piaget claims that
a child recognizes the distinction between the necessary
and the possible, but has no systematic approach to gener-
ating a list of "the possibles". Thus, a child in this
stage lacks the ability to list the sample space for a prob-
ability experiment. The second stage child does not possess
the formal operations which are crucial in systematizing a
combinatorial analysis.

In the third stage, the child begins to develop a com-
binatorial analysis, understands probability as the limit
of relative frequency (law of large numbers), and can deal
with the probability of isolated instances as a function of
the whole distribution.

Piaget, and Kahneman and Tversky, give evidence for the
same types of behavior in subjects from vastly different age
groups. In the language of Kahneman and Tversky, Piaget's
subjects (children) exhibit behavior that is characteristic
of the representativeness and availability heuristics when
they are asked to predict outcomes. In the language of
Piaget, Kahneman and Tversky's subjects (mostly college

students) eXhibit behavior that is characteristic of the

50

first and second stages of cognitive development, prior
to the acquistion of formal operations.

Piaget's interview technique requires a high degree
of verbalization from the subjects. Some studies have been
conducted to see if very young children indicate an under-
standing of some probability concepts when their decisions
are made in a non-verbal format. Davis (1965), and Yost,
Siegal, and Andrews (1962) present evidence for the exis—
tence of some concepts of probability in children age 3
and 4. The children were permitted to determine probability
or frequency by utilizing a non-verbal decision process.
Ybst et. a1. claim that the amount of reinforcement in a
probability learning experiment with four year olds had a
significant effect upon the accuracy of the children's pre-
dictions.

Smock and Belovicz (1968) claim that the children in
Yost's experiment really learned about reinforcement, and
not about probability. They present substantial evidence
that subjects of junior high age have a very poor conception
of the laws of probability. Smock's subjects could not
consistently generate correct sample spaces, and did not
recognize or utilize the concept of independence when pre-
dicting outcomes.

Cohen and Hansel (1956) identify four stages that chil-
dren go through in the development of the idea of a proba-

'bility distribution. At first there is just a "glimmering

51

belief" that the numbers in a distribution will really vary.
This corresponds somewhat to recognizing the distinction
between the necessary and the possible in Piaget's theory.
Secondly, a child feels that the category of exactly equal
proportions will occur most often, that is, that every prob-
ability distribution is a uniform distribution. In the
third stage, likelihoods are assigned to outcomes based upon
their similar structure. For example, the outcome one blue
and four yellow beads is judged as likely to occur as the
outcome one yellow and four blue beads, regardless of the
population composition. In this stage the child applies

the principal of symmetry universally. Finally, Cohen and
Hansel claim, a child is able to assign a greater probability
to the event "one blue and three yellow beads" than the
event "four blue beads" in a 50-50 distribution. then

and Hansel attribute the stages of mental development both
to maturation and physical experience, and say that a child
is ordinarily in the fourth stage of development around the
age of 15. This theory is very much in accord with that of
Piaget.

Cohen and Hansel, Stevens and Zigler (1958), Messick
and Solley (1957), and Kass (1964), have examined the devel—
opment of the ability in children to "matc " their guesses
to the actual distribution for binomial outcomes in a serial
learning task. Generally, studies on serial learning pre-

sent the subject with a box containing two lights. The

52

subject guesses which light will come on, and then the
light is shown. The frequency for the light is preset at
80:20 or 70:30 or 50:50, or whatever the researcher desires.
Results of such research performed with adult subjects show
that the proportion of guesses asymptotically approaches
the preset proportion for the lights. Messick and Solley
(1957), and Stevenson et. a1. (1958) report that children
approach the asymptotic response level in about the same
‘way that adults do. Cohen and Hansel (1956) agree with
these results, but found a tendency for children age 6 or

7 to simply alternate their guesses. About age 8, Cohen
reports, children begin to be influenced by the previous
outcomes.

Kass fOund that boys preferred real gambling distri-
butions more than girls in binomial probability learning
tasks. While girls would rather have a certain chance at
a payoff, boys preferred payoff odds of 2:1 or 7:1 against
them in gambling situations (Kass, 1964). Kass' study is
one of the few studies that found any sex differences in
the development of the probability concept in children.

Carlson (1969) and Hoeman and Ross (1971) found that
the development of probabilistic reasoning increased with
age, and generally followed Piaget's stages. Hoeman supports
Smock (1968) in criticizing the studies of Yost and Davis
(mentioned above) which attributed to very young children a

greater understanding of probability than that claimed by

Piaget.

53

There appears to be a good deal of support and agree-
ment in the literature that the development of the proba-
bility concept in children does proceed in stages in accord
with the theory of Piaget. However, there is considerable
disagreement among investigators as to which probability
concepts are actually known by children, and at what age
levels.

As a result of Piaget's theory of development and the
controversy surrounding the level of probability concept
attainment by children at various ages, there has been some
relatively recent research by mathematics educators in the
area of probability learning. These studies have also been
spurred by the suggestions of the College Entrance Examina-
tion Board (1959) and the Cambridge Conference on School
Mathematics (1963) which encouraged the inclusion of topics
from probability and statistics in the elementary and secon-
dary schools. Most of the studies in mathematics education
are either feasibility studies undertaken to determine the
teachability of probability and statistics in the elementary
or secondary schools, or experimental and correlational
studies which attempt to measure the effects of teaching a
unit on probability.

Studies by Leake (1962), Doherty (1965), Mullenex (1969),
Leffin (1971), and Jones (1974) have investigated children's
understanding of probability concepts prior to any fOrmal

instruction.

54

Leake found that seventh, eighth, and ninth grade stu-
dents had some understanding of sample space, probability
of a simple event, and probability of the union of two dis-
joint events (mutually exclusive events). Mental age and
achievement both correlated significantly with understand-
ing of probability. Leake recommends the inclusion of prob-
ability topics in these grade levels based upon his results.

Doherty (1965) carried out a similar study with fourth,
fifth, and sixth graders. An investigation of children's
understanding of independent events was added to the three
concepts of sample space, simple probability, and mutually
exclusive events of Leake's study. Doherty found that chil-
dren in grades 4-6 possess considerable familiarity with
these concepts prior to fOrmal instruction. As in the Leake
study, age, mental age, and achievement were found to be
significantly related to the level of understanding of prob-
ability concepts. Doherty interprets her results as indi-
cative of the feasibility of teaching probability in the
elementary school. She recommends that topics from proba-
bility be included in elementary school curricula, and that
teacher training programs make provisions for informing
prOSpective elementary teachers about probability topics
that would be suitable for elementary school children.

Mullenex (1969) investigated the relationShips between
understanding of probability in grades 3-6, and the variables

of sex, age, grade level, and skill in other school subjects.

55

His test was based upon the questions that Piaget asked
children in interviews. Multiple linear regression tech—
niques indicated a tendency for arithmetic computational
skills and reading skills to be relevant predictors of
performance on probability measures. Mullenex found suf-
ficient evidence for the understanding of probability in
children to warrant inclusion of probability topics in
grades 3-6.

In a study of probability concepts possessed by chil-
dren in grades 4-7 prior to formal instruction, Leffin
(1971) reports that children have considerable knowledge
of the concepts of finite sample space, probability of a
simple event, and quantification of probability. 1.0.,
sex, and grade level were all found to be signigicantly re-
lated to the understanding of probability. 1.0. was found
to be the most accurate predictor of performance on proba-
bility tests. In analyzing the children's errors, Leffin
mentions that the concept of combinations was very diffi-
cult for them to comprehend or to use. When Leffin's sub-
jects could list all the outcomes in a sample space that
counted combinations, 92% of them could not use the infor-
mation from the sample space to calculate a probability.
This evidence appears to support Piaget's position that
children of this age are in the stage of concrete operations.
Leffin's subjects could successfully handle probability in
simple situations like drawing balls out of a box given the

number of balls of each color that are in the box. However,

56

the more complicated combinatorially-generated sample spaces
were not understood by these children. This finding caused
Leffin to speculate on how early children can be taught a
systematic method of counting. He recommends taped inter-
views and the use of manipulatives with children in order

to obtain more information about children's readiness to
learn counting principles.

Jones (1974) used taped interviews with first, second,
and third graders, and embodiments of set and measure to
investigate the status of five concepts of probability
among early elementary school children. The embodiments
were spinners with equal and unequal area divisions, and
jars containing discrete objects. Interviews were taped
in order to gain insight into the errors made by the sub-
jects. The concepts were sample space; comparison (P1)
of the probability of two events within a fixed sample space;
comparison (P2) of the probability of a given event across
three sample spaces with the number of total outcomes held

a b c

-: identification of (P

h- , a. , n uniform prob-

constant, 3)

ability distribution; and comparison (P4) of one event
across three sample spaces in which the frequency of that
event was constant but the total number of outcomes was
varied, 3’ 5:7, 1:- . Jones found evidence in support of the
children's understanding of P2, P4, and of sample space.
He suggests that for primary children, an apparent under-
standing of probability in one situation does not guarantee

understanding will be evidenced in another situation. There

57

is also further evidence in Jones' study that 1.0. predicts
the extent of the development of probabilistic thinking in
young children, in accord with the findings of Leake,
Doherty, and Leffin. The use of embodiments seemed to help
the children understand probability although Jones reports
that the use of manipulatives to perform an experiment some—
times interfered with the children's ability to list the
outcomes of a sample space. Color biases and individual
preferences prevented some children from making accurate
responses to questions involving the spinners.

The studies listed above universally recommended that
topics in probability be included in pre-secondary mathe-
matics curricula, grades 1-9. Attempts at developing ma-
terials in probability and testing the effectiveness of
these materials have been made by Shepler (1970), Shepler
and Romberg (1973), Gipson (1971), White (1974), McKinley
(1960), and Shulte (1968).

Shepler (1970) developed a unit on probability dealing
with sample spaces of one, two, and three dimensions, and
necessary counting techniques. The unit was taught to a
class of 25 specially selected sixth graders of above aver-
age ability. The unit was taught using a mastery learning
model that incorporated self-correcting exercises, specific
prescriptions to diagnose and remedy errors, extra help
sessions, and extra group instruction when mastery was not
satisfactorily attained by a large majority of the class.

Objectives included counting outcomes, probability of a

58

simple event, probability of a compound event, equally
likely vs. unequally likely probability models, and estim—
ating the probability of an event from data in an experi-
ment. A criterion level of 90% correct by 90% of the
students was set for mastery of the objectives. All the
behavioral objectives were mastered at this level by
the students except those dealing with counting the number
of outcomes and estimating probability from data. Shepler's
results agree with those of Leffin (1971), and suggest that
sixth graders do not yet possess the formal operations that
Piaget claims are necessary to count all the outcomes sys-
tematically. A follow up study (Shepler and Romberg, 1973)
indicated that after four weeks the subjects were able to
retain most of what they had acquired at the mastery level.
Studies were carried out by Gipson (1972) to determine
what materials would be appropriate for introducing proba-
bility concepts to third graders. In one study children
received instruction in small groups and in another instuc-
tion was individualized. The instructional sequence dealt
with the concept of sample space and the probability of a
simple event. Audio and video tapes of the subjects were
made to gain deeper insight into the process through which
children learn about probability concepts. Gipson, like
Shepler, reports that the children had difficulty specify—
ing estimated probability from an experiment.

White (1974) compared pre- and post-test results and

found that seventh and eighth grade subjects demonstrated

59

significant increases in achievement of probability concepts.
Achievement in probability was correlated significantly with
concept attainment, computational ability, and reading abil-
ity in White's study.

McKinley(1960) and Shulte (1968) developed and tested
units on probability for secondary school students. McKinley
reports that intelligence, language skills, reading compre-
hension, and math achievement all correlate significantly
with achievement on a unit in probability taught to 12th
graders. Shulte tested the effects of a unit on probability
on attitude, computational skill, understanding mathematics
symbols, and the ability to use formulas. No significant
effects upon attitude or computational skill were found,
but the probability unit did have a significant effect upon
the student's ability to use fOrmulas and interpret mathe-
matical symbols.

In addition to the studies concerned with the feasi-
bility of teaching probability at the elementary and secon—
dary levels, which we have discussed above, there have been
several comparative studies of the relative effectiveness
of two or more approaches to teaching probability at various
levels. Comparative studies have been undertaken at the
elementary level by McLeod (1972), at the secondary level
by Geeslin (1974) and Meyer (1975), and at the collegiate

level by Barz (1970), Austin (1974), and Kipp (1975).

Three treatments in a unit on probability were admin-

istered to second and fourth grade children by McLeod (1972).

60

The treatments were laboratory experience, a teacher dem-
onstration, and a control in which no probability was
taught. The unit on probability covered the law of large
numbers, prediction of a set of outcomes from an experiment
involving repeated trials, and uses of probabilistic terms
such as “certain", "impossible", "likely", and "unlikely".
McLeod found no differences among the three treatments in
probability achievement.

Moyer (1975) examined the effects of a unit in proba-
bility upon arithmetic computation skills, reasoning ability,
and attitudes. The experimental group showed no significant
improvement in these three areas over a control group which
received no probability instruction. Meyer reports that
the experimental group did, however, learn a lot about prob-
ability.

In an attempt to compare the content structure of prob-
ability with secondary school students' cognitive structure,
Geeslin (1974) prepared a programmed text covering several
concepts in probability. Students were allowed to work
through the programmed material at their own pace. A con-
trol group worked through a programmed text on an unrelated
mathematical topic. After ten days the groups were tested
on their ability to solve probability problems, and a rep-
resentation of each groups' cognitive structure with respect
to probability was made and compared to a theoretical struc—
ture for probability content. Geeslin found close corre-

spondence between the experimental group's cognitive structure

61

and the structure of the probability concepts. Although
the two structures corresponded closely, Geeslin warns
that the learning of the structure of probability and the
ability to actually solve problems in probability may de—
velop independently of each other.

Kipp (1975) investigated the effects of integrating
topics from probability with those of elementary algebra
in an experiment with college students. She compared ex-
perimental and control groups on achievement, retention,
and attitude. Greater retention and improved attitude to-
wards mathematics were found in the groups receiving the
algebra integrated with probability. Kipp recommends that
experimentation be introduced before college students are
taught probability formally. She suggests that college
students should encounter physical models of both uniform
and non-uniform probability distributions.

The studies of Barz (1970) and Austin (1974) are closely
related to the present study in that they compare several
methods of teaching probability to college students. Barz
taught two different courses in probability to liberal arts
majors and to elementary education majors. The groups were
broken up into an x population with three years or less
of high school mathematics and a y population with more
than three years of high school mathematics. Each of these
groups was then broken up into two parts. One part received

a course in probability that was set-theoretic based and the

(”ﬂier part a probability course that actively involved the

62

students and presented probability from an historical per-
spective. The y-population of liberal arts majors was the
only group in which the historical-practical-involvement
approach yielded significant differences over the set
theoretic based course. Barz noted tendencies in the data
to favor the historical-active-involvement course in all
the groups.

Three methods of teaching probability and statistics,
symbolic, pictoral, and manipulative pictoral, were compared
for their effects upon the achievement of probability con-
cepts by college students in a study by Austin (1974). The
subjects were freshman and sophomores who were not majoring
in mathematics or sciences. The manipulative-pictoral treat-
ment used the results of student performed experiments to
introduce and motivate the development of mathematical models
in the study of probability. The pictoral treatment replaced
the actual experiments of the manipulative approach with
graphs, pictures, and diagrams of experimental data. In the
symbolic treatment, the same concept and material were cov—
ered, except that the use of experimental data in the form
of graphs or diagrams was deleted. Comparisons on computa-
tion, comprehension, application and analysis measures indi-
cated that the pictoral and manipulative-pictoral treatment
resulted in significantly higher achievement scores than the
Symbolic treatment. The pictoral and manipulative-pictoral

treatments yielded the same results. Austin concludes that

63

it appears that college students can give up the use of
manipulatives, but 225 the use of graphs, pictures, and
diagrams representing vicarious experiments, in learning
about probability.

The review of the literature has, thus far, dealt with
subjective probability, models of human judgment, and the
development of the probability concept in children and young
adults. Literature from the areas of psychology and mathe-
matics education has been discussed in connection to these
three fields of research. This study is concerned with
misconceptions that college students have about probability.
Very little research has been done in this area. A study
was conducted by Smock and Belovicz (1968) with college
students and junior high school students. The major purpose
of the paper was to investigate whether junior high school
students have a capacity for understanding probability.
Smock and Belovicz's results contradict those researchre—
sults which say that children have considerable intuitive
knowledge of probability (Yost et. al., 1962; Davis, 1965;
Leake, 1962; Doherty, 1965; Mullenex, 1969; Leffin, 1971).
The results in Smock and Belovicz paper warn educators who
would implement probability in the elementary schools that
they Should not assume too much. While children in the
Smock study did indicate some knowledge of probability in
very simple situations, the extent and generalizability of

their knowledge was found to be very limited.

64

A secondary purpose of the Smock and Belovicz study
was to use data obtained from college students responses
to probability items in order to construct similar items
for the junior high subjects. An analysis of the college
students' responses indicated that the college subjects
had not acquired the rules or strategies of responding in
probabilistic situations that would correspond to the con-
cepts underlying probability. Smock and Belovicz conclude
that probabilistic learning, which might be expected to
exist among college students on the basis of their eXperi-
ence, has not taken place. They recommend further study
of the processes underlying the concepts of probability

among college students.

The Pilot Study

In the winter quarter of 1976, a pilot study was con-
ducted by the author to investigate college students' mis-
conceptions of probability. The subjects for the study
were 54 college students enrolled in a finite mathematics
course at Michigan State University. The specific purposes
of the pilot study were:

1. To ascertain the degree to which college students
use the availability and representativeness heuristics to
give estimates for the probability of events.

2. To develop and teach an experimental activity-

based course in elementary probability and statistics.

65

3. To test the effectiveness of the experimental
activity—based course in helping college students to over-
come reliance upon the availability and representativeness
heuristics when making estimates for the likelihood of
events.

4. To compare the effectiveness of the eXperimental
course in overcoming reliance upon the availability and
representativeness heuristics to that of a lecture-based
course in finite mathematics.

The subjects had pre-registered into sections of finite
mathematics and were grouped accordingly. One section was
selected as an experimental group to receive the activity—
based course and another was selected as a control group.
The control group was given a lecture course in finite math-
ematics. The text used was Finite Mathematics by weiss and
Yoseloff (1975). An outline of the topics that are covered
in the lecture-based course is listed in Appendix C of this
study.

Personal background data was gathered on the subjects,
and it indicated that most of the subjects were second term
freshman. There were two juniors and one senior in the
experimental group, and several upperclassmen in the control
group. The subjects were business, horticulture, agricul-
ture, or biological science majors for the most part. A
few of the freshman had not yet declared a major.

The finite mathematics course was created at Michigan

State University for the purpose of providing an alternative

66

to College Algebra and Trigonometry II for students in
business, agriculture, or biological sciences who would
not continue on to calculus. All of the subjects in this
study had completed a course, College Algebra and Trig-
onometry I.

The experimental group had 24 subjects and the con-
trol group 30. On the first day of class, the nature of
the experimental course was explained by the experimenter
to the students. The subjects were told that the course
would be activity—based and that they would work in small
groups on experiments and problems. If they felt uncomfor-
table about learning mathematics on their own or in small
groups, the students were encouraged to transfer into a
regular lecture section of finite mathematics. No one trans-
fered.

0n the second day of the term, a pretest constructed
by the experimenter was administered to both the experimen-
tal and the control groups. The pretest was designed to
give information about the subjects' reliance upon the avail—
ability and representativeness heuristics in estimating the
likelihood of probabilistic events prior to any formal train-
ing in probability. Personal background data indicated
that only a very few of the subjects had had any course work
on probability, and that it was usually only a few days in
Ihigh.school. The pretest also contained items to give infor-

ination about the subjects' knowledge of some probability

67

concepts, such as simple probability, one-, two-, and
three-dimensional sample spaces, combination, permutations,
and expected value.

The pretest questions consisted of several items that
were used by Kahneman and Tversky in their research (1972,
1973), several items from the National Assessment of Edu-
cational Progress, and items constructed by the experimenter.
In order to gain insight into the reasoning process used by
subjects to make their responses, the subjects were asked
to supply a reason for their responses to some of the items
which dealt with availability and representativensss. A
revised version of the pretest can be found in Appendix D
of this study. An analysis of responses on the pretest

items indicated the following:

1. The subjects relied heavily upon the heuristiCs
of availability and representativeness in making
their responses. Results identical to those of
Kahneman and Tversky were Obtained in all but
one item (item 3 on the revised pretest). Rea-
sons given by subjects for their responses sup-
port the contention that the heuristics of
availability and representativeness influenced
their decisions.

2. There was no significant difference between the
responses of the experimental group and the
control group on any of the pretest items. The
degree of reliance upon availability and rep-
resentativeness was the same for both groups
prior to a course on probability. There was no
difference between the two groups in knowledge
of probability concepts prior to formal course
work in probability.

The results of the pretest, therefore, supported the

hypothesis of Kahneman and Tversky that combinatorially

68

naive college students rely upon the heuristic of repre-
sentativeness and availability in estimating probability.
The pretest results also indicate that there is no reason
to believe that the two groups have not been drawn from
the same sample of college students.

In addition to the pretest, taped interviews with 12
subjects randomly selected from the exPerimental group were
carried out during the first two weeks of the quarter. The
subjects were asked questions similar to those that dealt
with representativeness and availability on the pretest,
and were asked to think outloud as they responded. Analysis
of the protocols from these interviews yielded results in
agreement with the written pretest. The subjects employed
the representativeness and availability heuristics in order
to decode complex probabilistic situations and make a judg—
ment for the likelihood of events. The subjects who were
taped often verbalized processes that were highly indicative
of availability and representativeness. "There are more
X's available to choose from." "This sequence of heads and
tails is not random." "I can draw more paths in this grid
than in this one because each row has more X's."

Following the pretest, course work was begun on the
experimental or control materials. Each class met 5 days
a week for a 50 minute period.

The content of the experimental and the control courses

in probability was nearly identical for the first 4 1/2

69

weeks of the quarter. Both courses covered counting prin-
ciples, simple probability, applications of counting prin-
ciples to probability, independent events, and uniform and
non-uniform finite probability models in the first 4-5
weeks of the term. The control group spent a little more
time on conditional probability. The experimental class
spent a little more time on calculating probability of
sequential events where it was necessary to multiply the
probabilities of successive independent events.

During the second—half of the 9 1/2 week quarter, the
content of the two courses differed. The control group
studied linear programming and the simplex algorithm, and
then studied exPected value and game theory. The experi-
mental group studied game theory and expected value, and
then spent time on elementary statistics. The statistics
included measures of central tendency and variability, the
binomial distribution, and an introduction to the chi-
squared procedure. In addition, several periods in the
experimental course were devoted to the misuse of statistics.
The subjects read Egg, 2 Lie with Statistics by Huff (1954),
and then reported on instances of misuse of statistics that
they found in newspapers, magazines, textbooks, and on tele-
vision. The fundamental difference in content was that the
experimental course spent time on some elementary statistical
concepts, while the control course learned linear programming

subjects in the experimental class were also given several

70

problems out of Mosteller's Fifty Challenging Problems i_
Probability (1962) to work on over a period of several
weeks in the second half of the term.

The main differences between the two courses were the

method of presentation of the topics, the materials and

 

texts that were used in the course, the sequence in which
the topics were presented, and the requirements of working
in small groups and keeping a log.

The experimental group worked through a set of 9 in-
class activities that were developed by the experimenter.
These activities were carried out by small groups of fOur
students each. There were six small groups in the exPeri-
mental class. The members of the groups were interchanged
after each activity so that every student had an opportunity
to work in a group with every other student during the term.
A copy of the revised versions of these activities that
were used in the main study can be found in Appendix B. A
detailed description of each of the activities is contained
at the beginning of Chapter 3 in a discussion of the main
study, and will not be repeated here.

At the end of the quarter a posttest was administered
to both groups to assess the degree to which the two groups
relied upon the availability and representativeness heuristic
after eXposure to a course in probability. A revised ver-
sion of the posttest can be found in Appendix D of this
study. The posttest was similar to the pretest, except

that it was shorter and did not include many of the items

71

on simple probability concepts. The items dealing with
availability and representativeness were the same as on
the pretest. A method of scoring the responses to each
question was devised. The subjects were asked to give
reasons for their responses. Their responses and reasons

were graded.

Points Response
3 Correct reasoning and correct
answer
2 Correct answer with a good

start on the correct reasoning
but reasoning was incomplete

1 Correct answer, but no reason
supplied, or incorrect reason—
ing

0 Incorrect answer or no response

A total test score was calculate for each subject, and
ranks were assigned on the basis of total test score. In
addition, an availability score and a representativeness
score were calculated for each subject. A low availability
score indicated that a subject was employing the availability
heuristic on those items that dealt with availability. Sim-
ilarly, a low representativeness score indicated reliance
upon the representativeness heuristic instead of application
of correct probability principles.

The Mann-Whitney U-test was used to analyze the data
and compare the two groups on the total test score and on
the two subscales. Siegal (1956) mentions that this non-

parametric test is appropriate for use in experimental

72

designs when the experimenter does not wish to make the
assumptions of a t-test. The only assumptions of the Mann-
Whitney U-test are that the two groups are independent of
each other and that the test scores represent a distribu-
tion which has underlying continuity. The experimenter has
no reason to believe that either of these assumptions was
violated.

The following hypotheses were tested using the Mann-

Whitney U-test.

1. There is no significant difference between the
two groups on total test scores.

2. There is no significant difference between the
two groups on availability scores.

3. There is no significant difference between the
two groups on representativeness scores.

Cronbach's o-coefficient of reliability was calculated
for the availability subscale and the representativeness
subscale. The coefficient represents how well the scores
from one administration of a test are indicative of the
universe of scores. The a-coefficients for posttest scores
were .70 on the representativeness subscale and .48 on the
availability subscale. The a-coefficient for availability
is low, but there are only four items on the availability
subscale. Results on a four item test are more likely to
influence by chance or guessing than a longer test.

All three hypotheses were rejected at the .001 level

of significance. Significant differences were found between

73

the exPerimental and control groups on total test score,
and on both subscales.

Comparisons of pretest and posttest responses were
made separately on each item within the groups. Chi-squared
statistics were calculated for each question comparing pre-
test and posttest responses of each group. There was a
tendency for both groups to improve on most items on the
posttest. This tendency was much stronger in the experi-
mental group than in the control group. Chi-squares between
the two groups on each posttest item indicated that there
was a tendency for the eXperimental group to do better than
the control group on most items. The chi-square statistics
were significant at the .01 level on 7 test items, and at
the .05 level on one test item of the combined 11 items on
the representativeness and availability scales. The fre-
quency of correct responses was higher in the experimental
group on each of these 8 items where significant differences
occurred.

The results of the pilot study encouraged the experi-
menter to follow the pilot study with a main study in which
revised versions of the activities and instruments were used.
Items on the pretest and posttest were reworded and rese-
quenced to avoid some confusion that arose during the admin-
istration of the initial versions.

Item to scale correlations were calculated for each

item on each of the two subscales of the posttest and item-

deleted reliability o-cofficient were calculated for each

74

item within each subscale. As a result, the question
If heads has come up ten times in a row
on a fair coin, and you could win $10 by

guessing the result of the next toss, what
would you guess?

was deleted from the representativeness scale in the main
study. This item had a negative or near zero correlation
with every other item on the representativeness subscale.
Furthermore, the o-coefficient for the representativeness
subscale was higher, from .63 to .70, with this item deleted.
Apparently there is more to this question than the expected
representativeness of a 50-50 sample of heads and tails.
While 14 of the 49 subjects said that tails should come up
because "things should even out", 12 said that they would
"stick with a winner“. The fairness of the coin bothered
some subjects even though they were told the coin was fair.
It is also possible that the context of a “gambling“ situ-
ation complicates the question so that the decision made

by the subject is not based solely on representativeness,
but also upon superstition.

As a result of teaching the experimental materials in
the pilot study, each activity was completely rewritten and
enlarged. Many extra questions and problems were added to
the final version of the activities, which are in Appendix
B. A set of notes to the instructor was developed for each
activity to assist other instructors in identifying prob—
able trouble spots in each activity, to suggest additional

topics for inclusion during each activity, and to point out

75

places where brief mini-lectures might help to introduce
new concepts or to summarize the results of an activity.
A complete outline of the course and a day-by-day list of
plans were written based upon the experience gained from
teaching the pilot materials. This outline is contained
in Appendix A.

The results of the pilot study on the experimental
course were encouraging as to the possibility of teaching
a college course in introductory probability and statistics
by an activity-based method with these materials. The ap-
parent success of the experimental group in overcoming some
reliance upon availability and representativeness has been
noted above. The results should be interpreted cautiously,
because the subjects within each group are certainly not
independent. The Mann—Whitney test used the individual
subjects as the unit of analysis. This non-parametric ana—
log of a t-test does not require each Observation within a
group to be independent. One can conclude on the basis of
the results of the Mann-Whitney test that the control group
and the experimental group scores on the posttest were not
drawn from the same population, with only a .001 probability
that this conclusion is incorrect.

In addition to the results of the hypothesis testing,
support for the success of the activity based course was
obtained from student evaluation forms written up a the

end of the course. (The form can be found in Appendix D).

It is no overstatement to say that nearly every student in

76

the experimental course felt that it was the best mathe-
matics course he/hhe had ever taken. The students expres-
sed an initial hesitancy about working in groups, and about
not being told "how to do it“, when working on problems.
However, after several weeks of working in groups and having
success in working things out for themselves, their initial
fears subsided. Several evaluations indicated that "the
instructor has not taught us anything, but has made it easy
for us to teach ourselves". It should be pointed out that
the learning was very highly guided by question sequences
in the experimental activities. The students were indeed
teaching each other, but in a very controlled context.

As a result of the pilot study, two experimental sec-
tions were taught the following term, and compared to two
control sections of finite mathematics. The main study
utilized the revised pretest and posttest, and the rewritten
activities. The design and results of this study are pre—

sented in the next three chapters.

CHAPTER III
A DESCRIPTION OF THE DESIGN OF THE STUDY

In this chapter a detailed description of and rationale
for the experimental activity-based course is presented. In
addition, the course in finite mathematics which was taught
to the control groups is discussed. The two courses are
compared on the basis of content and teaching method. This
chapter also includes a description of the subjects, the
procedure of the study, a statement of the hypotheses tested,
a section on the instruments used to test the hypotheses,

and a section on the method of data analysis.

The Experimental Course

The experimental course was constructed by the author
to provide an alternative approach to the course in finite
Inathematics (Mathematics 110) at Michigan State University.
TPhe eXperimental course covered much of the same content
<>rdinarily treated in the finite mathematics course, such
Eis probability, expected value, and simple game theory.
Iiowever, some topics from elementary statistics were inte-
syrated into the course, and the materials used and teaching
Inethod employed were very different than those normally

\Jtilized in the Mathematics 110 course. The primary purpose

77

78

of the experimental course, as described in chapter one,
was to provide a learning environment and learning experi-
ences which might enable college students to overcome their
reliance upon the heuristic of availability and representa-
tivenss when making estimates for the likelihood of events.

In this section the material, the content, and the

teaching method used in the experimental course will be
discussed. The role of the instructor and the role of the
students will also be presented.

A series of nine activities in probability, combinatorics,
game theory, expected value, and some elementary statistics
svere developed by the experimenter. These activities formed
'the foundation for the content of the experimental course
inn finite mathematics. Each activity is accompanied by a
set of notes to the instructor. These notes contain sug—
gestions to the instructor for his role during the activity.
ITTIe notes indicate problems that are likely to come up as
tlie students do the activities, suggest procedures for over-

<2<>ming some of the trouble spots, and provide possible di-
rections for pursuing the activities in more depth. A com-
Plete set of the activities with notes to the instructor
can be found in Appendix B. A brief description of each
a(Itivity will be given here.

The first three activities are concerned with simple
Probability models, both uniform and non-uniform. Coins,

taicks, and dice are thrown, the outcomes are recorded, and

79

experimental probabilities for events are calculated based
upon relative frequency of the outcomes. Then the student
is asked to make a theoretical model of the experiment.

The model involves listing all the possible outcomes from
the experiment and assigning probabilities to those outcomes.
Comparisons are then made among the guesses for the likeli-
hood of events made by that students prior to carrying out
the experiments, the experimental probabilities, and the
probabilities based upon the theoretical model. The the-
oretical distribution is graphed against the experimental
distribution, and comparisons between the two are made.
The students are asked to list the assumptions and limita-
tions of their experiments and the assumptions of their
mathematical model. The data gathered from all three of
the first activities is used in later activities and prob-
lem sets.

The fourth activity provides an introduction to counting
Principles. Difficulties encountered in listing all the
Outcomes from the coin activity (64 outcomes) and the dice
a<=tivity (216 outcomes) are used to motivate the investiga-
tion of a more systematic way to count the outcomes of an
experiment. The concepts of permutation and combination
are introduced via a sequence of spelling problems. Stu—
dents are asked to list all the distinct "words" that can
be spelled using all of the letters in G A K, E Z A K L,
L 2 A K L, L Z A L L, and so forth, where any arrangement

spells a word. Then they are asked to count the number of

80

"redundant" spellings of each word that can occur when
listing words with repeated letters. The students are

subsequently led via a sequence of questions to the model

# arrangements
# redundancies per word

the number of distinct "words" that can be formed.

. This model can be used to count

The concept of a combination is isolated as a special
case of these spelling problems, when there are only two
distinct letters from which to choose. The words “combina—
tion" and "permutation" are ppp ppgg at all, either in the
activities or by the instructor, during the course. The
sequential counting principle forms the basis for every
<counting problem. Thus, the formulas developed for the
spelling problems and the binomial coefficient in activity
41 are treated as special cases of the sequential counting
£1rinciple. Activity 4 concludes with a series of compli-
cated counting problems and applies the results to the

ca lculation of the probability of some complex events in-
‘VIleing cards. The questions are designed to facilitate
‘tlie discovery of the solutions by the students. These first
if<>ur activities took about four weeks to complete.

The second half of the course concerns applications of
Probability to game theory, expected value, and some simple
illferential statistics. A set of notes on game theory and
another on expected value were written by the author and
distributed to the class. These notes can be found in
APp‘endix B. Prior to a formal discussion of game theory,

tlhe students were asked to play several two-person games

81

and to record the outcomes. In activity 5 they were asked
to suggest what they thought were the best strategies for
each of the players in each of these two-person games. 0f
the games that were played some had "mixed" and some had
"strictly determined" optimal theoretical strategies. After
activity 5 was completed, the instructor gave a minilecture
on two-person games in which optimal strategies for 2 x2
games were thoroughly discussed.

Activity 6 introduces expected value. In this case,

a lecture on expected value preceded the activity. A method
<3f calculating the expected value for any 2 x2 two-person
53ame was presented. The method, called the method of odd-
nnents, was from The Compleat Strategyst (Williams, 1954).
bﬂany of the examples that appear in the notes on game theory
and in activities 5 and 6 were based upon problems from
William's book.

Activities 7 and 8 involve the effects of sample size
11E>on measures of central tendency such as the median and
Ineaan, and upon measures of variability such as standard de-
'VTiation. In activity 7, the students are asked to guess
tile number of cards that should be turned over from the top
(Di? a well-shuffled deck of cards in order to assure at least
‘3 .50% chance of getting an ace. The eXperiment is then
carried out for samples of size 10, 20, and 100. The median
nuHuber of cards necessary to get an ace is calculated for
the samples of size 10, 20, and 100. Finally, the theoret-

icnal number of cards necessary to assure a probability of

82

%-of getting an ace is calculated. Comparisons are made
among guesses, experimental medians, and the theoretical
number of cards. This activity was suggested by an example
in Probability with Statistical Applications (Mosteller,
Rourke, and Thomas, 1969). In activity 8, the students
calculate means and standard deviations for samples of ran-
dom two-digit numbers. Samples of size 5 and of size 25
are used to obtain data on the behavior of means and stan—
dard deviations. The effects of sample size on the range
of observed means and standard deviations is treated in
activity 8 by means of a series of questions and problems.
IData from the dice experiment is also used to generate
:samples of various sizes in order to observe the behavior
c>f means and standard deviations.

The ninth activity is less structured than the previous
exight. It presents the students with a challenge and allows
tJIem to decide for themselves what the direction of the
Eicztivity will be. The challenge is presented in the form
of the statement: "Pulse rates go up when taken by a member
of the opposite sex". The problem is to design and carry
<3I1t an experiment which will test the truth of this state-
ment. The challenge was suggested by Dr. William Fitzgerald
of Michigan State University.

The nine activities were supplemented by several texts

alwd.by homework problem sets. The texts used in the course

Were Spatistigs p2 Example: Exploring Data and Weighing

83

Chances (Mosteller et. al., 1973), Fifty Challenging Prob-

lems ip Probability (Mosteller, 1962), and How 59 Lie with

 

Statistics (Huff, 1954). Six of the chapters in the two

 

Statistics py Example books were assigned as homework.
Problems were selected from these chapters to be written

up and handed in. The problem sets assigned from.§tatistics
py Example included work on regression, counting circular
arrangements, and estimating wildlife population by the
capture-recapture method (Exploripg Data: Sets 4, 10, and
12 by Shulte, Cohen, and Chatterjee, respectively). The
sets from Weighipg Chances dealt with random digits and
:Simulation, the binomial distribution, and the chi-square
Iprocedure (Sets 2, 4, and 6 by Carlson, Link and Brown, and
Carlson) .

Five problem sets constructed by the author were also
assigned in the experimental course. The problems concerned
Sampling with and without replacement, game theory, appli-
<ZEition of the chi-square procedure, and a generalization of
'tlie coin experiment. Several problems from Mosteller's book
1:33:31 Challepging Problems ip Probability were assigned.
-Aﬂnong these were the Birthday Problem (#31), the probability
tﬂlat the roller wins at craps (#9), the Flippant Juror (#3),
and the Three—Cornered—Duel (#20).

Students in the experimental course were required to
write up and hand in all in-class activities and homework

assignments, and to keep a log that contained all the notes,

84

activities, and assignments from the course. The log
counted one-half the grade in the course. In addition to
the activities and problem sets, the students were also
required to find ten examples of misuses of statistics in
newSpapers, magazines, textbooks, or other sources, to write
a critique for each misuse they found, and to enter the
critiques in their logs. The book by Huff ﬁpgﬂpp pig pggp

Statistics provided the rationale for the analysis of mis-

 

uses of statistics. The other half of the grade for the
eXperimental course was determined on the basis of three
in-class tests. In each of these tests the students were
allowed to use the texts for the course and their logs.
EPhe use of hand-held calculators was highly encouraged at
2:11 times during the course, including on examinations.
JTt was, in fact, essential to use a calculator on the in—
class tests.

The content, materials, and requirements of the eXper-
linnental course have been described above. we now turn to
E3 description of the teaching method used in the course.

The students in the experimental course worked together
3111 class on the activities in small groups of four or five
Peeple. The data for each activity was generated, organized
and analyzed by the group. The students were strongly en-
c3<>uraged to co—operate with one another to solve problems
3&3 a group rather than individually, to share ideas with

one another, and to help all the members of their group

85

understand the concepts in each activity. The groups were
changed after every in-class activity so that everyone had
a chance to work with everyone else during the course.

Thus, the members of the group taught each other as they
interacted while working on an activity in class. In order
to facilitate the small-group work, the class was conducted
in the mathematics laboratory at Michigan State university.
The mathematics laboratory contains enough small tables and
and chairs to accomodate six small groups of four students
each. The instructor in the experimental course provided
feedback to the groups and to the individual students re-
lated to their progress on the activities and homework as-
signments. During an activity, the instructor circulated
among the groups, clarifying questions and assisting groups
who had stalled on a particular problem Sometimes this
assistance took the form of a series of questions put to

the groups by the instructor. The questions were intended
to take the group back to a concept which they already knew,
and then, step-by-step, lead them up to the source of their
original question. Thus, all hints on activities that were
given by the instructor were of an indirect nature. The
technique of answering a question with another question

‘was used to encourage the groups to work out the problems
for themselves, and to keep the investigation on each activ-
ity as open-ended as possible. The activity often contained

questions or problems which had several possible solutions.

86

The responsibility for determining the direction in which

a particular activity would go was left to each group. The
instructor only intervened if a group was not working in

a direction which was in accord.with the goals of the ac—
tivity.

Outside of class, the instructor's job consisted of
reading all the activities and problem sets that were entered
in the log by each student. The activities and problem sets
were collected one at a time upon completion and graded by
the instructor for completeness and correctness. If any-
thing was not complete or correct, the difficulty was pointed
out to the student and the student was asked to make the
necessary revisions. Every assignment was circulated back
and forth between instructor and student until it was com-
plete and correct. The assignments, mentioned above, in-

cluded 9 in-class acitivities, 6 problem sets from Statistics

 

py Example, 5 problem sets on probability devised by the
author, and 10 written critiques of the misuses of statistics.
Students and instructors thus constantly exchanged informa-
tion on the progress of each activity and each problem set.
The role of the instructor in the experimental course,
as outlined above, is somewhat like that of a judge, a diag-
nostician, a devil's advocate and a critic. There are sev-
eral instances where the notes to the instructor suggest
that a short lecture be given on a topic, such as the sequen—

tial counting principle, game theory, or expected value.

87

However, for the most part, the instructor's role is that
of a resource person and an evaluator.

A complete day by day outline of the activities and
assignments for the experimental class can be found in
Appendix A. A report on what happened in the groups during
the conduct of each activity, and on some of the successes
and difficulties encountered by the students on the problem
assignments, will be presented in chapter 4, along with the

results of the statistical analysis of this study.

A Description of the Control Course.

 

The text that was used for the course in finite mathe-
matics (Mathematics 110) at Michigan State University is

Finite Mathematics by Weiss and YOseloff (1975). This text

 

was used in all sections of Mathematics 110 that were taught
in the spring quarter of 1976 except for the two sections
that received the eXperimental course. A complete outline
and description of the topics that were covered in the course
taught from Weiss and YOseloff can be found in Appendix C

of this study.

The course in finite mathematics began with counting
principles and probability. The sequential counting principle;
permutations; combinations; mathematical models of sample
spaces; probability of a simple event; probability of unions,
intersections, and compliments; uniform and non-uniform sample
spaces; applications of combinatorial counting techniques to

jprobability; the binomial distribution; expected value; and

88

conditional probability were all covered in the first four
weeks of the term in mathematics 110 in the control classes.

The next three to four weeks of the course were con-
cerned with linear programing. Linear programing was first
introduced in two dimensions from a geometric point of view.
Solutions were obtained by testing extreme points of the
intersection set of a system of linear inequalities. Fol—
lowing the geometric introduction to linear programing, the
simplex algorithm was taught in order to handle general linear
programing problems in higher dimensions. Operations on
matrices and Row transformations on matrices were discussed
in order that they might be applied to the simplex algorithm.

The course concluded with a two week unit on game theory.
Two person two-by—two games were completely characterized.

The simplex algorithm was used to obtain optimal strategies
for nxn games.

The method of teaching used in the control classes was
by lecture to a class of approximately 30 students. The in-
structor's primary role in the course was to prepare and
deliver a daily lecture on material from weiss and Yoseloff's

Finite Mathematics.

Comparison of the Experimental and Control Courses

The exPerimental course integrated some topics from
elementary statistics into the study of probability. Mea-
sures of central tendency, variability, the effects of sample

Size on statistical parameter estimates, the chi-square

89

procedure, simulation of experiments by means of random
numbers, and examples of the misuses of statistics were
covered in the experimental sections. The control sections
covered linear programing instead of the elementary statis-
tics. Combinatorial counting techniques, simple probability
models, expected value, and game theory were discussed at
length in both courses. Table 3.1 indicates the order of
the topics presented in each course and the approximate

length of time spent on the topics in each class.

TABLE 3.1

ORDER OF TOPICS IN THE EXPERIMENTAL
AND CONTROL COURSES

 

EXPERIMENTAL CONTROL
CLASSES CLASSES
(lst) Probability 4 1/2 (lst) Counting Techniques
(2nd) Counting Techniques weeks (2nd) Probability
(3rd) Game Theory - 2 weeks (3rd) Linear Programing
and Matrices - 3
weeks
(4th) Statistics - 3 weeks (4th) Game Theory — 2
weeks

 

The materials used in the two courses were obviously
different. The control classes used the text Finiteéggphgf
matics (weiss and Yoseloff, 1975). The experimental classes
.performed the nine activities, used notes and problem sets

that were written by the experimenter, and used Statistics

9O

py Example, How 59 Lie with Statistics, and Fifty Challenging

Problems ip Probabilipy as texts and references.

 

The most striking difference between the two courses
was in the teaching method. In the control classes, the
role of the instructor was primarily as lecturer. The lec-
turer interprets and conveys large quantities of mathematical
information and concepts. The role of the students in the
control classes was primarily to receive and process the
information conveyed by the instructor. The roles of in-
structor and student in the control course are typical of
most of the teaching that presently occurs in undergraduate
mathematics courses at Michigan State University.

In the experimental classes, the students assumed a
much more active role and were responsible for teaching
the material to each other. The instructor acted primarily
as a guide, a counselor, and a resource person. In the con-
trol classes, the instructor presented the formulas for
counting techniques and developed models of sample spaces
and simple probability experiments for the students. In
the experimental class, the students isolated their own
formulas and built their own probability models. The first
attempts at these formulas or models were sometimes inade-
quate. However, concepts in probability were constantly
refined by each successive activity, and so the models could
constantly be revised. The students in the experimental

class were required to revise all written assignments until

91

they were complete and until they had satisfactorily re-
solved all the problems.

Hand-held calculators were used in the experimental
classes to obtain immediate numerical values for complex
probability problems and combinatorial expressions. The
calculators were not used in the control classes as an

integral part of the course.

Rationale for the Experimental Course

 

The experimental course was intended to help students
become better intuitive statisticians. Specifically, it
was hoped that the style of this course would help students
to overcome their reliance upon the heuristics of availa-
bility and representativeness. In order to reach this goal,
small group work, experiments, guessing, model building, the
use of hand-held calculators, and the role of the instructor
as diagnostician were all incorporated into the experimental
course. Each of these components was considered by the ex—
perimenter to be essential to the process of replacing sub-
jective probability intuitions with statistical probability
models.

Activities 1-3 were constructed to contend with the prob-
lem of representativeness by confronting students with the
inaccuracy of their own guesses for the likelihood of events,
and subsequently having them build their own theoretical
models of the coin, tack, and dice experiment. The problems

on counting principles in activity 4 asked students first to

92

guess, then to list outcomes long-hand, and finally to iso-
late counting principles, This slow introduction to count-
ing.techniques was devised to develop the alternative of
actually counting the outcomes instead of relying upon the
heuristic of availability of get an estimate for the likli—
hood of events. In situations where it was impossible to
count all the instances of an event, the importance of ob-
taining a large enough unbiased sample of the outcomes was
emphasized. Activities 7 and 8, and problem sets from
Statistics py Example on wildlife population and on simu—
lation by means of random digits, all dealt specifically
with sampling techniques and the effects of sample size on
means and variability. It was hoped that considerable ex-
posure to the effects of sample size might reduce the wide-
spread belief in the "law of small numbers" (Tyersky and
Kahneman, 1971) .

The use of calculators was considered to be an essential
component in the develOpment of probabilistic intuition.
subjective probability estimates and empirical probability
results from experiments could be instantaneously compared
to theoretical probability values once the model for an
experiment had been developed by the students. The results
could be graphed, and students could begin to contend with
the problem of why their estimates were ofter so far off
from the real probability of an event. The constant feed-

back on the accuracy of their guesses was intended to help

93

make the students more cautious, or perhaps even more accur-
ate, when they estimated the likelihood of events.

These sections have described and compared the experi—
mental and control courses that were taught in the pring
term of 1976 at Michigan State University, and which were
used in this study. The next sections describe the design

of the study itself.

Subjects

The subjects in this study were 85 undergraduate stu-
dents who had enrolled in a course in finite mathematics
in the spring quarter of 1976 at Michigan State University.
A personal background form was filled out by the subjects
at the beginning of the quarter to obtain information about
major field, previous high school and college mathematics
courses, and eXposure to probability and statistics prior
to the course in finite mathematics. 80 of the subjects
in the study, 48 men and 32 women, completed and handed in
the form, which can be found in Appendix D of this study.
There were 51 freshmen, 12 sophomores, 9 juniors, and 8
seniors among the subjects.

Most of the subjects were business or accounting majors,
or were majoring in some branch of agriculture, horticulture,
or natural resources. The breakdown of the subjects accord-
ing to major was as follows: business or some branch of
business — 28, accounting - l7, agriculture or horticulture-

7, parks and natural resources - 7, animal husbandry — 4,

94

food systems — 2, communications - l. The remaining 14
subjects had not yet declared a major at the time of the
study.

In response to the question on high school mathematics
courses, 5 subjects indicated that they had five high school
mathematics courses, 13 had four courses, 37 had three
courses, 20 had two courses, 4 had one course, and 1 sub-
ject had no high school mathematics courses. The mean num-
ber of high school courses taken by these subjects was 2.75.
Most of the subjects indicated that they had taken one year
of high school algebra (2 courses), and one—half year of
geometry (1 course).

The results of the question on previous college courses
taken at Michigan State University showed that 78 subjects
had successfully completed the prerequisite course in col-
lege algebra, mathematics 108. Furthermore, a substantial
number of the subjects took one or more remedial courses
in high school algebra prior to attempting mathematics 108.
It was found that 51 of the subjects took mathematics 082-104,
‘which is essentially a review of high school algebra II level
:material. Of these 51, 14 also took mathematics 081-103,
'which introduces and reviews the material that corresponds
to high school algebra I.

The majority of the subjects involved in the study in-
ciicated that they had had no previous course work that dealt

VVith probability or statistics in any way. Only 21 subjects

95

mentioned that they had taken a course which involved any
probability or statistics. Of these 21, 14 indicated that
their previous experience was limited to one or two weeks
in a high school business course or precalculus course, or
to a very brief exposure in a college genetics course.
Only 7 subjects among the 80 in the study had prior formal

course work in probability or statistics.

Procedure

 

In the spring quarter of 1976, students registered
into seven sections of finite mathematics, mathematics 110,
at Michigan State University. Four of these sections were
randomly selected for this study. The mathematics 110 course
was offered at 1:50 p.m. and at 3:00 p.m. during the spring
quarter. The four sections were randomly assigned to either
the experimental activity-based course in elementary proba-
'bility and statistics, or the finite mathematics course
‘based upon the Weiss and YOseloff text (1975). One section
of each treatment for each time slot was included in the
study. The two sections of finite mathematics were desig-
nated as control groups (C1 and C2), and the activity
based sections were designated as exPerimental groups (El
and E2). Information concerning the subjects in each group

can be found in tables 3.2, 3.3, and 3.4.

96

TABLE 3.2

NUMBER AND SEX OF SUBJECTS
WITHIN EACH GROUP

 

 

 

 

Group MALE FEMALE N
C1 14 12 26
C2 9 5 14
E1 11 9 20
E2 14 6 20

TOTAL 48 32 80

TABLE 3.3
CLASS LEVEL AND MAJOR.FIELD
# Upper- Business Accounting Other*

Group # Freshmen classmen Majors Majors Majors
Cl 19 6 7 9 9
C2 9 5 3 3 8
E1 9 ll 8 4 8
E2 14 7 10 l 10

TOTAL 51 29 28 17 35

*
Includes no preference

 

for a major.

97

TABLE 3.4

PREVIOUS MATHEMATICS COURSE WORK

 

Aver. #
High School Math 081 Math 082 Previous
GROUP Math Courses -103 -104 Prob. or Stat.
C1 2.81 l 14 2
C2 3.08 2 6 1
E1 2.45 5 l4 2
E2 2.73 l 17 2
TOTAL 2.75 9 51 7

 

 

Tables 3.2, 3.3, and 3.4 above indicate that there is
not much difference among the four groups with respect to
the subjects within each group. The majority of the subjects
in each group are freshmen, except for group El which has
9 freshmen and 11 upperclassmen. There was a predominance
of business and accounting majors in all the groups. The
average number of high school mathematics courses is about
the same in all the groups except C2 in which it is slightly
higher. Each group contained a substantial number of sub-
jects who took the remedial mathematics course, mathematics
082—104 at Michigan State University. Only 6 of the 14 sub-
jects in C2 took 082, but this group also had the highest
average number of high school mathematics courses. At most
two subjects in each group said that they had a course in

Jprobability or statistics prior to the study. Two of the

98

seven subjects who had previous experience with probability
were repeating mathematics 110. The sections labeled C1

and E were taught at 1:50 p.m., and those laleled C

2 2
and E1 were taught at 3:00 p.m. The classes met 5 days
a week for a 50 minute period.

Table 3.2 indicates that the sample sizes of the two
control groups were 26 and 14 respectively. The two exPer-
imental groups each had 20 subjects. The imbalance in the
control group sample sizes resulted from the way students
had pre—enrolled into sections at registration. The stu-
dents had favored enrollment at the 1:50 period rather than
the 3:00 o'clock period. The author had no control over
the pre-registration process, as pre-registration section
assignments were made by a computer. During the subsequent
registration process, not enough students added mathematics
110 to fill the 3:00 o'clock sections.

Four different instructors each taught one of the sec-
tions. The experimenter taught section E1 of the study.
Section E2, the other experimental section, was taught by
Al Stickney, an instructor at Michigan State University.
The two instructors of the control groups taught the topics
from Weiss and YOseloff's Finite Mathematics that are out-
lined in Appendix C. Robert Bentley, an instructor, taught
section C1. John Novak, a graduate assistant, taught C2.
The two instructors in the experimental group taught the

activity-based course that is outlined in Appendices A and B.

99

Both these courses have been discussed in detail in previous
sections of this chapter.

On the first day of the term, the personal background
forms described above were distributed in all four sections.
The material and methodology Of the exPerimental course,
including the course requirements, the necessity for access
to hand calculators, and the log notebook containing all the
assignments of the course were carefully explained to the
experimental classes. The students who had enrolled in the
two experimental sections had no idea prior to the first
day of class that they would receive a different course.
After the explanation of the experimental course, students
were given the option of dropping the course and enrolling
in a regular section of mathematics 110. One subject from
each experimental group dropped the course.

A pretest instrument, contained in Appendix D, was
devised by the author and administered to the control classes
on the first day, and to the experimental classes on the
second day. The description of the course and requirements
took up the entire first day in the experimental sections.
The subjects were not told that they were involved in an
experiment. They were only told by their instructor that
he wished to gather information about their knowledge of
some probability concepts prior to the course. The pretest
instrument measured knowledge of some probability concepts,

and the use of the availability and representativeness heu-

Iristic prior to formal course work in probability. A

100

detailed description of the pretest is presented in the
next section of this chapter.

During the last week of the quarter, the groups were
tested for knowledge of certain probability concepts and
for their reliance upon the heuristics of availability and
representativeness after formal course work in probability.
The posttest, presented in Appendix D, contained the same
questions on availability and representativeness as the
pretest. The questions on probability concepts were limited
on the posttest to those which were concerned with complex
situations. Some very simple questions on the pretest that
dealt with relative frequency and uniform probability models,
and with elementary counting problems, were deleted from the
posttest. The results of the pilot study had indicated that
nearly every subject got the simple questions correct, so
these questions were not included because they were not
yielding any information. A description of the posttest can
also be found in the next section of this chapter.

There were 80 subjects in both the pretest and the post-
test samples. Of these 80 subjects, 75 took both the pre-
test and the posttest measures, 39 in the two eXperimental
groups and 36 in the two control groups. The control and

experimental samples each had 40 subjects on both measures.

.Measures

 

The pretest and posttest instruments used in this study

twere constructed by the experimenter. Copies of the

101

instruments are in Appendix D. Three subscales, a proba-
bility concepts subscale (P), an availability subscale (A),
and a representativeness subscale (R) were contained in

each of the two instruments. The items on the tests were
compiled from several questions used by Kahneman and Tversky
(1972, 1973), several items from the instrument used in the
National Assessment of Educational Progress, and items con-
structed by the experimenter.

The items used in the availability scale (A) and the
representativeness scale (R) were the same on both the pre-
test and the posttest. The availability scale consisted of
four items, questions #2a, 2b, 16, and 19, on the pretest
and questions #7, 8, 9, and 10 on the posttest. The ques-
tions were labeled A -A

l 4
constructed by Kahneman and Tversky (1973) and A4 was con-

respectively. A1--A3 were

structed by the author.
There were six items on the representativeness subscale.
They were questions #l7ii, l7iii, 17iv, 17v, 18, and 13 on
the pretest and questions #111, liii, liv, 1v, 3 and 4 on
the posttest. They are labeled R1--R6 respectively. R2,

R and R6 were constructed by Kahneman and Tversky (1972)

3)
while R1, R4, and R.S were constructed by the author.

The probability scale (P) contained 12 items on the
pretest and 5 items on the posttest. The pretest contained

some questions about elementary and counting concepts.

These items are labeled P6-P14, and are, respectively,

102

pretest questions #3, 4a, 4b,5, 6, 7, 9, 10, 12, and 16.
These items were included on the pretest to find out how
much knowledge of simple probability, sample space, and
counting principles existed among the subjects prior to
formal course work in probability.

There were three probability items that appeared on
both the pretest and posttest. They were pretest #4c, 4d,
and 8 and posttest #12a, 12b, and 14. These items were
-P

labeled P 3 respectively. The posttest also included

1
two items on probability that did not appear on the pretest.
These were posttest questions #11 and 13, labeled P4 and

P respectively.

5
There were several items on each of the instruments
which were not part of any subscale. Two questions on count-

ing the number of paths in a grid with two or three rows
were included on each instrument. They are labeled N3 and
N4 (N for "not on a scale“). These items were included

to see if the subjects understood the definition of a path
prior to answering to questions on availability. A path in
a grid of symbols was defined as a sequence of line segments
intersecting one and only one symbol in each row of the grid.
The definition was explained orally by the instructor of
each group before the administration of each of the test
instruments. The instructor drew several grids on a chalk-
board and emphasized that paths could "zig-zag" a great deal.
.A question on estimating the number of paths in a 6 x6 grid

(N ) was on the pretest.

5

103

Both tests also contained off—scale items on the
gambler's fallacy (N2) and Birthday Problem (N1).

The subjects were asked to give a reason for their
answers to every item on the availability and representa-
tiveness subscales, and for their answers to the more com—
plicated probability questions. Each item received a score
of O, 1, 2, or 3 based upon the response and the reason
given for the response. A "O" was the worst and a "3" the
best possible score on a particular item. The method for
assigning points to each item is discussed in chapter four

in the analysis of the results of this study.

Hypotheses
The following hypotheses were tested in this study:

1. There is no significant difference on the
total test score (T) between the groups
taught the experimental activity-based
course (E1 and E2) and the groups taught

the lecture-based course in finite mathe-
matics (C1 and C2) on either the pretest

or on the posttest.

2. There is no significant difference on the
probability concepts subscale (P) between
the groups taught the experimental activity-
based course (E1 and E2) and the groups

taught the lecture-based course in finite
mathematics (C1 and C2) on either the

pretest or on the posttest.
3. There is no significant difference on the
availability subscale (A) between the group

taught the experimental activity-based
course (E1 and E2) and the groups taught

the lecture-based course in finite mathe-
matics (Cl and C2) on either the pretest

or on the posttest.

104

4. There is no significant difference on the
representativeness subscale (R) between
the groups taught the experimental activity-
based course (E1 and E2) and the groups

taught the lecture-based course in finite
mathematics (Cl and C2) on either the

pretest or on the posttest.

Method of Analysis

The pretest data was analyzed by using t-tests on the
four scale means of the pooled experimental groups (EllJEz)
and pooled control groups (ClLJCZ) with the individual
subjects as the unit of analysis. The assumptions of the
t-test model are that the two populations are normally and
independently distributed with equal variances. The author
had no reason to suspect that the assumption of independence
of the individual subjects was violated ppipp to the course
in finite mathematics. Therefore, the individual subjects
were used as the unit of analysis on pretest data. There
was also no reason to suspect violation of the assumptions
of normality and equal variances. Thus, the four hypotheses
were tested using t-tests with a level of rejection set at
a = .05.

The posttest data was analyzed by using t-tests on the
scale means with the class sections as the unit of analysis.
The class sections were the largest independent units after
the courses had been taught.

There was no reason to suspect violation of the assump-

tions of normality, independence, and equal variances with

105

the class section as the unit of analysis. Therefore, the
four hypotheses were tested using t-tests with a level of
rejection set at a = .05.

Differences between the two groups in mean gain scores
on the four scales were also compared using t-tests with
the class section as the unit of analysis. The hypotheses
were tested at the .05 level.

A reliability coefficient for the posttest was calcu-
lated from posttest scores using Cronbach's d-coefficient.
Descriptive statistics for each item on the pre- and post—
test instruments were compiled and compared.

The analysis of the raw data was performed by a CDC-
6500 computer at Michigan State University using SPSS
(Statistical Package for the Social Sciences, Nie et. al.,
1975) packaged programs for the data analysis and output.
The results of the analysis of the data are reported in

section two of chapter four.

Summary

In the spring quarter of 1976, four classes of finite
mathematics were randomly selected and randomly assigned
to one of two courses in finite mathematics. TWO classes
were assigned to an experimental activity-based course and
two to a lecture—based course in finite mathematics. There
were a total of 85 subjects involved in the study. The groups
were pretested and posttested for knowledge of probability

concepts and for reliance upon the heuristics of availability

106

and representativeness in estimating the likelihood of
events. The pretest and posttest instruments contained
items devised by the author and items used by Kahneman and
Tversky (1972, 1973).

This chapter has discussed the nature of the experi-
mental course and the control course in detail. Differences
and similarities between the courses in both content and
teaching methodology have been illucidated. A description
of the subjects, the procedure of the experiment, the hypoth-
eses tested, and the method of data analysis has been set
forth.

In the next chapter, descriptive and statistical re-

sults of the study are reported.

CHAPTER IV

ANALYSIS OF THE RESULTS

OF THE STUDY

Introduction

The analysis of the results is presented in two parts.
Part one contains a report on the eXperimental activity-
based course.

The second part presents the results of the statistical
analysis of the pretest and posttest measures. The results
of the hypotheses testing and descriptive statistics on the

four classes can be found in part two.

Part I: Repprt on the Experimental Course

The day-by—day occurences within experimental group
El were recorded in a log kept by the experimenter. This
section reports on the observations of the experimenter
made during experimental course. The discussion is pre-
sented in three parts: the activities, student critiques
of some misuses of statistics, and the course evaluation

forms filled out by both experimental groups.

The Activities

 

Activity 1. Activity one begins by asking the groups

to guess the probability of getting various numbers of heads

107

108

in tossing 6 coins. The groups performed the experiment

48 times, recording the number of heads. The students were
slow to get started on this first activity. They spent a
long time reading the activity, and appeared hesitant to
begin working on the experiment. After approximately 15
minutes the groups began tossing coins and recording the
outcomes. Experimental probabilities for the outcomes 6
heads, 5 heads,...1 head, 0 heads were calculated from the
data using the relative frequency model.

The groups had difficulty setting up the mathematical
model for the experiment because they could not agree among
themselves how to list the outcomes. Some students felt
that only the number of heads Should determine "an outcome".
For these students there were seven outcomes, from 0 heads
up to 6 heads. Others felt that the result of each sepa-
rate flip changed the outcome. For this latter group of
students, the position of the heads among the six coins
changed the outcome. The first three coins might be heads,
or the second, third, and fifth coin heads. The issue was
debated in the small groups.

The instructor suggested to the groups that they might
consider assigning probabilities to the number of heads
using each of their two approaches to the model. The first
approach to the model was abandoned, for it assigned a prob—
ability of 1/7 to each of the outcomes. The experimental
data had indicated that it was improbable that the outcomes

"6 heads" and "3 heads" were equally likely to occur. In

109

fact, in 192 tosses, the outcome 6 heads occurred only once
in the pooled small-group experimental data.

The second approach to the model was adapted. It soon
became apparent to the students that there were a large
number of outcomes to list. The first attempts to list the
outcomes as sequences of six heads and tails failed because
the groups had not yet developed a systematic way of enum-
erating the outcomes. Gradually, a systematic approach to
listing the outcomes developed in each group. The groups
discovered that if they held the values of some of the coins
fixed while changing the others, the list of outcomes be-
came much more manageable.

When the 64 outcomes in the model had been listed,
theoretical probabilities for the number of heads were cal-
culated. Many students were surprised that their guesses
for the probabilities were so far off. Over half the stu-
dents had guessed that the probabilitiy that three heads
would occur was at least 1/2. The probability of three
heads based on their mathematical model was only 20/64.
They were also surprised that the probability of 6 heads
was so small. Only a few of the students had estimated the
probability of 6 heads to be below 10%.

In their written reports on activity one almost all
the students mistook the assumptions of the exPeriment for
the assumptions of the model. The assumptions of the ex-

periment were that the coins were fair and that there was

uniformity in the tossing procedure in the groups. These

110

assumptions were also listed as assumptions of the mathe-
matical model. Only two students in 20 wrote that the
model assumed that all 64 outcomes were equally likely to
occur, and that the coins were independent of each other.
The students did not have a clear conception of what a
mathematical model was during the first activity. It was
not until much later, after the first four or five activi-
ties, that the necessity for determining the assumptions
of the model itself and the limitations placed by those
assumptions became apparent to the students in the experi—
mental group.

Activity 2. In the second activity, a model for toss-
ing three tacks, listing the outcomes, and assigning prob-
abilities was developed. The groups first had to find an
estimate for the probability P(U) that a tack lands point
up. The range of values for P(U) Obtained by 20 students
each tossing a tack 72 times was from .48 to .76. As a re-
sult of the wide range of outcomes for P(U), a discussion
arose concerning the factors that may have affected the
outcomes — the way the tack was dropped, the height it was
dropped from, and the surface on which it landed. The class
decided to rerun the experiment on estimating P(U) and to
attempt to control for as many nuisance variables as possible.
Some of the students stood a textbook on end and pushed a
tack, sitting point upright, off the top edge of the book.
Thus the groups could control for height, uniformity in

dropping procedures, and landing surface. The range for

111

P(U) on the rerun experiment from the top of the book was
from .52 to .71, with a cluster of values around .60. Other
students controlled for nuisance variables by pushing tacks
off the table top onto the floor. The values for P(U)
from the higher distance off the table were mostly between
.5 and .6. Values for P(U) were determined by averaging
the results for each procedure. It was decided that P(U)
was about 2/3 if the tack was pushed off the book onto the
table, and about .55 if the tack was pushed off the table
onto the floor. In any case, there was agreement among

the subjects that the outcomes point up, U, and point down,
D (on its side), were not equally likely to occur when
based upon the experiment.

However, when the subjects constructed a mathematical
mpdgl for the experiment, the 8 outcomes for tossing three
tacks were each assigned a probability of 1/8. This over-
whelming tendency to see the tack model as a uniform proba-
bility model persisted even with evidence from the second
part of the experiment in which three tacks were tossed
and the outcomes were recorded as ordered triples. The
data indicated that the outcomes were pp; equally likely
to occur, since UUU, UDU, and UUD, occurred much more
often than DDD, DDU, or DUD. The subjects tended to in-
dicate in their logs that there was probably something
wrong with the tacks. "Theoretically, U and D should be
equally likely, even though experimentally they were not",

was written in several logs. The feeling among the students

112

that every probability model was really a uniform model
was difficult to overcome. Manifestations of this belief
persisted throughout the experimental course. Even in
game theory in activities 5 and 6 the subjects tended to
say that in a two-choice two-person game, each choice should
be played 50% of the time, regardless of the payoffs.

The instructor assisted the groups in discovering a

model for the non—uniform case by means of questions.

Instructor: "Suppose that three tacks were tossed
on the table 1200 times. YOu have
decided that P(U) = 2/3. In how
many of those 1200 tosses would you
expect to find the first tack land
upright?"

Student: "800, because that's 2/3 of 1200."

Instructor: "Now, of those 800, in how many would
you expect to see the second tack
land down?"

Student: "2/3 of the 800."

In this manner, the model of multiplying probabilities for
independent outcomes was slowly elicited from the groups.
The written responses to activity 2 were cycled back and
forth between the instructor and most of the students sev-
eral times before all the errors in applying uniform prob-
ability model properties to the non—uniform tack situation
were cleared up.
The theories for P(U) = 2/3 and P(U) = .55 for

table and floor respectively were tested by the chi-square
procedure later in the course. The observed frequencies

for the 8 outcomes from tossing three tacks were tested

113

for goodness of fit with each theory, depending on whether
the groups had tossed the tacks on the floor or on the
table. Results tended to support P(U) = 2/3 for the
table, and to contradict P(U) = .55 for the floor.

Activity 3. This activity on modeling the outcomes

 

for tossing three dice was similar to the coin and tack
experiments. The difficulties with "equally likely vs.
unequally likely outcomes" or “the best way to model the
outcomes" (as ordered triples or as the sum of the three
faces) that appeared in activities one and two were not

as predominant in activity three. The subjects were not
happy about having to list 216 outcomes in order to find
theoretical probabilities. However, many of them discovered
patterns during the process of making the list, or noticed
the symmetry of the frequency distribution. This simplified
the job of listing the outcomes. At the conclusion of this
tedious listing process, the subjects were demanding count-
ing principles that would help them list the outcomes for

an exPeriment.

Activity 4. The fourth activity was constructed to

 

lead subjects to discover several counting principles. The
instructor gave a brief talk on counting at the start of

the activity. The students were led to a point where they
could state the sequential counting principle in their own

words.

114

The outcomes for the first few problems on activity
4 were listed longhand. Very gradually students began to
see that the sequential counting principle would help them
list the outcomes for the number of words that could be
spelled from the letters L Z A K E. It was 5 x4rx3>x2 x
l = 120, using each letter once. It took the groups a long
time to discover what to do when some of the letters occur-
red more than once. If the letters were L Z A K L, or
L Z A L L, the first conjecture made in each group was that
only 1/2 (respectively 1/3) of the 120 possibilities would
actually be distinct. The instructor encouraged the sub—
jects to list the outcomes. When only 20 outcomes for words
from L Z A L L could be found, the search for an alter-
native approach was begun. The groups each discovered that
although the first L could account for three redundancies
per word, the second L still accounted for two more re-
dundancies per word. Thus they first divided the total
number of 120 arrangements of five letters by 3, and then
reduced the remaining 40 by oneAhalf. ‘With the help of
examples, this long process eventually produced the formula
nI/KK1:K2:""KP:)’ where 11 is the total number of letters
in the word and Ki is the number of repetitions of the
1th letter. The subjects were elated when they discovered
this formula. The classroom was filled with triumphant

smiles.

115

The concept of a combination was isolated as a special
case of this formula, with only two distinct letters to
choose from The first step made by the group towards a
general solution for "combinations“ consisted of writing
(5 x4'x3) + (# redundancies), for problems such as picking
the number of groups of three runners who could finish in
the top three slots in a field of 5 runners. This gradu-
ally changed to (5 x4-x3)/3!, and finally to 53/13! x22).
The process of counting the number of groups of x people
that could be chosen from a group of y people was seen
to be equivalent to the process of counting the number of
distinct words that could be spelled with x 0'3 and
(y-x) N's, where C stands for "chosen" and N for "not
chosen“. The equivalence among the two-letter spelling
problem, the # subsets of size x from a set of size y,
the binomial coefficient, and Pascal's triangle was pointed
out by the instructor later in the course.

Activity four provided a more efficient and systematic
approach to counting the outcomes for flipping 6 coins
(activity one) or tossing three dice (activity three). The
students in the experimental course were quite pleased after
they had worked through activity four. They indicated that
they felt they had learned a great deal even though at times
it had been very frustrating fOr them.

Activity 5. In this activity three games were played

by pairs of students in order to provide an introduction to

116

two-person game theory. In the first game one or two
fingers were thrown by the players. Player one received
payoffs of $10 or $30 when different numbers of fingers
were shown. Player two received $20 whenever there was a
match. After playing the game 20 times, results indicated
that there was a tendency for player two, the matcher, so
win. The subjects generally attributed the success of the
winner of the game to his ability to "psyche out" the other
player and guess what the other player would show. No one
indicated that they thought the game was rigged. The in—
structor suggested that the game be simulated to make it
difficult to pick up a pattern in the opponent's choices.
Every pair of subjects simulated an equally likely game,
making their choices on a 50:50 schedule with coins. It
did not occur to the subjects that perhaps a 50:50 schedule
was not in the best interests of both players. In fact, if
player-one does play a 50:50 schedule and player two, "the
huckster", catches on, player two has an eXpected value of
$5.00 if he always plays the finger on which the worst he
can do is to lose $10.

The advantages of carefully alternating among the choices
became more apparent to the subjects when they played game
2. This 4'x4 game had black and red cards as entries in the
payoff matrix. There were so many more choices in this game
than in the first game that the students began to develOp

strategies for playing the game. Rows or columns with too

117

many of the opponents entries were disdained or altogether
avoided. There was a tendency to pick the "safer" rows
which had two cards of each color. High payoff cards like
9's and 10's that were imbedded in a row that otherwise con—
tained all opponent's cards were only occasionally gambled
upon. The beginnings of naive "mixed strategies“ were used
by the students in this game.

The last game was a 4-x4 game that contained a saddle
point. Pairs of students decided upon a strategy in this
game that resulted in choosing the saddle point in 5 of the
8 pairs. Each of the other three pairs picked one of the
two co—ordinates of the saddle point.

This activity was conducted prior to any formal in-
struction on game theory or strategies for two-person games.
At the end of the activity the subjects were already dis-
playing some intuition for both "mixed" strategies and "pure”
strategies in their choices.

Activity 6. This activity on expected value consisted
primarily of working out solutions to problems and games in
order to calculate the long—run payoff. A lecture on ex—
pected value concerning the method of calculating the payoff
fOr a 2 x2 game was given by the instructor. The optimal
strategy for playing a 2 x2 game was simulated with coins
and played 25 times. The subjects were surprised at how
close the mean payoff for 25 plays came to the theoretical

payoff calculated from the optimal mixed strategy. The

118

2 x2 game had a theoretical payoff of 4.5, while the range
of 10 means for 25 plays of the game with coins was 4.2-
4.7. Another surprise in this activity occurred in the
Carnival game that has two dice in a cage. The students
were asked to guess which of the outcomes had the highest
expected value, and then to calculate the expected values
of all the outcomes. Bets paid even money on 8 (or 6),
two-to-one on 9 (or 5), four-to-one on 10 (or 4), six—to-
one on 11 (or 3), and ten-to-one on 12 (or 2). The house
won everything on 7. Most of the students felt that 8 (or
6) was the best bet because it had the greatest probability
of occurring. It turned out that 10 (or 4) was the bet
that minimized the gambler's losses on this game.

Activity 7. This was the first of two activities on
the effects of sample size upon the variation of measures
of central tendency and variability. The subjects guessed
the number of cards that they would have to turn over to
have at least a 50% chance of getting at least one ace.
Guesses were somewhat high, mostly from 12-15 cards. Only
one guess of 26 cards was made. The guesses indicated that
subjects were much more aware of the deceptive nature of
the probability of disjunctive events than they had been
at the beginning of the course. A pretest question had
asked for an estimate of the number of people needed so
that there would be a 50% chance that at least two people

had the same birthday. 62 out of 80 subjects responded

119

that it would take 183 people or more. 20 subjects re—
sponded that it would take exactly 183 (see the next sec-
tion.) The tendency to use 50% as a respresentative
multiplier of the total population had almost disappeared
in the experimental group E1 by the time activity 7 was
done.

Experimental data on the number of cards necessary to
obtain an ace was gathered for sample sizes of 10, 20, and
100. The median was used by the subjects as an estimate
for the number of cards necessary to have at least a 50%
chance of getting an ace. The medians for sample size 10
ranged from 4 to 13. The medians for sample size 20 ranged
from 4 to 9. The median for sample size 100 was 7. The
true value was calculated to be 9.

The experimenter has reason to believe that several
of the really low median estimates for 10 trials were the
result of very poor shuffling. The subjects did not take
time to carefully reshuffle the cards between each trial.
It surprised some subjects that the medians for sample size
of 20 did not narrow down to the theoretical value better
than the observed 4 to 9 range. It is likely that poor
shuffling was partly responsible for this range of medians
for sample sizes of 20. Mosteller (1961) reports means
ranging between 9.25 and 11.75 for five samples of 20 trials.
Mosteller was counting the card on which the first ace ap-

peared, so his theoretical value was 10. A machine did the

120

shuffling in Mosteller's eXperiment. The subjects dis—
cussed how bias could be introduced into a sample by im-
proper or careless sampling techniques.

Activity 8. Means and standard deviations for sets

 

of two-digit numbers of various sample sizes were calculated
in this activity. The samples of size 5 yielded means from
34 to 71, while samples of size 25 had means from 43 to 52.
The standard deviations calculated for samples of size 5

or 25 indicated a similar “narrowing“ of the range of ob-
servations in the larger samples. The subjects concluded
that measures of central tendency and variability are rather
unstable for small samples, and may not be very accurate
indicators of the true population parameters.

Activity 9. The students were presented with a chal-

 

lenge which was much less structured than the first eight
activities. The problem was to design and carry out an
experiment to test the truth of the statement "Pulse rates
go up when taken by a member of the opposite sex". The
design of the experiment was set up by the experimental
class during an open class discussion. This activity was
different from the previous ones in that it was not handled
in the small group format.

The activity began with a brainstorming session. Sug-
gestions for parts of the design and the experiment were
given while the instructor acted as a secretary and recorded

all suggestions on a chalkboard. Experimental class E1

121

decided to carry out the experiment on themselves. Pulses
were taken on the temple or neck in order to maximize the
chance of raising the pulse rates. Each person in the class
took his (her) own pulse first. The pulse-by-self outcome
was used as a basis for comparison with pulse rates found
by members of the same sex or members of the opposite sex.

Each "subject" had his (her) pulse taken by two members
of the class of the same sex and by two members of the oppo-
site sex. The pulses were recorded for a 30 second interval
and a 60 second interval on each of the five trials. The
sample in the class contained 11 males and 9 females.

The data was organized into 2 x2 contingency tables of

the forms

 

 

 

 

# up # not up # uP * not up
A) B)
mal 5 same
e sex
opp.
females sex

 

 

 

 

 

 

 

 

Form A was set up for the two cases where members of the
same sex or members of the opposite sex took the pulse.
Form B was set up for the two cases where males or females
took the pulse rates.

Decisions concerning the design of the exPeriment or
the experimental procedure were all made by the students
themselves. After the data had been collected and organized,

the instructor made up a series of questions to help the

122

students analyze the data. The question sheet can be found
in Appendix B following the notes to the instructor on ac-
tivity nine.

All contingency tables were set up for both the 30
second data and the 60 second data. Chi-square statistics
were calculated for all contingency tables to test the in—
dependence of males vs. females or same-sex vs. opposite-
sex with reSpect to raising the pulse rates. The calculations
were carried out for both the 30 second and 60 second data.
No significant differences were found on any of the contin—
gency tables. Several students tested the 30 second data
against the 60 second data and wrote the results in their
logs. No differences were found between the pulse rates
for the two time intervals.

The last question on the work sheet for activity nine
asked the student to write up a critique of their experiment.
This activity was performed at the end of the course after
the students had already analyzed many articles in newspapers
and magazines for misuses of statistics. Most of their logs
included the following sorts of suggestions and criticisms

of the pulse-rate exPeriment.

1. We all knew each other an that may have biased
the results. It would have been interesting to
have done this activity at the beginning of the
course to see if there were any differences.

2. It would be better to have one fixed person of
each sex take everyone's pulse. This should be
done by a very handsome man and a very beautiful
woman and they each should be an expert at taking

123

pulses. we are not very good at taking pulses
and this may have biased the results.

3. Knowing who is taking your pulse might affect
the pulse rate. Thus the design of this ex-
periment might not really help to answer the
original question. The subjects should be
blindfolded so that any bias that might occur

from knowing the pulse-taker could be control-
led.

The subjects in the experimental class realized that their
experimental design and procedure admitted many sources of
bias due to uncontrolled nuisance variables.

These observations on activities 1-9 were made in

experimental class E Conversations between the exPeri-

1'
menter and the instructor of eXperimental class (E2) indi-

cated that similar things happened in the other experimental
class on all of the activities. The design of the experiment

for pulse rates was exactly the same in E2’ except that the

30 second data was not gathered.

Misuses of Statistics

 

Each student in the two experimental classes was re-

quired to read How 39 Lie with Statistics (Huff, 1954), and

 

to write at least ten short critiques of misuses of statis—
tics that they found. Articles were taken from newspapers
and analyzed for correct or incorrect uses of statistics.
Examples of misleading graphs, inflated percentages, biased
samples, insufficient sample size, and verbose quantitative
descriptions with little or no foundation in fact were fer-

reted out by the students.

124

Many examples of the use of statistics to mislead

 

the consumer were found in advertisements. "Nine out of
ten trucks made by the company since 1972 are
still on the road". Students wrote: "Did this company sell

the same number trucks every year? Were most of their trucks
made and sold in the last two years?" There were many
criticisms of reports on products that claimed "a statis-
tical test showed that product A was better than ...".

The students pointed out: "329 says that this is better?

If a statistical test was performed, why not state and pub-
lish the results? What is meant by the word 'better', better
in What way?"

Examples of graphs that were conveniently chopped off
to make a certain point, or graphs that were unclearly la-
beled with sliding scales on either the ordinate or the
abscissa, or even graphs that had no labels or units of
measure at all, were included in every student's log. Many
of the misleading graphs were found in prominant weekly
periodical magazines, such as $132 or Newsweek.

Misuses of percentages were mentioned by practically
every student. Many times percentages were used to mask
small sample sizes. The percentage of cost increase or of
profit was found to be conveniently inflated or deflated by
merely Changing the denominator units. One student reported
that a recent tuition hike at his university was proported

to be a 13% increase when in fact it was a 22% increase.

125

The university calculated the percentage of increase by
I/(C-kI) where I was the increase cost per credit hour
and C was the cost before the tuition hike. The student
calculated the increase by I/C.

Several examples of statistics that were based upon
a biased sample were analyzed. One student found an article
that appeared on page one of the largest selling newspaper
in a major city. The article mentioned that 93% of the
people that were interviewed were against cross district
busing to achieve racial integration in the schools. A
continuation of the article on page 14 mentioned that 70%
of the people that were interviewed did 295 have children
in the schools. Another student criticized the charts that
are published by bookstores and record stores that include
the "top ten" books or records for the week. These figures
are often based upon sales to a particular class of people,
with special interests. Yet the list appear to claim juris-
diction over all age groups, interests, and races.

Several students found articles which misused the word
“average" in one way or another. A recent sports article
in a major newspaper for a large metropolitan area claimed
that the average salary of a player in the National Basketball
Association was $110,000. 0ne dubious student gathered data
on all the N.B.A. players' 1975-76 salaries. He concluded
that the average salary for a starting player in the N.B.A.

was around $90,000 - $110,000 depending on the team.

126

However, most of the players receive much lower salaries.
Salaries of second and third string players were only around
$20,000 - $30,000.

In another case of misuse of averages, a report on
suicide in the Sunday magazine section of this same news-
paper was found to be inaccurate. The report claimed that
the suicide rate in the United States had gone up 10% in
the last few years. A student who had access to an almanac
with all the population statistics for the past 10 years
reported that the suicide rate had gone Qggg 2% over the
last few years. The magazine article did not take the pop—
ulation growth of the country into account in their calcu—
lations.

These are only a few of the misuses of statistics found
by the students in the experimental classes. By the time
the pulse-rate activity (activity nine) was carried out these
students had been sensitized to misuses of statistics, and
they were able to pick out nearly every flaw in the design

and procedure of the pulse eXperiment that they had set up.

Experimental Course Evaluation Forms

At the end of the experimental course the students in
the experimental sections were asked to respond to a ques-
tionnaire. The students were asked to comment on working
in groups, the activities, the log that they kept, the texts,
what they liked about the course and what they disliked.

The questions used to gather this information can be found

127

at the end of Appendix D. The responses generally were in
the form of a letter to the instructor. The students had
the option of signing their responses, or of keeping their
identity secret by not signing them or by typing their
responses.

There was general agreement among the students that
the log kept by each student was essential to the course.

It provided a study guide, a reference book, and a tremen-
dous sense of accomplishment for the subjects in the exper-
imental course. It was also agreed by everyone that working
in groups was an excellent way to learn mathematics. Inter—
action and cooperation in solving mathematics problems was

a new experience for these students. Their comments on the
evaluation forms indicated that they thoroughly enjoyed
working in small groups. Several students did mention that
a few of their group members had a tendency to rely on other
people's work and to not contribute much. However, most of
the students were very active and cooperated well in the
groups.

The activities performed in class were called "relevant
to everyday life“. Several evaluations mentioned that the
activities really helped to "prove" the theory that was
being learned in the course.

Reactions to Statistics by Example (Mosteller et. al.,

 

1973) texts were mixed. Some students felt that these texts

were very helpful and had applications to their major field

128

of study. other students felt that the books were hard
to read, confusing, and, at times, poorly written. The
students had a great deal of difficulty understanding the
chapters on the chi-square procedure and on the binomial

distribution in the weighing Chances volume.

 

0n the other hand, ngntg Lie with Statistics (Huff,
1954) was generally considered to be a highpoint in the
course. Most of the students indicated that Huff's book
and their own critiques of misuses of statistics had made
them much more aware of numbers, and of the deceptive manip-
ulations that could be performed with numbers in order to
publish slanted statistical information.

Overall attitude towards the experimental course was
very positive. Almost every evaluation indicated that the
students had thoroughly enjoyed the class. Several students
wrote that they were “amazed to think they had enjoyed a
mathematics class". Initial frustration at not having the
answer or the rules or the "formulas" provided for them by
the instructor had disappeared for most of the students by

the end of the course.

129

Part II: Analysis of the Statistical Results

 

Introduction

 

The statistical analysis is presented in three sections.
Section one contains the results of the hypothesis testing
and compares the experimental and control groups on pretest,
posttest, and mean gain scores. Section two contains an
analysis of each individual item on the pretest and post-
test. The third section contains scale—to-scale item-to—

item, and item-to-scale correlation matrices.

Comparisons Between the Experimental and Control Groups on
the Four Scales

In this section the results of the hypothesis testing
on pretest, posttest, and mean gain scores are reported.
Comparisons between the experimental and control groups
were made on the total test score and on the subscales
(probability, availability, and representativeness) using
t-tests with a = .05 as the level for hypothesis rejec-
tion. The results of the pretest comparisons are reported
in tables 4.1 and 4.1A - 4.1D. The results of the posttest
comparisons are reported in tables 4.2 and 4.2A - 4.2D.
The results of the comparisons in mean gain scores on the
availability and representativeness subscales are reported

in tables 4.3, 4.3A, and 4.3B.

130

Notation
In the analysis of results, Cl and C2 stand for

the two control groups and El and E stand for the

2
experimental groups. The total test score is designated
by TOTAL. The three subscales are designated by PROB.
(probability), AVAIL. (availability), and REP. (represent-

ativeness).

Reliability

Cronbach's coefficient-a was calculated for the post—
test total score. The reliability coefficient-a was found
to be .70. The coefficient estimates the percentage of
the variance in the test scores that is due to non-error

variance (Cronbach, 1970).

131

TABLE 4.1

SCALE MEANS AND STANDARD

DEVIATIONS FOR THE FOUR GROUPS
ON THE PRETEST

 

 

 

 

SCALE
(Points TOTAL PROB. AVAIL. REP.
on (81) (42) (12) (18)
Scale)
GROUP
01 i = 32.62 i = 22.70 x = 1.81 x = 4.27
N = 26 s = 9.31 s = 3.96 = 1.44 s = 4.23
02 i = 34.64 2 = 22.86 2 = 2.43 2,: 2.71
N = 14 s = 9.23 s = 5.67 s = 1.55 s = 2.55
E1 2 = 33.50 i = 22.15 i = 1.75 i = 3.95
N = 20 s = 9.24 s = 4.43 s = 1.52 s = 4.37
32 x = 37.20 x = 23.30 i = 1.85 x = 5.30
N = 20 s = 8.18 s = 3.33 s = 1.35 s = 4.45
c1 ()02 x = 33.33 x = 22.75 5': = 2.02 x = 3.72
N = 40 s = 9.21 s = 4.56 s = 1.49 s = 3.77
EILJEZ x = 35.25 x = 22.73 i = 1.80 i = 4.62
N = 40 s = 8.81 s = 3.90 s = 1.41 s = 4.41
GRAND x = 34.24 x = 22.74 2 = 1.91 i = 4.17
N = 80 s = 9.02 s = 4.22 s = 1.45 s = 4.10

 

 

132

TABLE 4.1 A

t-TEST RESULTS FOR PRETEST

 

 

 

 

 

 

 

 

 

SCALE TOTAL
GROUP Mean S.D. t-value d. f. Sig.
ClLJC2 33.32 9.22 1.00 78 .318
El U E2 35. 35 8. 18 t is not significant
TABLE 4. l B
t-TEST RESULTS FOR PRETEST
SCALE PROBABILITY
GROUP Mean S.D. ) t-value d. f. Sig.
ClLJC2 22.75 4.56 -.03 78 .98
El LJE2 22.72 3.91 t is not significant
TABLE 4.1 C
t-TEST RESULTS FOR PRETEST
SCALE AVAILABILITY
GROUP Mean S.D. t-value d.f. Sig.
ClLJC2 2.02 1.49 -.69 78 .49
El U E2 1.80 1.42 t is not significant

 

 

 

133

TABLE 4.1 D

t-TEST RESULTS FOR PRETEST
SCALE REPRESENTATIVENESS

 

GROUP Mean S.D. t-value d.f. Sig.
ClLJC2 3.72 3.77 .98 78 .33
ElLJE2 4.62 4.41 t is not significant

 

 

 

The results of t-tests on the pretest data (TABLES
4.1 A — 4.1 D) indicated that there were no significant
differences between the experimental groups and the control
groups on the total test score or on any of the three sub-

scales prior to the course in finite mathematics.

134

TABLE 4.2

SCALE MEANS AND STANDARD
DEVIATIONS FOR THE FOUR GROUPS
ON THE POSTTEST

 

 

 

 

 

SCALE
(Points TOTAL* PROB.* AVAIL. REP.
on (57) (15) (12) (18)
Scale)
GROUP
01 i = 19.70 x = 9.04 x = 2.15 x = 5.96
N = 26 s = 8.24 s = 3.65 s = 1.91 s = 3.48
02 i = 30.57 i = 12.86 x = 3.50 x = 7.57
N = 14 s = 7.64 s = 2.48 s = 2.28 s = 4.34
El 2 = 34.60 x = 12.65 2 = 4.30 2 = 10.50
N = 20 s = 6.98 s = 2.85 s = 1.83 = 4.66
E2 2 = 37.35 x = 11.85 i = 4.05 2 = 12.10
N = 20 s = 10.13 s = 3.34 s = 2.72 s = 4.05
01(J02 x = 23.50 i = 10.37 x = 2.62 x = 6.52
N = 40 s = 9.52 s = 3.74 s = 2.29 s = 3.83
ElLJEz 2.: 35.98 x = 12.25 i = 4.17 2.: 11.30
N = 40 s = 8.70 s = 3.09 s = 2.30 s = 4.39
GRAND i = 29.74 x = 11.31 2,: 3.40 x = 8.91
N = 80 = 11.02 s = 3.54 = 2.33 s = 4.74

 

*This scale was shorter on the posttest than on the

pretest.

135

TABLE 4.2 A

t-TEST RESULTS FOR POSTTEST
SCALE TOTAL

 

 

 

 

 

 

 

 

 

GROUP Mean S.D. t—value d.f. Sig.
CIUC2 25.13 7.70 1.93 2 .19
ElLJE2 35.98 1.94 t is not significant
TABLE 4.2 B
t-TEST RESULTS FOR POSTTEST
SCALE PROBABILITY
GROUP Mean S.D. t-value d.f. Sig.
ClUC2 10.95 2.70 .67 2 .57
EluE2 12.25 .57 t is not significant
TABLE 4.2 C
t—TEST RESULTS FOR POSTTEST
SCALE AVAILABILITY
GROUP Mean S.D. t—value d.f. Sig.
CltJC2 2.83 .95 1.97 2 .19
EluE2 4.17 .18 t is not significant

 

 

 

136

TABLE 4.2 D

t-TEST RESULTS FOR POSTTEST
SCALE REPRESENTATIVENESS

 

GROUP Mean S.D. t—value d.f. Sig.

Cluc2 6.77 1.13 3.99 2 .05
t is significant

EluE2 11.30 1.13 P(t 2_3.99) < .05

 

 

 

The results of the t-tests on the posttest data (TABLES
4.2 A - 4.2 D) indicated that there was a significant dif—
ference between the eXperimental and control groups on the
representativeness scale in the direction of the experimental
groups. There was a tendency for the experimental groups to
achieve higher scores on the total test scale and on the
availability scale, although the difference was not signifi-
cant. There was no difference between the experimental and
control groups on the posttest probability scale.

Next, it was of interest to compare the mean gain scores
of the experimental and control groups on the availability
and representativeness scales. The results are reported

below in tables 4.3, 4.3 A, and 4.3 B.

137

TABLE 4.3

MEAN GAIN SCORES ON
THE AVAILABILITY AND REPRESENTATIVENESS

 

 

 

 

SCALES
GROUP AVAILABILITY REPRESENTATIVENESS
- ...- * = - —- =
Cl (X2 X1) .34 (x2 X1) 1.69
C2 (X2 ~Xl) = 1.07 (X2 -X1) = 4.86
El (x2 -x1) = 2.55 (x2 -x1) = 6.55
* ‘2 is the posttest group mean on the scale.
i1 is the pretest group mean on the scale.
TABLE 4.3 A
t-TEST RESULTS ON PRE-POST
GAIN SCORES ON
THE AVAILABILITY SCALE
GROUP Mean S.D. t—value d.f. Sig.
Cluc2 .71 .52 8.35 2 .01
t is significant
EluE2 2.38 .25 P(t 2_8.35) < .01

 

 

 

138

TABLE 4.3 B

t-TEST RESULTS ON PRE-POST
GAIN SCORES ON
THE REPRESENTATIVENESS SCALE

 

GROUP Mean S.D. t—value d.f. Sig.

ClLJC2 3.28 2.24 4.30 2 .03
t is significant

EIUEZ 6.68 .18 P(t Z 4.30) < .03

 

 

 

Significant differences were found on the mean gain
scores between the eXperimental and control groups on both
the availability subscale and the representativeness sub-
scale. The differences indicated significantly higher pre-

test to posttest mean gain scores by the experimental groups.

139

Individual Item Statistics

Descriptive statistics are reported for each item on
the pretest and posttest in this section. The statistics
include: the method of assigning points to each item; a
distribution of the responses on each item for each class
section; means and standard deviations on each item for
each class section; and the grand mean and standard devia-
tion for the entire sample of subjects for each item.

The section first presents the representativeness sub-
scale items (Rl-R6). The availability subscale items Ecllow
(Al-A4). Next the probability items that were on the post-
test are presented (Pl-P5). There are several items which
did not appear on any scale (N1-N5) which are discussed.
These off-scale items are interspersed among related on-
scale items. Finally, elementary probability items that
were included on the pretest but not incorporated into the
posttest are analyzed. These probability items are labeled
..P .

6 14
Each table containing a distribution of responses on

P

an item has the method of assigning 3, 2, 1, or 0 points on
that item in the column headings. All items that appeared
on both the pretest and the posttest measures are reported
in "double" tables with the pretest results at the top and
the posttest results at the bottom. There were 26 subjects
in group C1, 14 in C2, 20 in E1, and 20 in E2. When the
total of the row frequencies for a group do not add up to

the proper number on an item, it is because several subjects

140

in that group left the item blank. A brief discussion of

the results follows the statistical tables on each item.

(R1) Which of the following sequences is more likely to
occur for having two children?

a) B G b) G G c) about the same chance

 

 

(note: coins were used on the pretest versions of
R1, R2, R3, and R4)
TABLE 4.4 A
GROUP RESULTS ON ITEM Rl
Pretest 3—same with 1-same
GROUP correct reason no reason O—B G O-G G Mean S'D
Cl 10 l 14 O 1.27 1.51
C2 5 O 8 O 1.07 1.49
E1 8 1 10 O 1.10 1.45
E2 11 3 5 1 1.65 1.42
TOTAL 34 5 37 l 1.29 1.46
Posttest
Cl 15 O 11 O 1.85 1.49
C2 8 O 6 O 2.36 1.28
E1 16 O 4 0 2.40 1.23
E2 19 O l O 2.85 .67
TOTAL 58 O 22 O 2.32 1.26

 

 

141

TABLE 4.4 B

t-TEST ON POSTTEST ITEM R1

 

GROUP Mean S.D. t-value d.f. Sig.
ClLJCZ 2.10 .36 1.54 2 .26
EluE2 2.63 .32 t is not significant

 

 

 

This item was constructed by the experimenter to test
the use of the representativeness heuristic for a short
sequence of events. The subjects who chose B G said that
the outcome was more likely to occur "because B and G
have an equal chance of happening and so the odds favor one
of each". These subjects evidently felt that the outcome
B G was representative of the 50:50 odds and so would be
more likely to occur than the outcome G G. There is evi-
dence (TABLE 4.4 A) that these college subjects used the
representativeness heuristic to estimate the likelihood of
this short sequence of events on the pretest.

There was a tendency for the exPerimental groups to do
better on the posttest on this item, although tht t-test
(TABLE 4.4 B) was not statistically significant. There was
improvement in all four groups on this item from pretest to

posttest.

142

(R2) Which of the following sequences if more likely to
occur for having six children?

a) BGGBGB b)
c) about the same chance

BBBBGB

TABLE 4.5 A

GROUP RESULTS ON ITEM R

 

 

 

2

3-same 1-same
Pretest with . but O-a) 0-b) MBan S D
GROUP correct incorrect B G G B G B B B B B G B ' '

reason reason
Cl 7 l 16 O .85 1.35
C2 1 1 11 O .29 .83
El 4 3 10 2 .75 1.21
32 5 2 13 O .85 1.31
TOTAL 17 7 50 2 .72 1.22

Posttest

Cl 10 l 15 O 1.19 1.47
C2 7 O 7 O 1.50 1.55
E1 10 2 7 1 1.60 1.47
E2 14 O 5 l 2.10 1.41
TOTAL 41 3 34 2 1.57 1.48

 

143

TABLE 4.5 B

t-TEST ON POSTTEST ITEM R

 

2
GROUP Mean S.D. t—value d.f. Sig.
CILJC2 1.34 .21 1.72 2 .23
EluE2 1.85 .35 t is not significant

 

 

 

Those subjects who chose B G G B G B indicated that
B B B B G B was less likely to occur because of the long
string of B's. 50 subjects responded in this manner on
the pretest. This result supports the hypothesis of Kahneman
and Tversky (1972) that combinatorially naive subjects be-
lieve that B B B B G B is not representative of the dis-
tribution of boys and girls. Reliance upon representativeness
appeared to be less on the posttest. Half the subjects cor-
rectly said that both sequences were equally likely to occur.
The subjects who got the item correct on the posttest either
mentioned that each successive child was independent of pre—
vious children, or calculated the probability of each of the
sequences to be 1/64. There was a tendency for the experi-
mental groups to score higher on this item, although the

t-test was not significant.

144

 

m4.H HG.H H ma am no N4 q<aoa

 

 

mm.H m~.~ o o m 0 ma mm
m4.a mn.H o v m a HH Hm
6m.H 0m.a o m 4 o 8 N0
64.H mo.H H o m H m H0
ummuumom
mm.H N». 0 mm mm 8 RH q<aoe
Hm.H mm. o o n m m mm
NN.H 0a. 0 a m m 4 an
mm. mm. o m m a H No
Gm.H mm. o k m N 8 H0
mu Gm m COWMQH
.o.m cum: 0 0 0 m m m m 0 m 0 0 m ommm 06M zommmu on uumuuou moomo
An..o Am..o GEMMIO mammla can uwmuwum
mammlm
m

m ZﬁHH mom mBADmMM mDQmO
d ®.¢ ﬂdmda

mucmno msmm 0:» 050nm Au 0 o o m m m An m o m 0 0 m Am
mcmupaﬁno xﬂm mcﬂ>m£ How H5000 0» hamxwa mnoe ma mocmsqmm QUHSR Ammv

145

TABLE 4.6 B

t-TEST ON POSTTEST ITEM R

 

3
GROUP Mean S.D. t-value d.f. Sig.
Cluc2 1.29 .30 2.17 2 .16
ElLJE2 2.00 .35 t is not significant

 

 

 

Kahneman and Tversky (1972) found that a majority of
their college subjects chose B G G B G B to be more likely
because B B B G G G was not "random" enough to be repre-
sentative of the process of having children. The format
for their question differed from the multiple choice item
in this study. Kahneman and Tversky asked their subjects
to estimate the percentage of families in which the six
children in the family were born in one of the two orders.
In item R3 the subjects are given the option of picking
"the same chance". Results different from those of Kahneman
and TVersky were obtained. Many more subjects chose "the
same chance" than Chose B G G B G B. H0wever, an analysis
of the reasons given for the responses to this item indicated
that the subjects were relying upon the representativeness
heuristic. About 1/2 of those subjects who chose “the same
chance" on the pretest did so because “there are the same
number of H's and T's" in both sequences. About 1/3 of
the "same chance" responders on the posttest also reasoned
in this manner. Subjects evidently tended to feel that each
of these sequences was equally representative of the 50:50

distribution of Boys and Girls among children.

146

Gains in all four groups were made on this item from
pretest to posttest. There was a strong tendency for the
experimental groups to do better on this item on the post-

test, although the t-test was not significant.

(R ) What is the probability that in six children there

4 will be three boys and three girls?

TABLE 4.7 A

GROUP RESULTS ON ITEM R

 

 

4
Pretest 3-22:64 1-1/64 Olizz O-other Mean S D
GROUP 30-33% 50% responses ° °
C1 0 O 12 9 .OO .00
C2 0 l 11 1 .07 .27
E1 1 O 11 6 .15 .67
E2 1 1 12 2 .15 .49
TOTAL 2 2 46 18 .08 .43
Posttest
C1 0 3 9 14 ll .33
C2 0 O 10 4 O7 .27
E1 6 9 3 2 1.25 1.07
E2 2 10 6 2 .80 .90
TOTAL 8 22 28 22 .56 .87

 

147

TABLE 4.7 B

t-TEST ON POSTTEST ITEM R4

 

GROUP Mean S.D. t-value d.f. Sig.

ClUC2 .09 .03 4.12 2 .05
t is significant

EluE2 1.02 .32 P(t 2_4.12) < .05

 

 

 

The pretest results on this item indicate that a sub-
stantial number of subjects felt that the probability of
3 boys and 3 girls was 1/2 (see TABLE 4. A). The outcome
appears to be representative of the distribution of boys
and girls, one-half of each. Several subjects even claimed
that this outcome "had to happen" or'Whad 100% chance of
occurring".

On the posttest many experimental subjects switched
their answer from 1/2 to either 1/64 or 20/64, while the
control subjects stayed with 1/2. There was a significant
difference between the eXperimental and control groups on
this item (see TABLE 4.7 B). The notation "6C3" or "6P3"
was used by many subjects in the control groups. This
notation was usually put down with a "?" along with a guess
for the probability, often a guess of 1/2. The control sub-

jects were evidently not sure how to use the formula to

calculate the probability.

148

(R5) Which is more likely to occur?

a) Pulling one red ball from a jar containing 10 red
balls and 90 white balls.

b) Pulling four red balls in a row from a jar con-
taining 50 red balls and 50 white balls.

TABLE 4.8 A

GROUP RESULTS ON ITEM R

 

 

 

 

 

5
Pretest 3-1/10 > 2-1/9 > _ O-b)
GROUP 1/16 1/16 1 guessed 4 red Mean S.D.
Cl 3 6 3 13 .84 1.00
02 O 1 3 10 .36 .63
El 1 4 2 12 .65 .99
E2 4 l 4 10 .90 1.21
TOTAL 8 12 12 45 .72 1.01
Posttest
Cl 1 5 8 ll .80 .90
C2 1 3 4 6 85 94
E1 7 4 3 6 1.60 l 27
E2 10 5 3 2 2.15 1.04
TOTAL l9 17 18 25 1.35 1.17
TABLE 4.8 B
t-TEST ON POSTTEST ITEM R5
GROUP Mean S.D. t-value d. f. Sig .
clLJC2 .83 .03 3.78 2 .06

t is significant
EIUEZ 1.87 .39 P(t 2. 3.78) < .06

 

 

 

149

There was a strong tendency on the pretest for the
subjects to extend the probability of getting a red ball
in one pull from the 50:50 distribution to all four pulls.
Subjects felt that the chance of getting four red in a row
was 1/2, and generally were unaware of the multiplication
principle for independent events during the pretest. The
probability of getting one red ball on one pull from the
50:50 distribution was apparently considered by the sub-
jects to be "representative“ of the probability of getting
several red balls in a row.

The experimental groups improved substantially on this
item on the posttest. There was a significant difference

between the experimental and control groups (see TABLE 4.8 B).

150

(R ) The chance that a baby is born a boy is about 1/2.

Over the course of an entire year, would there be
more days when at least 60% of the babies born were
boys in:

a) a large hospital b) a small hOSpital
c) makes no difference

TABLE 4.9 A

GROUP RESULTS ON ITEM R

 

 

6
Pretest
GROUP 3-Small O-Large O-Same Mean S.D.
C1 3 6 17 .34
C2 3 3 8 64 1.28
E1 4 4 12 .60 1.23
E2 7 2 11 1.05 1.47
TOTAL 17 15 48 .64 1.23
Posttest
Cl 8 2 15 .92 1.41
C2 6 l 7 1.29 1.54
E1 14 1 5 2.10 1.41
E2 13 0 7 1.95 1.47
TOTAL 41 4 35 1.53 1.51

 

 

151

TABLE 4.9 B

t-TEST ON POSTTEST ITEM R6

 

GROUP Mean S.D. t-value d.f. Sig.

Clue2 1.10 .26 4.69 2 .04
t is significant

EluE2 2.02 .10 P(t 2_4.69) < .04

 

Kahneman and Tversky claim that subjects tend to be—
lieve that both hospitals should have birth rates that are
equally representative of the population proportion of boys
to girls. Thus, sample size should make no difference.

The pretest results support this claim, as 48 subjects
picked "makes no difference".

Reliance upon the representativeness heuristic dimin-
ished substantially on the posttest, particularly in the
experimental groups. There was a significant difference
(P(t 2_4.69) < .04 between the experimental groups and
the control groups on this item. The subjects from the
experimental groups who chose "the small hospital" wrote
that there was "more chance for a higher proportion of boys
in a smaller hospital because the sample size was smaller“.

The following item, N1, was asked in two different
versions. The pretest version is presented first, followed

by the posttest version.

152

(NI-pretest) A man bets you one dollar that at least two
pe0ple at a party you are attending have the
same birthday. How many people would have
to be at the party so that the man has at
least a 50% chance of winning the bet?

TABLE 4.10 A

GROUP RESULTS ON PRETEST ITEM N

 

1
3-20 2-15
GROUP to 30 to 19 0-183 0—365 0-730 O—other Mean S.D.
C1 4 3 4 2 3 10 .65 1.16
C2 0 O 5 3 1 5 .00 .00
E1 0 3 5 3 5 4 .20 .62
E2 2 l 6 5 3 3 .30 .92
TOTAL 6 7 20 13 12 22 .34 .89

 

 

Many subjects used "50%" as an indicator. They either
halved or doubled the number of days in a year. Five subjects
gave estimates in the thousands. In general, the subjects
gave very inaccurate estimates. They did not seem to be aware
of the pidgeon-hole principle. Many of them gave estimates
for which there were more people than there are days in a

year.

153

(Nl—posttest) People at a Carnival pick one number from
1 to 100. If two people match, they win
a prize. HOW many people would have to be
playing the game in order that there be at
least a 50% chance that there would be win-
ners? Give your best estimate.

TABLE 4.10 B

GROUP RESULTS ON POSTTEST ITEM N

 

 

 

 

1
3—9 2-17 1-21

GROUP to 16 to 20 to 30 0-50 0-100 0-200 Other Mean S.D.

C1 1 l 1 7 4 3 9 .23 .71

C2 2 O 5 4 1 2 O .79 1.05

El 6 2 7 3 O O 2 1.45 1.19

E2 14 l 5 O O O O 2.45 .89
TOTAL 23 4 18 14 5 5 11 1.19 1.27

TABLE 4.10 C
t-TEST ON POSTTEST ITEM N4

GROUP Mean S.D. t-value d.f. Sig.
Cluc2 .50 .39 2.52 2 .13
EluE2 1.95 .71 t is not significant

 

 

 

There was a strong tendency for the experimental groups
to estimate the number of people necessary (13) more accu-
rately than the control groups. There was little change in
the responses of the subjects in the control groups from

pretest to posttest.

154

 

 

 

 

 

(N2) A fair coin is flipped and comes up heads 10 times.
in a row. If you could win $10 on a $1 bet by guess-
ing the next toss, what would you guess and why?

TABLE 4.11 A
GROUP RESULTS ON ITEM N2
2-hedged l-heads .
Péggggt 3-no diff. on go with ggtzais Mean S.D
no diff. winner
Cl 6 1 6 12 .96 1.25
C2 2 O 3 9 .64 1.08
El 0 1 2 17 .20 .52
E2 2 l 4 13 .60 .99
TOTAL 10 3 15 51 .62 1.04
Posttest
Cl 2 1 8 15 .61 .90
C2 2 1 6 5 1.00 95
El 1 5 4 10 .85 1.27
E2 7 6 O 7 1.65 1.04
TOTAL 12 13 18 37 1.00 1.17
TABLE 4.11 B
t-TEST ON POSTTEST ITEM N2
GROUP Mean S.D. t-value d.f. Sig.
Cluc2 .80 .27 1.00 2 .42

EIUE2 1.25 .57 t is not significant

 

 

 

155

On the pretest a majority of the subjects fell for the
gambler's fallacy. Their reasons for picking tails on the
next toss included: “the odds are 1/2, but £25 for 11 times
in a row"; "the probability of a head diminishes each time";
"the law of averages has been broken"; and "tails is due!".
There were not as many subjects on the posttest who fell for
the gambler's fallacy. Experimental class E2 scored better
than the other three classes on this item. There were no
significant differences on the posttest between the experi-
mental and control groups on this item. Control group C1
appears to have regressed slightly from pretest to posttest
on this item. There may be interference on this item caused
by learning to multiply the probability of successive inde—
pendent events. Subjects may be focusing on the entire
sequence of tosses as if it were an event that had not yet

occurred rather than considering the single independent toss.

156

 

 

 

 

 

(N3) How many paths are in this grid?
(pretest X 0 X X (posttest X 0 X X 0
version) X X 0 version) X X O.X
TABLE 4.12 A
GROUP RESULTS ON ITEM N3
Pretest 3-12 2-11 1-10
GROUP paths paths paths O-other Mean S‘D
C1 10 1 3 12 1.35 1.41
C2 12 O O 2 2.57 1.08
El 15 l O 4 2.35 1.23
E2 18 O O 2 2.70 .92
TOTAL 55 2 3 20 2.15 1.31
Posttest 3—20 2-19 1-28
GROUP paths paths paths O-other Mean S'D
C1 10 1 O 15 1.23 1.48
C2 11 O l 2 2.42 1.16
E1 17 O 1 2 2.60 .99
E2 15 l O 4 2.35 1.23
TOTAL 53 2 2 23 2.06 1.36
TABLE 4.12 B
t-TEST ON POSTTEST ITEM N3
GROUP Mean 8 . D . t-va lue . f . Sig .
CllJC2 1.83 .85 1.05 2 .40
EllJEZ 2.48 .18 t is not significant

 

 

 

157

There was little change in the results of this item
from pretest to posttest even though all classes had covered
the sequential counting principle. Control group Cl did
not do as well as the other three groups on this item. Most
of the responses that were in the "other“ column listed 8
paths on the pretest and 11 on the posttest. These responses
may have occurred because some subjects forgot to count the
diagonal paths that skip-over one or more columns. The
definition of a path was carefully explained in all sections
prior to both tests. Examples of the different possibilities
for the paths were drawn for the subjects and were visable
during the tests. There was no significant difference be-

tween experimental groups and control groups on item N3.

158

 

 

 

 

 

(N4) How many paths are in this grid?
(pretest X X X. (posttest X,X X
version) X 0 version) X.O X 0
O X X 0 XiX
TABLE 4.13 A
GROUP RESULTS ON ITEM N4
Pretest 3—24 2-22 1-28
GROUP paths pa ths pa ths °'° ther Mean 5' D °
Cl 5 O 2 19 .65 1.20
C2 11 O O 3 2.36 1.28
El 12 O 2 6 1.90 1.41
E2 14 3 O 3 2.40 1.10
TOTAL 42 3 4 31 1.70 1.43
Posttest 3-24 2-22 1-26
GROUP paths paths paths °'°ther “can 5 ° D '
Cl 7 O 2 17 .88 1.33
C2 11 O l 2 2.42 1.16
El 16 0 O 4 2.25 1.33
E2 17 O O 3 2.55 1.10
TOTAL 51 O 3 26 1.91 1.42
TABLE 4.13 B
t—TEST ON POSTTEST ITEM 4
GROUP Mean S. D. t-value d. f. Sig.
Cluc2 1.66 1.09 .95 2 .44
EllJEZ 2.40 .21 t is not significant

 

 

 

159

In accordance with item N3, there were only slight
gains made by the groups on the posttest version of item

N The C1 control group did not do as well on this item

4.
as the other three groups. It was more difficult to "draw“

all the paths in item N than in item N3. The results on

4

N and N on the posttest were nearly identical. Those

1 2
who got both items correct on the posttest used the sequen-
tial counting principle. The t-test found no significant

difference between the experimental groups and the control

groups on this item.

160

(Al) Consider the grids below.

GridA XXXXXXXX GridB
XXXXXXXX
XXXXXXXX

Are there: a) more paths in A
b) more paths in B

c) about the same number
in A and B

xxxxxxxxx
xxxxxxxxx

161

 

 

4m.H Om.H a mN AH N mN A4908
m4.H mN.H A o N 0 AH Nm
ON.H ON.H O O m m N Am
4m.H Om.H o m N N m NO
mA.A ma. 6 NH 4 N 4 HO
ammuumoa
HH.H mm. m mm 4 m NH H4808
mm. 04. H OH O H N Na
mN.H ma. N NA H H 4 Am
mN.H ma. H m N O m NO
mO.A O4. 4 NH A H m H0
. . m aA 4 cA usmAu . m u N macaw
Q m 582 OAOEIO OAOEIO pmmmmsmla mm v mm N m m ummumum

Hump cA NHmnN

 

 

HG zmeH zo maHpmmm mpomo

d 3...» maﬁa.

162

TABLE 4.14 B

t-TEST ON POSTTEST ITEM A

 

1
GROUP MEAN S.D. t-value d.f. Sig.
ClLJC2 1.11 .55 1.58 2 .25
ElLJE2 1.73 .04 t is not significant

 

 

 

According to Kahneman and Tversky (1973), subjects
should favor grid A over grid B because there appear to be
more paths "available" in grid A. This was indeed the case
on the pretest where 53 of the 80 subjects favored grid A.
The reasons given for the choice of grid A included ”there
are more X's in grid A" and "it is easier to draw a path
in grid A". The pretest results support Kahneman and
Tversky's contention that the availability heuristic is
employed by subjects who are unable to count all the out-
comes.

There was still a tendency to favor grid A on the post-
test even though the subjects had been exposed to counting
techniques. However, the tendency was much slighter as only
29 subjects chose grid A.

There was a tendency for the experimental groups to do
better on this item. 19 of the 40 subjects in the exPeri-
mental groups calculated 512 paths in each grid, while only
9 of the 40 control subjects successfully computed the re-
sult. This tendency was born out by the results of the t-
test (TABLE 4.14B) although the test did not find any

significant difference.

163

Consider the grid below. Which type of path is more
likely to occur?

(A

X X 0 X X
X X'X O X
0 X XZX X
X X X X.O
X 0 X.X X

a) a path that hits 4X. and l()
b) a path that hits 5X

(the pretest version used a 6)<6 grid. See next
item, N3)

164

 

 

mm. mm. Nm 0N NH m A H4808
m8. 00. OH m H m 0 NH
44. 04. a m m N O Hm
N0. A8. 4 4 m 0 A No
84. 08. 8 NH 4 O 0 H0
ummuumom
4m. No. 0N 0H Nm N O H4808
Am. m4. 0H 4 o 0 0 NH
Am. mm. 8 N m 0 0 Am
Am. mm. m H m 0 0 N0
4m. 4m. 4 m a N 0 HO
m\¢ m.0 mo macaw mmxu
. . mo mmmcm>Au NHHAAHmHAmam 120340 4040 mo 4 49040
a m 2402 xmlo (wucmmmummu mo mmswumn so pumum pmumAsUku ummumum
10A 004 x4uA AoA 044 x4uA 0000 «IN sAuumunouum

 

 

N

< mH.d mam¢ﬁ

¢ EHBH ZO MBQDWHM mbomw

165

TABLE 4.15 B

t-TEST ON POSTTEST ITEM A

 

2
GROUP Mean S.D. t-value d.f. Sig.
ClUC2 .7O .02 -9.40 2 .01
t is significant
ElLJE2 .60 .OO P(t g -9.40) < .01

 

 

 

The results on this item contradict the results of
Kahneman and Tversky (1973) who fOund that a majority of
their college subjects claimed that there were more paths
that hit all X's. A majority of the subjects in this
study chose paths that hit 5X and 10 (in a 6 x6 grid)
on the pretest. The two kinds of paths were about equally
favored on the posttest.

Kahneman and Tversky claim that subjects favor paths
that hit all X's because there are "more X's available".
An analysis of the reasons given for the responses to this
item showed that availability was'being used by the subjects,
but not in the manner claimed by Kahneman and Tversky. 32
subjects on the pretest chose 4X and l<) to be more
likely because "there was an 0 available". 16 subjects
on the pretest chose 4X and 1() because "the probability
of getting an X at each level is 4/ ". The former reason-
ing indicates the use of the availability heuristic, while
the latter reasoning suggests that subjects felt 4X. and

1C) was representative of the probability of getting an X

166

in a single row. On the posttest similar reasons were

given for the choice 4X and 1(3. However, only 13 were
indicative of "availability of O” on the posttest while

26 suggested the “representativeness" of 4/5. It appears
that the reasons given for the response 4X and l() tended
to switch from depending upon availability on the pretest to
depending upon representativeness on the posttest.

Those subjects (26 on pretest and 32 on posttest) who
chose "5X's" did so because there were more X's, as Kahneman
and Tversky claimed. The subjects who made this choice did
seem to rely upon availability.

In summary, there was no clear cut case for favoring
either of the two responses "4X and l()" or "5X". Fur-
thermore, it was possible that a mixture of both availability
and representativeness were used in the responses to this
item. There were more instances of the use of availability
than of representativeness among the reSponses.

Table 4.15 A indicates that practically no one answered
this item correctly by applying counting techniques. The
significance of the t-value simply indicates that more con-
trol subjects scored a l for choosing "4X and 1(3". The
answer is correct, but the reasons given for it were based
upon availability or representativeness, and not upon count-
ing principles. The significance on this item is an artifact

of the scoring procedure.

167

 

(N5) Give your best estimate for the number of paths in
the grid below. X.X O X X X
X X X 0 X.X
O X X X X.X
X.X X.X O X
X 0 X X.X.X
X X X.X X.O
TABLE 4.16
GROUP RESULTS ON PRETEST ITEM N5
6 5 4 O-way off
GROUP 3—6 2-6 1-6 often 3% Mean S.D.
C1 3 1 2 20 .53 1.07
C2 5 O O 9 1.07 1.49
E1 5 3 l 11 1.10 1.33
E2 4 l 2 13 .80 1.20
TOTAL 17 5 5 53 .84 1.25

 

 

Most of the incorrect responses to this item said there

were 36 possible paths.

used a multiplication principle on items N3

and N

Many subjects who had successfully

in

which there were only 2 or 3 rows in the grid failed to gen-

eralize the multiplication to this 61x6 grid.

what surprising that 17 subjects answered NS

and yet 39 one could use similar counting techniques on items

A2

item was not included on the posttest.

It was some-

successfully,

to estimate the number of paths that hit SX's.

This

168

(A3) A man must select committees from a group of 10
people. WOuld there be:

 

 

a) more distinct possible committees of 8
b) more distinct possible committees of 2
c) about the same number of committees of 8 as
committees of 2
TABLE 4.17 A
GROUP RESULTS ON ITEM A3
3-the
l-the
Pretest same O-Comm. O-Comm.
GROUP correct same of 8 of 2 Mean S'D°
a guess
reason
C1 1 2 3 19 .19 .63
C2 1. 5 l 6 .57 .85
El 0 4 2 12 .50 .81
E2 1 5 l 10 .40 .75
TOTAL 3 l6 7 47 .31 .67
Posttest
Cl 6 O 9 11 .69 1.29
C2 4 l 3 S .93 1.38
El 7 3 4 6 1.35 1.42
E2 5 2 4 8 .85 1.30
TOTAL 22 6 20 3O .93 1.34

 

 

169

TABLE 4.17 B

t-TEST ON POSTTEST ITEM A

 

3
GROUP Mean S.D. t-value d.f. Sig.
Cluc2 .81 .17 1.05 2 .40
ElLJE2 1.10 .35 t is not significant

 

 

 

Only 3 of the 80 subjects correctly calculated 45 com-
mittees for each type of committee on the pretest, and these
3 had taken a probability course prior to the experiment.
There was an overwhelming tendency on the pretest for the
subjects to choose "committees of 2". Remnants of this ten-
dency persisted on the posttest. According to Kahneman and
Tversky (1973), subjects rely upon availability on this
item and tend to believe there are more committees of 2
since instances of committees of two are easier to construct.
The results in TABLE 4.17 A strongly support this contention
of Kahneman and Tversky.

Quite a few subjects had learned how to calculate the
number of committees by the time the posttest was adminis-
tered. It is somewhat surprising that there were not more
than 22 subjects who calculated that 213% = 51%;.- . The
subjects were generally unaware of the fact that choosing a
committee of 8 was the same as choosing a "non-committee" of

2. There was not a significant difference between the two

groups on item A3.

170

 

 

 

(A4) A jar contains 9 red balls, 4 blue balls, and 3 green
balls. Which would be more likely to occur?
a) Pulling at least one green ball in two tries (with
replacement)
b) Pulling two red balls in a row (with replacement)
(Note: The distribution of balls on the pretest was
8 red, 4 blue, and 3 green, and “at least one
blue" was compared to "two red“.)
TABLE 4.18 A
GROUP RESULTS ON ITEM A4
3-a) with 2-noted
Pretest . . l-a) O—b)
correct disjunc- Mean S.D.
GROUP calc. tion guessed 2 reds
C1 0 O 8 16 .35 .56
C2 0 O 6 8 .50 .52
E1 0 O 8 ll .40 .50
E2 0 O 11 7 .55 .51
TOTAL 0 O 33 42 .44 .52
Posttest
Cl 0 O l 21 .03 .20
C2 0 O 5 8 .36 .50
E1 2 l 7 ll .70 .98
E2 2 3 6 9 .85 1.04
TOTAL 4 4 19 49 46 .81

 

 

171

TABLE 4.18 B

t-TEST ON POSTTEST ITEM A

 

4
GROUP Mean S.D. t—value d.f. Sig.
CllJC2 .20 .22 3.28 2 .08
t is significant
ElUE2 .78 .1O P(t 2 3.28) < .08)

 

 

 

It was not expected that the subjects would be able
to successfully answer this problem prior to a course in
probability. The subjects favored the outcome "2 reds“
on the pretest. Reasons given for this on the pretest in-
cluded "two pulls makes it twice as likely" and "there are
more reds". On the posttest after a course in probability
the subjects still favored "2 red" claiming that 9/16 > 3/16.
There were more red balls available, and the subjects appear-
ed to have relied heavily upon the availability heuristic in
their responses to this item. 8 subjects in the experimental
groups recognized the disjunctive nature of the event "at
least one green". No one in either of the control groups
recognized the disjunction. The results of the t-test are
significant at the .1 level. The experimental groups per-

formed better on this item.

172

 

 

 

 

(Pl) List the outcomes for tossing three coins.
TABLE 4.19 A
GROUP RESULTS FOR ITEM P
Pretest 3—8 out- 2-7 out- 1-4 out-
GROUP comes comes comes O—other Mean S°D'
C1 6 5 12 3 1.54 .99
C2 4 O 9 l 1.50 1.02
E1 3 2 14 1 1.35 .81
E2 2 l 15 2 1.15 .74
TOTAL 15 8 SO 7 1.39 .89
Posstest
C1 24 O l l 2.80 .69
C2 14 O O O 3.00 .00
E1 19 0 O l 2.85 .67
E2 18 O 0 2 2.70 .92
TOTAL 75 O l 4 2.82 .69
TABLE 4.19 A
t—TEST ON POSSTEST ITEM
GROUP Mean S.D. t-value d.f. Sig.
Cluc2 2.90 1.36 —1.06 2 .401
2.78 1.06 t is not significant

E1 UE2

 

173

It is evident that on the pretest most of the subjects
listed the four outcomes "3 heads", "2 heads-l tail",
"1 head-2 tails", and "3 tails“ for the sample space. The
results of the next item, P2, suggested that the subjects
also felt that these four outcomes were equally likely to
occur (see TABLE 4.19 A). Almost everyone listed the eight
outcomes correctly on the posttest. Only one subject in
80 persisted in listing four outcomes for the sample space.

There was no significant difference between the experimental

and the control groups on this item.

174

 

 

 

 

(P2) What is the probability of getting 2 heads and l tail
in tossing 3 coins?
TABLE 4.20 A
GROUP RESULTS FOR ITEM P2
Pretes" 3-3/8 2-re1 fre 1—1/4 O-other Mean 5 D
GROUP ' ' ' '
Cl 2 3 10 11 .84 .92
C2 2 O 8 4 1.00 . 96
El 0 2 13 5 .85 .59
E2 1 0 ll 8 .65 .75
TOTAL 5 5 42 28 .82 .80
Posttest
C1 17 O 2 7 2.03 1.37
C2 13 0 O l 2.79 .80
E1 18 O l 1 2.75 .79
E2 16 O l 3 2.35 1.18
TOTAL 64 O 4 12 2.42 1.13
TABLE 4.20 B
t-TEST ON POSTTEST ITEM P2
GROUP Mean S.D. t-value d.f. Sig.
CllJC2 2.41 .53 .33 , 2 .78
2.55 .28 t is not significant

ElUE2

 

175

Most of the subjects listed 1/4 on the pretest for the
probability of getting 2 heads and 1 tail. This was the
result of listing the four outcomes "0 heads - 3 heads" on
item P1 and assuming that the four outcomes were equally
likely. Several subjects listed only 7 of the 8 outcomes
on P1, and then correctly used the sample space to assign
a probability to the outcome “2 heads and l tail". Although
on the pretest 15 subjects correctly listed the 8 outcomes
for tossing 3 coins on item P1, only 5 of these 15 used the
sample space correctly to assign a probability of 3/8 to
"2 heads and l tail“.

The posttest results indicated that most of the subjects
had learned how to use the sample space of 8 outcomes to
assign a probability of 3/8 to "2 heads and l tail". There
were, however, 11 subjects on the posttest who correctly
listed the outcomes (on Pl)’ but did not apply the sample
space when calculating a probability on P2. The control
and experimental groups performed about the same on this

item.

176

The pretest version of the following item was slightly
different from the posttest version. The pretest version is
presented first.

For four games you have the following chances

(P3-pretest)
of gaining points:

Game A: 20% chance of winning 15 points
Game B: 40% chance of winning 10 points
Game C: 10% chance of winning 25 points
Game D: 50% chance of winning 5 points

If you play the game many times, you would be most
likely to gain the greatest number of points in:

a) Game A b) Game B c) Game C d) Game D

TABLE 4.21 A

GROUP RESULTS ON PRETEST ITEM P

 

3
GROUP 3-Game B O-Game A O-Game C O-Game D Mean S.D.
C1 13 l 0 11 1.50 1.53
C2 10 O 2 2 2.14 1.40
El 10 l 2 6 1.50 1.54
E2 12 2 2 4 1.80 1.50
TOTAL 45 4 6 23 1.69 1.50

 

 

177

For three games you have the following

(P3—posttest)
chances of winning points:

Game 1: 50% chance of winning 8 points
Game 2: 20% chance of winning 20 points
Game 3: 30% chance of winning 15 points

TABLE 4.21 B

GROUP RESULTS ON POSTTEST ITEM P

 

 

 

 

3
GROUP 3-Game 3 O-Game l O-Game 2 Mean S.D.
Cl 12 12 1 1.38 1.53
C2 11 2 l 2.36 1.28
El 15 5 O 2.40 1.23
E2 15 2 2 2.25 1.33
TOTAL 53 21 4 2.03 1.41
TABLE 4.21 C
t—TEST ON POSTTEST ITEM P3
GROUP Mean S.D. t-value d.f. Sig.
CILJC2 1.87 .69 .92 2 .45
ElUE2 2.32 .10 t is not significant

 

 

 

There was a tendency for the subjects to choose the
game with the highest probability of winning and to neglect
the payoffs on both the pretest and posttest measures. A
majority of the subjects did, however, get the item correct
did not do as well

on each instrument. Control group Cl

178

on this item as the other three groups. This item is an
NAEP item on which 47% of the 17 year old population and
23% of the adult population responded correctly. The pro-
portion of correct responses for these college students was
56% on the pretest and 69% on the posttest. There was no

significant difference between the control and experimental

groups on this item.

179

The following two probability items were included on the

posttest but not on the pretest.

the experimenter.

Both were constructed by

 

 

 

(P4) What is the probability that the sum of the faces
will be 5 when a pair of dice are rolled?
TABLE 4.22 A
GROUP RESULTS ON POSTTEST ITEM P4
3-4/36, O-l/ll O-other
Cl 15 l 2 8 1.77 1.49
C2 13 O O l 2 79 80
E1 12 O O 8 1.80 1.50
E2 13 l O 6 2.00 1.41
TOTAL 53 2 2 23 2.01 1.40
TABLE 4.22 B
t-TEST ON POSTTEST ITEM P4
GROUP Mean S.D. t-value d.f. Sig.
CluC2 2.28 .72 -.73 2 .54
EluE2 1.90 .14 t is not significant

 

Only 2 subjects assumed an equally likely model for the

11 outcomes after a course in probability.

Most of the in-

correct responses in the "other" column were "1/6” or

180

"(l/6)2 = 1/36". These responses indicated that some sub-

jects may have misread the question thinking that there was

only one die, or that two fives had to show. It is some-

what surprising that 27 subjects actually got this item

wrong after taking a course in probability.

181

 

 

 

 

(P5) The probability that it rains in Seattle on a given
day is 2/3. The probability that Bill forgets his
umbrella on any given day is 1/4. What is the prob-
ability that it rains and Bill forgets his umbrella?

TABLE 4.23 A

GROUP RESULTS ON POSTTEST ITEM P5

GROUP 3-1/6 0-2/34-1/4 O-other Mean S.D.
C1 9 9 8 1.04 1.46
C2 9 2 3 1.93 1.49
E1 20 O O 3.00 0.00
E2 17 1 2 2.55 1.10
TOTAL 55 12 13 2.06 1.40

TABLE 4.23 B

t-TEST ON POSTTEST ITEM P5

GROUP Mean S.D. t-value d.f. Sig.

Cluc2 1.48 .63 2.59 2 .12

ElLJE2 2.78 .31 t is not significant

 

 

 

There was a tendency for the experimental groups to

perform better on this item, although the result was not

significant at the .05 level. A number of students in the

control groups attempted to set the problem up using

P(ALJB) = P(A)-+P(B)-P(Ar)B), where A is the event “it

rains" and B is the event “bring umbrella". These students

182

evidently confused the concept of "mutually exclusive" and
"independent" events.

The results of items P4 and P5 are quite different.
The experimental and control groups performed about the same
on P4, while the experimental groups did considerably bet-
ter on P5 than the control groups. Item P4 dealt with
a uniform probability model for two dice. On the other

hand, P was concerned with independent events occurring

5
in sequence. The experience of developing a multiplicative
model for the independent tacks in activity 2 may have been
responsible for higher success of the experimental groups

on this item.

183

The following elementary items on probability and sample
space (P6 — P14) were included only on the pretest to obtain
information about the subjects' conceptions of probability

prior to formal course work.

(P6) A jar contains 4 blue, 6 red, and 3 white marbles.
If you draw one marble from the jar, it is most
likely that you will:

a) get a blue marble b) get a red marble

c) have about the same chance of getting a red
or a blue marble

 

TABLE 4.23
GROUP RESULTS ON PRETEST ITEM P6
GROUP 3-red O-blue O-same Mean S.D.
C1 24 O 2 2.76 .81
C2 13 O 1 2.78 .80
El 19 O 1 2.85 .67
E2 19 O 1 2.85 .67
TOTAL 75 O 5 2.81 .73

 

 

This item was easily handled by almost all the subjects.
Those who chose "the same chance" gave as a reason for their
answer that all three colors had the same chance of being
drawn. These subjects did not pay attention to the relative

frequency of each color in the sample.

184

(P7) a) A fair die is rolled. What is the probability
of getting a 3?

b) A fair coin is tossed. What is the probability
of getting a head?

 

TABLE 4.24
GROUP RESULTS ON PRETEST ITEM P7
GROUP 3-both right l-b) right Mean S.D.
Cl 23 3 2.76 .65
C2 13 1 2.85 .53
El 18 2 2.80 .61
E2 18 2 2.80 .61
TOTAL 72 8 2.80 .60

 

 

It was expected that almost everyone would answer this
item correctly. The item was included to find out the dif-
ferent ways that the subjects would eXpress probability
prior to a formal course. For part a), the responses in-
cluded "1/6", "1 out of 6", “1 in 6“, "l to 6", "1:5", and
"one in six rolls“. The responses to part b) were similar
except that "50%" was added. There was a great deal of
variation among the subjects in the language which they used
to express probability. The 8 subjects who got part a) in-
correct misread the question and thought there were two

dice.

185

(P8) YOu are playing a game in which you are blindfolded
and draw cards out of a box. If you draw a card
that has an X on it, you win the game. In the
boxes below, would you be more likely to win if you:

a) draw from box A b) draw from box B
c) makes no difference

 

 

 

 

 

 

 

 

 

A O X 0 X X B X.X O
X 0 O X 0 O X.O
TABLE 4.25
GROUP RESULTS ON PRETEST ITEM P8

GROUP 3—no diff. O-Box A O-Box B Mean S.D.
Cl 22 2 2 2.54 1.10
C2 10 2 2 2.14 1.40
El 17 3 O 2.55 1.10
E2 17 1 2 2.55 1.10
TOTAL 66 8 6 2.48 1.15

 

 

Those subjects who chose box A.wrote that there were
more X's in A, so there would be a better chance of get-
ting an X. Those subjects who chose B wrote that since
there were "less 0's in box B", the chances of losing
would be less in B. The reasons given for either type of
incorrect response indicate that the "availability" of the
X's was the deciding factor rather than the relative fre—
quency of Xﬁs within a given box. The correct answers
to this question were given by subjects who claimed that

the ratio was 50:50 in either case.

186

Leffin (1971) used this same item in his investigation
of the concept of probability possessed by elementary school
children in grades 4 through 7. His results indicated a
tendency among the children to pick box A, the box with the
most X's. This tendency decreased as grade level increased
(from 58% in grade 4 to 36% in grade 7). Only 8% of the
college students in the present study chose box A. Appar-
ently older subjects are less susceptible to the influence

of the total number of X's.

187

 

(P9) Three friends agree to change the order in which
they go through the lunch line each day. In how
many possible ways can they arrange themselves?

TABLE 4.26
GROUP RESULTS ON PRETEST ITEM P9
GROUP 3-6 ways O-other Mean S.D.
C1 21 5 2.46 1.14
C2 12 2 2.57 1.08
E1 16 4 2.40 1.23
E2 20 0 3.00 0.00
TOTAL 69 11 2.60 1.01

 

 

Most of the 69 subjects who correctly answered “6-ways"
actually listed the 6 possible arrangements, using names or
symbols to identify the three friends. Of the 11 incorrect
responses, 5 said there were 32 = 9 distinct arrangements.
This item appeared on the mathematics inventory taken by
the National Assessment of Educational Progress (NAEP).

The results found by NAEP Showed that 47% of the 17 year
old pOpulation and 28.3% of the adult population responded
correctly. 86% of the college students in this study got

the item correct prior to a formal course in probability.

188

 

 

 

 

(P10) At the start of a party game, eight red, six green,
four blue, and two white slips of paper were thor-
oughly mixed in a bowl. The chance that the first
slip drawn at random will be WHITE are given by
which of the following:

a) —-2-- b) 1 C) 1
8+6+4 8+6+4+2 8+6+4+l
2
d) 8+6+4+2
TABLE 4.27
GROUP RESULTS ON PRETEST ITEM P10
GROUP 3-d) O-a) O-b) O-c) Mean S.D.
Cl 17 l 8 0 1.96 1.45
C2 12 0 2 O 2.57 1.08
El 12 1 4 3 1.65 1.53
E2 13 2 3 1 1.95 1.47
TOTAL 54 4 l7 4 1.99 1.43

 

 

A substantial number (17) of the subjects chose response
b) on this item. Reasons listed for this response indicated
that "there is one chance out of the total number of papers".
Those subjects who chose b) apparently focused on the word
"first“ rather than on the number of slips of white paper.
The word "first" may have suggested that there was only‘ggg
chance at the draw.

Results on the NAEP analysis Showed that 30.8% of the
17 year old population and 28.5% of the adult population
responded correctly on this item. 68% of the college stu-

dents in this study responded correctly.

189

(P11) A committee of two people is to be chosen from among
Bill, Sally, JOe, and Beth. List all possible com-
mittees of two from this group.

 

TABLE 4.28
GROUP RESULTS 0N PRETEST ITEM P11
GROUP B-Eizigg l-ii:::§ O-other Mean S.D.
Cl 24 O 2 2.77 .81
C2 13 O l 2.79 .80
El 18 1 1 2.75 .78
E2 18 O 2 2.70 .92
TOTAL 73 l 6 2.75 .82

 

 

Nearly every subject correctly listed the 6 pairs.
Only one subject counted the outcomes as ordered pairs and
obtained 12 committees of two people. The remaining errors

were 42 = 16 (one), 23 = 8 (three), and 10 (one).

190

(P ) There are 162 games in a baseball season. The
manager of the team always bats his pitcher last.
He has eight other players to assign to a batting
order. Are there enough games in one season to
try all possible batting orders for the other
eight players? If not, how many seasons would it

take?

12

 

TABLE 4.29
GROUP RESULTS ON PRETEST ITEM P12
3-250 or i33:tgr 1-4 to 10 ggygﬁozgﬁ

GROUPS 8!/l62 300 Egoover or NO 1_2 Mean S.D.

seasons seasons
C1 1 3 3 19 .50 .86
C2 1 1 2 10 .50 .94
El 0 3 3 14 .45 .76
E2 0 1 3 16 .4O .82
TOTAL 2 8 11 59 .46 .83

 

 

Unlike items P9 and P11, the sample space for this
question could not be listed. The item was included to see
how many subjects knew the sequential counting principle
prior to the course in probability. Only 2 students in 80
could do this problem correctly. Eight other subjects gave
a reasonable estimate for the number of seasons necessary,
somewhere between 100 and 300. Three-fourths of these col-
lege students believed that either one season contained
enough games to admit all possible batting orders, or that

two or three seasons would suffice. There was an overwhelming

tendency to underestimate the number of possible arrangements

191

of the 8 players. Prior to the course in probability the
subjects apparently had very little intuition for the mag-
nitude of combinatorial expressions. This is consistent

with the results of the taped interviews in the pilot study.
Both 8x7x6x5x---x2xl and 1x2x3x---x7 xB were
greatly underestimated in the interviews. Similar results

have also been obtained by Kahneman and Tversky (1973).

192

(P ) List an event that is certain to occur. List an

13 event that is impossible to occur.

TABLE 4.30

GROUP RESULTS ON PRETEST ITEM Pl3

 

2-one log-
GROUP 16;:gally 8:21:22: 25:20 26:52::05 Mean S.D.
correct event events
C1 1 O 6 19 .46 .70
C2 1 2 4 7 .71 .99
E1 0 1 6 13 .40 .60
E2 2 2 7 9 .95 .94
TOTAL 4 5 23 48 .61 .82

 

 

All 3 possible points were given only when the "events"
listed by the subject were "logically certain" and ”logically
impossible". Examples of acceptable "certain" events were:
"pulling a red ball from a jar containing all red balls",
"pulling either a red or a black card from a deck of cards",
and "getting either a head or a tail when flipping a coin".
Examples of acceptable "impossible" events were: "pulling
a black ball from a jar of all red balls", and “rolling a
7 with one ordinary die".

The list of "impossible" events given by the subjects
included: I become a millionaire: Nixon re-elected as pres-
ident; an elephant in a refrigerator: everyone graduates

from Michigan State with a 4.0 G.P.A: landing on the sun;

193

living forever: George wallace getting the Democratic
nomination for president; a man jumps 50 feet vertically;
and peace. The list of "certain" events contained similar
responses, only these were very likely to occur.

TABLE 4.30 indicates that the subjects may have had
a great deal of imagination but had a poor concept of
"certain" and "impossible“ in the probability sense prior

to course work in probability.

194

(P14) YOu are playing a game with two other people. One
person picks a number between 1 and 10 and the other
two try to guess it. The person who guesses closer
to the number wins the game.

a) If you have the first choice, what would you

 

pick?
b) If the first person picked seven, what would
you pick?
TABLE 4.31
GROUP RESULTS ON PRETEST ITEM P14
3-6 or
GROUP 5 on a) 2'2 on g) 1—5 on a) O-other Mean S.D.
6 on b) on ) response
C1 19 3 2 2 2.5 .95
C2 7 l 2 4 1.5 1.28
E1 15 2 2 1 2.1 1.25
E2 17 l l l 2.5 .95
TOTAL 58 7 7 8 2.22 1.14

 

 

Subjects received 3 points only if they responded "6"
or "5" on part a), and ”6" on part b). These responses
optimized their chances of winning the game. 7 subjects
chose "5“ for part b) also. They failed to realize that
the number 6 would result in a tie game. The reason most
frequently given for a choice of "5" in part a) was "its in
the middle". The reason most frequently given for the re-
sponse "6" in part b) was that "then I'll have all the num-
bers below 6". Several students responded "3" to part b)

in order to "balance off the 7". Table 4.31 shows that

195

most of the subjects could find the best strategy for this
game prior to a formal course in probability. The item was

suggested by Professors John Masterson and Bruce Mitchell.

196

Correlation Matrices

This section contains scale-to—scale, item-to-item,
and item-to-scale correlation matrices. Pearson product-
moment correlations were calculated. The significant level
of each coefficient was listed beneath it. Relationships
that were significant at the .05 level were singled out

in the discussion following each matrix.

TABLE 4.32

SCALE-TO-SCALE CORRELATION MATRIX
FROM POSTTEST DATA

 

SCALE TOTAL PROB. AVAIL. REP.
TOTAL 1.000
PROB. r = .730 1.000
sig = .001
AVAIL. r = .648 r = .385 1.000

sig = .001 sig = .001

REP. r = .775 r = .307 r = .323 1.000
sig = .001 sig = .003 sig .002

(N = 80)

 

 

Significant relationships were found between all pairs

of subscale scores on the posttest.

197

TABLE 4.33

AVAILABILITY ITEM-TO-SCALE CORRECTION

MATRIX FROM POSTTEST DATA

 

SCALE TOTAL PROB. AVAIL. REP.
ITEM
A1 r .618 r = .442 r = .754 r = .247
$19 = .001 51g = .001 S19 = .001 sig = .013
A2 r = .101 r = .004 r - .357 r = .055
819 = .187 sig = .486 $19 - .001 sig = .314
A3 r = 240 r = .116 r — .602 r = .092
$19 — .016 sig = .153 $19 = .001 sig = .207
A4 r — .351 r - .170 r = .343 r = .327
$19 = .001 sig = .066 $19 = .001 sig = .002
(N = 80)

 

 

The availability items all correlated significantly

(p < .001)

with the availability subscale score.

Item A2,

A3, and A4 were not related to the probability subscale

SCOI'G.

A

3, and A4 dealt with paths,

the synunety of the

combinations formula, and disjunctive events, respectively.

REPRESENTATIVENESS ITEM-TO-SCALE CORRELATION

198

TABLE 4.34

MATRIX FROM POSTTEST DATA

 

SCALE TOTAL
ITEM
R1 r .400
Sig = .001
R2 r = .498
Sig = .001
R3 r — .548
sig = .001
R4 r = .448
Sig = .001
R5 r = .535
Sig = .001
R6 r — .424
sig = .001
(N = 80)

 

PROB.

r = .065
Sig = .284
r = .078
sig = .244
r = .127
sig = .130
r = .197
sig = .040
r = .407
sig = .001
r = .300
sig = .003

AVAIL.

r = .131
sig = .122
r = .127
Sig = .131
r ‘— o 156
sig = .084
r = .337
$19 = .001
r — .338
$19 = .001
r = .212
sig = .030

REP.

r = .618
sig = .001
r = .802
sig = .001
r — .832
sig = .001
r = .491
sig = .001
r - .432
sig = .001
r = .417
sig = .001

 

All of the representativeness items correlated signifi-

cantly (p < .001) with the representativeness subscale

score. Items R1, R2, and R3 did not have a significant re-

lationship with the probability subscale scores.

These

three items dealt with the relative likelihood of various

sequences of heads and tails or of boys and girls.

199

TABLE 4.35

PROBABILITY ITEM-TO-SCALE CORRELATION
MATRIX FROM POSTTEST DATA

 

SCALE TOTAL PROB . AVAIL . REP .
ITEM
P1 r = .365 r = .594 r = .332 r = .088
Sig = .001 519 - .001 sig = .001 sig = .219
P2 r = .107 r = .350 r = .060 r = -.036
sig = .172 sig = .001 sig = .300 sig = .377
P3 r = .394 r = .667 r = .184 r = .070
sig = .001 sig = .001 sig = .051 sig = .267
P4 r = .607 r = .589 r = .291 r = .405
sig = .001 sig = .001 sig = .004 sig = .001
P5 r = .497 r = .570 r = .189 r = .276
sig = .001 sig = .001 sig = .047 sig = .007
(N = 80)

 

 

All the probability items had a significant relationship
(p < .001) with the probability subscale score. Items P1,

P and P do not have a Significant relationship with the

2’ 3
representativeness subscale score. P2 does not have a sig-

nificant relationShip with the availability subscale score.

P asks fOr a list of the outcomes for tossing 3 coins, P2

1
asks for the probability of 2 heads and l tail in three

tosses, and P3 is an expected value problem.

ZOO

 

H00. u

00N. u

000.H

mHm
H

BMO . " GHQ
00N. n A
mmo. u 8A4
8AA. u u
000.H

44

00m. u 8A4
H00.- u u
84H. u 0A4
mean. u .H
moo. u 0A4
00m. u u
000.H
mm

>04. u mHm
omo.l u H
mmH. ﬂ mam
wHH. u H
Nmo . " mﬂm
hON. u H
H00. u mHm
mum. u u
000.H

H4m. u 0H4
040.) u u
ONN. u 0A4
moo.) u A
8N0. u 0A4
NHN. n H
H00. n mHm
05m. n 8
H00. u me
0mm. u u
000.H
Hm

 

How u ze

ZHBH

 

¢B<Q Emmaamom 20mm me942

ZOHBdHMMMOU EHBHIOEIZMBH mﬂéom mmﬂZﬂ>HB¢BZﬂmmmmﬂm

mm.v mqmda

201

Representativness variables R1, R2, R3, and R.4 were
significantly related to each other. These items dealt
with sequences of independent events. Variables R5 and
R

6
conjunctive events and R6 concerned sample size. variable

were significantly related to each other. RS concerned

R4 was signficantly related to all the other representative—
ness variables. R4 asks for an estimate for the probability
that there will be 3 boys and 3 girls in a family with six

children.

202

TABLE 4.37

AVAILABILITY SCALE ITEMFTO_ITEM CORRELATION
MATRIX FROM POSTTEST DATA

 

ITEM Al A2 A3 A.4
A1 1.000
A2 r = .188 1.000
sig = .047
A3 r = .139 r = -.039 l 000
Sig = .109 sig — .364
A4 r = .123 r = -.O49 r = -.101 1.000
(N = 80)

 

 

The only significant relationship between availability

variables occurred between A1 and A the two questions

2,
dealing with paths.

203

TABLE 4.38

AVAILABILITY AND REPRESENTATIVENESS
INTER-ITEM CORRELATION MATRIX
FROM POSTTEST DATA

 

ITEM A1 A2 A3 A4
Rl r = .101 r = .078 r = -.048 r = .198
sig = .185 sig = .246 sig = .337 sig = .039
R2 r = .116 r = .053 r = .012 r = .102
sig = .152 sig = .320 sig = .458 sig = .183
R3 r = .148 r = .028 r = .051 r = .088
sig = .095 Sig = .403 Sig = .326 sig = .219
R4 r = .278 r = -.116 r = .193 r = .308
sig = .006 sig = .152 sig = .043 sig = .003
R5 r = .208 r = .046 r = .183 r = .308
sig = .032 sig = .343 sig = .052 sig = .003
R6 r = .115 r = .052 r = .067 r = .280
sig = .155 sig = .325 sig = .278 sig = .006
(N = 80)

 

 

There were few significant relationships found between
the availability variables and the representativeness vari-
ables. Item A4 (on disjunctive events) tended to have
significant relationships with the representativeness vari—
ables (R1’ R4, R5, R6)' Representativeness items R4 and

R tended to have significant relationships with the avail-

5
ability variables (A1, A3, and A4).

The results of the correlation investigations in Tables
4.32 - 4.38 suggest that neither availability nor represent-

ativeness items were necessarily related to probability items.

204

Relationships of items on the two heuristics scales with
the probability subscale were generally weak or non—
existant. It may be possible, therefore, for a person to
be able to solve probability problems and yet still not
use probability theory in situations that are susceptible

to either availability or representativeness.

CHAPTER V

SUMMARY, CONCLUSIONS, AND DISCUSSION

Summary

This study investigated college students' use of “avail-
ability" and "representativeness" in estimating the likeli-
hood of events. Considerable evidence was found in the
literature to support the contention that people do not
estimate probabilities in accordance with the theoretical
laws of probability. 0n the other hand, the evidence sug-
gests that people do rely upon the heuristic principles of
availability and representativeness when they estimate the
likelihood of events.

The subjects involved in the study were 85 undergraduate
students who had enrolled in a finite mathematics course at
Michigan State University. Four class sections were ran-
domly chosen and two each were randomly assigned to either
the activity-based course (experimental) developed by the
author, or a lecture-based course (control). The subjects
were pretested and posttested on instruments devised by the
experimenter. The instruments contained a probability sub-

scale, an availability subscale, and a representativeness

205

206

probability subscale, an availability subscale, and a rep-
resentativeness subscale. Cronbach's coefficient-a was
calculated for the posttest. The reliability coefficient
was .70.

The events that took place in one of the experimental
sections were recorded in a daily log by the experimenter.
The students in the eXperimental sections worked on in-class
activities in small groups. The activities were written
by the experimenter and involved probability and statistics
concepts. The log was kept by the experimenter to gather
additional information concerning the way in which college
students learn probability while working in small groups.

A complete report of the observations made in the class can
be found in part one of chapter four.

The classes were compared on the posttest on the total
test score and on the three subscales. The data was ana-
lyzed by t-tests (a = .05) performed on the total scale and
the three subscales. A significant difference was found
between the groups receiving the lecture-based course and
the activity-based course on the representativeness sub-
scale. The eXperimental groups scored significantly higher
on the representativeness subscale. There was a tendency
for the experimental groups to score higher on the availa-
bility subscale but the difference was not significant at
the .05 level. The experimental groups attained signifi-

cantly higher mean gain scores than the lecture—based grouPs

207

or the total test score. Analysis of the pretest showed
that there were no differences between the groups on any
of the subscales prior to a formal course in probability.
The experimenter concluded that the experimental
activity-based course appeared to have been more successful
than the lecture-based course in helping college students
to overcome their reliance upon availability and represent-

ativeness.

Limitations

 

A personal background inventory indicated that most of
the subjects in the study did not have strong high school
mathematics backgrounds. Over 75% of the subjects responded
that they had had only a year of high school algebra and 1/2
year of high school geometry. Furthermore, 65% of the sub-
jects indicated that they had taken at least one remedial
mathematics course in college in order to raise the level
of their competence in high school algebra before they took
the finite mathematics course. Any conclusions or general-
izations made from this study should be limited to the popu-
lation of students with similar backgrounds.

The study was also limited by the fact that there were
only four independent units of analysis upon which to base
the posttest comparisons, namely, the fOur class sections.
Individual differences in teaching style among the four
instructors must also be considered in interpreting the re-

sults of this study.

208

Results of the Hypothesis Testing

Hypothesis 1. There will be no difference between the
activity-based and the lecture-based sections on the total
test scale.

 

Hypothesis 2. There will be no difference between the
activity-based and the lecture-based sections on the proba-
bility subscale.

 

Hypothesis 3. There will be no difference between the
activity-based and the lecture-based sections on the avail-
ability subscale.

 

Hypothesis 4. There will be no difference between the
activity-based and the lecture-based sections on the repre-
sentativeness subscale.

 

All four hypotheses were tested both on the pretest
and posttest scores. Hypotheses 3 and 4 were also tested
by comparing mean gain scores from pretest to posttest.
T-tests were used to test the hypothesis with a level of
rejection set at a = .05.

The comparisons from the pretest indicated that there
were no significant differences between the sections on any
of the subscales prior to a course in probability.

The posttest comparisons led to the rejection of hy-
pothesis 4. The activity-based sections scored Significantly
higher than the lecture-based sections on the representative—
ness subscale. The activity-based sections also attained
significantly higher mean gain scores on both the represent-
ativeness and availability subscales. There was a tendency

for the experimental sections to score higher on the posttest

209

availability subscale than the lecture-based sections. The
difference was not, however, significant at the .05 level.
No significant differences between the sections on either
total test score or the probability subscale were found on
the posttest. Therefore, hypotheses one and two were not

rejected.

Conclusions and Discussion

 

The results of the hypothesis testing suggest that the
experimental activity-based course was apparently more suc-
cessful than the lecture-based course in helping college
students overcome their reliance upon representativeness
and availability to estimate the likelihood of events. The
level of probability concept attainment was not significantly
different in the two courses. It appears that the learning
of probability concepts may not be sufficient to overcome
reliance upon the heuristics of availability and represent-
ativeness. Course methodology may be an important factor in
overcoming reliance upon the heuristics.

These conclusions are supported by the analysis of
individual test items and the item-to-item and item-to-scale
correlations. The activity-based sections scored signifi-
cantly higher than the lecture-based sections on 4 of the
10 items on the heuristics subscales, and tended to score
higher on 4 others. Correlations between the probability

subscale and items on the two heuristics subscales were

210

generally weak or non-existant. A detailed discussion of
the analysis of individual test items and of the correla-
tion matrices can be found in chapter four.

The observations made by the experimenter during one
of the activity-based classes have been reported in detail
in chapter four. It appears that college students can learn
to discover sound elementary probability models and formulas
while working on probability eXperiments in small groups.
Furthermore, the effects of sample size upon measures of
central tendency and variability may be learned by students
working in small groups on activities such as those developed
in this study. Making guesses for the probability of events
and checking the guesses with a hand calculator seems to help
make college students more cautious about probability and
more aware of some of their own misconceptions about proba—
bility. Small group work, keeping a log of all the class
work and investigating misuses of statistics all appear to
have a positive effect upon college students' attitudes
towards mathematics.

There were differences observed between the two control
groups C1 and C2 on the posttest total score. The mean
score in C1 was 19.70, while the mean score in C2 was
30.57. There were also differences between C1 and C2 on
the three subscale scores, with C2 scoring higher on each
subscale. There does not appear to be a difference between

the posttest scores of the two experimental groups E1 and

211

E2 on any of the subscales. The mean gain scores of El
and E2 on the availability and representativeness sub-
scales are nearly identical (see Table 4.3). On the other
hand, the mean gain scores of C2 are higher than those

of C1 on both of the heuristics subscales. Table 4.1
indicates that there were no apparent differences among any
of the four groups on the pretest measures.

There appears to be uniformity in the posttest results
across the experimental groups and non-uniformity across
the control groups.

There are several possible explanations for the differ-
ence in performance between C1 and C2. The difference
could be due to sampling variability. The study only had
four independent groups, two within each treatment, and so
it is not possible to say whether the scores of C1 are
low relative to the population of all such classes that are
taught finite mathematics in a lecture format. The differ-
ence could be the result of variability in teaching styles.

It is also possible that class size had an effect on

the learning process. C had

1 had N = 26, while C2
N = 14. Olson (1971) has done extensive research on the
factors that influence "quality education" and has found that
class size has a dramatic effect upon the teaching-learning
process. Over a period of seven years, Olson developed an

index called "Indicators of Quality". Individualization,

interpersonal regard, group activity, and creativity were

212

the four indicators isolated by Olson. Trained observers
obtained data from classroom observations. The data was
converted in quantitative scores. Olson's investigations
took place in 18,528 classrooms in 112 suburban school dis-
tricts of 11 metropolitan regions in the United States.
Olson found seven significant predictors of his four indi-
cators. The top three predictors of quality education were
teaching style, subject matter, and class Size. For sec-
ondary school classes, which comprised about half of his
sample, Olson noted a Significant drop in the quality of
education when class size l-15 was compared to class size
16-40.

Although Olson's research was done in secondary schools,
it is possible that his results on class size may have some
implications for college courses. The subjects in this
study were mostly freshman and the lecture format used in
the control classes was similar to much of the teaching that
occurs in secondary school mathematics courses. In any case,
the higher scoring control group C2 enjoyed the advantage
of class size N = 14.

The results of this study on the pretest generally sup-
port the results of Kahneman and Tversky (1972, 1973) which
claim that combinatorially naive college students rely on
the availability and representativeness heuristics to esti-
mate probability. In chapter one it was noted that Kahneman
and Tversky were skeptical about the possibility of helping

peOple to overcome their reliance upon availability and

213

representativeness. The results on the posttest in this
study suggest that the manner in which college students
learn probability may make a difference in their use of
availability or representativeness. The activity-based
course was significantly better at overcoming the use of
representativeness than the lecture-based course, and also
tended to surpass the lecture course in overcoming the use
of availability.

It is impossible to isolate the specific factors of
the eXperimental course which may have helped more students
in the experimental sections to use probability theory rather
than availability or representativeness to answer the post-
test items. However, the extensive study of Olson (1970,
1971) cited above found that teaching style was the best
predictor of quality education. The teaching styles that
were high scoring quality predictors were small group work,
individual work, discussion, laboratory work, and student
report and demonstration. The styles that were low scoring
predictors were lecture, question-and-answer, and movies.
Everyone of the high scoring predictors was included in the
methodology of the experimental activity-based course in the
present study. It is very likely that differences between
the experimental and control groups in overcoming reliance
upon availability and representativeness were primarily a

result of different classroom methodologies.

214

Implications for Future Research

This study was limited in its method of analysis by
the fact that there were only four class sections. A rep-
etition of the experiment with many more classes involved
would allow for a more appropriate method of data analysis
such as an analysis of variance or an analysis of covariance
with the pretest as covariate.

Taped interviews were used to gather information about
college students' reliance upon the heuristics of availabil-
ity and representativeness prior to formal course work in
the pilot study to this thesis. It might be interesting to
tape periodic interviews with students who were taking the
experimental course to find out where changes in their use
of availability and representativeness occur and what activ—
ities or class-related experiences might account for the
changes. These interviews might also help to explore the
reasons why some students still rely on representativeness
and availability even after the activity-based course. Per-
haps such students have not learned the probability concepts
in the activities. Perhaps they have learned the probability
concepts in the activities and yet still rely upon availa-
bility and representativeness when they are in a situation
outside of the classroom.

A study on retention is necessary to determine whether
the positive effects of the activity-based course on over-

coming reliance upon availability and representativeness

are maintained over a period of time.

215

Studies involving populations other than college stu—
dents in a finite mathematics course should be carried out
to determine the extent of the use of availability and rep-
resentativeness. In particular, the posttest instrument
in this study could be administered to undergraduate mathe-
matics majors, or to graduate students in mathematics, or
even to faculty members in a university mathematics depart-
ment. Such studies might illucidate the pervasiveness of
the use of the heuristics of availability and representative—
ness.

Finally, it should be recalled that availability and
representativeness are heuristics which are used to simplify
complex decisions and judgments. This study has investi-
gated the uses of these two heuristics in a very special and
somewhat artificial context: a college class in probability.
Perhaps a more important question is: To what degree do the
availability and representativeness heuristics affect the
decision making process of professional people who must make
judgments based upon information that may be only partially
valid, or based upon probabilistic cues? In particular,
doctors diagnose disease based upon symptoms which come in
the form of probabilistic cues. Investors in securities,
court judges, airplane pilots, administrators in business
and education and classroom teachers also sometimes have to
make judgments and decisions based upon uncertain information.

The literature review in chapter two has cited evidence that

216

people do not always make optimal decisions and judgments
in probabilistic situations.

Most of the studies referenced were carried out in
laboratory or experimental environments. What do people
do in real life situations? What are the practical con-
sequences of their judgments and decisions?

Clarkson (1962) researched the decision making processes
of a trust investor and gathered information on the investor's
underlying "policy". The cues and strategies used by the
investor as he made decisions involving the allocation of
investment funds were modeled by Clarkson. There is currently
a program of research in medical education at Michigan State
University which is investigating the decision making process
of exPerienced doctors as they perform a diagnostic work-up
on a patient. This research is being conducted by the Office
of Medical Education, Research, and Development (OMERAD) in
order to obtain information for the training of medical
students.

The author has been unable to find examples of research
that have looked specifically at the possible effects of
availability or representativeness upon the judgments of
people while they are performing their professional tasks.
The author strongly recommends that research be undertaken
to attempt to identify instances of the uses of availability
and representativeness among professional people while they

are on the job, and to determine the poSsible practical con—

sequences of the uses of the heuristics.

BIBLIOGRAPHY

B I BLIOGRAPHY

Austin, J.D., An experimental study of effects of three
instructional methods in basic probability and
statistics. Journal of Research.ip Mathematics

Education, 1973, §, 3, 146-154.

Barz, T.J., A Study pf TWo ways.pf Presenting Probability
and Statistics 33 the Collegg Level. unpublished
doctoral dissertation. Columbia, 1970.

 

Bell, M., Mathematical Uses apd Models ip our Everyday
WOrld. Studies in Mathematics, volume XX. School

Mathematics Study Group: 1972.

Brunswik, E., Representative design and probabilistic
theory in a functional psychology. Psychological
Review, 1955, pg, 193—217.

, Systematic and Representative Design pf
Psychological Experiments. Berkeley: University of
California Press, (2nd Ed.), 1956.

Cambridge Conference on School Mathematics. Goals for
School Mathematics. Boston: HDughton-Mifflin, 1963.

Carlson, J.S., Children's probability judgments g§_related
39 age, intelligence, socio-economic level, and sex.
Human Development, 1969, 13, 2, 192-203.

Carnap, R., What is probability? Scientific American, 1953,
189, 128-138.

Chapman, J.C., and Chapman, J.P., Genesis of popular but
erroneous psychodiagnostic observations. Journal pf
Abnormal Psychology, 1967, 13, 193-204.

Clarkson, G.P.E., Portfolio Selection:IA §imulation pf
Trust Investments. Englewood Cliffs: Prentice-Hall,
1962.

Cohen, J., and Hansel, M., Risk and Gambling. New YOrk:
Philosophical Library Incorporated, 1956.

217

218

Cohen, J., Subjective probability. Scientific American,
1957, 197, 128-138.

, ghance, Skill, and Luck: The Psychology pf
Guessing and Gambling. Baltimore: Penguin Books,
1960.

College Entrance Examination Board. Commission on Mathe-
matics. Introductopy Probability and Statistical
Inference for Secondary Schools: 5p Experimental
Course. YOrk: New Ybrk, 1959.

Cronback, L.J., Epsentials p; Psychological Testing.
New YOrk: Harper and Rowe, (3rd Ed.), 1970.

Davis, C.M., Development of the probability concept in
children. Child Development, 1965, pp, 779-788.

DeGroot, A.D., Thought and Choice‘ip Chess. The Hague:
Moughton, 1965.

Doherty, J., Level pf Four Concepts pf Epobabilipy Possessed
py_Children p; the Fourth, Fifth, and Sixth Grade Before
Formal Instruction. Unpublished doctoral dissertation.
Missouri, 1965.

 

Edwards, W., Conservatism in human information processing.
In: Formal Representations p; Humap Judgment. Ed. by
B. Kleinmutz. New YOrk: Wiley, 1968.

Feigenbaum, E.A., and Lederberg, J., Mechanization of in-
ductive influence in organic chemistry. In: Formal
Representations prHuman Judgment. Ed. by B. Kleinmutz.
New YOrk: Wiley, 1968.

Fitzgerald, W., The role of mathematics in a comprehensive
problem solving curriculum in secondary schools.
School Science and Mathematics, 1975, 1, 39-47.

Freudenthal, H., Why to teach mathematics so as to be useful.
Educational Studies 1p Mathematics, 1968, l, 1, 3-8.

Geeslin, W.,‘Ap Analysis pf Content Structure and Cognitive
Structure ip Context g£.§ Probability Unit. ERIC
Document ED 090 036.

Gipson, J.H., Teaching Probability 13 Elementary School:
Ap Experimental Study. Unpublished doctoral disser-
tation. Illinois, 1971.

219

Hammond, H.R., Hursch, C.J., and ded, F.J., Analyzing the
components of clinical inference. Psychological ge-
view, 1964, 2;, 438-456.

Hoeman, H;W., and Ross, B.M., Children's understanding of
probability concepts. Child Qeyelopment, 1971, pp,
221-236.

Hoffman, P.S., The paramorphic representation of clinical
judgment. Psychological Bulletin, 1960, p1, 116-131.

, Cue-consistency and configurality in human
judgment. In: Formal Representations 9; Human Judgment.
Ed. by B. Kleinmutz. New Ybrk: Wiley, 1968.

 

Howell,‘W.C., Intuitive counting and tagging in memory.
Journal pg Experimental Psychology, 1970, 85, 2, 210-
215.

Huff, D., How 52 Lie with Statistics. London: WIW. Norton,
1954.

Jarvick, M.E., Probability learning and a negative recency
effect in serial anticipation of alternative symbols.
Journal 9; Experimental Psychology, 1951, 3;, 291-297.

Jones, G. A. The Performances Lf First, Second, and Third
Grade Children Ln Five Concepts Lf Probability and the
Effects Lf Grade, _9, and Embodiments Ln Their Perform—
ances. Unpublished doctoral dissertation. Indiana,
1974.

Kahneman, D., and Tversky, A., Subjective probability: A
judgment of representativeness. Cognitive Psychology,
1972, g, 3, 430-454.

, 0n the psychology of predic-
tion. Psychological Review, 1973, pp, 4, 237-251.

Kass, N., Risk and decision-making as a function of age,
sex, and probability preference. Child Development,
1964, 3;, 577-582.

Kipp, W.E., 5p Investigation p§_the Effects pg Integrating
Topics Lf Elementary Algebra with Those Lf Elementary
Probability within a Unit Lf Mathematics— Prepared for
College Basic Mathematics Students. UnpubliShed doc-
toral dissertation. Florida State, 1975.

Klamkin, M., On the teaching of mathematics so as to be
useful. Educational Studies Lp Mathematics, 1968, l,
126-160.

 

220

Klamkin, M., On the ideal role of an industrial mathematician
and its educational implications. The American Mathe-
matical Monthly, 1971, 1g, 1, 53-76.

Kleinmutz, B., The processing of clinical information by
man and machine. In: Formal Representations Lf Human
Judgment. Ed. by B. Kleinmutz. New Ybrk: Wiley, 1968.

Komorita, S.S., Factors which influence subjective proba-
bility. Journal 9; Experimental Psychology, 1959, §§,
386-389.

Leake, L., The Status pf Three Concepts 9; Probability in
Children 2§_the Seventh, Eighth, and Ninth Grades.
unpublished doctoral dissertation. Wisconsin, 1962.

 

Leffin, W.W., A Study 2; Three Concepts 9; Probability
Possessed 21 Children.in Grades Four-Seven. ERIC
Document ED 070 657.

 

McKinley, J. E. Relationship Between Selected Factors and
Achievement in a Unit Ln Probability and Statistics
for Twelfth Grade Students. Unpublished doctoral
dissertation. Pittsburgh, 1960.

 

McLeod, G.R., An Experiment in the Teaching pg Selected
Concepts 2; Probability £9 Elementary School Children.
Unpublished doctoral dissertation. Stanford, 1971.

Messick, S.J., and Solley, C.M., Probability learning in
children: Some exploratory studies. JOurnal 9; Genetic
Psychology, 1957, 29, 23-32.

Mosteller, F., Fifty Challenging Problems in Probability
with Solutions. Reading: Addison Wesley, 1962.

Mosteller, F., Rourke, R.E.K., and Thomas, G.G., Probability
with Statistical Applications. Reading: Addison-Wesley,
(2nd Ed.), 1970.

Mosteller, F., Kruskal, W.H., Link, R.F., Peiters, R.S., and
Rising, G.R., Statistics: A Guide 59 the Unknown. Ed.
by Judith M. Tanur. San Francisco: Holden—Day, 1972.

 

Mosteller, F., Kruskal, W.H., Link, R.F., Peiters, R.S., and
Rising, G.R., Statistics by Example. Reading: Addison-
Wesley, 1973.

Moyer, R.E., Effects 9: 3 Unit 93 Probabilitnyn Ninth Grade
General Mathematics Students' Arithmetic Computation
Skills, Reasoning, and Attitudes. Unpublished doctoral

dissertation. Illinois, 1974.

221

Mullenex, J.L., A Stud 2 of the Understanding of Probability
Concepts by_Selected —Elementary School Children. Un-
published doctoral dissertation. Virginia, 1968.

National Council of Teachers of Mathematics. The Place gf
Mathematics in Secondary Education. Fifteenth Yearbook
of the NCTM, 1940.

Newall, A., Shaw, J.C., and Simon, E.A., Elements of a theory
of human problem solving. Psychological Review, 1958,
.§§, 151-166.

Newall, A. Judgment and its representation: An introducation.
In: Formal Representations of Human Judgments. Ed. by
B. Kleinmutz. New YOrk: Wiley, 1968.

Newall, A., and Simon, H., Human Problem Solving. Englewood
Cliffs: Prentice-Hall, 1972.

Nie, N.H., Hull, C.H., Jenkins, J.C., Steinbrunne, K., and
Bent, D.H., Statistical Package for the Social Sciences.
New Ybrk: McGraw—Hill, (2nd Ed.), 1975.

Olson, M.N., ways to achieve quality in school classrooms:
Some definitive answers. Phi Delta Kappan, 1971, 5;,
1, 63-65.

Page, D.A., Probability. In: The Growth 2; Mathematical
Ideas Grades K-12. Twenty-fourth Yearbook of the NCTM,
1959, 229-271.

 

 

Peterson, G.R., Du Charme, W.M., and Edwards, W., Sampling
distributions and probability revision. JOurnal 2;
Experimental Psychology, 1968, lg, 236-243.

Phillips, L. D. Hays,‘W. L. and Edwards, Conservatism
in complex probabilistic inference. IEEE Transactions
.22 Human Factors in Electronics, 1966— 7 7- 18.

 

Piaget, J., and Inhelder, B., L'a Genese' de l'Idee g2 Hazard'
chez l'Enfant. Press Universitaries de France, 1951.

, The Origin 9; the Idea g; Chance

in Children. Translated by Leake, Burnell, and Fishbein.
London: W.W. Norton and Company Incorporated, 1975.

Piel, E.J., and Truxal, J.S., Man and His Technology. New
Ybrk: McGraw-Hill, 1973.

Pollak, H.O., On some problems of teaching applications of
Mathematics. Educational Studies i§_Mathematig§, 1968,
l, 1, 24-30.

222

Rorer, L.G., and Slovic, P., The measurement of changes
in judgmental strategy. American Psychologist, 1966,
31, 641-642.

Schwab, J.J., The practical: A language for curriculum.
School Review, 1969, 1g, 1—23.

Shepler, J., Parts of a systems approach to the development
of a unit in probability and statistics for the ele-
mentary school. JOurnal 9; Research in Mathematics
Education, 1970, 1, 4, 197—205.

Shepler, J., and Romberg, T., Retention of probability con-
cepts: A pilot study into the effects of mastery learn—
ing with sixth grade students. Journal 9; Research in
Mathematics Education, 1973, 5, 1, 26-32.

Shulman, L.S., The psychology of school subjects: a premature
obituary? JOurnal 2; Research in Science Teaching, 1974,
3;, 4, 319—339.

Shulman, L.S., and Elstein, A., Studies of problem solving,
judgment, and decision making: Implications for edu-
cational research. In: Review 9; Research in Education.

Ed. by F.N. Kerlinger. Baltimore: Peacock and Company,
1975.

2Q Students and Teachers‘gf a Ninth Grade General Mathe-
matics Class. Unpublished doctoral disSertation.
Michigan, 1968.

Simon, H.A., and Newall, A., Human problem solving: The
state of the theory in 1970. American PsyChologist,
1971, g9, 145-159.

Slovic, P., and Lichtenstein, 8., Comparison of Bayesian
and regression approaches to the study of information
processing in judgment. Organizational Behavior and
Human Performance, 1971, Q, 649-744.

Smock, C., and Belovicz, G., understanding 9; Concepts‘gf
Probability Theory by_Junior High School Children.
Final Report. ERIC Document ED 020 147, 1968.

Stevenson, H.W., and Zigler, E.E., Probability learning in
children. Journal 9; Experimental Psychology, 1958,
gg, 185-192.

223

Thompson, M., Models, Problems, and Applications 91.Math—
ematics. Unpublished pre-conference paper for a
conference on topical resource books for mathematics

teachers. Eugene, Oregon, 1974.

Tune, 6.8., Response Preferences. Psychological Bulletin,
1964, 1, 4, 286-302.

Tversky, A., and Kahneman, D., Belief in the law of small
numbers. Psychological Bulletin, 1971, 16, 2, 105-110.

, Availability: A heuristic
for judging frequency and probability. Cognitive
Psychology, 1973, g, 207-232.

, Judgment under uncertainity:
Heuristics and biases. Science, 1974, 185, 4157, 1124-
1131.

Vlek, C.A.J., Multiple probability learning. Acta
Psychologica, 1970, ;§, 207—232.

Weiss, N.A., and Ybseloff, M.L., Finite Mathematics. New
Ybrk: WOrth, 1975.

Wheeler, G., and Beach, L.R., Subjective sampling distri-
butions and conservatism. Organizational Behavior and
Human Performance, 1968,.1, 36-46.

White, C;W., A Study 21 the Ability 21 First and.§gghth
Grade Students 12 Learn Basic Concepts 91 Probability
and the Relationship Between Achievement 15 Probability
and Selected Factors. UnpubliShed doctoral disserta-
tion. Pitsburgh, 1974.

Wilks, S.S., Teaching statistical inference in elementary
mathematics courses. The American Mathematical Monthly,
1958, 65, 143-153.

Williams, J.D., The Compleat Strategyst. New Ybrk: McGraw-
Hill, 1954.

Ybst, P., Siegal, A., and Andrews, J., anverbal probability
judgments by young children. Child Development, 1962,
.11, 769—780.

APPENDI CES

APPENDIX A

OUTLINE OF DAILY PLAN
FOR THE EXPERIMENTAL COURSE

2 Days - Introduction and administration of pretest.

2-3 Days -

Activity 1.

Assignments: Activity 1 written up. S.B.E. Set #4,
Exploring Data, pp. 27-31. Write up
and hand in p. 29 #1-3, and p. 30,31,
#4.

About 4 days -

Activity 2 on tacks.

Go over any problems that came up with the Activity 1
write up when handing back. Towards the end of the
week, see if anyone has come up with any articles
showing the misuses of statistics yet. Suggest a day
for in—class discussion of any articles found.
Assignments: Activity 2 written up. First Homework

assignment on generalization of coin
activity.

About 5 days -

Activity 3.

Go over Activity 2 problems when handing back. Discuss
the equally likely model from the coin experiment vs.
the unequally likely model from the tack experiment.
Discuss “multiplying" probabilities (i.e., independent
events). One day in class will be spent on a Mini-
lecture introducing the sequential counting principle.
Another day in class will be spent playing the game of
picking 5 numbers from 1-12 and betting with the class
that at least two are the same, and then calculating
the probability that this will happen (they should
know about multiplying probabilities by this time.)

Go over Activity 3 in class if there are any big prob-

lems in the write ups. Go over Sets #4 and #10 from
S.B.E. when handing back. -

224

225

Assignments: write up of Activity 3 is due about 2
days after finishing the in-class part.
Assign S.B.E., Set # lO, Exploring Data,
pp. 87-90. Write up and hand in, pp. 89,
90, #1-3. (Note: have them draw dia-
grams of the outcomes in #2 and #3.)
Assign the second homework assignment,
probability problems, after having talked
about multiplying probabilities and having
played the number 1-12 game in class. The
assignment will be on a separate Sheet,
and will consist of the Birthmonth, Birth-
day, Flippant Juror problems from Fifty
Challenging Problems, and of the problems
on pulling two colored balls. They Should
have about 2—3 days to work on these at
home. Solutions to the Flippant Juror
and Birthday Problem can be presented in
class by individual students.

About 7 days -

Activity 4 on counting principles.

Go over the second homework (Birthday set).

Spend some time discussing their articles on misuses

of statistics. Have them present individual solutions

of the "harder" problems of Activity 4.

Assignments: Activity 4 written up. (Or you can wait
to grade this one when they hand in their
logs at mid-term.) Assign S.B.E. weighing
Chances, Set #2 about random digits.

Write up and hand in pp. 12, 13, #1; p. 14,
#1,2; p. 20, *1-5.

About 2 days -

Pull together any loose ends on problem set #2B or
Activity 4. Review for first in-class test.

One day -

First in—class test.

For the second part of the course, try to set some time
aside each week to discuss any misuses of statistics that
the students may have found. It may help to set aside some
specific time slot for this during each week.

1 Day -

Go over the test. Clear up any problems that may still
exist in Activity 4. Outline the rest of the term's
work. (This will include a brief introduction to game
theory and the use of probability in calculating the

226

expected value of a game: an introduction to inferen-
tial statistics (as distinguished from descriptive
statistics that they have had up until now) via the
chi-squared test, and the importance of probability

in the decision making process of inferential statis-
tics; and some work on the effect of sample size upon
the stability of means and variability measures. At
the end of the course there will be an open ended
activity that involves setting up a controlled experi-
ment (or uncontrolled if they decide to go that way!),
gathering some data, analyzing the data, and making a
decision based upon the statistical information that
they have accumulated. The experiment will then be
critized for validity and strength using the experience
that they have obtained from analyzing the articles on
the misuses of statistics.

4 Days -

Activity 5. After they do Activity 5, discuss the re-
sults that they have come up with for their strategies.
Then, give a mini-lecture on the mini-max theory for
strictly determined games with a saddle point, working
through several examples. Continue your lecture on
game theory by introducing games in which mixed strat-
egies are necessary, and explain the method of elements,
a way to calculate the best mixed-strategy for a 2 x2

 

two person game. (See Williams' The Compleat Strategyst,
as well as the notes that will be handed out on game
theory.)

Set aside a day to answer questions on the problem

set dealing with the binomial distribution, "Classifying

Pebbles", assigned below.

Assignments: Read S.B.E. Set #11 in Exploring Data,
"Testing Beer Tasters", pp. 91-97. This
is a good example of a misuse of statis-
tics. Project 2, p. 96 can be done for
fun and extra credit by anyone that is
interested.

Read Set #4 in weighing Chances, "Class-
ifying Pebbles", pp. 33-43. write up and
hand in: p. 40, #1; p. 41, #2,3,5,6,9:
and p. 43, #15a,b,c. (Note: It is likely
that the students will have questions on
S.B.E. #4 on the binomial distribution.
The formula at the top of page 40 appears
rather suddenly, and sometimes the students
do not see what it has to do with the pre-
vious text. There may be a tendency to
leave the coefficients off when applying
this formula to problem 15. Nip it in the
bud.)

227

Assign Problem Set #3 (the five games)
when you have finished lecturing on best
strategies for playing a 2 x2 game.
Activity 5 can be handed in with this
problem set.

About 5 Days -

Activity 6. (About three days) Before they do Activity
6 give a mini-lecture on expected value. Include among
your examples the game which they will simulate 25
times in Activity 6 to see how close “long run average
payoff" comes to theoretical expected value. Talk
about Chi-Squared problem set from S.B.E. (see assign-
ment). They are likely to have difficulty understanding
the table on page 58, to have difficulty calculating
degrees of freedom since the text really does not do a
very good job with this, and to have difficulty under-
standing how to calculate the expected frequencies from
a grid of observed frequencies. Unfortunately, the
text does not do a very good job explaining this clearly
either.

Spend time on misuses of statistics per usual. Explain

the game of craps to them.before assigning the problem

of calculating the probability that the roller wins at
craps.

Assignments: Read S.B.E. Set #6 in weighing Chances on
the chi-squared procedure. Write up and
hand in: p. 56, #1; p. 58, #2; p. 63 and
64, #1-3. Hand in Activity 6 written up.
Assign the problem of calculating the
probability that the roller wins at craps,
due in about a week.

About 5 Days -

Activities 7 and 8. (About two days each) After you
hand back the set on chi-square, give them the problem
of testing their "theory" for the tack experiment, that
is, use their theoretical probabilities for "ups" and
"downs" that they calculated in Activity 2 to get ex-
pected frequencies for each of the outcomes (eight) in
throwing these tacks, and use the chi-square test to
compare these to the observed frequencies (in 63 tosses)
to test the goodness of fit between their theory and
what really happened. (There are 7 degrees of freedom.)
If you have data from Activity 2 that involves two (or
more) theories for the tacks, depending on how the tacks
were dropped i.e., on the floor or on a table top, then
have them test each theory out. If you don't have such
data, and time permits, have them throw three tacks
against the wall and record the outcomes, and test this
against an equally likely model. In general, let dis—
cussion run as long as they are interested in this

228

eXperiment. The calculations can be made at home,

and the results can be discussed the next day in

class. Spend some time going over any misuses of

statistics that they may have found.

Assignments: Read S.B.E. Set #12 in Exploring Data
on Extimating the Size of Wildlife Pop-
ulations. Due, p. 101 and 102, #1,2,3,
and 6. The write up of Activities 7 and
8 is due whenever it is convenient.

 

 

About 5 Days -

Finish any discussion of the tack-theory-testing that
may still not be completed. Go over any problems or
questions before the test, paying particular attention
to any questions on the binomial distribution or on
the chi-squared procedure that may still bother students.
If time permits, discuss the misuses of statistics that
they may have found, or perhaps have a student present
the solution for the crap problem, if anyone has gotten
it.

Second in-class test. (one day)

Go over test. If time permits, introduce Activity 9.

(one day)
Introduce pulse problem to them. "Pulse rates go up
when taken by a member of the opposite sex." This is

the beginning of Activity 9, an open ended problem that
could go many different directions in different classes.
Good luck!

About 4—5 Days -

Activity 9 on plus rates. Whenever it is convenient
during the last two weeks, spend a day on the three
cornered duel, and have them play with their calculators
and the series that evolves in the one case.

The posttest will be administered on one day. Announce
this well in advance, say a week to ten days, and em-
phasize that the test is for your information as to how
much they know about certain concepts, and will in 22
way count towards their grade. Emphasize the importance
that they attend that day.

 

APPENDIX B

ACTIVITIES, PROBLEMS, AND NOTES TO THE INSTRUCTOR.

NOTES ON GAME THEORY AND EXPECTED VALUE

Activity 1.

Before doing this activity, write down your best guess for
each of the following:

If you flip six coins, what is the probability that you
will get,

a) six heads c) four heads
b) five heads d) three heads

Perform this activity in groups of six.

1. Flip six coins. Record the number of heads. Repeat
this experiment 48 times in your group. Arrange the
data in a 4-x12 grid. Use your data to answer each of
these questions.

a) What is the probability of getting 6 heads?
5 heads? 4 heads? three heads? two heads? one
head? no heads?

b) What is the probability of getting at least one
head? At least two heads?

2. Make a list of all the possible outcomes for flipping
six coins.

b) Develop a mathematical model to find the theoret-

ical probability for the outcomes of flipping
six coins.

c) What is the theoretical probability for getting
at least one head? At least two heads?

d) What are the assumptions of your mathematical
model?

3. Compare the experimental probabilities from part 1 with

the theoretical probabilities in part 2 above. How well
do they agree? Make a graph to compare the experimental
(observed) probabilities in 48 flips for 0 heads, 1 head...

229

230

6 heads with the theoretical probabilities, (How many
times they should happen). Plot the two graphs on the
same set of co-ordinate axes. Where is there close
agreement between the two graphs? Where is there not
close agreement? Why do you suppose that this happens?

4. What assumptions have you made in your experiment when
flipping the coins? What suggestions do you have to
improve the experiment?

5. List any other comments, questions, observations, or
reactions that you might have to this activity.

Note to the instructor for Activity 1.

Before the Activity: Activity 1 is intended as an introduc-
tion to equally likely outcomes, and is meant to be tapped
later on as an example of a binomial experiment and utilized
in sampling variation. (Thus the 4-x12 grid!)

Before doing this activity, ask them (fbr definitions of)
"what is probability" soliciting as many different re-
sponses as you can get. Then give several examples
using, say, coins, dice, cards-—something that involves
equally likely outcomes and is simple--and ask them what
the probability of several things is. Then ask them
which, if any, of the definitions help calculate prob-
ability, or describe accurately how they obtained the
probability. The goal is to get them to isolate the
relative frequency model #favorable outcomes .

total # outcomes

During the Activity: As a general rule, do not answer their
questions outright. Try to solicit answers from someone
else in their group. If this fails, try to ask them ques-
tions that might have bearing on their question and for
which they g9 know the answer in an attempt to build back
up to their original question. The goal is for them to do
the mathematics and solve the problems themselves, and for
the instructor to act as a "mathematical physician and
clinician".

After the Activity: After they have performed the experiment
and gathered the data, and answered several questions about
it, put all their results from the coin tosses on the board
and pool their results. (Try to get them to suggest this!)
They could use the pooled results to calculate more exper-
imental probabilities, compare these to both their own
experimental probabilities, and later on to the mathematical
model probabilities and graph all three on the same chart
in question 3. Thus, you could get into the effects of
sample size in an informal way, right off.

 

231

Activity 2. Thumbtacks - A pointed affair.

 

Part 1

a) Before doing this activity, write down your best
guess for probability that your tack will land
upright when dropped.

b) Devise some uniform way of drOpping your tack.
Toss the tack 72 times. Arrange your data in a
6 x12 array with U to indicate upright an D
to indicate down.

c) Based upon the data you have collected, calculate
the probability that the tack lands upright; that
it lands down.

Part 2 Do this part of the activity in groups of 4. Get a
person in your group with each one of the (three)
colored tacks. Use the probabilites calculated in
part 1 to list the probability that the red tack
lands up, that the silver tack lands up, and that
the gold tack lands up.

Ybu will toss the three colored tacks together and
record the results, before performing this experi-
ment, write down your best guess for each of the
following:

a) The probability that all the tacks land up.

b) The probability that no tack lands up (what's
another way to say this?)

c) The probability that at least one tack lands up
(what's another way to say this?)

d) The probability that the red tack lands up.

e) The probability that 2 tacks land up and one lands
down.

Devise some uniform way of dropping the three tacks and per-
form the experiment 63 times. Record your data in a 7)(9
array by listing the results as triples, i.e., UIDD could
stand for red tack up, gold tack down, silver tack down.

Use the data you have gathered to calculate the exper-
imental probabilities for the events in a-e listed above.
Compare these calculations to your guesses. Any surprises?
What assumptions have you made in doing the experiment?
What suggestions do you have to improve this experiment?

Part 3

a)

b)

C)

d)

232

Develope a mathematical model to assign theoretical
probabilities to the outcomes of this experiment.
First, list all possible outcomes for the experi-
ment, then devise a way of assigning a probability
to each outcome.

Use your data from part 2 to determine experimental
probabilities for each of the outcomes listed above
in part a. Compare these experimental probabilites
with the theoretical probabilities for the outcomes
given by your model. Make a graph to compare your
experimental probabilities with the theoretical
probabilities. How well do the graphs agree?

What assumptions have you made in your mathematical
model? Does the model for the tack experiment
differ in any way from the model for the coin ex-
periment? If so, how? Is there any similarity
between the two experiments?

List any other comments, questions, observations,
or reactions that you might have to this activity.

"This branch of mathematics (probability) is the only one,
I believe, in which good writers frequently get results
entirely erroneous."

-- Charles Sanders Pierce

Notes to Instructor for Activity 2.

 

1.

Post result by tack color after part 1. If there
is (probably will be!) wide dispersion in the re-
sults, see if they can give some reasons for it.
They may be willing to redo part one controlling
for some factors. (i.e., height, surface, way
dropping.)

Post results of part 2 by group and outcome. Before
they start writing up part 3 (which can be done out-
side class) attempt to get some kind of "average"
probability for U and D as predictors fbr the
theoretical model. Help them list the outcomes,

as in Activity 1. (Try to talk about the coin
problem with 1 coin, 2 coins, 3 coins, --- 6 coins
as possible motivation fOr a multiplicative prin-
ciple in the model they are about to make.)

They may, or may not be interested in trying tossing
the tacks against the wall to see if the results

233

come up close to SO-SO. If they are tired of
tack tossing, this part can be saved until later
when we do the chi-squared statistic. They can
then test the various theoretical models they have

using X2 and their experimental data.

234

Activity 3. Do this activity in groups of 4. Before per-

 

a)

b)

C)

forming this activity, write down guesses for
each of the following:

the most likely sum for throwing 3 dice, and the
probability of that sum

the least likely sum for throwing 3 dice and the
probability of that sum

the probability that you get a 7
the probability that you get an odd number.

1. Toss the dice 84 times in some unifbrm way. Arrange
the outcomes in a 7 x12 array by both triples, and the
sum of the faces.

For example:

3 5 4
R W G
12

shows that the red die was a 3, the white die a
5, and the green die a 4, and that the sum of the
faces was 12. Record the frequency of each sum,
and make a histogram (see page 5 of SBE Exploring
Data) that will indicate the number of times that
each sum occured. (This can be done at home.)

 

2. Use the data you have collected to calculate experimen-
tal probabilities for each of the following:

a)

b)
C)

d)

e)

f)

Each of the possible sums of the faces of the three
dice. Compare your guesses for most likely and
least likely outcomes to experimental most likely
and least likely ones. How did you do?

The probability that the sum was odd.
The probability that the red die came up a 3.

The probability that you got a 6 on one die and
something other than a six on the other two dice.

The probability that §1_least one die had a 6.

 

What assumptions have you made in doing this ex-
periment?

3.a)

b)

C)

d)

4. Is

235

Develop a mathematical model to describe the exper-
iment and to get theoretical probabilities for the
outcomes. How many different outcomes are there?
How many different ways can each sum occur, i.e.,
how many ways are there to make a 12? List the num-
ber of ways that each outcome can occur.

Use your information to calculate the theoretical
probabilities that a 3,4,...,18 occurs.

Superimpose a histogram for the theoretical proba-
bilities on the top of a histogram for the experi-
mental probabilities. wa well do the two compare?
Any surprises? Why do you suppose the “surprise"
happened?

Calculate theoretical probabilities for those events
listed above in part 2b, 2c, 2d, and 2e.

the mathematical model for this activity more similar

to the model for the tacks or the model for the coins?
How so?

Notes

1.

Statistics are no substitute for judgement.
-- Henry Clay

to Instructor for Activity 3.

Post results by group and sum after the experiment
is done. Pool results. Suggest third histogram
with class totals to be superimposed on group
total and theoretical histograms. Talk about most
likely and least likely outcomes.

This activity may draw discussion on "how" to
list outcomes i.e., 15 sums or 216 triples. The
triples yield more imformation and can be used
to calculate probabilities of the sums. The
equally likely model applies.

The general partition problem is hard, unsolved in
part! It may bear mentioning. (i.e., no formula
to yield the number of partitions of a given number).

236

Activity 4. Counting Outcomes and Counting Redundancies:

 

Tbols for calculating probabilities.

The sequential counting principle will help you get started
on these problems.

1. Slobbobic Spellink.

 

The county of Lower Slobbobia uses the arabic
alphabet (26 letters) in their written language. HOw—
ever, in Slobbobian, any arrangement of the letters
makes a word.

HOW many different words can be written in
Slobbobian using the letters in each case below.

 

a) G A K (List them)

b) P Z U B (List them) How about Z Z U B?
c) E Z A K L (Just say how many)

d) L Z A K L (List them)

e) L Z A L L (List them)

f) In parts d) and e), some spellings are redundant,
that is, yield the same word over again.

How many redundant spellings of the single word
L Z A L K (from part d) are there?

HOW many redundant spellings of the word L Z A L L
(from part e)? Give a reason for your answer.

So, what percentage, or fraction, of the total number
of possible arrangements of the five letters
L Z A K L will be redundant?

Same question for the letters L Z A L L.

2. How many different words in Slobbobian can be written
from each of the following:

a) J Z E K E K
b) J
c) H

M

Z U Z U
T H T T
d) I S S I

03:3 G

S I P P I

3. Generalizations

a) If you have N distinct letters, how many different
words (in Slobbobian of course!) can you write?

b) Now, suppose some of those N letters are identical
(as happens above.) How would you modify your answer
to part a) so that you count only distinct words?

237

c) Can you suggest a general rule for these kind of
spelling problems?

d) Using H for heads, T for tails, and using your
general rule from part c), count the number of ways
you can get 6 heads, 5 heads, 1 tail, 4 heads
2 tails, . . . 1 head 5 tails, no heads, for flip-
ping coins. (Do this at home - you did part of it
in 2c already).

4. Suppose that six people are running a race.

a) How many different ways could there be a first
place, then a second place, then a third place
finish, that is, how many different one - two -
three finishings are there?

b) Suppose we are only interested in whether or not
a runner finishes in the top three. That is, we
are concerned about Egg the first three runners
are, but we don't care about the order that they
finished in.

How many different groups of three people could
cross the finish line? (Hint: How many times was
each group of three counted in part a) above? i.e.,
how many redundancies are there for each group?)

c) HOw many different groups of 4 people could cross
the finish line? Of 5 people? Of 2 people? Of 1
person? Of 6 people? Where have you seen these
numbers before?

5. ‘Write up answers to question 4(a-c). When there are 5
people running the race. When there are 4 people run-

ning the race. (This can be done outside of class
provided that you are able to answer question 6 at this
point.)

6. a) Can you give a general rule for the number of ways
of choosing a subset of x-persons (or things) from
a set of y-persons (or things)?

b) HOW many different groups of 12 can be choosen from
a group of 25?

c) How many different ways can you get 3 heads if you
toss 8 coins?

II Counting Challenges

 

Using the sequential counting principle, as well as
what you have learned about redundancies, you can get a
good start on these problems.

238

1. How many Michigan license plates can be made?

2. How many double dip Baskin and Robbins ice cream cones
can be made (if all flavors are in)?

Does your answer depend on anything? Is there another
possible answer?

3. How many different pizza toppings are possible if you
had cheese, mushrooms, pepperoni, onions, and sausage
to work with?

4. An ordinary deck of 52 cards has 4 suits with 13 denom-
inations in each suit (ace, 2, 3,..., king).

a) HOw many different pairs of jacks are there?
How many different triples of jacks?

b) HOw many possible pairs can be made in the whole
deck? How many possible triples?

5. a) How many different ways can you get 5 cards of the
same suit? (Called a fluSh)

b) How many different flushes would there be in the
whole deck?

c) How many different ways can you get a hand of five
cards that goes 4-5-6-7-8? (8 high straight!)

d) How many different ways can a straight start, i.e.,
having a lowest card?

e) Use c and d to count the number of possible
different straights.

6. a) What is the probability that you get dealt a pair
in a 5 card poker game? (Hint: What is the prob-
ability of not getting a pair?)

b) How many different 5 card poker hands are there?

c) Using part b), and your answer to 5b) and 5e) above,
find the probability that you get dealt a flush.
A straight. Have you made any assumptions in your
answers for the probabilities of a straight and a
flush?

Notes to Instructors on Activity 4.

 

1. Before doing Activity 4, give a mini-lecture with
several examples that lead into the sequential
counting principle, such as:

239

How many roads are there from N.Y. to L.A. if
there are three from N.Y. to St. Louis, and 4
from St. Louis to L.A.? How many telephone
numbers are possible? How many on campus? and
so on!

The first objective of Activity 4 is to get students
to isolate the principle of dividing out by redun-
dancies in counting problems. Asking such questions
as, "For a fixed word (of N letters), how many
ways can you get the same word? (then) What frac-
tion of the total number of possible words would

be redundant?" may help them.

The special case of this principle which is usually
called "combinations" is developed in question 4,
leading up to a general rule. Please do not use
the words "combinations" or "permutations" in re-
ferring to any of the problems in this activity.
The sequential counting principle will suffice to
approach all these problems, and then the concept
of redundancies will help to keep from double
(triple etc.) counting the outcomes. Try to get
the students to develop their own rules. If they
get an incorrect one suggest an example that will
lead them to a contradiction. This process is slow
but helps students to avoid misusing formulas in-
stead of analyzing the problem.

Some of the problems in Counting Challenges will
require time outside of class. Copious hints may
be needed on these problems.

The students can do the first part la) 4 If) at
home after your mini-lecture (in interest of saving
class time.) In fact, they'll need to spend time
everyday outside of class on this activity. Encour-
age them to do so.

Some of the harder poker hands such as 2 pair, 3
of a kind, full house, could be given as extra
challenges.

240

Activity 5. Introduction to game theory.

 

1.

You are playing a game with a friend in a bar. Both
of you show either one or two fingers.

If you show one finger and your friend shows two, you
win $10. If you show two fingers and he shows one
finger, you win $30. If you both show the same number
of fingers - both one or both two - your friend wins
$20.

Play this game with a partner 20 times. Keep a record
of the payoffs and whose getting them. Switch roles
and play it another 20 times, recording the payoffs

as above. As you play the game, try to figure out
what's the best thing for you to do, i.e., what's your
best strategy. After you have played it both ways,
write down what you think the best strategy is for
each player.

Play this game with a partner.

Take the cards from 2 to 9 in a black suit and in a

red suit out of a deck, so you'll have 16 cards in all.
Shuffle and arrange them in a 4)(4 grid. Let one person
be black, the other be red.

Black picks a row and red picks a column in secret of
course! Whatever the card is in the row and column
that was picked is the payoff. A red 6 gives 6 points
to red, a black 8 gives 8 points to black.

Play the game about 20 times with a partner, keeping

a running tally. Can you figure out a best strategy
for each player? If you can't figure out a "best" one,
can you suggest several good possibilities for each
player?

If you got all black cards in a row, or all red cards
in a column, what happens? If this does happen to you
reshuffle and try again. Draw g_picture of the game
you play for your log.

CHALLENGE QUESTION (for outside of class work). What
is the probability that you do get all blacks in a row
or all reds in a column when you set up your grid?

A husband and wife, Jack and Beth, are mountain climbing
in the Rockies. Beth likes trails and campsites that
are at high altitudes, Jack likes low altitudes. The
area of the mountains they are interested in exploring
is criss-crossed by a network of trails, four running
north-south and fbur running east-west. The campers
have agreed to camp at a junction of two of these roads.
The intersections of the N-S and E-W trails are at

241

various altitudes, so to make it as fair as possible,
the couple decides that Beth will choose a north-
south road and Jack an east-west road, and they will
camp at the intersection. Of course, Beth would like
to be as high up as possible, and Jack as low down as
possible. Below is a matrix of the four choices for
each camper with the numbers in each slot representing
the altitude in thousands of feet at the intersection
of those roads.

Each camper gets only one choice. What should they

 

 

 

 

choose?
Jack
1 2 3 4
l 7 2 5 l
2 2 2 3 4
Beth
3 5 3 4 4
4 3 2 1 6

 

 

 

 

 

 

Write down what you have concluded, and why you think
it should be done that way?

Notes to Instructor on Activity 5.

 

It is not important that the groups figure out the
best answer or the right answers for these problems. This
activity is intended to get them thinking about strategies,
playing some alternatives some of the time and then switch-
ing, or perhaps choosing one strategy and then sticking with
it.

After the activity has been completed, discuss the
various strategies that came up in the groups for each of
the three games.

The last game may be used as a lead in to a mini-lecture
on the mini-max pure strategy. The first game can be used
when you begin to talk about mixed strategies.

The Challenge Question is meant for outside of class.
It is intended as a extra problem for anyone who is inter-
ested.

242

Activity 6. EXpected value.

 

l. The oddments for this game have been
5§x34 1 calculated previously.

Use coins, or the table or random numbers, to simulate
the playing of this game. (Ybur simulation needs to

be 50-50 for the column choice, but 3 to l for the row
choice.) ‘Write down how you simulated the game, play

the game 25 times using the simulation for a row player's
choice and a column players' choice, and list the payoff
each time. Find the mean of the 25 payoffs. How close
is your mean to the theoretical expected value of this
game? (This is what we mean by long run average ex-
pected payoff.)

2. Carnivals often have a game called OVER and UNDER.
(This game was notorious a couple years ago in church
festivals.) TWo dice are rolled down a chute. Someone
playing the game can bet that the dice will show a
number OVER 7, UNDER 7, or they can bet on 7. OVER and
UNDER each pay even money, and 7 pays 4 times the bet.
Where is the best place to bet?

a) Calculate the expected value of $1 bet on OVER.
How about for UNDER?

b) Calculate the eXpected value of a $1 bet on 7.

c) Is this game a fair game?

(Note: Recall what the outcomes are for rolling
two dice!)

3. Another Carnival game involves a cage with two dice in
it. Players bet on the number that comes up. The pay-
offs are as follows:

a) 8 (or 6) pays even money
b) 9 (or 5) pays two to one
c) 10 (or 4) pays four to one
d) 11 (or 3) pays six to one
e) 12 (or 2) pays ten to one.

If a 7 shows up, the house always wins every-
thing! Where is the best place to bet? Write
down what you think at first glance.

243

Calculate the expected value for a $1 bet on each
of 8, 9, 10, 11, and 12. (Keep in mind, if you bet
on 8, anything else loses when you calculate the prob—
ability of losing!)

What is the best place to play? Did you guess it?
How do you feel about this game?

In the notes the game "$5 is you roll a 6, otherwise
lose $1.50" was discussed. The expected value for that
game was

3. (41.50) + % $5.00 = :‘Qéﬂ = -.41¢

How could you change the payoffs to make it a fair game?

Optimal strategies for playing each of the games below
have already been calculated in problem set #3. Use

them to find the expected value for each of the following
2 x2 games:

a) [20 -10] b) [:4 6] c) [3 1]
-30 20 5 2 4 3
d) [60 100] e) [ O -l:)
100 80 -3 0

In each of the following, set up the payoff-matrix for
the 2 x2 games described, calculate the best strategy

for each player, and find the expected value of the
game.

a) Fast Eddie and Slow Sam are deep into a TGIF
party at Lizard's. Fast Eddie says, "Sam,
let's play a game. we'll throw fingers, either
one or two. If we both throw one finger, you
buy me a beer. If we both throw two fingers,
you buy me two beers. If we don't match, you
just pay me a dime." Slow Sam knows that he
is probably at a disadvantage if he plays this
game, but he decides to try to figure out how
much advance compensation he should receive
from Eddie each time they play the game. Set
up the payoff matrix, determine the best strat-
egies for each player, determine the value of
the game, and then determine how much Fast Eddie
should cough up on each play to make the game
fair. (Determine your own price for a beer.)

244

b) After leaving the bar, Slow Sam tries to
remember whether today is his anniversary
or not. If it is, he should bring his wife
some flowers. He reasons, in his disabled
condition, as follows:

"If it is and I do bring them, I will
have at least 2 good days (no griping) to
look forward to. If it is, and I don't bring
them, I will have at least 10 bad days (con-
stant griping) to look forward to. If it
isn't and I do bring the flowers, I will get
1 good day in. If it isn't and I don't bring
the flowers, nothing is lost and nothing is
gained!

Set up the payoff matrix, calculate the strat-
egy and expected payoff to Slow Sam.

Notes to Instructor on Activity 6.

1.

Give a mini-lecture on expected value, for 2)(2
games and in general, before the students do Activ—
ity 7.

Cover several (or manY!) examples thoroughly.

Post the results of each group for number 1 on
Activity 7. It may be interesting to find the
Grand Mean Payoff for the pooled results.

When they set up a payoff matrix, the may forget
our convention of letting payoffs be for the row
player, and be dealing with the "transpose" of the
correct payoff matrix. This is particularly in-
teresting for game 6b. Try it both ways and see
what Slow Sam's decision would be.

Problem 5 can be done at home, and then results
discussed in the groups.

245

Activity 7. Sample size.

 

HOw many cards would you have to turn over from the top of
a well—shuffled deck so that there was at least a 50% chance
that an ace was among them?

Put down a guess.

a)

b)

C)

d)

e)

Carry out an eXperiment with a deck of cards.
Shuffle well each time. See how many cards you
have to turn over until the first ace. Do 10
trials of the exPerimant, and list the number of
cards turned over each time in a column.

Using your data from these trials, how many cards
would you have to turn over so that you hit an ace
half the time? What is the median number of cards
you turned over?

Now, do the experiment ten more times and list the
outcomes in a second column of 10, so that you now
have 20 trials. Using your data from these 20
trials, how many cards would you have to turn over
so that you hit an ace half the time?

What is the most number of cards you had to
turn over?

What is the least number of cards you had to
turn over?

What is the median number of cards turned over?

Post your results for a) and b) on the board. Make
a lO><lO grid of all the results of the other groups
and put it in your log.

Make a list of the answers for parts a) and b) from
the other groups data. [Thus, you will have 5 (or
so) estimates for 10 trials, 5 (or so) estimates
for 20 trials, 5 (or so) highs, lows, and two sets
of 5 (or so) medians - one for trials of 10 and one
for trials of 20.]

Use probability to calculate the exact number of
cards that have to be turned over (from the top of
a well-shuffled deck) so that you have at least a
50% chance of getting an ace. (Hint: What is the
probability that the first card is not an ace?)

Compare your theoretical results from part d) to

the estimate for 10 trials, and for 20 trials in
part c). How close were the estimates for 10 trials?
How close were the estimates for 20 trials? Are you
surprised by anything?

246

Use the data from the pooled 100 trials to get an es-
timate for the number of cards you would have to pull
off the top in order to get an ace about half the time.
How well does this estimate compare with the theoretical
number of cards calculated in part d))

Notes to the Instructor for Activityi7.

This activity is intended to show the effects of vary-
ing sample size in an experiment. With any luck, the medians
for sample size 10 will be less stable than the medians for
size 20. Also, the samples of 20 (and of course 100) should
be better estimators of the theoretical probability for the
eXperiment.

All results for each group should be posted on the
board so that question c) and e) can be answered.

The data from this activity may be used later in ac-
tivity 8, which deals with measures of spread of variability,
and the effects of sample size on estimates of the variability.

247

Activity 8. Notes on measures of central tendency and
variability. Problems and activities are
included.

 

Measures of Central Tendengy.

 

‘We have discussed these before. The median for a set
of numbers is the middle point of that set, i.e., the value
that half the scores lie above and half lie below. The mode
is the most frequent value of a set of numbers. The mean
is the arithmetic average of the numbers, i.e., the sum
divided by the number of entries.

1. Find the median and the mode from your data in the
dice experiment (Activity 3). Enter in your log.
The numbers 3 through 18 are the possible outcomes.

2. For this problem, use your table of random numbers
(SBE Weighing Chances p. 15). Consider lines 36-40 in
the chart. In line 36, find the mean of the first five
two—digit numbers. Repeat this for each of lines 37, 38,
39, and 40, picking the first five two-digit numbers and
averaging. List the five means you have calculated.
Each group will be assigned a 5 x5 block of two-digit
numbers to calculate the mean for. List the five means
from the blocks of (25) two-digit numbers.

 

3. What would you expect the mean (average) of the numbers
00 to 99 to be? How does your expectation compare with
what you found in number 2 above for the sets of five
two-digit numbers? HOw well do the mean for 25 numbers
compare to the expected mean? Which would you eXpect to
be better and why, the sets of five, or the sets of 25?

Measures of Variability

Suppose that you were told that the mean of a sample
of numbers was 30. What could you say about the numbers?
Not a heck of a lot! The numbers could be 120, 10, l, 15,
and 4 because 120'+10'%%'+15'+4 = 30. On the other hand,
they could be 32, 31, 30, 29, 28, or for that matter, 30,
30, 30, 30, and 30! In fact, the numbers were really 61,

44, 2, 9, and 34 all the time. (Didn't you know?!)

Perhaps you see the problem here. Just knowing the
mean of a set of numbers does not tell you very much about
the relative sizes of the numbers, or their frequency. That
is, the mean does not give any information about the dis-
persion of the set of numbers. What is needed is some mea-
sure of the expected "variability" of a set of numbers.

 

248

Attempt l.

 

Since the mean is a measure of the center of a set of
numbers, it would seem to be a reasonable focal point from
which to consider variability in a sample of numbers. That
is, how much does the sample of numbers bounce around above
and below the mean? One possible way to measure this "boun-
cing around" would be to find the average of the differences
of the numbers from the mean. Suppose we try this on 61,
44, 2, 9, and 34. The mean is 30, so we get

(61 -30) + (44 -30) + (34- 30) + (2 —30) + (9 -30)
5
4. a) What does this come out to?

b) Does this answer depend on the numbers we used?
Try another set of numbers.

c) How do you feel about this approach to variability?

Attempt 2.

 

Suppose we took just the pgsitive differences of the
numbers from the mean. That is, take the absolute value of
each difference. we get:

161-301+j44—301+134-30j+l2-30j +j9-30l
. . . . . 5 - . L . 1

Check what this comes out to. This approach to measuring
deviation from the mean is alright for some purposes. It
is called the mean absolute deviation for a sample of numbers.

Attempt 3.

Another possibility would be to generalize the distance
formula for two points in a plane. It is, in fact, this
approach to variability which helps statisticians estimate
the theoretical variability of large samples from the ob-
served variability actually calculated in a small sample.
Thus it is possible to estimate the variability in incomes
across the nation from a small sample of incomes (carefully
chosen to eliminate sources of bias), instead of attempting
the impossible task of tabulating every working person's
income.

The experimental value for the variability as defined
below turns out to be a better estimator of the theoretical
variability for a whole population than does the mean abso-
lute deviation in number 2 above.

Recall the formula used to calculate the distance be-
tween two points in the cartesian plane. If the points have
co-ordinates (2,5) and (8,-2), then the distance between them
is:

249

 

./(2—8)2+ (5—(—2))2 = ,/36'+"'49 = ,7?
,/8—5 m 9.24

Our approach to measuring variability will be a gen-
eralization of the distance formula. we measure a type
of average "distance" of our set of numbers from the mean.
The distance is called the standard deviation of the numbers.

If the numbers are 61, 34, 44, 9, and 2, the standard
deviation of these numbers from the mean, 30 in this case, is

 

 

ﬂu - 30)2 +34 -30)2 + (34r—30)2 +2 - 30)2 + (9 -30)2
5

Use a calculator to find the value here.

(Note: The standard deviation is a generalized distance.
Above we calculated the "distance" between the point (61, 44,
34, 2, 9) and the point (30, 30, 30, 30, 30). NOtice that
these points are in 5-space (dimensions), whereas you are
perhaps most familiar with the distance formula in two dimen-
sions.)

5. Ybu have calculated 5 means from samples of number in
each of rows 36-40 of your table of random digits.
(number 2, above). Calculate the standard deviations
for each of these five sets of two digit numbers (5
numbers in each set) and list them.

Calculate the standard deviation for the block of 25
two digit numbers which was assigned to your group.
Make a list of these results from each of the groups.

YOu now should have two lists of standard deviations,
one for five two-digit numbers and one for the blocks

of 25 two—digit numbers. There are five entries on each
list.

6. Consider the two lists of (5) standard deviations. Can
you say anything about the behavior of the standard de-
viations for samples of five numbers, as compared to the
standard deviations for the samples of 25 numbers? Which
set of standard deviations would you think are the more
accurate indicators of the theoretical variability that
could be expected above and below the mean for the sample
of all two digit numbers? What does sample size have to
do with variability?

7. This can be done at home. Consider your results from
the dice experiment. The outcomes should be arranged

a)

b)

250

Pick out four samples of 7 and four samples of
12 and calculate their means and standard de-
viations. Try to pick the ones that you think
will vary the most.

Which means and standard deviations would you
expect to be the more accurate indicators of their
theoretical counterparts and why? Does your data
from part (a) verify your expectations?

Notes to the Instructor for Activity 8.

 

1.

List the five means for samples of 5, and the 5
means for samples of 25 on the board. (The re-
maining blocks of 25 in rows 36-40 of the chart
of random digits give fairly useful results for
this activity. These would be good examples to
use when assigning a block to each group. Perhaps
you can find a set of blocks whose means behave
even more wildly).

Try to get them to point out the effect of sample
size on the instability of the mean (and of the
standard deviation later on). You might calculate
the mean and standard deviation of the entire set
of 125 two-digit numbers in rows 36-40, and post
them along with the classes results for samples

of 5 or 25. Perhaps some of them can be induced
to do this. (The standard deviation is a long
process for a five-operation calculator).

Part 7 should be done at home. It is not necessary
to take up class time with this excursion.

251

Notes to the Instructor for Activity 9.

It is difficult to know where a class will go after
the statement, "Pulse rates go up when taken by a member
of the opposite sex" is written on the board. The goal
of this activity is for the class, or individual group,
to design and carry out a study that will test the truth
of this statement. Thus, the participants need to devise
an experiment that includes (among other things):

a) A way of obtaining data

b) The gathering of the data

c) Analysis of the data

d) Conclusions based on results
e) Limitations of the study

It may happen that the entire class will want to try
this out immediately on each other. Now the result of this
will, of course, be chaos, and a very poor experimental
design. However, this may pay off in the long run because
the students can experience the flaws of their design, and
learn how to criticize an experiment. Then, if time permits,
they could design a more controlled eXperiment that would
eliminate the sources of bias that they encountered in their
haphazard approach.

It may happen that a preponderance of one sex or the
other makes it difficult to even consider carrying out the
experiment within the class. Then a class may be forced
outside its own boundaries to collect the data. They Should
consider carefully a design that will eliminate any sources
of bias, including how the pulse rates are taken, by whom,
in what surrounding, after how much rest, etc.

In any case, the eXperiences they have gained from cri-
tiqueing misuses of statistics and from reading Huff' book
How to Lie with Statistics should help them find and eliminate
sources of bias. It may be easier for them to recognize the
sources of bias if they actually do the experiment badly at
first.

 

It may be possible to analyze the data using a x2-test
of the frequency of male (up-down) and female (up-down) to
test the truth of the pulse rate statement. Chi-Squared
does not assume anything about the sample populations ex—
cept independence. (Which, of course, you won't have if
they do it on themselves!)

An important point to emphasize about the results is
that "trends" do not always warrant the strength of a defi—
nite conclusion. The necessity of a good design that can be

252

statistically tested can be brought out in discussing the
data. The important point is: Is this result likely to
occur in a random sample anyway?

Once the class has embarked on a particular course
of action, you will have to make up specific questions
for them to answer. The questions should focus on a)-e)
above. Have fun!

Activity 9

For each person in your sample, you have a pulse rate
taken by that person, two pulse rates taken by members of
the same sex, and two pulse rates taken by members of the
opposite sex. In addition you have an entry after 30 seconds,
and an entry after 60 seconds for all four pulse trials by
other people, as well as for your own pulse trial.

First, use the 60 second data, and set up the following
chart, with all the females at the top of the chart, and
all the males at the bottom. Skip a line between the males
and females.

(all females first) PULSE BY PULSE BY
PULSE BY SELF SAME ssx(2) OPP. SEX(2)
76 72, 79 76, 89
Ex. 1 ° - .

When you have finished this, count up the total number
of times that a female pulse went up when taken by a member
of the opposite sex. In the example above, only the 89
counts, since the 76 was the same as Pulse—by—self. Do the
same thing for males.

Now, repeat this process, but now use the pulses taken
by the same sex. Do this for each sex, as above.

Set up the following charts.

# Pulse rates up # Pulse rates not up

 

(by opp. sex)
Male

 

Female

 

 

 

 

(by same sex)

253

Pulse rates up ,#Pulse rates not up_

 

Male

 

 

 

 

Female

 

(optional)

If you have time and wish to do more, do the exact

same thing as above for the 30 second data.

1.

Use the chart that you have set up for the influence

of opposite sex on pulse rates of males and females to
see if there was any significant effect on the rates.
YOu have the observed frequencies of pulse rates that
went up, or due not go up. Calculate the expected fre-
quencies from your two-by-two grid. Use the chi-square
test to see if there is a significant difference between
the exPected frequencies and the observed frequencies.

What is your conclusion, do pulse rates go up when taken
by a member of the opposite sex?

Repeat these calculations for the second chart in which
you have the observed frequencies for the pulse rates
when taken by the same egg, Calculate the chi-square
statistic for this case.

What is your conclusion, do pulse rates go up when taken
by a member of the same sex?

(Optional)

3.

In question 1 and 2 above, you have used the pulse rates
when taken over a sixty second interval. If you wish

to do more, answer questions 1 and 2 above for the 30
second pulse rate data. You will have to set up charts
similar to the charts for pulse rates taken by opp. sex
and same sex, except this time the frequencies entered
are observations from the 30 second data.

What are your conclusions, do pulse rates go up when
taken by a member of the opposite sex? of the same sex?

Did you get a different result than the result from the
60 second data? If so, could you suggest some possible
reasons why you got a difference?

‘Write up a criticism of this experiment. Include in

your analysis suggestions for improving this experiment.
What uncontrolled factors may have influenced the results
of this experiment?

254

NOTES ON THE THEORY OF GAMES: DEFINITION OF TERMS AND
OPTIMAL GRAND STRATEGIES

The games that we will be considering are two-person
games. Each player has a fixed set of choices for each
play, and makes the play without knowing what his opponent
has, (or will), done (do). The game will be played many
times, and the object of the game will be for each player
to try to maximize his gains (or if necessary minimze his
losses) in the long run over many plays. The outcomes from
the plays of the game are determined by the combined choices
of the two players involved, and so can be nicely represented
in an array, called a matrix.

For example, in game three of Activity 6, the payoff
to one of the players is $20 when both fingers match, and
if the fingers do not match, the payoff is $10 in one case
and $30 dollars in the other. We could set up a payoff-
matrix from the point of view of the player who wins when
the fingers match as follows:

# fingers l 6 $20 -$10 second player

lst player 2 -$30 $20

Thus, the entries in the matrix are the payoffs to the
Egg player. Some payoff matrices have all positive entries,
as for example, the mountain climbing friends and their
choice of a campsite elevation. The payoff matrix for this
game was if you recall:

Elevations of roads 2nd player
r‘7 2 5 1
2 2 3 4
lst player 5 3 4 4
L3 2 1 6d

 

 

Sometimes it is clear that the player should always
make the same choice in a game in order to have the least
chance of getting burned. Sometimes the players have to
mix up the order or the frequency that they play certain
rows (columns) in order to have the best chance of winning.
The first case is called a game with a pure strategy, and
the second case a game with a mixed strategy. We will see
how to determine whether a game should be played with a
pure or mixed strategy, and if it falls into the mixed
strategy case, what the best mixed strategy should be.

255

Case 1: The search for appure strategy.

Strategy g§_row player. The row player makes the
assumption that his opponent is going to make the best
choice possible. Thus the row player asks, "What is the
worst that can happen to me in each row?" The row player
then picks the row that has the "best of the worst" out-
comes in it, that is, under this strategy the row player
picks the row that has the largest of the row minima.
Consider the following game.

Row min
7 2 5 2
4 3 8 (3)
2 l 6 l

The row player would then pick row 2, since the
largest of the row minima is in row 2, a payoff of 3.

Strategy for column p1ayer. Similarly, the column
player assumes that his opponent has made a choice that
will get him the most possible payoff. His strategy, under
this assumption, is to scan the columns for the largest
entries, and then choose the smallest one of these. (Small-
est of the column maxima.) For example, in the game above
the column maxima are:

7 ® 8 :- col max
7 2 5
4 3 8
2 l 6

So the column player would pick column two under this
strategy.

Note!! This is a mathematical model for a strategy to play
a game that assumes that your opponent is not a
complete turkey. If the column player begins to
play the first column in the game above, in hopes
of limiting his opponent to a payoff to two units,
then he can be burned by the row player. The row

player switches to row 1 and walks off with a fat
7!

The game above is called a stric1y determined game,
because the largest of the row minima, 3, is equal to the
smallest of the row maxima, 3. When this occurs, that entry

 

I‘llIIIl
1|,

256

in the matrix is called a saddle point, and strategy for
the game is pure. Each player can do best by always play-
ing that one choice, in this case, row 2 and column 2.

Case 2. Calculating the best mixed strategy when there
is no saddle point in the game.

 

Suppose that the entries in the game above were
slightly changed, and the game looked like this:

7 2 5
3 4 8
2 1 6

Now when we scan for the row minima, and the column maxima,
we obtain the following:

Row min
7 2 5 2
3 4 8 @ largest
l 6 1

column max 7 @ 8
smallest

If our players utilize the strategy that we outlined
above, the row player would again pick row 2, expecting a
payoff of 3 units. The column player would again pick
column two, expecting that his opponent would win 4 units!
If they make these choices, the row player would indeed
win (much to his surprise and happiness) 4 units. Now what
goes wrong here? The column player Should probably play
column 1, at least part of the time, in hopes of limiting
his opponent to the entry in the 2-1 slop of the matrix,

a payoff of 3. But he can't do it all the time since the
row person will then switch to row 1 and clean up with a 7.

So, the eXpected payoff is not either 3, or 4 units,
but lies somewhere inbetween. (the: this assumes that
the column guy is not so stupid as to ever pick column three
in the game!) The question is then, how much of the time
should the column player play the first column, and how much
the second column? Recall that his object is to minimize
the amount that the row player will collect, a number that
is obviously at least 3, but hopefully can be kept under 4
in the long run.

257

METHOD 9; ODDMENTS

4 6 (E) row min
Consider the following game:
7 5 5

col max '7 C)

If we check for a saddle point we find that there isn't one.
The row player would expect a payoff of 4, and column player
would expect to give up 6, if we played the game according
to the pure strategy above. There is quite a difference
between the expectations of the two players. Surely the
column player can do better than that! In order to calcu-
late the proportion of time that the column player should
switch to column one, (he hopes to stick Opponent with only
a 4), or that the row player Should switch to row two (he
hopes to win a few 7's) we use the method of oddments, an
alogorithm for determining best mixed strategy.

Column players Odds.

The column player finds the difference (called the oddment)
between the two row entries in each column. The oddment for
the first column represents the proportion of time the second
column should be played, and the oddment for the second column
represents the proportion of time that the first column should
be played. (Its just reversed!) For example, in our matrix
above, the column oddments are:

4 6

7 X 5

col. oddments 4 3 1

 

Thus, the oddment for column two is 3, and the oddment for
column one is 1. (Remember, its reversed.) This says that
the column player's best mixed strategy is to play the
columns in the ratio of 3:1 in favor of the second column.
Another way to say this is that the Odds Should be 3:1 in
favor of playing the second column, or that in the long run,
the column player should play column two 3 times for every
1 play of column one. The probability of playing column
one is 1/4, and the probability of playing column two is
3/4, if we convert Odds into probability.

Row player odds.

 

The row player calculates his oddments in a similar manner.
He subtracts the entries in the two columns of each row. As
before, the oddments are reversed. So in our example above

258

4 6 2 Row oddments

we calculate the oddment for each row to be 2, so the row
player should play the rows in the proportion of 2:2, or
in other words, he should play each row half of the time,
in order to counter the column players best mixed strategy.

These two strategies, pure and mixed, will enable you
to optimize your winnings in any two-by two game played by
two players. If there is a saddle point, you have access
to the best strategy for any size (n by m) two person
game. When there is not a saddle point in a larger game,
like the example on page two of these notes (it is 3 by 3),
then the best mixed strategy must be calculated in a more

complicated manner than straight application Of the method
of oddments.

Problem Set #3

In each game below, calculate the best strategy for
each player. (A good rule of thumb is to always try to
find a saddle point first.)

20 —lO 4 6 3 1

—3O 20 5 2 4 3

60 100 O -l

100 80 -3 O

259

NOTES ON EXPECTED VALUE

Lopg Run Payoffs for two person games.

In the case of a two person game that is strictly
determined, i.e., has a saddle point, the "long-run pay-
off“ for the game is Obvious. Consider the game

3 6 3 row min

45(4)

col max @ 6

In this game, 4 is a saddle point since the largest
of the row min = 4 = the smallest of the column max. Thus,
the average long run expected payoff (sometimes called ex-
pected value) for this game is 4 to the row player (-4 to
the column player).

If the entries in the game above are slightly differ—
ent,

3 6 3 row min

5 4 4
col max 5 6

there is no saddle point. The row player would expect to
win a 4 if he played the largest row min, the column player
would expect to pay out 5 to the row player if he played
the smallest column max. So, the expected value (long-run
average payoff) of this game neither 4 nor 5, but somewhere
in between, because the players will use a mixed strategy
and bounce back and forth, choosing the rows (column) ac-
cording to the oddments.

To find the value of a two_person 2 x2 game:

1. In the strictly determined case, the expected value 1§_
exactly the value Of the saddle point.

2. If the game is not strictly determined:
a) Find the oddments for each player.

b) Change the odds to probabilities so that you know
what the probability of the row player (column
player) is for playing each row (column).

260

c) to calculate the expected value (payoff), either
1) pick a row, multiply the payoffs in each slot
Of that row by the respective probabilities that
the column player picks the entry in that row,
and add up.

or

ii) pick a column, multiply the payoffs in each
slot of that column by the respective probabil-
ities that the row player picks the entry in that
column, and add up.

6
4

seen that there is no saddle point. Calculating by the
method of oddments we see that

For example, in the game [3 ] above, we have
5

3 6 3
5 4 1
2jx<2

the row player's odds are 3:1 in favor of playing row 2,
and that the column player's odds are 2:2, indicating that
he should split his choices evenly. Converting to prob-
abilities, the row player plays row one 1/4 of the time
and row two 3/4 of the time. The column player plays each
column 1/2 of the time.

Thus if we pick the first row, [3 6], the expected value
can be calculated as follows:

1 1 -29—2- 1
2’3+'2"6‘2+2“2‘42

Note that we multiply the probability that the payoff
in the first slot % 3, and likewise for the probability
that he plays the second column by the payoff in the second
slot,-%- 6, and then add. Thus we are calculating an aver-
age payoff.

It would make no difference if you picked the second
row:

1 l _9__ 1 -
2.. 4+5. 5 _ 2 _ 42 g1ves the same payoff.

261

We could just have easily used a column, and the row
player's oddments within that column. Column one looks

like [3 , and the payoff would be -%- 3-t%w’5 = §~+%§-=
5

18._ 1

7f — 42 .

Here we used the row player's probabilities for choices
within a fixed column. Note: expected value is really
a weighted average Of payoffs, weighted by the probabil-
ities 1/4 and 374 in this case.

 

The same results occur if we picked the second column:

1 3 6 12 18 l
Z'6+Z’4'4+T‘T‘4'2'

So, using any row (or column), and the probabilities
of the choices occuring in each slot, it is easy to cal—
culate the eXpected value of a game. (It is a good idea
to do it several of the 4 ways so that you have a check
on your calculations and on the method Of oddments you
calculated.)

EXpected value in General

So, the concept of expected value involves two things:
probabilities and payoffs. To calculate expected value, we
multiply the probability that each outcome occurs by the
payoff (could be positive or negative) for that outcome,
and then add them all up.

For example, suppose that you are playing a game in
which you win $5 if you roll a 6, otherwise you lose $1 50.
If you are playing with one die, your probability of win-
ning is 1/6, of losing is 5/6. YOur payoffs are $5, and
-$l.50 respectively. Thus your expected value (long run
eXpected winnings per game) for this game is:

.1_. 2- - 5__$l-_5_0_i:215_0...
6 $5+6($1.50) —$5 6 — 6 -$—.41

Thus, this game is not in your favor.

In the long run, you will lose 41¢, on the average,
for every time you play.

If a game has an expected value of 0, it is called a
fair game, because neither side has an advantage in the
long run. The game above is not a fair game. The payoffs
for a game can be changed so that a game which was not fair
becomes fair.

 

lill'u Ill-ll) ii I

262

YOu can think of expected value as a weighted average
of payoffs. The payoffs are weighted by the probabilities
that those payoffs will occur. Adding up the payoffs tells
you which way the scales tip, "fer you or agin' you", and
how hard they tip, or whether they actually balance Off
(in the case of a fair game).

263

Problem Set #1

a)

b)

C)

d)

e)

List all the outcomes for flipping 4 coins.

What is the probability that you get 0 heads? 1 head?
2 heads? 3 heads? 4 heads?

What is the probability of getting g1_least 2 heads?

HOw many outcomes are there for flipping 5 coins?
7 coins? 10 coins? N coins?

What is the probability that you get 3 heads when flip-
ping 5 coins? (List all such outcomes and use part d).

Problem Set #ZA

Suppose that you have a container of colored balls, and
that there are 8 red balls, 4 green balls, and 3 blue balls
in the container and that they are well mixed.

1.

4.

What is the probability that the first ball that you
draw is red? is blue? is green?

Give your best guess for the probabilities of each of
the following:

a) drawing two red balls in a row

b) drawing at least one blue ball in two tries

Which of these do you feel is more likely to occur?
List all possible sequences of two balls that could
occur, i.e., red-green, blue-red, etc. Assign a proba-

bility to each of the sequences on your list in two
different cases:

i) in the case where you put the first ball back
in before drawing the second ball (with re-
placement)

ii) in the case where you do not put the first
ball back in (without replacement)

Check to see that the sum of your probabilities
adds to one in each case.
Find the probabilities for each of the following using
part 3:
a) The probability of getting two reds in a row.
b) The probability of getting 31_least one blue.

264

c) HOw did your guesses turn out for part 2 on
these?

d) The probability that no blues are drawn; the
probability that 2 blues are drawn; how are
the answers for these two related?

e) The probability that the second ball drawn
is green.

f) The probability that the second ball is green
given that the first ball was red.

Problem set #2B

5. a) What is the probability that if 5 people pick a
number from 1 to 12, at least two of them will pick
the same number?

(Hints:

What is the probability that the first two
numbers are different? What is the proba-
bility that the third number is not the
same as either of the first two? NOw, what
is the probability that these two things
happen back to back, i.e., that none of

the first three are the same? Continue
this process out to five numbers. YOu will
then be able to find the probability that

none of the five people have the same num-

ber. wa could you use this to find the
probability that at least two g9 have the
same number?)

b) wa many people do you need so that you have at
least a 50% chance that at least two have the same
birthmonth?

c) wa many people do you need so that you have at
least a 50% chance that at least two have the same

birthday?

6. The Flippant Juror.

On a certain three man (person?) jury, each of two
jurors has probability = Q Of making the correct
decision in rendering their verdict. The third juror
flips a coin to make his decision.

Which would have the higher probability of making
the correct decision, this three man jury, or a
jury of one person with probability Q of making
the correct decision?

(Hints:

List all possible outcomes for the three

person jury. YOu need to know the probability

265

that one of the jurors makes the wrong decision.
Assign probabilities to all the outcomes on your
list.)

7. The Three Cornered Duel.

In a certain recent (terrible) Italian western, the
movie ends in a three cornered duel. The participants
are called The Good, The Bad, and The Ugly. The Ugly
gets the first shot, and he has a 30% chance of hitting
his target. The Good gets the second shot, and he has
a 100% chance of hitting his target, he never misses!
The Bad gets the third shot, and he has a 50% chance
of hitting his target. The duel continues in this
circle until only one man is left. (See below)

The Ugly has the first shot. What should he do?

Good
1.0 (arrows indicate
Ugly \\N the order that
.3 the turns are
Bi? taken)

Problem set #4

Calculate the probability that the roller wins at
craps.
Problem set #5

Use the chi-square procedure to test the goodness Of

fit of your theoretical model for tossing 3 tacks in activ-
ity 2.

APPENDIX C
COURSE OUTLINE FOR THE CONTROL GROUPS

Text: weiss and YOseloff, Finite Mathematics, WOrth, 1975.
The following material should be covered:

Chapter 3 Combinatorial Analysis
3.1 - 3.7

Chapter 4 Probability
4.1 - 4.12
4.16 Expected value needed for game theory

Some matrix notation. Sections 6.8, 6.9, 6.10 perhaps.
All that is needed latgr is the ability to write a linear
program as Ax g,b x = w a max.

Chapters 7 and 8 Linear Programming
Chapter 11 Matrix games

A possible time division is

combinatorics and probability 30%
linear programming 40%
game theory 20%

The instructions above were given to all lecturers in
the finite mathematics course, mathematics 110, during the
1975-1976 school year at Michigan State University. A
detailed outline of the topics in each chapter mentioned
above is listed below.

Chapter 3 3.1 - 3.7
Tree diagrams; the sequential counting prin-
ciple; combinations: permutations; applications
of the counting principles.

Chapter 4 4.1 - 4.12; 4.16
Random experiments and relative frequency as
an interpretation of probability; sample space
as a model for an experiment; events, assigning
probabilities to events; intersection, unions,

266

Chapter

Chapter

Chapter

Chapter

11

267

and compliments of events; addition principle
for the probability of the union of two events;
uniform sample spaces; application of combin-
atorics and counting principles to probability;
the binomial distribution: conditional prob-
ability; the law of total probability for cal-
culating probability of an event by partitioning
it into disjoint subsets; Bayes' Formula: ex—
pected value.

6.8 - 6.10
Properties of Matrices: systems of equations
in matrix form; the inverse of a matrix.

7.1 - 7.7

Graphs of linear equations: geometric inter-
pretation of systems of linear equations; graphs
of linear inequalities; systems of linear in-
equalities and polyhedral convex sets; linear
programming from a geometric point of view.

8.1 — 8.7

Linear functions and linear inequalities in

n unknowns; the vector-matrix form of a linear
program; the Tucker tableau; the simplex method;
the pivoting operation: the simplex algorithm;
applications of linear programming.

11.1 — 11.7

Two-person, zero-sum matrix games; strictly
determined games: 2 x2 games; m;(n games: deter-
mination of optimal strategies; von Neumann's
Minimax theorem; applications of matrix games.

APPENDIX D

THE INSTRUMENTS

PRETEST

Name

 

Answer all questions below as best you can.

1. a) wa many paths are there in this grid? X 0 X X
X X 0
b) HOW many paths are there in this grid? X X X
X 0
O X X 0

2. a) Consider the grids below.

'Grid B

Are there: (circle one)

:ﬁ>¢X:ﬁ>¢N€ﬂ>¢N
x:<><>:x:4><>:x

a) More paths in grid A
b) More paths in grid 8
c) about the same number of paths in each grid.

Give a reason for your answer.

2. b) Consider the grid below.

x X 0 x x X Which type of path is more

X X X 0 X X likely to occur (circle one)?

0 X X X X X .

X X X X 0 X a) a path that hits 5X and l()
i g g g X g b) a path that hits 6X and no 0

268

2.

3.

269

Give reason for your answer.

c) Give your best estimate for the number of paths

in the grid above.

 

A jar contains 4 blue, 6 red, and 3 white marbles. If
you draw one marble from the jar, it is most likely
that you Will (circle one):

a) get a blue marble b) get a red marble

c) have the same chance of getting a red or a blue
marble

Give a reason for your answer

a) A fair die is rolled. What is the probability of

getting a 3?

 

b) A fair coin is tossed. What is the probability of

getting a head?

 

c) List the possible outcomes for flipping three coins.

d) What is the probability of getting one head and two

tails in flipping three coins? write down your best
guess.

 

YOu are playing a game in which you are blindfolded and
draw cards out of a box. If you draw a card that has
an X on it, you win the game. In the boxes below,
would you be more likely to win if you (circle one):

a) draw from box A b) draw from box B
c) makes no difference

270

 

 

X X
BOX A O O

><N
C>O

 

CDX

 

 

O O X
X 0 X

 

 

 

Give a reason for your answer.

Three friends agree to change the order in which they
to through the lunch line each day. In how many pos-
sible ways can they arrange themselves?

At the start of a party game, eight red, six green,
four blue, and two white slips of paper were thoroughly
mixed in a bowl. The chances that the first slip drawn
at random will be WHITE are given by which of the fol-
lowing (circle one):

 

 

 

a) ——l-— b) 1 c) 1
8+6+4 8+6+4+2 8+6+4+l
2
d) 8+6+4+2

Give a reason for your answer.

For four games you have the following chances of gain-
ing points:

Game A: 20 percent chance of winning 15 points
Game B: 40 percent chance of winning 10 points
Game C: 10 percent chance of winning 25 points
Game D: 50 percent chance of winning 5 points

If you play the game many times, you would be most likely
to gain the greatest number of points in (circle one):

a) Game A b) Game B c) Game C d) Game D

Give a reason for your answer.

10.

11.

12.

13.

271

A committee of two people is to be chosen from among
Bill, Sally, Joe, and Beth. List all possible commit-
tees Of two from this group.

a) There are 162 games in a baseball season. The
manager of the team always bats his pitcher last.
He has eight other players to assign to a batting
order. Are there enough games in one season to
try all possible batting orders for the other eight
players? (circle one)

a) Yes b) No

b) Give a reason for your answer. If you circled
No, how many seasons would it take?

Give your best estimate

 

A man bets you one dollar that at least two people

at a party you are attending have the same birthday.
How many people would have to be at the party so that
the man has at least a 50% chance of winning the bet?

Give your best estimate

 

a) List an event that is certain to occur.

b) List an event that is impossible to occur.

The chance that a baby Will be a boy is about one-
half. Over the course of an entire year, would there
be more days When at least 60% of the babies born were
boys in (circle one):

a) a large hospital b) a small hospital
c) makes no difference
Give a reason for your answer.

14.

15.

16.

17.

272

A fair coin is flipped and comes up tails 10 times
in a row. If you could win $1 by guessing the next
toss, what would you guess? (circle one)

a) Heads b) Tails

Give a reason for your answer.

You are playing a game with two other people. One
person picks a number between 1 and 10 and the other
two try to guess it. The guess closest to the number
Wins the game.

a) If you have the first choice, What would you pick?

 

Give a reason for your answer.

b) If the first player picked seven, what would you
pick?

 

Give a reason for your answer.

A man must select committees from among ten people.
WOuld there be (circle one):

a) more distinct possible committees of eight
b) more distinct possible committees of two

c) about the same number of committees of eight as
committees of two

Give a reason for your answer.

Let H stand for head and T for tail.

1) WhiCh of the following is more likely to occur
for tossing one coin? (circle one in each part)

a) H b) T c) about the same chance

19.

ii)

Give

iii)

273

Which sequence is more likely for two tosses?
a) H T b) T T c) about the same chance

a reason for your answer.

Which sequence is more likely for six tosses?
a) H T T H T H b) H H H H T H
c) about the same chances

Give a reason for your answer.

iv)

Give

V)

Which sequence is more likely for six tosses?
a) H T T H T H b) H H H T T T
c) about the same chances

a reason for your answer.

What is the probability that in six tosses there
will be three heads-and three tails?

write down your best estimate and give a reason
for your answer.

Which is more likely to occur? (circle one)

a) Pulling one red ball from a jar containing 10 red
balls and 90 white balls.

b) Pulling four red balls in a row from a jar con-
taining 50 red balls and 50 white balls.

Give

a reason for your answer.

19.

274

A jar contains 8 red balls, 4 blue balls, and 3 green
balls. Which is more likely to occur? (circle one)

a) Pulling at least one blue ball in two tries.
b) Pulling two red balls in a row.

Give a reason for your answer.

Name

275

POSTTEST

 

Answer the questions below to the best of your ability.
Supply the reasons for your answers where they are re-

quested.

l. The probability that a baby will be a boy is 1/2. Let
B stand for boy, and G for girl. (Circle one in each
case below).

i)

ii)

iii)

iv)

Which of the following is more likely to occur
for having one child?

a) B b) G c) about the same chance

Which of the following sequences is more likely
to occur for having two children?

a) B G b) G G c) about the same chance

Give a reason for your answer.

Which of the following sequences is more likely
to occur for having six children?

a) B G G B G B b) B B B B G B
c) about the same chance

Give a reason for your answer.

Which of the following sequences if more likely
to occur for having six children?

a) B G G B G B b) B B B G43 G
c) about the same Chance

Give a reason for your answer.

2.

276

v) What is the probability that in six children
there will be three boys and three girls?

Give a reason for your answer.

A fair coin is flipped and comes up heads 10 times in
a row. If you could Win $10 on a $1 bet by guessing
the next toss, what would you guess? Why?

Which is more likely to occur? (circle one)

a) Pulling one red ball from a jar containing 10 red
balls and 90 white balls.

b) Pulling four red balls in a row (with replacement)
from a jar containing 50 red balls and 50 White

balls.

Give a reason for your answer.

The chance that a baby is born a boy is about 1/2.

Over the course of the entire year, would there be

more days when at least 60% Of the babies born were
boys in: (circle one)

a) a large hOSpital b) a small hospital

c) makes no difference

Give a reason for your answer.

6.

277

People at a Carnival pick one number from 1 to 100.
If two people match, they win a prize. HOW many
people would have to be playing the game in order
that there be at least a 50% chance that there would
be winners? Give your best estimate.

a) HOW many paths are there in this grid? X.O X.X 0
X X.O X

b) How many paths are there in this grid? X X.X
XLO XiO
XZX

Consider the grids below.

GridA XXXXXXXX GridB
X X X.X X X1X.X
X X X.X X X XHX

Are there: (circle one)
a) More paths in grid A
b) More paths in grid B
c) About the same number Of paths on each grid.

94><K$<>¢Xﬁﬂ>ﬁx
9<>¢N7¢>¢Xﬁﬂ>¢x

Give a reason for your answer.

Consider the grid below. Which type of path is more
likely to occur? (circle one)

XX 0 XX a) a path that hits 4X and 10
xxxox ,

0 xxx x b) a path that hits 5x

X X.X X 0 .

X 0 X.X x Give a pegpppror your answer

278

9. A man must select committees from a group of 10 people.
WOuld there be: (circle one)

a) more distinct possible committees of eight

b) more distinct possible committees of two

c) about the same number of committees of eight as
committees Of two

Give a reason for your answer.

10. A jar contains 9 red balls, 4 blue balls, and 3 green
balls. Which of the following would be more likely
to occur? (circle one)

a) Pulling at least one green ball in two tries (with
replacement)

b) Pulling two red balls in a row in two tries (with
replacement)

Give a reason for your answer.

11. A pair of dice are rolled. What is the probability
that the sum of the faces will be a 5?

12. a) List the outcomes from tossing three coins.

b) What is the probability that there will be 2 heads
and l tail?

13.

14.

279

The probability that it rains in Seattle on a given
day is 2/3. The probability that Bill forgets his
umbrella is 1/4. What is the probability that it
rains and Bill forgets his umbrella?

For three games, you have the following chances of
winning points:

Game 1: 50% chance of winning 8 points
Game 2: 20% chance of winning 20 points
Game 3: 30% chance of winning 15 points
If you play the game many times, in which game would

you be most likely to gain the greatest number of
points? (circle one)

a) Game 1 b) Game 2 c) Game 3

Give a reason for your answer.

280

EXPERIMENTAL COURSE EVALUATION FORM

Please take a few moments and respond thoughtfully
to the questions below. If you wiSh, type your answers.
YOu may either sign your responses or not. Hand these
in before or on the day of the final.

1. What suggestions would you have for improving this
course? What would you like to change about the
course? What would you like to leave the same? What
did you like about this course? What did you dislike

about this course?

In answering these questions, please reflect upon:
the required texts
the in-class activities and activity sheets
the log and assignments
learning mathematics by working in groups with
other students

as well as anything else that you would like to say
about the course. Thank you.

MICHIGAN STATE UNIV. LIBRQRIES
IIHIHHII 1“ \III III "MI W IIHUII |||| INN) ll) NH) IIHI
31293103147090