A CLINICAL INVESTIGATION OF COLLEGE STUDENTS’ RELIANCE UPON . THE HEURISTICS 0F AVAILABILITY AND REPRESENTATIVENESS IN ESTIMATING THE LIKELII-IOOD OFIPROBABILISTIC EVENTS: " Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY ' 2". MICHAEL SHAUGHNESSY T976 ‘ t. IIIIIIIIIZIIIIIIIIIIIILIIIIIIIIIIIIIIIIIIIII LIBRARY Michigan Stan: Univcnity This is to certify that the thesis entitled A CLINICAL INVESTIGATION OF COLLEGE STUDENTS' RELIANCE UPON THE HEURISTICS OF AVAILABILITY AND REPRESENTATIVENESS IN ESTIMATING THE LIKELIHOOD OF PROBABILISTIC EVENTS presented by J. Michael Shaughnessy has been accepted towards fulfillment of the requirements for Pho Do degree in Mathematics I? - fl/ 1/ . 7/ L" ’ at» F Major profeser /,> a, M, Date '45”; /¢, f" ‘4’ 0-7 639 DINO!“ BY DIG G SOIS' IOOK BIIDEIY ‘-‘~. .Iaunv muons ‘5‘ .Ol-Rnlf HIOIIQI I tl/V** " ABSTRACT A CLINICAL INVESTIGATION OF COLLEGE STUDENTS' RELIANCE UPON THE HEURISTICS OF AVAILABILITY AND REPRESENTATIVENESS IN ESTIMATING THE LIKELIHOOD OF PROBABILISTIC EVENTS BY J. Michael Shaughnessy The purpose of this study was to investigate college students' reliance upon availability and representativeness in estimating the likelihood of events. An experimental activity-based course in elementary probability and statis- tics was developed. Groups of college students who took the activity-based course were compared to groups who took a lecture—based course for their relative success in over— coming reliance upon the heuristics of availability and representativeness. The subjects involved in the study were 85 undergraduate students who had enrolled in a finite mathematics course at Michigan State University. Four class sections were randomly chosen and two each were randomly assigned to either the eXperimental activity-based course or a lecture-based course. The materials for the activity-based course had been piloted during the quarter preceding the main study and had been re- vised as a result of the pilot study. J. Midhael Shaughnessy The subjects were pretested and posttested for their reliance upon the heuristics of availability and represent- ativeness and for their knowledge of elementary probability concepts. The instruments used had been piloted prior to the main study and contained a probability concept subscale, an availability subscale, and a representativeness subscale. The data was analyzed by t-tests (a = .05) with the individ- ual as the unit of analysis on the pretest and class section as the unit of analysis on the posttest. The pretest analysis indicated that there were no sig- nificant differences between the groups on any of the three subscales prior to a formal course in probability. A sig- nificant difference was found between the activity—based groups and the lecture-based groups on the representativeness subscale on the posttest. The experimental activity-based groups scored significantly higher on the representativeness subscale on the posttest than the lecture-based groups. There was a tendency for the experimental groups to score higher on the availability subscale on the posttest, but the difference was not significant at the .05 level. There was no significant difference between the two groups on the probability subscale on the posttest. The activity-based groups attained significantly higher mean gain scores than the lecture—based groups on both the availability and repre- sentativeness subscales. J. Michael Saughnessy The author concluded that course methodology appears to be an important factor in helping college students to replace heuristic principles with probability theory when making estimates for the likelihood of events. Learning elementary probability concepts may not be sufficient to overcome reliance upon the heuristics of availability and representativeness. A CLINICAL INVESTIGATION OF COLLEGE STUDENTS' RELIANCE UPON THE HEURISTICS OF AVAILABILITY AND REPRESENTATIVENESS IN ESTIMATING THE LIKELIHOOD OF PROBABILISTIC EVENTS BY J. Michael Shaughnessy A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics 1976 DED I CATI ON To Joan. We really did it! ii ACKNOWLEDGMENTS I would like to thank my Committee Chairman, Professor ‘William M. Fitzgerald, and the members of the committee, Professor Peter Lappan, Professor John Masterson, Professor Bruce Mitchell, and Professor John wagner, for their support and constructive comments during this investigation. The patience and encouragement of my wife Jean were greatly appreciated throughout my studies and especially during the writing of this thesis. The assistance of Joseph'Wisenbaker with the analysis of data, and the assistance of A1 Stickney who taught one of the experimental sections, were essential to the develop- ment of this study and were most helpful. A special thanks to Jill Hagan is merited for a superb typing job and patience in deciphering my difficult scrawl. iii TABLE OF CONTENTS LIST OF TABLES Chapter I. II. INTRODUCTION AND DEFINITION OF THE PROBLEM. Definition of Heuristic . . . . . . . . . . The Representativeness Heuristic The Availability Heuristic . . . . . . Introduction to the Problem . . . . . . . . . Purpose of the Study . . . . . . . . . . Importance of the Study . . . . . . . . . . i) Consequences of the Misuse of Probability and Statistics . . ii) Subjective Probability and Mathematical Probability: A Modeling Point of View . . . . . iii) Psychological Theory and Educational Practice . . . . . . . A Summary of the Procedure . . . . . . . . Overview of the Organization of the Study . . REVIEW OF THE LITERATURE RELATED TO THE STUDY Introduction . . . . . . . . . . . . . . . The Use of Heuristics to Estimate Probability . . . . . . . . . . . . . . Models of Human Judgment and Decision Maki'rlg O O O O O I O O O O I O O O O O O O The Development of the Probability Concept in Ybung Children and Adolescents . . . The Pilot Study . . . . . . . . . . . . . . iv Page vii \DmO‘WNl-‘H 10 14 16 18 20 20 20 35 46 64 III. IV. A DESCRIPTION OF THE DESIGN OF THE STUDY . The Experimental Course . . . . . . . . A Description of the Control Course Comparison of the Experimental and Control Courses . . . . . . . . Rationale for the Experimental Course Subjects . . . . . . . . . . . . . . . . Procedure . . . . . . . . . . . . . . . Measures . . . . . . . . . . . . . . . Hypotheses . . . . . . . . . . . . . . . Method of Analysis . . . . . . Summary . . . . . . . . . . . . . . . . ANALYSIS OF THE RESULTS OF THE STUDY . . Part I: Report on the Experimental Course Activities . . . . . . . . . . . . . Misuses of Statistics . . . . . . Experimental Course Evaluation Forms Part II: Analysis of the Statistical Results . . . . . . . . . Introduction . . . . . . . . . . . . Comparisons Between the Experimental and Control Groups on the Four Scales Notation . . . . . . . . . . . . . . Reliability . . . . . . . . . . . . . Individual Item Statistics . . . . . Correlation Matrices . . . SUMMARY, CONCLUSIONS, AND DISCUSSION . Summary . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . Results of the Hypothesis Testing . . . Conclusions and Discussion . . . . . . Implications for Future Research . BIBLIOGRAPHY . . . . . . . . . 77 77 87 88 91 93 95 100 103 104 105 107 107 107 123 126 129 129 129 130 130 139 196 205 205 207 208 209 215 218 APPENDICES . . . A. B. OUTLINE OF DAILY PLAN FOR THE EXPERIMENTAL COURSE ACTIVITIES, PROBLEMS, AND NOTES TO THE INSTRUCTOR COURSE OUTLINE FOR THE CONTROL GROUPS THE INSTRUMENTS vi 224 224 229 266 268 Table 3 4. 4. 4. 4. 4. 4. 4. 4. .1 1A 1B 1C 1D 2 2A 2B 2C .2D .3 LIST OF TABLES Order of Topics in the Experimental and Control Courses . . . . . . . . . . Number and Sex of Subjects . . . . . . . . . Class Level and Major Field . . . . . . . . . Previous Mathematics Course Wbrk . . . . . . Scale Means and Standard Deviations for the Four Groups on the Pretest . . . . . t-Test Results for Pretest Scale Tbtal t—Test Results for Pretest Scale Probability . . . . . . . . . . . . . . t—Test Results for Pretest Scale Availability . . . . . . . . . . . t-Test Results for Pretest Scale Representativeness . . . . . . . . . . Scale Means and Standard Deviations for the Four Groups on the Posttest . . . . . . t-Test Results for Posttest Scale Total t-Test Results for Posttest Scale Probability . . . . . . . . . . . . . t-Test Results for Posttest Scale Availability . . . . . . . . . t-Test Results for Posttest Scale Representativeness . . . . . . . . . . Mean Gain Scores on the Availability and Representativeness Scales . . . . . . . . vii Page 89 96 96 97 131 132 132 132 133 134 135 135 135 136 137 4.3A t—Test Results on Pre-post Gain Scores on the Availability Scale . . . . . . . . . . 137 4.33 t—Test Results on Pre-post Gain Scores on the Representativeness Scale . . . . . . . 138 4.4A Group Results on Item R1 . . . . . . . . . . 140 4.43 t-Test on Posttest Item R 141 l 0 O O O I O O O O 4.5A Group Results on Item R2 . . . . . . . . . . 142 4.5B t-Test on Posttest Item R2 . . . . . . . . 143 4.6A Group Results for Item R3 . . . . . . . . . 144 4.68 t-Test on Posttest Item R 145 3 o O O O O O O O 4.7A Group Results on Item R4 . . . . . . . . . . 146 4.7B t-Test on Posttest Item R 147 4 0 O O O 0 O O O 4.8A Group Results on Item R5 . . . . . . . . . . 148 4.8B t-Test on Posttest Item R5 . . . . . . . . . 148 4.9A Group Results on Item R6 . . . . . . . . . . 150 4.93 t-Test on Posttest Item R 151 6 O I O 0 O O O O 4.1OA Group Results on Pretest Item N1 . . . . . . 152 4.103 Group Results on Posttest Item N1 . . . . . 153 4.lOC t-Test on Posttest Item N 153 1 . . . . . . . . 4.11A Group Results on Item N2 . . . . . . . . . . 154 4.113 t-Test on Posttest Item N2 . . . . . . . . . 154 4.12A Group Results on Item N3 . . . . . . . . . . 156 4.123 t-Test on Posttest Item N3 . . . . . . . . . 156 4.13A Group Results on Item N4 . . . . . . . . . . 158 4.138 t—Test on Posttest Item N4 . . . . . . . 158 4.14A Group Results on Item Al . . . . . . . . . . 161 4.143 t-Test on Posttest Item A1 . . . . . . . . . 162 viii .15A Group Results on Item A2 . . . . . . . . . . 164 .158 t-Test on Posttest Item A2 . . . . . . . . 165 .16 Group Results on Pretest Item N5 . . . . . . 167 .17A Group Results on Item A3 . . . . . . . . . . 168 .17B t-Test on Posttest Item A3 . . . . . . . . . 169 .18A Group Results on Item A.4 . . . . . . . . . . 170 .18B t-Test on Posttest Item A4 . . . . . . . . 171 .19A Group Results on Item P1 . . . . . . . . . . 172 .l9B t—Test on Posttest Item P1 . . . . . . . . . 172 .20A Group Results on Item P2 . . . . . . . . . . 174 .20B t—Test on Posttest Item P2 . . . . . . . . 174 .21A, Group Results on Pretest Item P3 . . . . . . 176 .213 Group Results on Posttest Item P 177 3 o o o 0 .21C t-Test on Posttest Item P3 . . . . . . . . . 177 .22A Group Results on Posttest Item P4 . . . . . 179 .22B t-Test on Posttest Item P4 . . . . . . . . . 179 .23A Group Results on Posttest Item P5 . . . 181 .23B t-Test on Posttest Item P5 . . . . . . . . . 181 .23 Group Results on Pretest Item P6 . . . . 183 .24 Group Results on Pretest Item P7 . . . . 184 .25 Group results on Pretest Item P8 . . . . . . 185 .26 Group Results on Pretest Item P9 . . . . . 187 .27 Group Results on Pretest Item Plo . . 188 .28 Group Results on Pretest Item P11 . . 189 .29 Group Results on Pretest Item P12 . . . 190 .30 Group Results on Pretest Item Pl3 . 192 ix 4.31 4.32 4.33 4.34 4.35 Group Results on Pretest Item P14 Scale-to-Scale Correlation Matrix . Availability Item-to-Scale Correlation Matrix . . . . . . . . Representativeness Item-to-Scale Correlation Matrix . . . . . . . . Probability Item-to-Scale Correlation Matrix . . . . . . . . Representativeness Item-to-Item Correlation Matrix . . . . . . . . Availability Item-to-Item Correlation Matrix . . . . . . . . Availability and Representativeness Inter-Item Correlation Matrix . 194 196 197 198 199 200 202 203 CHAPTER I INTRODUCTION AND DEFINITION OF THE PROBLEM In a series of studies, Daniel Kahneman and Amos Tversky have found evidence in support of the hypothesis that human beings rely upon certain specific principles when they are asked to estimate the probability of complex events, to predict the likelihood of outcomes, or to make judgments under uncertainty (Tversky and Kahneman, 1971, 1973; Kahneman and TVersky, 1972, 1973, 1974). Kahneman and Tversky call their principles “heuristics". These heuristic principles often lead to incorrect or biased estimates of the likelihood of events. Definition of Heuristic A heuristic can be defined as a principle by which an individual reduces a complex task to a simple one. In the present study, a heuristic is a principle by which an in- dividual reduces the complex task of assessing likelihood or predicting outcomes to a simple judgment. Two heuristics which were isolated and studied by Kahneman and Tversky are the representativeness heuristic and the availability heuristic. According to Kahneman and Tversky, these two heuristic principles enable human beings to decode complex probabilistic situations. 1 The Representativeness Heuristic According to the representativeness heuristic, subjects will make decisions about the relative likelihood of events based upon how representative an event is of the distribution of the parent population, or of the process by which the outcomes are generated (Kahneman and Tversky, 1972). For example, a long string of heads in tossing a coin does not appear to be representative of the random process of throwing a coin. Subjects who were employing the repre— sentativeness heuristic would tend to believe that tails will be more likely than heads on a subsequent toss, even though the tosses are independent of each other. Similarly, a subject using the representativeness heuristic would judge the outcome "two heads and two tails" in flipping four coins to occur with a probability of 1/2. The event "two heads and two tails" appears to be representative of the distri— bution of heads in the parent population of outcomes for flipping one coin once. Kahneman and Tversky (1974) mention that the repre- sentativeness heuristic can be shown to account for falla- cies in prediction that arise from: i) insensitivity to prior probabilities and dis- regard for population proportions ii) insensitivity to the effects of sample size on predictive accuracy iii) unwarranted confidence in a prediction that is based upon invalid input data iv) misconceptions of chance, such as the gambler's fallacy v) misconceptions about the tendancy for data to regress to the mean When making predictions, subjects tended to ignore sample size, population base-rate data, and the validity of input information. On the other hand, according to Kahneman and Tversky, subjects g9 make predictions for the likelihood of events based upon how well the events reflect the distribution of the parent population or the process by which the outcomes are generated. "People view chance as unpredictable but essenti- ally fair. Thus, they expect that in a purely random allocation of marbles each child will get approximately (though not exactly) the same number of marbles. Similarly, they expect even short sequences of coin tosses to include about the same number of heads and tails. More generally, a representative sample is one in which the essential characteristics of the parent population are represented not only globally in the entire sample, but also locally in eadh of its parts." (Kahneman and Tversky, 1972; 435) The Availability Heuristic According to the availability heuristic, subjects will make decisions about the relative likelihood of events based upon the ease with which instances of that event can be constructed or called to mind (Tversky and Kahneman, 1973). For example, if asked whether there are more distinct three person committees or more distinct nine person commit- tees that can be formed from a group of twelve people, sub— jects who employ the availability heuristic will tend to favor the three person committees. It is easier to call to mind more examples of three person committees than nine person committees, even though the number of distinct nine person committees is the same as the number of dis- tinct three person committees. Kahneman and Tversky claim that the availability heuristic causes systematic bias in probability estimates because subjects will tend to believe that those outcomes which can easily be brought to mind, will also be more likely to occur (Tversky and Kahneman, 1973). If a subject is asked to estimate the divorce rate in his city, or to estimate the probability of being involved in an automobile accident, the frequency of his own personal contact with these events may lead to bias in the estimates that he gives for the likelihood of these events. "In general, availability is a useful clue for assessing frequency or probability, because in- stances of large classes are recalled better and faster than instances of less frequent classes. waever, availability is also affected by other factors besides frequency and probability. Con- sequently, the reliance on availability leads to predictable biases... ." (Tversky and Kahneman, 1974; 1128) In summary, then, Kahneman and Tversky claim that humans often do not apply the theory of mathematical prob- ability in estimating the likelihood of events, nor do .humans conform to the laws of statistical decision theory when making predictions. Instead, human subjects tend to employ principles such as the representativeness heuristic and the availability heuristic when they are asked to make subjective probability estimates. Most of the subjects involved in this series of studies by Kahneman and Tversky were combinatorially naive college students with no prior training in probability or statistics. It is not surprising that these subjects util- ized such heuristics as representativeness and availability in their predictions of the likelihood of events. waever, Kahneman and Tversky also found that trained psychologists, who had had a substantial background in probability and statistics, were subject to the same types of bias and fallacies as the combinatorially naive college students. (Tversky and Kahneman, 1971; Kahneman and TVersky, 1973). Evidently, exposure to the theory of probability and sta- tistics is not necessarily sufficient to overcome the biases induced by the availability heuristic and the representative- ness heuristic. Kahneman and Tversky found such strong and widespread evidence for the use of these heuristics, that they suggest that misconceptions of probability and statis— tics that arise from the use of these heuristics may be extremely difficult, if not impossible, to overcome. "Corrective experiences are those that provide neither motive nor opportunity for spurious ex- planation. Thus, a student in a statistics course may draw repeated samples of a given size from a population, and learn the effect of sample size on sampling variability from personal observation. We are far from certain, however, that expectations can be corrected in this manner, since related biases, such as the gambler‘s fallacy, survive considerable contradictory evidence. Even if bias cannot be unlearned, students can learn to recognize its existence and take the nec- essary precautions. Since the teadhing of statistics is not short on admonitions, a warning about biased statistical intuitions may not be out of place. The obvious precaution is computa- tion." (Tversky and Kahneman, 1971; 109-110) "we surely do not mean to imply that man is in- capable of appreciating the impact of sample size on sampling variance. PeOple can be taught the correct rule, perhaps even with little diffi— culty. The point remains that people do not fol— low the correct rule, when left to their own devices. Furthermore, the study of the conduct of research psychologists (Cohen, 1962; TVersky and Kahneman, 1971) reveals that a strong tendency to underestimate the impact of sample size lingers on deSpite knowledge of the correct rule and ex- tensive statistical training. For anyone who would wish to view man as a reasonable intuitive statis- tician, such results are discouraging." (Kahneman and Tversky, 1972; 444-445) Introduction to the Problem Kahneman and Tyersky are psychologists who did their research in the spirit of developing a model of how human beings make judgements and decisions. The results of their work have led them to conclude that neither a Bayesian model (Edwards, 1968) nor a regression model (Hoffman, 1968) is sufficient to describe the human decision making process under conditions of uncertainty. Neither of these models account for the use of heuristic principles, such as re- presentativeness and availability, which human subjects were found to employ. A detailed discussion of models of human judgment will be presented in chapter two. The present study is interested in the research of Kahneman and TVersky from the viewpoint of mathematics ed— ucation. The psychological investigations of Kahneman and Tyersky, together with other studies (Cohen and Hansel, 1956; Cohen, 1957; Bruner, Goodnow, and Austin, 1956; Edwards, 1968), have diagnosed some misconceptions that human subjects have about how probability and statistics work. These studies point out a set of entering charac- teristics that can be observed in students who are about to take an introductory course in probability and statistics. The evidence from the studies above indicates that students come to such a course with a set of misconceptions about probability and statistics, and with a set of heuristic principles that can propagate and maintain their misconcep- tions of probability and statistics. Prior to any formal training in probability, students have had exPerience in and have dealt almost exclusively with "subjective proba- bility". Suddenly, in their formal course work, they are confronted with a completely mathematized model of "statis— tical probability“ (Carnap, 1953). From the viewpoint of mathematics educatiOn, the problem that arises can be stated as follows: How should elementary probability and statistics be taught so as to maximize the students' chances of overcoming their misconceptions of probability and statistics? Given that students rely upon such heuristic principles as representativeness and availability to make estimates of the likelihood of events, what is the best way to teach elementary probability so that a student would learn to rely upon probability theory in making estimates of the likelihood of events rather than relying on heuristic principles which may bias his estimates? Purpose of the Study The purpose of this study was to: 1. Develop an activity-based experimental course in introductory probability and statistics at the undergraduate level. 2. Examine the effects of an activity-based course in probability and statistics upon college stu- dents' use of the availability and representa- tiveness heuristics in making estimates for the likelihood of events. 3. Compare groups of college students who took the experimental course to groups who took a lecture- based course in probability in order to test the relative effectiveness of these two approaches in helping college students to overcome their reliance upon the representativeness and availa- bility heuristics when estimating the likelihood of events. In order to accomplish these three goals, the study was done in two parts: a pilot study and a main study. In the pilot study, activities for the experimental course were developed and taught. Instruments were devised by the experimenter to measure reliance upon the availability heuristic and the representativeness heuristic in estimating the likelihood of events. Revisions of the activities and the instruments were made as a result of the pilot study. The main study compared two approaches to teaching elementary probability, the experimental activity-based course and a lecture-based course, and examined their rela- tive effectiveness in helping students to overcome reliance upon the heuristics of availability and representativeness when making estimates for the likelihood of events. Importance of the Study There are three main factors that provided a rationale for this study: i) the importance of a basic reading knowledge of elementary probability and statistics, and the consequences of the misuses of probability and statistics ii) the distinction between subjective probability and mathematical probability, and the need for teaching elementary probability from a modeling point of view iii) the importance of anchoring educational research and educational practice in psychological theory Consequences of the Misuse of Probability and Statistics The importance of teaching elementary probability and statistics in the schools has been made clear by many authors. The National Council of Teachers of Mathematics (1940, 1959), Wilks (1958), Page (1959), Pieters and Kinsella (1959), The College Entrance Examination Board (1959), and The Cambridge Conference on School Mathematics (1963) have all recommended that topics in probability and statistics be a part of every students' school experience. The arguments for literacy in probability and statistics that these authors have made will not be repeated here. Darrell Huff sums up the consequences of the misuses of probability and statistics in his little book How to Lie with Statistics. "So it is with much that you read and hear. Averages and relationships and trends and graphs are not al- ways what they seem. There may be more in them than meets the eye, and there may be a good deal less. 10 The secret language of statistics, so appealing in a fact-minded culture, is employed to sensation- alize, inflate, confuse, and oversimplify. Statis- tical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, 'opinion' polls, census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense." (Huff, 1954; 8) Exposure to the misuses of statistics and experiences that enable students to confront their own misconceptions about probability would seem to be essential to any intro- ductory course in probability and statistics. Such misuses and misconceptions affect the human decision making process, which in turn affects the course of human lives. Material on the misuses of statistics andactivites designed to enable students to come to grips with some of their own probabilistic misconceptions form the basis for the experimental course in introductory probability that was developed in this study. Subjective Probability and Mathematical Probability: ‘5 Modeling Point of View In an attempt to define probability, Rudolph Carnap points out the distinction between subjective probability and mathematical probability. "Most scientists will define it (probability) as sta- tistical probability, which means the relative fre- quency of a given kind of event or phenomena within a given class of phenomena, usually called the 'population'. ...But you will find that there are scientists who define probability in another way. They prefer to use the term in a sense nearer to everyday use, in which it means a measurement, based on the available evidence, of the chances that some— thing is true. ... This concept is called inductive (or subjective) probability.... Statistical probability 11 characterizes an objective situation, e.g., a state of a physical, biological, or social system. On the other hand, inductive probability, as I see it, does not occur in scientific statements but only in judgments about such statements.“ (Carnap, 1953; 123) The phenomemon of subjective probability suggests that students enter elementary probability and statistics courses with a set of preconceived notions of how probability works. The experimental course developed in this study emphasized the framework of mathematical model-building as a vehicle to promote a gradual transition from subjective probability to mathematical probability. Thus, rather than beginning with the laws of probability and then attempting to apply these laws to specific problems, the exPerimental course begins with specific problems and encourages the students to gradually build and constantly modify their own laws of probability. There are two reasons for a modeling approach in a course in elementary probability and statistics. First, the transition from preconceptions of probability to proba— bilistic laws could be greatly facilitated if students are encouraged to see probability as a process of describing observed phenomena more and more accurately, rather than as a system of rules, axioms, and techniques which one attempts to apply to problems. Second, the process of building models gets students involved in a part of applied mathematics that is sorely neglected in low level mathematics 12 courses. Henry Pollak's comments on the subject of appli- cations indicate the need for a model-building approach. "All too often, our teaching has failed to present this open—ended and constructive nature of both pure and applied mathematics. ‘we most always say to the student 'Here is a theorem. Prove it.‘ and say very rarely 'Here is a situ- ation. Think about it. Find out what the problem should be, or what the theorem is that you ought to be trying to prove.‘ Such a radical improvement in pedagogy and student involvement will help the teaching of mathematics from many angles, not just the problem of applications." ... "Instead, the mathematization has become so familiar to the teacher that he forgets all about this part and begins immediately with the mathematical model, all built and ready to go. This deprives the student of the essential experience of partici- pating in the model building - and, incidentally, tends to ossify the mathematical formulation." (Pollak, 1968; 25-26) Pollak is not alone in this opinion. There is con- siderable support among mathematicians, mathematics edu- cators, and operations research eXperts for elementary courses taught from a modeling point of view. Thompson (1974) suggests that students should be given a chance to experience the process of doing applied mathematics. Ele- mentary courses should, according to Thompson, spend time encouraging students to identify precise problems from only partially understood situations. Freudenthal (1968) points out that teachers do not know how to teach mathematics so that it will be useful because they have not experienced the subject as a mathematization of reality, and consequently, students do not get a chance to learn how to apply mathematics. The secondary schools are virtually devoid of experiences in 13 the uses and applications of mathematics, as pointed out by Fitzgerald (1975). “In short, in spite of all the attention which has been paid to the mathematics curriculum during the past two decades, most of the mathematics teach- ing occurring today in the schools in the United States continues to be mechanistic, skill oriented, and motivated principally by the supposed need for those skills in the next mathematics class. The results of these efforts can be seen when one looks at the population which comes out the schools. With the exception of a small percentage, most students leave school with very little conceptual basis of understanding of mathematics. They are not very skillful at using mathematical ideas and have rather negative attitudes about mathematics." (Fitzgerald, 1975: 40) Klamkin (1968) complains that students trained in mathe- matics who subsequently work in operations research are often unable to solve problems. He makes a plea for the teaching of mathematics so as to encourage students to think problems through for themselves. "I have thought for a long time that one of the most important goals of education is to get the students to 'think for themselves'. As I look over the American education scene, it seems that each year more and more material is being crowded into the curriculum. The net result being that most students hardly have any time to sit back and think out various problems for themselves. Con- sequently, most students will just parrot back the material from their texts or from their classroom notes. Or at best, students will get together to independently work out their problems as a group. This being the case, it is no wonder that when a student or even a graduate is faced with a problem that is not directly in the books, he will have difficulties." (Klamkin, 1968; 131) In a later paper, Klamkin (1971) provides an extensive review of the literature on the education of industrial math- ematicians, and then outlines a modeling approach to 14 elementary geometry to provide an example of how the model- ing approach could be started in secondary school and con— tinued throughout graduate education in mathematics. There have been several attempts to write materials that develop mathematical models in elementary courses. Among these are: Mathematical Uses and Models in Our gygryday Wbrld (Bell, 1972); Man and His Technology (Piel and Truxal, 1973); Statistics: ,5 Guide 59 the Unknown (Mosteller et. al., 1972); and §tatistics by_Example (Mosteller et. al., 1973). The experimental course developed in this study attempts to integrate this emphasis on model building and on dealing with problem situations, esposed above by Pollak (1968), Fitzgerald (1975), and Klamkin (1968), with an activity- based course methodology. Psychological Theory and Educational Practice This study attempts to formulate an approach to the teaching of elementary probability and statistics that builds upon the discoveries of paychological research. The work of Kahneman and Tversky on bias in probabilistic estimation provided an initial diagnosis of the preconceptions that a student in an elementary probability and statistics course is likely to have. It becomes the task of mathematics edu- cation to prescribe learning experiences which meet the needs of the student, as suggested by psychological theory. 15 This interplay among the psychology of learning, the subject matter content, and the method of teaching the content is precisely what Joseph Schwab advocates when he speaks of “practical deliberation" (Schwab, 1969). Schwab suggests that the interplay between psychological theory and educational practice should be a continual process. The diagnosis of Kahneman and Tversky may be incomplete. A proposed project in mathematical curriculum development may fall short of correcting the learning problems that are brought out by psychological theory. The point is that it is unlikely that the psychologist or the mathematics edu- cator will get close to the truth by“working independently. In a paper on the psychology of school subjects, Shulman (1974) expounds on the advantages of co-operation between psychologists and content specialists. "All these assertions lead to the conclusion that, whereas the psychology of school subjects is undoubtedly deserving of immediate disinterment, its future vitality will be predicated on its no longer remaining the exclusive province of psychologists. It must become the joint focus of subject-matter experts and psychologists, if its study is to be fruitfully pursued, and if useful theoretical statements are to emerge from that research." (Shulman, 1974; 330) Shulman goes on to say that such co-operation can best be obtained if researchers adapt a view of the teacher as clinician. The implications for research are stated in what follows. "we must be prepared to broaden the range of methods we employ in our research, as we reformu- late the questions we propose to raise. Although good experimental and correlational investigations 16 will continue to be useful, we need add more varied kinds of studies - longitudinal case studies, anthropological analysis of class- rooms and teachers, information-processing modelings of the thought processes of teachers and learners using methods of controlled intro- spection and retrospection, investigations of basic phenomena, such as transfer, under con- ditions of varying subject matter, to name but a few. we should be prepared to treat our sub- ject more clinically, both in terms of the teacher and the investigator as clinicians." (Shulman, 1974; 335) The experimental course designed in this study specifies the role of the instructor to be that of clinician. The instructor in the experimental course diagnoses errors and misconceptions that students encounter as they do the activ- ities. He then suggests other problems for them to look at, or asks them questions about their results, until they en- counter sufficient dissonance with previous results to force them to embark in a more profitable direction. The role of the instructor in the experimental course is, therefore, quite different than the usual role of con- veyor of vast amounts of information that is assumed in many courses at the undergraduate level. Aimmnary of the Procedure In the winter quarter of 1976 a pilot study was con- ducted. An experimental activity-based course in elementary probability and statistics was developed by the investigator and taught to twenty-four undergraduate students at Michigan State University. The subjects for the experimental course had all enrolled in the same section of finite mathematics, 17 and were taught the experimental course instead of a lecture- based course from Weiss and YOseloff's Finite Mathematics which was taught in all the other sections. Students who take the course in finite mathematics at Michigan State are primarily freShmen business, agriculture, and horticulture majors. The experimental section and a section of the lecture- based course in finite mathematics were both pretested and posttested for knowledge of basic concepts in probability and for reliance upon the heuristics of representativeness and availability when giving estimates for the likelihood of events. The results of the pilot study are reported in chapter two of this thesis. In the spring quarter of 1976, the main study was car- ried out. TWO sections of finite mathematics were randomly assigned to the experimental course and two lecture sections were randomly chosen to serve as a control group. Altogether there were 85 undergraduates enrolled in the four sections. One of the experimental sections was taught by the investi- gator. Three other instructors each taught one of the other sections involved in the study. All four sections were pre- tested and posttested for knowledge of some basic concepts in probability and for reliance upon the heuristics of avail- ability and representativeness. The instruments used were devised by the investigator and had been revised as a result of information gained in the pilot study. The instruments 18 that were used are presented in Appendix D of this study. A representativeness subscale and an availability subscale were constructed from the instruments. Comparisons between the groups were made on the pretest scores and on the post- test scores for each subscale, as well as for the whole test. A comparison of the groups responses on each individ— ual test item was also made. A detailed report of the de— sign of the study can be found in chapter three of this thesis. Overview of the Organization of the Study The study will be organized and presented in five parts. This first section has presented the problem, the purpose of the study, and a rationale for the study. Chapter two contains a review of the literature of psychology and mathe- matics education that is related to this study, and a report on the pilot study in which the activities for the experi- mental course and the instruments used in the main study were developed. The third chapter discusses the design of the study, including a detailed description of the experimental course, a description of the course taught to the control groups, specific statements of the hypotheses tested, and a report on the method of data analysis. Chapter four reports the results of the main study, both descriptive results and statistical results. The fifth chapter gives a summary of the findings of the study and a discussion of the results. A complete outline of the 19 experimental course, as well as the activities and notes that formed the basis of the experimental course, can be found in the appendices. An outline of the course taught to the control group is also included in the appendices, as are the instruments that were devised by the investi- gator and used in the study. CHAPTER II REVIEW OF LITERATURE RELATED TO THE STUDY Introduction The review of the literature related to this study will be presented in four parts. The first section dis- cusses the work of Kahneman and Tversky on the use of heuristics to estimate the likelihood of events. Related results of other investigators will be included in this discussion. The second section contains the literature on models of human judgment and decision making. The third section discusses literature related to the development of the probability concept in elementary and secondary school children. In section fbur, the results of a pilot study on col- lege students' use of heuristics to estimate the likelihood of events are presented. The Use of Heuristics to Estimate Probability A series of studies that dealt with college students' misconceptions of probability was carried out at the Univer— sity of Jerusalem and at Oregon Research Institute at the university of Oregon by two psychologists, Daniel Kahneman 20 21 and Amos Tyersky (1971, 1972, 1973, 1974). These studies demonstrated that college students who are combinatorially naive tend to rely on strategies that simplify complex prob— abilistic situations when estimating the likelihood of events. Two of these simplification strategies are called the repre- sentiveness heuristic and the availability heuristic. These two heuristics have been defined in chapter one of this study and were discussed briefly in that chapter. A detailed discussion of the work of Kahneman and Tversky will be pre- sented in this chapter. The research of Kahneman and Tversky concerning representativeness will be presented first, fol- lowed by their work on the availability heuristic. Tversky and Kahneman (1971) showed that even trained scientists tend to have little regard for the effects of sample size upon the validity of statistical results. When asked how many subjects should be in a replication study to test an experimentally significant result, researchers were found to favor sample sizes smaller than in the original study. The resulting loss of power in the statistical test makes it only about half as likely that a significant re— sult will be obtained. When confronted with non-significant results from a simulated replication study, these same re- searchers failed to pool the results of the initial study and replication study together in order to obtain support for their hypotheses. Rather, they attempted to "explain" the non-significant replication results by some quirk in the sample. 22 Actually, a large sample should be used in a replica- tion study to insure the power of the statistical test and minimize the chances of missing a statistically significant result. Kahneman and Tversky explain this behavior of sci- entific researchers by the representativeness heuristic. Apparently the non-significant replication studies are viewed separately from the results of the initial study, rather than as part of it. Thus, evaluators tend to use each study as representative of the same population, and they feel signif- icant results should appear regardless of the sample size of the study. The representativeness heuristic causes peOple to believe that a "law of small numbers" holds, much like the mathematical law of large numbers. Tversky and Kahneman (1971) also mention that repre- sentativeness may affect the way that people View random sampling. A gambler tends to feel that even small devia— tions from a 50-50 distributions of heads and tails in sequence of coin tosses will be corrected with subsequent tosses. Tversky and Kahneman found evidence for this gam- bler's fallacy among college subjects. The subjects were told that the theoretical mean of all 1.0. scores is 100. They were then told that in a sample of 50 recorded 1.0. scores, the first score was 150. The eXpected mean for this sample of 50 scores is greater than 100 in light of the evidence of the unusually high score of 150. However, when subjects were asked to estimate the mean I.Q. score of the sample of 50, there was a tendency to stick with 100 as 23 the expected mean of the sample. These college subjects believed that the sample of 50 1.0. scores was representa- tive of the entire population. They felt the remaining scores in the sample should contain some very small entries that would counter-balance the large score of 150. Kahneman and Tversky point out that systematic use of the representative heuristic leads researchers to gamble research hypotheses on small samples (to overestimate the power of small samples), to have unreasonably high expecta- tions for the success of replication studies (to underesti- mate the breadth of confidence intervals), and to fail to attribute deviations in expected results to sampling vari- ability. The results of this first paper ”Belief in the Law of Small Numbers" (1971) encouraged Kahneman and Tversky to continue their investigations on the use of the repre- sentativeness heuristic. In subsequent investigations, Kahneman and Tversky (1972, 1973) report widespread employment of the representa- tiveness heuristic by college students. College students were told that about oneéhalf of all babies born are boys. They were then asked to estimate the relative frequency of families in which the order of B B B G G G or B G G B G B would occur for having six children, where B stands for boy and G for girl. 75 of 92 responses favored the latter. The sequence B G G B G B appears to be more representative of the random process of having children. A similar tendency was observed for the sequences B B B B G B and B G G B G B. 24 The latter sequence was again preferred, apparently be- cause the former sequence has too many boys to be repre- sentative of the population proportion of 50% boys. Kahneman and Tversky (1972) include further support for their hypothesis that subjects rely upon representa- tiveness and tend to neglect the effects of sample size upon sampling distributions. They asked college students the following question. A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hOSpital about 15 babies are born each day. As you know, about 50% of all babies are boys. The exact percentage of baby boys, however, varies from day to day. Sometimes it may be higher than 50%, sometimes lower! For a period of 1 year, each hospital recorded the days on which (more/less) than 60% of the babies born were boys. Which hospital do you think recorded more such days? The larger hos- pital, the smaller hospital, about the same? Twenty-eight of fifty college students said that it made no difference whether the hospital was large or small, because probability was still 1/2 that a baby would be a boy. The remaining 22 subjects divided their answers about equally between the other two responses. These subjects tended to see no difference in the effects of sample size on sampling variability within the two hospitals. Reliance upon the representativeness of the 50:50 population propor- tion of boys to girls tends to obscure the instability of the sampling variation in the smaller hOSpital. 25 Kahneman and Tversky (1973) also found that the re- presentativeness heuristic could account for biases in the prediction of categorical events and of numerical events. Given a list of 9 fields of graduate study, and a descrip- tive personality profile, college subjects were asked to rank the fields of specialization (business, law, education, medicine, social science, etc.) according to the likelihood that the person described in the profile would be in those areas. There was an overwhelming tendency fer the subjects to rank the fields according to how well they “fit" the stereotyped personality description. The subjects ignored the actual relative frequencies that graduate students enter any of these fields. Thus, while education may have more graduate students than any other area and engineering may have the fewest, the personality description of the graduate student could lead people to rank engineering as most likely if a stereotype engineer was described. Information given about a person, regardless of its worth or validity, was deemed more representative of a person's career pursuits than were the actual percentages of people pursuing those careers in graduate school. The representativeness heuristic is suggested by Kahneman and TVersky to be a major reason why opinions are formed on the basis of stereotypes. Similar results on stereotypes have been obtained by Chapman and Chapman (1967), and by Bruner, Goodnow and Austin (1956). Bruner et.aJu say 26 that preferential attributes lead to prejudicial cate- gorizing. Overreliance upon preferred but highly unre- liable cues, such as length of brow, biases subjects in picking out highly intelligent people. Intelligence is, thus, categorized by reference to the cue of length of brow. Kahneman and Tversky found that subjects ignore the effects of regression in making numerical predictions. Even graduate students in educational statistics courses who had recently been exposed to the effects of regression and sample variability ignored regression when making pre- dictions. Results of Kahneman and Tversky (1972), Peterson and Du Charme and Edwards (1968), and Edwards (1968) indicate that college subjects are unable to construct optimal bi- nomial distributions. Edwards and his collaborators offer substantial evidence for what they call "conservatism" in human beings' construction of subjective probability dis- tributions. .Humans behave suboptimally in their revisions of probability distributions in light of new data. Edwards suggests the theory that man is a conservative Bayesian and and behaves according to a Bayesian model of probability revisions when making judgments. A more detailed discussion of Edwards' point of view is included in the next section of this chapter. Kahneman and Tversky (1972) disagree with Edwards' claim that man is a conservative Bayesian. They prefer to 27 explain human's suboptimal preformance in constructing subjective binomial probability distributions by the use of the representativeness heuristic. Subjects, according to Kahneman and Tversky, use the representativeness heuris- tic to simplify the complex task of estimating probabilities in binomial contexts. The population proportion of successes, p, is deemed representative of what should happen, even for small samples of a binomial experiment. Thus, when subjects were presented with p = 4/5 and N = 10 in a binomial experiment, there was a significant tendency to predict 6 successes and 4 failures to be more likely than 10 successes. The former is more representative of the population propor- tion of successes, although the latter is actually more likely to occur. In addition to the representativeness heuristic, Tversky and Kahneman (1973) identify and discuss the heuristic of availability. Subjects are said to employ the availability heuristic when they judge probability according to the ease with which instances of an event can be constructed. Evidence for the accurate assessment of outcomes using availability is presented by TVersky and Kahneman (1973) in experiments with word construction. Some subjects were asked to estimate the number of words that they thought they could form in a two minute time interval form a list of letters. Other subjects were asked to perform the task. The difficulty of the task was controlled by varying the 28 available letters. Correlations between the estimators and the performers were all above .9. Thus, people can be accurate assessors of availability. While the use of availability can lead to correct judgments, as above, it is also possible that availability can bias the judgments that subjects make for the likeli- hood of events. When asked whether the letters K, L, R, N, and V are more likely to appear in the first position or the third position in English words (Tversky and Kahneman, 1973), 105 of 152 college subjects favored the initial po- sition. In fact, each of these letters appears more fre— quently in the third position in the English language. Availability of instances where these letters start words is probably responsible for the incorrect judgments made by these subjects. Kahneman and Tversky also found uses of availability which cause errors in judgments given for combinatorial outcomes. A path was defined as a sequence of line seg- ments connecting a symbol in the top row to a symbol in the bottom row, such that one and only one symbol was touched in each row. Then subjects were asked whether grid A or grid B (below) had more paths in it. Grid A Grid B >¢NIK>4XIE>¢Ntfl>< 29 An easy application of the sequential counting prin- ciple shows that each grid has 512 paths possible in it. However, 46 of 54 subjects (Tversky and Kahneman, 1973) responded that grid A had more paths in it. Evidently it is easier to construct particular instances of paths in grid A, i.e., subjects tend to see paths as more available in grid A. In estimating the binomial coefficient (A?) for choosing a committee of r persons from 10, 118 college students indicated that (if) was monotone decreasing as r increased. The symmetrical nature of the binomial coef- ficient escaped subjects. There was an overwhelming ten- dency for subjects to view small groups from 10 to be more available than large groups. TVersky and Kahneman propose that it is easier to construct instances of three person committees than seven person committees. Availability also comes up when subjects are asked to extrapolate beyond an initial pattern. College students were exposed for 5 seconds to either the expression 1.x2 x 3 x4 x5 x6 x7 x8, or the expression 8 x7 x6 x5 x4x3 x2 x1- Median estimates were 512 for the first expression and 2,250 for the second. The correct answer is much larger, 40,320. The availability of large or small initial partial products had a significant effect on the size of the extrapolated estimate, although both median estimates were very conser- vative when compared to the correct answer. 30 In estimating likelihoods in a binomial distribution, subjects were also found to employ the availability heuris— tic. Tversky and Kahneman presented college students with the following grid. XXOXXX XXXXOX XOXXXX XXXOXX XXXXXO OXXXXX Then the students were asked to estimate the relative fre- quency of paths that hit 6X, 5X, and 10, 4X and 20, .. ., 60's. Other students were simply asked to decide whether paths with 6X or with 5X and 10 would be more likely to occur. In both cases Kahneman and Tyersky found a pre- ponderance of subjects who favored the paths with 6X's . The explanation they give for this phenomenon is that sub- jects see the X's as more available than the 0's. If the mathematical construction for the above problem was preserved, but the structure of the task was altered, college subjects were found to favor 5X and 10 over 6X's. Suppose that six players in a card game received a single card at random from a deck in which 5/6 of the cards are marked X and 1/6 are marked 0. What is more likely to occur, SX's and 10 or all 6X's? In this context, Tversky and Kahneman found that a majority of their subjects favored SX's and 10. These subjects were presented with the actual population proportion of X's, 5/6. Hence, their judgments are "representative" of the p0pulation 31 proportion. The representativeness heuristic, rather than the availability heuristic, was utilized by these subjects in evaluating the likelihood of the card outcomes. Kahneman and Tversky (1972) propose that the major difference between the evaluation of subjective probability by representativeness or by availability is the nature of the judgment presented by the task environment. Representa- tiveness encourages evaluators to judge the degree of cor- respondence between the sample and the population. Thus, representativeness emphasizes the generic features of the task environment, the connotative distance between sample and population. 0n the other hand, availability is employed to judge subjective probability by the retrieval of parti— cular instances, and so it focuses on the denotative dis- tance between sample and population. These heuristics, according to Kahneman and Tversky, are adopted not only because they simplify complex tasks and thereby reduce cog- nitive strain, but also because in many instances they serve to produce quite accurate estimates by human judges. Uh- fortunately, they can also lead to bias and misconception, as pointed out above. In addition to the investigation mentioned above, many other authors who eXplore the areas of probability learning and subjective probability have reported results which sup- port the hypothesis that human judges employ the heuristics of availability and representativeness. 32 Reviews of the literature on probability-serial- learning experiments by Tune (1964) and Vlek (1970) re- port that subjects are overconfident of their ability to predict events based upon a long series of observations. A discussion of this phenomenon can be found in Howell (1970). Tune cites considerable evidence for subjects' overestimation of low frequency events and underestimation of high frequency events. Reliance upon representativeness, which tends to make subjects think outcomes Should "even out", may explain this behavior. Komorita (1959) also pro- vides experimental results verifying the overestimation of low and underestimation of high probability events. The work of John Cohen and Mark Hansel (Cohen and Hansel 1956; Cohen 1957, 1960) in the area of subjective probability has also provided support for the use of availability and representativeness. Cohen and Hansel sampled one bead at a time from an urn filled with blue and yellow beads in a 3:1 ratio. Their subjects were not told what the ratio was, but were asked to predict the color of the next bead after each trial. Cohen and Hansel found a tendency among adults to first predict the color which had appeared less often, and then the color which had appeared more often. This behavior is entirely compatible with the representativeness heuristic. Subjects will first assume that things should "even out", and then predict outcomes based on observed population pro- portion. 33 In another experiment, the beads were sampled in groups of 4 and placed in hidden beakers. Cohen and Hansel told their subjects, who were from all different age groups from age 6 up to adult, that the population proportion of blue beads to yellow was 1:1, or 3:1 or 2:1. Then they asked their subjects to estimate the number of beakers among 16 which would contain 4 blues, 3 blues and a yellow, etc., down to 4 yellows. The subjects were, thus, generating subjective sampling distributions. The chief effect noticed was an induced preference among the subjects for the number of beakers containing the exact proportion of the population in the large urn. Cohen and Hansel's subjects manifested the judgment of representativeness, because the small samples of size 4 were judged to be very representative of the 1:1 or 3:1 population proportions. Cohen and Hansel mention that subjects fail to grasp the independence of sequential events. They claim that subjects feel forced to believe that failure is more likely to occur after a long run of successes, and vise-versa. "Even those familiar with the theory of statistical independence often involuntarily share this belief" - (then and Hansel, 1956; 10). This tendency to be influenced by recent events in a series is characteristic of the "local representativeness" attributed by subjects even to small samples. Jarvick (1951) obtained results similar to those of Cohen and Hansel when subjects were asked to predict the next event based upon a sequence of observed events. 34 Cohen (1957, 1960) reports that 15 year old subjects feel that uncertainty from two or more sources reduces the probability of success more than does a single source of uncertainty. Cohen also reports that when subjects 15 years adult are asked if they would prefer to try to pull a winning ticket out of a box of ten tickets, or get ten tries at pul- ling a winning ticket from a box of 100 tickets (with re- placement), the subjects prefer the smaller sample size and the single attempt by 4 to 1. This behavior suggests the use of the availability heuristic. People may view their chances of winning as more "available" in the 1 in 10 situ- ation than in the additive context of 10 chances at l in 100. In summary then, there is evidence in the literature supporting the theory of Kahneman and Tversky that subjects employ the heuristic of availability and representativeness when making judgments as to the likelihood of events. The theoretical positions of Cohen, and of Kahneman and Tyersky, are summed up below. "It is a fact of great interest that in the long and intricate course of mental development, our subjective probabilities tend, in many respects, increasingly to approach objective or mathemati- cally determined probabilities. This convergence owes something to learning as well as to maturation. 0f no less significance is the fact that, in certain respects, subjective and Objective or mathematical probabilities never converge. The subjective then Obeys laws of its own. From this we may infer that consciousness is not merely 'a reflection of exter- nal reality' but creates other realities from with- in." (Cohen, 1960; 191) 35 "Perhaps the most general conclusion obtained from numerous investigations, is that people do not follow the principles of probability theory in judging the likelihood of uncertain events. This conclusion is hardly surprising because many of the laws of chance are neither intuitively apparent nor easy to apply. Less obvious, however, is the fact that the deviations of subjective prob- ability seem reliable, systematic, and difficult to eliminate. Apparently, people replace the laws of chance by heuristics, which sometimes yield reasonable estimates and quite often do not." (Kahneman and Tversky, 1972; 430-431) The quote by Cohen suggests a developmental theory of the probability concept. Literature related to the devel- opment of the probability concept in children will be dis- cussed in section three. The current concern is an overview of the representational theories of human judgment. This is a large area of research in the context of which the studies of Kahneman and Tversky are imbedded. Models of Human Judgment and Decision Making In the introductory chapter to a symposium on repre- sentations of human judgment, Allen Newell (1968) discusses scientific and motivational questions about judgment, and provides an overview of the branches of research in the area of representing human judgment. The scientific ques- tions that researchers ask include: What information is being used to make the judgment? How does the judgment depend upon the input information? What is the process that takes place between input information and the judgment which is the output? What other variables besides the in- put information affect the judgment or decision? 36 From a motivational standpoint research on the judg— mental process and simulations of it are pursued in order to find out: Why do human beings fail to make Optimal judg- ments? Can machines simulate the human judgmental process, equalling or perhaps even surpassing the accuracy of human judges? What are the ways in which humans simplify complex tasks, thus rendering them amenable to a simpler analysis, in order to make a decision? The research on subjective probability that has been surveyed in section one of this chapter is very much in the tradition of the research on the questions listed above. In particular, the studies of Kahneman and Tversky on the uses of heuristics are concerned with the process that occurs between the inputs and outputs of human judgments. Kahneman and Tversky are primarily concerned with the process or strategies that humans employ to simplify the complex task environment in situations of uncertainty. Models of human judgment are, therefore, of central importance to the work of Kahneman and Tversky, and so also to this study. Newell (1968) mentions that there are four major strands of research that attempt to model the process and products of human judgment. These are mathematical models of the formal judgmental process, models of the task environment, informa- tion processing models, and models that view judgment as embedded in a larger problem. The first three of these strands are concerned with isolating and studying the judg- ment of and in itself, and are related to the area of 37 subjective probability. The fourth strand, which investi- gates judgment embedded as a step in larger process, will not be discussed in this paper. The interested reader is referred to Feigenbaum and Lederberg (1968). The mathematical models of human judgment are concerned with simulating the judgmental law. They are, therefore, primarily product oriented models and tend to be only se— condarily concerned with the process that a human being uses to make a judgment. .A mathematization is attempted in order to calculate optimal decisions based upon input cues. According to Newell (1968), the general ingredients for these formal models consist of a state of the environment x, action possibilities by the judge, given by a, the out- come p that results from the environment "x" and the judge's action "a" (p is a function of x and a, p = p(x,a)), a utility function v which measures or assigns_ a value to each outcome p, and a payoff function V of x and a. The payoff resulting from application of a par- ticular outcome is formally represented as w(x,a) = v[p(x,a)]. The usefulness of such a general model is heavily dependent upon some valid and reasonable method of mathema- tizing x and a. In general, it is not even clear what x and a are, for the state of the environment may be un- certain, and the action options available to the judge are hard to estimate, and may even vary from judge to judge. Thus more specific mathematizations of pieces of this gen— eral model must be attempted. 38 Hoffman (1960, 1968) has used the techniques of linear regression to obtain weights for the cues utilized by human judges and to develop regression equations which will pre- dict their decisions. Hoffman's general model has its the- oretical roots in a correlational model proposed by Egon Brunswick's probabilitistic functionalism (Brunswick, 1947). In one of his studies Hoffman (1968) gave college subjects (the judge) information for nine predictor variables from student profiles. The predictor variables included such items as a high school counselor's rating, study habits, amount of parents' education, emotional stability, anxiety level, and so forth. Subjects then used the predictor vari- able information to make a judgment of a student's intelli- gence. weights were assigned to the predictor variables and a regression equation was computed for each judge with the judgment being the dependent variable. The simplest possible regression model is the best fitting hyperplane obtained by the method of least squares. In computing a multiple cor- relation coefficient to measure the precision with which a linear combination of the predictor variables can account for a subject judgments, Hoffman found that 64-81% of the variability in the judges decisions could be accounted for by a simple linear regression model. The question then arises: Do human judges actually process information according to a linear model, or is the regression model just an accurate simulation of the judg- mental product? Hoffman declares that even though the 39 linear model is very accurate in predicting judgment, he is unwilling to accept the fact that the actual process of human judgment is a simple linear combination of weighted cues. He indicates that there is some evidence (Hammond, Hurch, and Todd, 1964; Rorer and Slovic, 1966) that humans combine information configurally, that is, in a non-linear manner. Hoffman prefers to look upon the regression model of human judgment as a paramorphic representation of the product of a judgment rather than as an isomorphic copy of the judgmental process. By "paramorphic" Hoffman means that although regression equations may perfbrm like a human judge, they do not necessarily describe the actual processing of the information in the human brain. Edwards (1968) is a spokesman for those researchers who model human judgments by calculating the outputs of Bayes' theorem and comparing them to human judgments. Pro- ponents of the Bayesian representation of human judgment are primarily concerned with how people revise their judg- ments in light of new evidence. Probabilities for condi- tional events can be formally computed from Bayes' theorem, which says if A and B are events, then the probability of event A occuring given that event B gig occur, P(A|B), is given by p(AlB) = ESPAIH-‘I-Bl . 4O Bayes' theorem allows one to calculate a revised probability in a sample space that has been restricted or narrowed down by the introduction of new evidence. Studies conducted by Edwards (1968), and Peterson, Du Charme, and Edwards (1968), have led them to believe that human beings do, in fact, make decisions that are in accord with the theorem of Bayes. However, the results of human judgments are generally much more conservative than would be predicted by Bayes' theorem. Edwards placed 700 red poker chips and 300 blue poker chips in one bookbag, and the opposite distribution, 300 red and 700 blue, in another. He then drew a sample of 12 chips from one bag - determined at random. The probability that the sample comes from one specific bag is .5 before a subject is given any information as to the composition of the sample. Subjects were then told that the sample con— sisted of 8 red and 4 blue chips, and were asked to estimate the probability that the sample was drawn from the predomin— ately red bag in light of the new information. The mean estimates for subjects fell between .7 and .8. The actual probability, calculated from Bayes' theorem, that the sample was drawn from the predominantly red bag is .97. Edwards (1968) sights considerable evidence for the hypothesis that human beings do, in fact, process information according to Bayes theorem, and do so conservatively. A study by Phillips, Hays, and Edwards (1966) gives strong evidence for the fact that human judges are conservative Bayesians. 41 The phenomenon of conservativism is somewhat distres- sing to the Bayesians, since it prevents human judges from performing optimally on judgmental tasks. Considerable effort has been devoted to the problem of overcoming con- servatism in human judgment in studies by Peterson et. a1. (1968) and by Wheeler and Beach (1968). Their findings suggest that the phenomenon of conservatism is very diffi- cult to overcome, and that only long and involved training procedures appeared to have any effect upon human subjects' performance on probability revision tasks. The Bayesians cannot agree among themselves as to whether conservatism results because humans misaggregate data, or misperceive the impact of data, or if conservatism is an artifact of the human judgmental process. Kahneman and Tyersky (1974) claim that the reason Bayesians have difficulty fitting their model to the process of human judgment is that humans are not conservative Bayesians at all. They claim that, whereas the Bayesian model may pre- dict fairly accurate products of human judgment, the model does not capture the essential characteristics of the judg- mental process. Since sample size has no effect upon sub- jective probability distributions (see section one of this chapter), human judges must be using the proportion of suc- cesses in a binomial experiment as a representative charac- teristic which will predict the actual distribution. Thus, for Kahneman and Tversky, human judges apply the representa- tiveness heuristic in the experiments conducted by the 42 Bayesians, and do not process information by applying Bayes' theorem. The linear regression model and the Bayesian model are the main attempts at mathematization of the human judg— mental process. Each of these models has shown some success in predicting human decisions, but neither of them is widely accepted as having accurately depicted the process that actually takes place in the human mind when judgments are rendered. A thorough review of the literature comparing and contrasting these policy capturing models, regression and Bayesian, has been done by Slovic and Lichtenstein (1971). The actual process that occurs during a judgment is more carefully investigated by models of the task environ- ment and by information processing models, than by formal mathematization of human judgment. Simon and Newell (1971) describe the elements of an information processing model of problem solving, judgment, and decision making. The model is concerned with a problem solver confronted by a task that is objectively defined in terms of a task environment. The task environment is the state of the problem at any given moment in the prdblem solving process. The problem solver defines the problem (and constantly redefines it) in terms of operations that constitute what Simon and Newell call the problem space. Thus, for example, in chess, the task envir- onment is the state of the board in between moves and the problem space consists of those operations which constitute permissable moves by the players. The assumptions that 43 infOrmation processing models make about human problem solvers include: the existence of a Short term memory to temporarily store bits of input information; the existence of a long term memory bank of facts and strategies which can be brought to bear upon a problem; and the assumption that human problem solving occurs in an essentially serial manner. Simon and Newell believe that human information processors proceed step by step in solving a problem or making a decision, rather than carrying out several paral— lel procedures simultaneously. Models that deal specifically with the nature of the task environment attempt to analyze the task itself to see if the structure of the task dictates the strategies that are possible for making a judgment. The work of Chapman and Chapman (1967) on categorizing personality on the basis of facial components, and the extensive investigation of Adrian DeGroot involving mid-game position in chess (DeGroot, 1965), as well as Newell and Simon's own work on chess and cryptarithmetric problems (Newell and Simon, 1972), provide examples of attempts by researchers to model the task envir- onment. Models of the task environment typically do not attempt to describe the entire process of human judgment. They focus on one element of this process. In this respect, information processing models try to simulate not only the task environ- ment and the information that gets used, but also the repre- sentations of the environment developed by the decision-maker 44 and hpg the human decision-maker actually processes these representations. A judgmental law is generated by the information structure of the task environment, and any math— ematization representing the process of the judgment is only developed afterwards. Often the model culminates Una computer program which attempts to simulate the actual judgmental process that a human judge uses. Newell, Simon, and Shaw (1958), and Newell and Simon (1972), have simulated the task environment, the problem space, the process used by a human judge, and finally written a computer program to carry out the processes of proving the- orems in logic, playing chess, and solving cryptarithmetic problems. Clarkson (1962) tape recorded stock brokers while they were thinking outloud and making decisions about invest- ments. He then analyzed the protocols that he had taped, identified crucial elements that stock brokers used in making decisions including a wealth of information on past stock performances, and wrote a computer program that simulated the process that the stockbrokers go through. Predictions made by the program were tested against those made by stockbrokers over a six-month period, and the program predictions were in agreement with those of the stockbrokers more than 90% of the time . Kleinmutz (1968) studied the diagnostic decisions made by clinical psychologists in interpreting personality pro- files. The psychologists were tape recorded while thinking 45 outloud, protocols were analyzed, and a computer program was written to simulate the optimum human judgment in ana- lyzing the profiles. Shulman and Elstein (1974) describe a four step infor- mation processing model which they claim is used by physi- cians in doing a diagnostic work up of a patient. The model includes one aquisition, hypothesis generation, cue inter- pretation, and hypothesis evaluation. This paper of Shulman and Elstein's includes a comprehensive review of the liter- ature of problem solving models in the information processing tradition, and compares the research in information processing to the aforementioned research on judgment by the Bayesian and regression model builders. In comparing these two branches of research, Shulman and Elstein point out that the information processors are primarily concerned with isomorphic models of the judgmental process that humans actually go through, while the Bayesian and regression schools deal with.paramorphic simulations that have outputs similar to those made by human judges. Bayesian and regression investigators distrust intrOSpective techniques, while process tracers rely heavily upon introspective tech- niques, such as thinking outloud, in order to represent the process of human decision making. The information process models are concerned first with understanding and explaining the judgmental process, and only secondarily with accurate prediction and control. On the other hand, prediction and control are the primary concerns in the "policy capturing" 46 research of the Bayesian and regression models. Shulman and Elstein suggest that the two areas of research have much to offer each other, and that they complement each others' weak points. This study is primarily concerned.with several heuris- tics that human information processors use when making judg- ments. The availability heuristic and the representativeness heuristic (Kahneman and Tyersky, 1974) appear to be part of the problem space. That is, they are examples of operations that judges might use in dealing with a task environment of uncertainty. There is evidence that human judges misuse these operations when making estimates for the likelihood of events (see section one of this chapter). Difficulties encountered in overcoming the misuse of the representative- ness and availability heuristic appear to be deep-seated and dependent upon the development of the probability concept. The next section, therefore, will contain a discussion of the development of the probability concept in young children and adolescents. The Development of the Probability Concept in Young Children and Adolescents. In this section, literature from the fields of psychology and mathematics education that deals with the development of probability concepts in both elementary and secondary school- age children will be cited. The section begins with Piaget's study of the probability concept in children, ages 3-15. 47 Although this study is concerned with misconceptions that college age students have about probability, the develop- ment of probability concepts in younger subjects is also important to the study. The roots of misconceptions that adults have about probability can sometimes be traced back to earlier experiences, or lack of experiences, with prob- abilistic concepts. The work Piaget and Inhelder reported in their book Inhelder, 1951) is the source of much of the research in the development of the probability concept in young chil- dren. Piaget presents clinical evidence from interviews with children and concludes that the learning of probability concepts proceeds in stages, in accord with his theory of the development of thought in children. There are three stages in Piaget's theory of the development of the prob- ability concept in children. In the first stage, generally characteristic of chil- dren under seven years of age, the child is unable to dis- tinguish between the necessary and the possible. In this stage, "uncertainty" means only unpredictability of events in the near future. The child does not possess a concept of logical uncertainty, and so does not understand the true nature of a random mixture. Piaget found that children in this first stage of development tried to superimpose an order or discover a pattern amid the chaos of a random mix- ture. 48 Two behaviors that Piaget observed in children in the first stage are worth noting in connection with the present study. In the first place, if a subject was shown instances of events A and B, and if A appeared more frequently than B, the subject would tend to bet on B because it had been skipped too often. This type of behavior, sometimes referred to as the "gambler's fallacy", exemplifies a sub- ject's use of the representativeness heuristic, in the language of Kahneman and Tversky (1972). A truly represent- ative sequence of instances of A's and B's Should not favor one or the other (provided, of course, that p(A) = P(B)). In the second place, Piaget's subjects tended to pre- dict those events which had been observed most frequently, with total disregard for the population distribution. This type of behavior is characteristic of the availability heuristic (Tversky and Kahneman, 1973), wherein events are predicted based upon constructible instances. The author does not want to imply that Piaget's subjects were actually employing the representativeness heuristic or availability heuristic in their responses. The use of these heuristics, as described by Kahneman and Tversky, would re- quire the subject to possess logical operations which are not observable in Piaget's preoperational children. However, the fact that behaviors characteristic of these two heuris- tics exist in small children, for whatever reason, and that 49 these behaviors are manifestly still observable in mature adults (Cohen and Hansel, 1956; Kahneman and TVersky, 1972),. indicates that they might present formidable obstacles to learning the correct theoretical rule. In the second stage of the development of the proba- bility concept (up to about 14 years), Piaget claims that a child recognizes the distinction between the necessary and the possible, but has no systematic approach to gener- ating a list of "the possibles". Thus, a child in this stage lacks the ability to list the sample space for a prob- ability experiment. The second stage child does not possess the formal operations which are crucial in systematizing a combinatorial analysis. In the third stage, the child begins to develop a com- binatorial analysis, understands probability as the limit of relative frequency (law of large numbers), and can deal with the probability of isolated instances as a function of the whole distribution. Piaget, and Kahneman and Tversky, give evidence for the same types of behavior in subjects from vastly different age groups. In the language of Kahneman and Tversky, Piaget's subjects (children) exhibit behavior that is characteristic of the representativeness and availability heuristics when they are asked to predict outcomes. In the language of Piaget, Kahneman and Tversky's subjects (mostly college students) eXhibit behavior that is characteristic of the 50 first and second stages of cognitive development, prior to the acquistion of formal operations. Piaget's interview technique requires a high degree of verbalization from the subjects. Some studies have been conducted to see if very young children indicate an under- standing of some probability concepts when their decisions are made in a non-verbal format. Davis (1965), and Yost, Siegal, and Andrews (1962) present evidence for the exis— tence of some concepts of probability in children age 3 and 4. The children were permitted to determine probability or frequency by utilizing a non-verbal decision process. Ybst et. a1. claim that the amount of reinforcement in a probability learning experiment with four year olds had a significant effect upon the accuracy of the children's pre- dictions. Smock and Belovicz (1968) claim that the children in Yost's experiment really learned about reinforcement, and not about probability. They present substantial evidence that subjects of junior high age have a very poor conception of the laws of probability. Smock's subjects could not consistently generate correct sample spaces, and did not recognize or utilize the concept of independence when pre- dicting outcomes. Cohen and Hansel (1956) identify four stages that chil- dren go through in the development of the idea of a proba- 'bility distribution. At first there is just a "glimmering 51 belief" that the numbers in a distribution will really vary. This corresponds somewhat to recognizing the distinction between the necessary and the possible in Piaget's theory. Secondly, a child feels that the category of exactly equal proportions will occur most often, that is, that every prob- ability distribution is a uniform distribution. In the third stage, likelihoods are assigned to outcomes based upon their similar structure. For example, the outcome one blue and four yellow beads is judged as likely to occur as the outcome one yellow and four blue beads, regardless of the population composition. In this stage the child applies the principal of symmetry universally. Finally, Cohen and Hansel claim, a child is able to assign a greater probability to the event "one blue and three yellow beads" than the event "four blue beads" in a 50-50 distribution. then and Hansel attribute the stages of mental development both to maturation and physical experience, and say that a child is ordinarily in the fourth stage of development around the age of 15. This theory is very much in accord with that of Piaget. Cohen and Hansel, Stevens and Zigler (1958), Messick and Solley (1957), and Kass (1964), have examined the devel— opment of the ability in children to "matc " their guesses to the actual distribution for binomial outcomes in a serial learning task. Generally, studies on serial learning pre- sent the subject with a box containing two lights. The 52 subject guesses which light will come on, and then the light is shown. The frequency for the light is preset at 80:20 or 70:30 or 50:50, or whatever the researcher desires. Results of such research performed with adult subjects show that the proportion of guesses asymptotically approaches the preset proportion for the lights. Messick and Solley (1957), and Stevenson et. a1. (1958) report that children approach the asymptotic response level in about the same ‘way that adults do. Cohen and Hansel (1956) agree with these results, but found a tendency for children age 6 or 7 to simply alternate their guesses. About age 8, Cohen reports, children begin to be influenced by the previous outcomes. Kass fOund that boys preferred real gambling distri- butions more than girls in binomial probability learning tasks. While girls would rather have a certain chance at a payoff, boys preferred payoff odds of 2:1 or 7:1 against them in gambling situations (Kass, 1964). Kass' study is one of the few studies that found any sex differences in the development of the probability concept in children. Carlson (1969) and Hoeman and Ross (1971) found that the development of probabilistic reasoning increased with age, and generally followed Piaget's stages. Hoeman supports Smock (1968) in criticizing the studies of Yost and Davis (mentioned above) which attributed to very young children a greater understanding of probability than that claimed by Piaget. 53 There appears to be a good deal of support and agree- ment in the literature that the development of the proba- bility concept in children does proceed in stages in accord with the theory of Piaget. However, there is considerable disagreement among investigators as to which probability concepts are actually known by children, and at what age levels. As a result of Piaget's theory of development and the controversy surrounding the level of probability concept attainment by children at various ages, there has been some relatively recent research by mathematics educators in the area of probability learning. These studies have also been spurred by the suggestions of the College Entrance Examina- tion Board (1959) and the Cambridge Conference on School Mathematics (1963) which encouraged the inclusion of topics from probability and statistics in the elementary and secon- dary schools. Most of the studies in mathematics education are either feasibility studies undertaken to determine the teachability of probability and statistics in the elementary or secondary schools, or experimental and correlational studies which attempt to measure the effects of teaching a unit on probability. Studies by Leake (1962), Doherty (1965), Mullenex (1969), Leffin (1971), and Jones (1974) have investigated children's understanding of probability concepts prior to any fOrmal instruction. 54 Leake found that seventh, eighth, and ninth grade stu- dents had some understanding of sample space, probability of a simple event, and probability of the union of two dis- joint events (mutually exclusive events). Mental age and achievement both correlated significantly with understand- ing of probability. Leake recommends the inclusion of prob- ability topics in these grade levels based upon his results. Doherty (1965) carried out a similar study with fourth, fifth, and sixth graders. An investigation of children's understanding of independent events was added to the three concepts of sample space, simple probability, and mutually exclusive events of Leake's study. Doherty found that chil- dren in grades 4-6 possess considerable familiarity with these concepts prior to fOrmal instruction. As in the Leake study, age, mental age, and achievement were found to be significantly related to the level of understanding of prob- ability concepts. Doherty interprets her results as indi- cative of the feasibility of teaching probability in the elementary school. She recommends that topics from proba- bility be included in elementary school curricula, and that teacher training programs make provisions for informing prOSpective elementary teachers about probability topics that would be suitable for elementary school children. Mullenex (1969) investigated the relationShips between understanding of probability in grades 3-6, and the variables of sex, age, grade level, and skill in other school subjects. 55 His test was based upon the questions that Piaget asked children in interviews. Multiple linear regression tech— niques indicated a tendency for arithmetic computational skills and reading skills to be relevant predictors of performance on probability measures. Mullenex found suf- ficient evidence for the understanding of probability in children to warrant inclusion of probability topics in grades 3-6. In a study of probability concepts possessed by chil- dren in grades 4-7 prior to formal instruction, Leffin (1971) reports that children have considerable knowledge of the concepts of finite sample space, probability of a simple event, and quantification of probability. 1.0., sex, and grade level were all found to be signigicantly re- lated to the understanding of probability. 1.0. was found to be the most accurate predictor of performance on proba- bility tests. In analyzing the children's errors, Leffin mentions that the concept of combinations was very diffi- cult for them to comprehend or to use. When Leffin's sub- jects could list all the outcomes in a sample space that counted combinations, 92% of them could not use the infor- mation from the sample space to calculate a probability. This evidence appears to support Piaget's position that children of this age are in the stage of concrete operations. Leffin's subjects could successfully handle probability in simple situations like drawing balls out of a box given the number of balls of each color that are in the box. However, 56 the more complicated combinatorially-generated sample spaces were not understood by these children. This finding caused Leffin to speculate on how early children can be taught a systematic method of counting. He recommends taped inter- views and the use of manipulatives with children in order to obtain more information about children's readiness to learn counting principles. Jones (1974) used taped interviews with first, second, and third graders, and embodiments of set and measure to investigate the status of five concepts of probability among early elementary school children. The embodiments were spinners with equal and unequal area divisions, and jars containing discrete objects. Interviews were taped in order to gain insight into the errors made by the sub- jects. The concepts were sample space; comparison (P1) of the probability of two events within a fixed sample space; comparison (P2) of the probability of a given event across three sample spaces with the number of total outcomes held a b c -: identification of (P h- , a. , n uniform prob- constant, 3) ability distribution; and comparison (P4) of one event across three sample spaces in which the frequency of that event was constant but the total number of outcomes was varied, 3’ 5:7, 1:- . Jones found evidence in support of the children's understanding of P2, P4, and of sample space. He suggests that for primary children, an apparent under- standing of probability in one situation does not guarantee understanding will be evidenced in another situation. There 57 is also further evidence in Jones' study that 1.0. predicts the extent of the development of probabilistic thinking in young children, in accord with the findings of Leake, Doherty, and Leffin. The use of embodiments seemed to help the children understand probability although Jones reports that the use of manipulatives to perform an experiment some— times interfered with the children's ability to list the outcomes of a sample space. Color biases and individual preferences prevented some children from making accurate responses to questions involving the spinners. The studies listed above universally recommended that topics in probability be included in pre-secondary mathe- matics curricula, grades 1-9. Attempts at developing ma- terials in probability and testing the effectiveness of these materials have been made by Shepler (1970), Shepler and Romberg (1973), Gipson (1971), White (1974), McKinley (1960), and Shulte (1968). Shepler (1970) developed a unit on probability dealing with sample spaces of one, two, and three dimensions, and necessary counting techniques. The unit was taught to a class of 25 specially selected sixth graders of above aver- age ability. The unit was taught using a mastery learning model that incorporated self-correcting exercises, specific prescriptions to diagnose and remedy errors, extra help sessions, and extra group instruction when mastery was not satisfactorily attained by a large majority of the class. Objectives included counting outcomes, probability of a 58 simple event, probability of a compound event, equally likely vs. unequally likely probability models, and estim— ating the probability of an event from data in an experi- ment. A criterion level of 90% correct by 90% of the students was set for mastery of the objectives. All the behavioral objectives were mastered at this level by the students except those dealing with counting the number of outcomes and estimating probability from data. Shepler's results agree with those of Leffin (1971), and suggest that sixth graders do not yet possess the formal operations that Piaget claims are necessary to count all the outcomes sys- tematically. A follow up study (Shepler and Romberg, 1973) indicated that after four weeks the subjects were able to retain most of what they had acquired at the mastery level. Studies were carried out by Gipson (1972) to determine what materials would be appropriate for introducing proba- bility concepts to third graders. In one study children received instruction in small groups and in another instuc- tion was individualized. The instructional sequence dealt with the concept of sample space and the probability of a simple event. Audio and video tapes of the subjects were made to gain deeper insight into the process through which children learn about probability concepts. Gipson, like Shepler, reports that the children had difficulty specify— ing estimated probability from an experiment. White (1974) compared pre- and post-test results and found that seventh and eighth grade subjects demonstrated 59 significant increases in achievement of probability concepts. Achievement in probability was correlated significantly with concept attainment, computational ability, and reading abil- ity in White's study. McKinley(1960) and Shulte (1968) developed and tested units on probability for secondary school students. McKinley reports that intelligence, language skills, reading compre- hension, and math achievement all correlate significantly with achievement on a unit in probability taught to 12th graders. Shulte tested the effects of a unit on probability on attitude, computational skill, understanding mathematics symbols, and the ability to use formulas. No significant effects upon attitude or computational skill were found, but the probability unit did have a significant effect upon the student's ability to use fOrmulas and interpret mathe- matical symbols. In addition to the studies concerned with the feasi- bility of teaching probability at the elementary and secon— dary levels, which we have discussed above, there have been several comparative studies of the relative effectiveness of two or more approaches to teaching probability at various levels. Comparative studies have been undertaken at the elementary level by McLeod (1972), at the secondary level by Geeslin (1974) and Meyer (1975), and at the collegiate level by Barz (1970), Austin (1974), and Kipp (1975). Three treatments in a unit on probability were admin- istered to second and fourth grade children by McLeod (1972). 60 The treatments were laboratory experience, a teacher dem- onstration, and a control in which no probability was taught. The unit on probability covered the law of large numbers, prediction of a set of outcomes from an experiment involving repeated trials, and uses of probabilistic terms such as “certain", "impossible", "likely", and "unlikely". McLeod found no differences among the three treatments in probability achievement. Moyer (1975) examined the effects of a unit in proba- bility upon arithmetic computation skills, reasoning ability, and attitudes. The experimental group showed no significant improvement in these three areas over a control group which received no probability instruction. Meyer reports that the experimental group did, however, learn a lot about prob- ability. In an attempt to compare the content structure of prob- ability with secondary school students' cognitive structure, Geeslin (1974) prepared a programmed text covering several concepts in probability. Students were allowed to work through the programmed material at their own pace. A con- trol group worked through a programmed text on an unrelated mathematical topic. After ten days the groups were tested on their ability to solve probability problems, and a rep- resentation of each groups' cognitive structure with respect to probability was made and compared to a theoretical struc— ture for probability content. Geeslin found close corre- spondence between the experimental group's cognitive structure 61 and the structure of the probability concepts. Although the two structures corresponded closely, Geeslin warns that the learning of the structure of probability and the ability to actually solve problems in probability may de— velop independently of each other. Kipp (1975) investigated the effects of integrating topics from probability with those of elementary algebra in an experiment with college students. She compared ex- perimental and control groups on achievement, retention, and attitude. Greater retention and improved attitude to- wards mathematics were found in the groups receiving the algebra integrated with probability. Kipp recommends that experimentation be introduced before college students are taught probability formally. She suggests that college students should encounter physical models of both uniform and non-uniform probability distributions. The studies of Barz (1970) and Austin (1974) are closely related to the present study in that they compare several methods of teaching probability to college students. Barz taught two different courses in probability to liberal arts majors and to elementary education majors. The groups were broken up into an x population with three years or less of high school mathematics and a y population with more than three years of high school mathematics. Each of these groups was then broken up into two parts. One part received a course in probability that was set-theoretic based and the (”flier part a probability course that actively involved the 62 students and presented probability from an historical per- spective. The y-population of liberal arts majors was the only group in which the historical-practical-involvement approach yielded significant differences over the set theoretic based course. Barz noted tendencies in the data to favor the historical-active-involvement course in all the groups. Three methods of teaching probability and statistics, symbolic, pictoral, and manipulative pictoral, were compared for their effects upon the achievement of probability con- cepts by college students in a study by Austin (1974). The subjects were freshman and sophomores who were not majoring in mathematics or sciences. The manipulative-pictoral treat- ment used the results of student performed experiments to introduce and motivate the development of mathematical models in the study of probability. The pictoral treatment replaced the actual experiments of the manipulative approach with graphs, pictures, and diagrams of experimental data. In the symbolic treatment, the same concept and material were cov— ered, except that the use of experimental data in the form of graphs or diagrams was deleted. Comparisons on computa- tion, comprehension, application and analysis measures indi- cated that the pictoral and manipulative-pictoral treatment resulted in significantly higher achievement scores than the Symbolic treatment. The pictoral and manipulative-pictoral treatments yielded the same results. Austin concludes that 63 it appears that college students can give up the use of manipulatives, but 225 the use of graphs, pictures, and diagrams representing vicarious experiments, in learning about probability. The review of the literature has, thus far, dealt with subjective probability, models of human judgment, and the development of the probability concept in children and young adults. Literature from the areas of psychology and mathe- matics education has been discussed in connection to these three fields of research. This study is concerned with misconceptions that college students have about probability. Very little research has been done in this area. A study was conducted by Smock and Belovicz (1968) with college students and junior high school students. The major purpose of the paper was to investigate whether junior high school students have a capacity for understanding probability. Smock and Belovicz's results contradict those researchre— sults which say that children have considerable intuitive knowledge of probability (Yost et. al., 1962; Davis, 1965; Leake, 1962; Doherty, 1965; Mullenex, 1969; Leffin, 1971). The results in Smock and Belovicz paper warn educators who would implement probability in the elementary schools that they Should not assume too much. While children in the Smock study did indicate some knowledge of probability in very simple situations, the extent and generalizability of their knowledge was found to be very limited. 64 A secondary purpose of the Smock and Belovicz study was to use data obtained from college students responses to probability items in order to construct similar items for the junior high subjects. An analysis of the college students' responses indicated that the college subjects had not acquired the rules or strategies of responding in probabilistic situations that would correspond to the con- cepts underlying probability. Smock and Belovicz conclude that probabilistic learning, which might be expected to exist among college students on the basis of their eXperi- ence, has not taken place. They recommend further study of the processes underlying the concepts of probability among college students. The Pilot Study In the winter quarter of 1976, a pilot study was con- ducted by the author to investigate college students' mis- conceptions of probability. The subjects for the study were 54 college students enrolled in a finite mathematics course at Michigan State University. The specific purposes of the pilot study were: 1. To ascertain the degree to which college students use the availability and representativeness heuristics to give estimates for the probability of events. 2. To develop and teach an experimental activity- based course in elementary probability and statistics. 65 3. To test the effectiveness of the experimental activity—based course in helping college students to over- come reliance upon the availability and representativeness heuristics when making estimates for the likelihood of events. 4. To compare the effectiveness of the eXperimental course in overcoming reliance upon the availability and representativeness heuristics to that of a lecture-based course in finite mathematics. The subjects had pre-registered into sections of finite mathematics and were grouped accordingly. One section was selected as an experimental group to receive the activity— based course and another was selected as a control group. The control group was given a lecture course in finite math- ematics. The text used was Finite Mathematics by weiss and Yoseloff (1975). An outline of the topics that are covered in the lecture-based course is listed in Appendix C of this study. Personal background data was gathered on the subjects, and it indicated that most of the subjects were second term freshman. There were two juniors and one senior in the experimental group, and several upperclassmen in the control group. The subjects were business, horticulture, agricul- ture, or biological science majors for the most part. A few of the freshman had not yet declared a major. The finite mathematics course was created at Michigan State University for the purpose of providing an alternative 66 to College Algebra and Trigonometry II for students in business, agriculture, or biological sciences who would not continue on to calculus. All of the subjects in this study had completed a course, College Algebra and Trig- onometry I. The experimental group had 24 subjects and the con- trol group 30. On the first day of class, the nature of the experimental course was explained by the experimenter to the students. The subjects were told that the course would be activity—based and that they would work in small groups on experiments and problems. If they felt uncomfor- table about learning mathematics on their own or in small groups, the students were encouraged to transfer into a regular lecture section of finite mathematics. No one trans- fered. 0n the second day of the term, a pretest constructed by the experimenter was administered to both the experimen- tal and the control groups. The pretest was designed to give information about the subjects' reliance upon the avail— ability and representativeness heuristics in estimating the likelihood of probabilistic events prior to any formal train- ing in probability. Personal background data indicated that only a very few of the subjects had had any course work on probability, and that it was usually only a few days in Ihigh.school. The pretest also contained items to give infor- ination about the subjects' knowledge of some probability 67 concepts, such as simple probability, one-, two-, and three-dimensional sample spaces, combination, permutations, and expected value. The pretest questions consisted of several items that were used by Kahneman and Tversky in their research (1972, 1973), several items from the National Assessment of Edu- cational Progress, and items constructed by the experimenter. In order to gain insight into the reasoning process used by subjects to make their responses, the subjects were asked to supply a reason for their responses to some of the items which dealt with availability and representativensss. A revised version of the pretest can be found in Appendix D of this study. An analysis of responses on the pretest items indicated the following: 1. The subjects relied heavily upon the heuristiCs of availability and representativeness in making their responses. Results identical to those of Kahneman and Tversky were Obtained in all but one item (item 3 on the revised pretest). Rea- sons given by subjects for their responses sup- port the contention that the heuristics of availability and representativeness influenced their decisions. 2. There was no significant difference between the responses of the experimental group and the control group on any of the pretest items. The degree of reliance upon availability and rep- resentativeness was the same for both groups prior to a course on probability. There was no difference between the two groups in knowledge of probability concepts prior to formal course work in probability. The results of the pretest, therefore, supported the hypothesis of Kahneman and Tversky that combinatorially 68 naive college students rely upon the heuristic of repre- sentativeness and availability in estimating probability. The pretest results also indicate that there is no reason to believe that the two groups have not been drawn from the same sample of college students. In addition to the pretest, taped interviews with 12 subjects randomly selected from the exPerimental group were carried out during the first two weeks of the quarter. The subjects were asked questions similar to those that dealt with representativeness and availability on the pretest, and were asked to think outloud as they responded. Analysis of the protocols from these interviews yielded results in agreement with the written pretest. The subjects employed the representativeness and availability heuristics in order to decode complex probabilistic situations and make a judg— ment for the likelihood of events. The subjects who were taped often verbalized processes that were highly indicative of availability and representativeness. "There are more X's available to choose from." "This sequence of heads and tails is not random." "I can draw more paths in this grid than in this one because each row has more X's." Following the pretest, course work was begun on the experimental or control materials. Each class met 5 days a week for a 50 minute period. The content of the experimental and the control courses in probability was nearly identical for the first 4 1/2 69 weeks of the quarter. Both courses covered counting prin- ciples, simple probability, applications of counting prin- ciples to probability, independent events, and uniform and non-uniform finite probability models in the first 4-5 weeks of the term. The control group spent a little more time on conditional probability. The experimental class spent a little more time on calculating probability of sequential events where it was necessary to multiply the probabilities of successive independent events. During the second—half of the 9 1/2 week quarter, the content of the two courses differed. The control group studied linear programming and the simplex algorithm, and then studied exPected value and game theory. The experi- mental group studied game theory and expected value, and then spent time on elementary statistics. The statistics included measures of central tendency and variability, the binomial distribution, and an introduction to the chi- squared procedure. In addition, several periods in the experimental course were devoted to the misuse of statistics. The subjects read Egg, 2 Lie with Statistics by Huff (1954), and then reported on instances of misuse of statistics that they found in newspapers, magazines, textbooks, and on tele- vision. The fundamental difference in content was that the experimental course spent time on some elementary statistical concepts, while the control course learned linear programming subjects in the experimental class were also given several 70 problems out of Mosteller's Fifty Challenging Problems i_ Probability (1962) to work on over a period of several weeks in the second half of the term. The main differences between the two courses were the method of presentation of the topics, the materials and texts that were used in the course, the sequence in which the topics were presented, and the requirements of working in small groups and keeping a log. The experimental group worked through a set of 9 in- class activities that were developed by the experimenter. These activities were carried out by small groups of fOur students each. There were six small groups in the exPeri- mental class. The members of the groups were interchanged after each activity so that every student had an opportunity to work in a group with every other student during the term. A copy of the revised versions of these activities that were used in the main study can be found in Appendix B. A detailed description of each of the activities is contained at the beginning of Chapter 3 in a discussion of the main study, and will not be repeated here. At the end of the quarter a posttest was administered to both groups to assess the degree to which the two groups relied upon the availability and representativeness heuristic after eXposure to a course in probability. A revised ver- sion of the posttest can be found in Appendix D of this study. The posttest was similar to the pretest, except that it was shorter and did not include many of the items 71 on simple probability concepts. The items dealing with availability and representativeness were the same as on the pretest. A method of scoring the responses to each question was devised. The subjects were asked to give reasons for their responses. Their responses and reasons were graded. Points Response 3 Correct reasoning and correct answer 2 Correct answer with a good start on the correct reasoning but reasoning was incomplete 1 Correct answer, but no reason supplied, or incorrect reason— ing 0 Incorrect answer or no response A total test score was calculate for each subject, and ranks were assigned on the basis of total test score. In addition, an availability score and a representativeness score were calculated for each subject. A low availability score indicated that a subject was employing the availability heuristic on those items that dealt with availability. Sim- ilarly, a low representativeness score indicated reliance upon the representativeness heuristic instead of application of correct probability principles. The Mann-Whitney U-test was used to analyze the data and compare the two groups on the total test score and on the two subscales. Siegal (1956) mentions that this non- parametric test is appropriate for use in experimental 72 designs when the experimenter does not wish to make the assumptions of a t-test. The only assumptions of the Mann- Whitney U-test are that the two groups are independent of each other and that the test scores represent a distribu- tion which has underlying continuity. The experimenter has no reason to believe that either of these assumptions was violated. The following hypotheses were tested using the Mann- Whitney U-test. 1. There is no significant difference between the two groups on total test scores. 2. There is no significant difference between the two groups on availability scores. 3. There is no significant difference between the two groups on representativeness scores. Cronbach's o-coefficient of reliability was calculated for the availability subscale and the representativeness subscale. The coefficient represents how well the scores from one administration of a test are indicative of the universe of scores. The a-coefficients for posttest scores were .70 on the representativeness subscale and .48 on the availability subscale. The a-coefficient for availability is low, but there are only four items on the availability subscale. Results on a four item test are more likely to influence by chance or guessing than a longer test. All three hypotheses were rejected at the .001 level of significance. Significant differences were found between 73 the exPerimental and control groups on total test score, and on both subscales. Comparisons of pretest and posttest responses were made separately on each item within the groups. Chi-squared statistics were calculated for each question comparing pre- test and posttest responses of each group. There was a tendency for both groups to improve on most items on the posttest. This tendency was much stronger in the experi- mental group than in the control group. Chi-squares between the two groups on each posttest item indicated that there was a tendency for the eXperimental group to do better than the control group on most items. The chi-square statistics were significant at the .01 level on 7 test items, and at the .05 level on one test item of the combined 11 items on the representativeness and availability scales. The fre- quency of correct responses was higher in the experimental group on each of these 8 items where significant differences occurred. The results of the pilot study encouraged the experi- menter to follow the pilot study with a main study in which revised versions of the activities and instruments were used. Items on the pretest and posttest were reworded and rese- quenced to avoid some confusion that arose during the admin- istration of the initial versions. Item to scale correlations were calculated for each item on each of the two subscales of the posttest and item- deleted reliability o-cofficient were calculated for each 74 item within each subscale. As a result, the question If heads has come up ten times in a row on a fair coin, and you could win $10 by guessing the result of the next toss, what would you guess? was deleted from the representativeness scale in the main study. This item had a negative or near zero correlation with every other item on the representativeness subscale. Furthermore, the o-coefficient for the representativeness subscale was higher, from .63 to .70, with this item deleted. Apparently there is more to this question than the expected representativeness of a 50-50 sample of heads and tails. While 14 of the 49 subjects said that tails should come up because "things should even out", 12 said that they would "stick with a winner“. The fairness of the coin bothered some subjects even though they were told the coin was fair. It is also possible that the context of a “gambling“ situ- ation complicates the question so that the decision made by the subject is not based solely on representativeness, but also upon superstition. As a result of teaching the experimental materials in the pilot study, each activity was completely rewritten and enlarged. Many extra questions and problems were added to the final version of the activities, which are in Appendix B. A set of notes to the instructor was developed for each activity to assist other instructors in identifying prob— able trouble spots in each activity, to suggest additional topics for inclusion during each activity, and to point out 75 places where brief mini-lectures might help to introduce new concepts or to summarize the results of an activity. A complete outline of the course and a day-by-day list of plans were written based upon the experience gained from teaching the pilot materials. This outline is contained in Appendix A. The results of the pilot study on the experimental course were encouraging as to the possibility of teaching a college course in introductory probability and statistics by an activity-based method with these materials. The ap- parent success of the experimental group in overcoming some reliance upon availability and representativeness has been noted above. The results should be interpreted cautiously, because the subjects within each group are certainly not independent. The Mann—Whitney test used the individual subjects as the unit of analysis. This non-parametric ana— log of a t-test does not require each Observation within a group to be independent. One can conclude on the basis of the results of the Mann-Whitney test that the control group and the experimental group scores on the posttest were not drawn from the same population, with only a .001 probability that this conclusion is incorrect. In addition to the results of the hypothesis testing, support for the success of the activity based course was obtained from student evaluation forms written up a the end of the course. (The form can be found in Appendix D). It is no overstatement to say that nearly every student in 76 the experimental course felt that it was the best mathe- matics course he/hhe had ever taken. The students expres- sed an initial hesitancy about working in groups, and about not being told "how to do it“, when working on problems. However, after several weeks of working in groups and having success in working things out for themselves, their initial fears subsided. Several evaluations indicated that "the instructor has not taught us anything, but has made it easy for us to teach ourselves". It should be pointed out that the learning was very highly guided by question sequences in the experimental activities. The students were indeed teaching each other, but in a very controlled context. As a result of the pilot study, two experimental sec- tions were taught the following term, and compared to two control sections of finite mathematics. The main study utilized the revised pretest and posttest, and the rewritten activities. The design and results of this study are pre— sented in the next three chapters. CHAPTER III A DESCRIPTION OF THE DESIGN OF THE STUDY In this chapter a detailed description of and rationale for the experimental activity-based course is presented. In addition, the course in finite mathematics which was taught to the control groups is discussed. The two courses are compared on the basis of content and teaching method. This chapter also includes a description of the subjects, the procedure of the study, a statement of the hypotheses tested, a section on the instruments used to test the hypotheses, and a section on the method of data analysis. The Experimental Course The experimental course was constructed by the author to provide an alternative approach to the course in finite Inathematics (Mathematics 110) at Michigan State University. TPhe eXperimental course covered much of the same content <>rdinarily treated in the finite mathematics course, such Eis probability, expected value, and simple game theory. Iiowever, some topics from elementary statistics were inte- syrated into the course, and the materials used and teaching Inethod employed were very different than those normally \Jtilized in the Mathematics 110 course. The primary purpose 77 78 of the experimental course, as described in chapter one, was to provide a learning environment and learning experi- ences which might enable college students to overcome their reliance upon the heuristic of availability and representa- tivenss when making estimates for the likelihood of events. In this section the material, the content, and the teaching method used in the experimental course will be discussed. The role of the instructor and the role of the students will also be presented. A series of nine activities in probability, combinatorics, game theory, expected value, and some elementary statistics svere developed by the experimenter. These activities formed 'the foundation for the content of the experimental course inn finite mathematics. Each activity is accompanied by a set of notes to the instructor. These notes contain sug— gestions to the instructor for his role during the activity. ITTIe notes indicate problems that are likely to come up as tlie students do the activities, suggest procedures for over- <2<>ming some of the trouble spots, and provide possible di- rections for pursuing the activities in more depth. A com- Plete set of the activities with notes to the instructor can be found in Appendix B. A brief description of each a(Itivity will be given here. The first three activities are concerned with simple Probability models, both uniform and non-uniform. Coins, taicks, and dice are thrown, the outcomes are recorded, and 79 experimental probabilities for events are calculated based upon relative frequency of the outcomes. Then the student is asked to make a theoretical model of the experiment. The model involves listing all the possible outcomes from the experiment and assigning probabilities to those outcomes. Comparisons are then made among the guesses for the likeli- hood of events made by that students prior to carrying out the experiments, the experimental probabilities, and the probabilities based upon the theoretical model. The the- oretical distribution is graphed against the experimental distribution, and comparisons between the two are made. The students are asked to list the assumptions and limita- tions of their experiments and the assumptions of their mathematical model. The data gathered from all three of the first activities is used in later activities and prob- lem sets. The fourth activity provides an introduction to counting Principles. Difficulties encountered in listing all the Outcomes from the coin activity (64 outcomes) and the dice a<=tivity (216 outcomes) are used to motivate the investiga- tion of a more systematic way to count the outcomes of an experiment. The concepts of permutation and combination are introduced via a sequence of spelling problems. Stu— dents are asked to list all the distinct "words" that can be spelled using all of the letters in G A K, E Z A K L, L 2 A K L, L Z A L L, and so forth, where any arrangement spells a word. Then they are asked to count the number of 80 "redundant" spellings of each word that can occur when listing words with repeated letters. The students are subsequently led via a sequence of questions to the model # arrangements # redundancies per word the number of distinct "words" that can be formed. . This model can be used to count The concept of a combination is isolated as a special case of these spelling problems, when there are only two distinct letters from which to choose. The words “combina— tion" and "permutation" are ppp ppgg at all, either in the activities or by the instructor, during the course. The sequential counting principle forms the basis for every ur activities took about four weeks to complete. The second half of the course concerns applications of Probability to game theory, expected value, and some simple illferential statistics. A set of notes on game theory and another on expected value were written by the author and distributed to the class. These notes can be found in APp‘endix B. Prior to a formal discussion of game theory, tlhe students were asked to play several two-person games 81 and to record the outcomes. In activity 5 they were asked to suggest what they thought were the best strategies for each of the players in each of these two-person games. 0f the games that were played some had "mixed" and some had "strictly determined" optimal theoretical strategies. After activity 5 was completed, the instructor gave a minilecture on two-person games in which optimal strategies for 2 x2 games were thoroughly discussed. Activity 6 introduces expected value. In this case, a lecture on expected value preceded the activity. A method <3f calculating the expected value for any 2 x2 two-person 53ame was presented. The method, called the method of odd- nnents, was from The Compleat Strategyst (Williams, 1954). bflany of the examples that appear in the notes on game theory and in activities 5 and 6 were based upon problems from William's book. Activities 7 and 8 involve the effects of sample size 11E>on measures of central tendency such as the median and Ineaan, and upon measures of variability such as standard de- 'VTiation. In activity 7, the students are asked to guess tile number of cards that should be turned over from the top (Di? a well-shuffled deck of cards in order to assure at least ‘3 .50% chance of getting an ace. The eXperiment is then carried out for samples of size 10, 20, and 100. The median nuHuber of cards necessary to get an ace is calculated for the samples of size 10, 20, and 100. Finally, the theoret- icnal number of cards necessary to assure a probability of 82 %-of getting an ace is calculated. Comparisons are made among guesses, experimental medians, and the theoretical number of cards. This activity was suggested by an example in Probability with Statistical Applications (Mosteller, Rourke, and Thomas, 1969). In activity 8, the students calculate means and standard deviations for samples of ran- dom two-digit numbers. Samples of size 5 and of size 25 are used to obtain data on the behavior of means and stan— dard deviations. The effects of sample size on the range of observed means and standard deviations is treated in activity 8 by means of a series of questions and problems. IData from the dice experiment is also used to generate :samples of various sizes in order to observe the behavior c>f means and standard deviations. The ninth activity is less structured than the previous exight. It presents the students with a challenge and allows tJIem to decide for themselves what the direction of the Eicztivity will be. The challenge is presented in the form of the statement: "Pulse rates go up when taken by a member of the opposite sex". The problem is to design and carry <3I1t an experiment which will test the truth of this state- ment. The challenge was suggested by Dr. William Fitzgerald of Michigan State University. The nine activities were supplemented by several texts alwd.by homework problem sets. The texts used in the course Were Spatistigs p2 Example: Exploring Data and Weighing 83 Chances (Mosteller et. al., 1973), Fifty Challenging Prob- lems ip Probability (Mosteller, 1962), and How 59 Lie with Statistics (Huff, 1954). Six of the chapters in the two Statistics py Example books were assigned as homework. Problems were selected from these chapters to be written up and handed in. The problem sets assigned from.§tatistics py Example included work on regression, counting circular arrangements, and estimating wildlife population by the capture-recapture method (Exploripg Data: Sets 4, 10, and 12 by Shulte, Cohen, and Chatterjee, respectively). The sets from Weighipg Chances dealt with random digits and :Simulation, the binomial distribution, and the chi-square Iprocedure (Sets 2, 4, and 6 by Carlson, Link and Brown, and Carlson) . Five problem sets constructed by the author were also assigned in the experimental course. The problems concerned Sampling with and without replacement, game theory, appli- uraged to co—operate with one another to solve problems 3&3 a group rather than individually, to share ideas with one another, and to help all the members of their group 85 understand the concepts in each activity. The groups were changed after every in-class activity so that everyone had a chance to work with everyone else during the course. Thus, the members of the group taught each other as they interacted while working on an activity in class. In order to facilitate the small-group work, the class was conducted in the mathematics laboratory at Michigan State university. The mathematics laboratory contains enough small tables and and chairs to accomodate six small groups of four students each. The instructor in the experimental course provided feedback to the groups and to the individual students re- lated to their progress on the activities and homework as- signments. During an activity, the instructor circulated among the groups, clarifying questions and assisting groups who had stalled on a particular problem Sometimes this assistance took the form of a series of questions put to the groups by the instructor. The questions were intended to take the group back to a concept which they already knew, and then, step-by-step, lead them up to the source of their original question. Thus, all hints on activities that were given by the instructor were of an indirect nature. The technique of answering a question with another question ‘was used to encourage the groups to work out the problems for themselves, and to keep the investigation on each activ- ity as open-ended as possible. The activity often contained questions or problems which had several possible solutions. 86 The responsibility for determining the direction in which a particular activity would go was left to each group. The instructor only intervened if a group was not working in a direction which was in accord.with the goals of the ac— tivity. Outside of class, the instructor's job consisted of reading all the activities and problem sets that were entered in the log by each student. The activities and problem sets were collected one at a time upon completion and graded by the instructor for completeness and correctness. If any- thing was not complete or correct, the difficulty was pointed out to the student and the student was asked to make the necessary revisions. Every assignment was circulated back and forth between instructor and student until it was com- plete and correct. The assignments, mentioned above, in- cluded 9 in-class acitivities, 6 problem sets from Statistics py Example, 5 problem sets on probability devised by the author, and 10 written critiques of the misuses of statistics. Students and instructors thus constantly exchanged informa- tion on the progress of each activity and each problem set. The role of the instructor in the experimental course, as outlined above, is somewhat like that of a judge, a diag- nostician, a devil's advocate and a critic. There are sev- eral instances where the notes to the instructor suggest that a short lecture be given on a topic, such as the sequen— tial counting principle, game theory, or expected value. 87 However, for the most part, the instructor's role is that of a resource person and an evaluator. A complete day by day outline of the activities and assignments for the experimental class can be found in Appendix A. A report on what happened in the groups during the conduct of each activity, and on some of the successes and difficulties encountered by the students on the problem assignments, will be presented in chapter 4, along with the results of the statistical analysis of this study. A Description of the Control Course. The text that was used for the course in finite mathe- matics (Mathematics 110) at Michigan State University is Finite Mathematics by Weiss and YOseloff (1975). This text was used in all sections of Mathematics 110 that were taught in the spring quarter of 1976 except for the two sections that received the eXperimental course. A complete outline and description of the topics that were covered in the course taught from Weiss and YOseloff can be found in Appendix C of this study. The course in finite mathematics began with counting principles and probability. The sequential counting principle; permutations; combinations; mathematical models of sample spaces; probability of a simple event; probability of unions, intersections, and compliments; uniform and non-uniform sample spaces; applications of combinatorial counting techniques to jprobability; the binomial distribution; expected value; and 88 conditional probability were all covered in the first four weeks of the term in mathematics 110 in the control classes. The next three to four weeks of the course were con- cerned with linear programing. Linear programing was first introduced in two dimensions from a geometric point of view. Solutions were obtained by testing extreme points of the intersection set of a system of linear inequalities. Fol— lowing the geometric introduction to linear programing, the simplex algorithm was taught in order to handle general linear programing problems in higher dimensions. Operations on matrices and Row transformations on matrices were discussed in order that they might be applied to the simplex algorithm. The course concluded with a two week unit on game theory. Two person two-by—two games were completely characterized. The simplex algorithm was used to obtain optimal strategies for nxn games. The method of teaching used in the control classes was by lecture to a class of approximately 30 students. The in- structor's primary role in the course was to prepare and deliver a daily lecture on material from weiss and Yoseloff's Finite Mathematics. Comparison of the Experimental and Control Courses The exPerimental course integrated some topics from elementary statistics into the study of probability. Mea- sures of central tendency, variability, the effects of sample Size on statistical parameter estimates, the chi-square 89 procedure, simulation of experiments by means of random numbers, and examples of the misuses of statistics were covered in the experimental sections. The control sections covered linear programing instead of the elementary statis- tics. Combinatorial counting techniques, simple probability models, expected value, and game theory were discussed at length in both courses. Table 3.1 indicates the order of the topics presented in each course and the approximate length of time spent on the topics in each class. TABLE 3.1 ORDER OF TOPICS IN THE EXPERIMENTAL AND CONTROL COURSES EXPERIMENTAL CONTROL CLASSES CLASSES (lst) Probability 4 1/2 (lst) Counting Techniques (2nd) Counting Techniques weeks (2nd) Probability (3rd) Game Theory - 2 weeks (3rd) Linear Programing and Matrices - 3 weeks (4th) Statistics - 3 weeks (4th) Game Theory — 2 weeks The materials used in the two courses were obviously different. The control classes used the text Finiteéggphgf matics (weiss and Yoseloff, 1975). The experimental classes .performed the nine activities, used notes and problem sets that were written by the experimenter, and used Statistics 9O py Example, How 59 Lie with Statistics, and Fifty Challenging Problems ip Probabilipy as texts and references. The most striking difference between the two courses was in the teaching method. In the control classes, the role of the instructor was primarily as lecturer. The lec- turer interprets and conveys large quantities of mathematical information and concepts. The role of the students in the control classes was primarily to receive and process the information conveyed by the instructor. The roles of in- structor and student in the control course are typical of most of the teaching that presently occurs in undergraduate mathematics courses at Michigan State University. In the experimental classes, the students assumed a much more active role and were responsible for teaching the material to each other. The instructor acted primarily as a guide, a counselor, and a resource person. In the con- trol classes, the instructor presented the formulas for counting techniques and developed models of sample spaces and simple probability experiments for the students. In the experimental class, the students isolated their own formulas and built their own probability models. The first attempts at these formulas or models were sometimes inade- quate. However, concepts in probability were constantly refined by each successive activity, and so the models could constantly be revised. The students in the experimental class were required to revise all written assignments until 91 they were complete and until they had satisfactorily re- solved all the problems. Hand-held calculators were used in the experimental classes to obtain immediate numerical values for complex probability problems and combinatorial expressions. The calculators were not used in the control classes as an integral part of the course. Rationale for the Experimental Course The experimental course was intended to help students become better intuitive statisticians. Specifically, it was hoped that the style of this course would help students to overcome their reliance upon the heuristics of availa- bility and representativeness. In order to reach this goal, small group work, experiments, guessing, model building, the use of hand-held calculators, and the role of the instructor as diagnostician were all incorporated into the experimental course. Each of these components was considered by the ex— perimenter to be essential to the process of replacing sub- jective probability intuitions with statistical probability models. Activities 1-3 were constructed to contend with the prob- lem of representativeness by confronting students with the inaccuracy of their own guesses for the likelihood of events, and subsequently having them build their own theoretical models of the coin, tack, and dice experiment. The problems on counting principles in activity 4 asked students first to 92 guess, then to list outcomes long-hand, and finally to iso- late counting principles, This slow introduction to count- ing.techniques was devised to develop the alternative of actually counting the outcomes instead of relying upon the heuristic of availability of get an estimate for the likli— hood of events. In situations where it was impossible to count all the instances of an event, the importance of ob- taining a large enough unbiased sample of the outcomes was emphasized. Activities 7 and 8, and problem sets from Statistics py Example on wildlife population and on simu— lation by means of random digits, all dealt specifically with sampling techniques and the effects of sample size on means and variability. It was hoped that considerable ex- posure to the effects of sample size might reduce the wide- spread belief in the "law of small numbers" (Tyersky and Kahneman, 1971) . The use of calculators was considered to be an essential component in the develOpment of probabilistic intuition. subjective probability estimates and empirical probability results from experiments could be instantaneously compared to theoretical probability values once the model for an experiment had been developed by the students. The results could be graphed, and students could begin to contend with the problem of why their estimates were ofter so far off from the real probability of an event. The constant feed- back on the accuracy of their guesses was intended to help 93 make the students more cautious, or perhaps even more accur- ate, when they estimated the likelihood of events. These sections have described and compared the experi— mental and control courses that were taught in the pring term of 1976 at Michigan State University, and which were used in this study. The next sections describe the design of the study itself. Subjects The subjects in this study were 85 undergraduate stu- dents who had enrolled in a course in finite mathematics in the spring quarter of 1976 at Michigan State University. A personal background form was filled out by the subjects at the beginning of the quarter to obtain information about major field, previous high school and college mathematics courses, and eXposure to probability and statistics prior to the course in finite mathematics. 80 of the subjects in the study, 48 men and 32 women, completed and handed in the form, which can be found in Appendix D of this study. There were 51 freshmen, 12 sophomores, 9 juniors, and 8 seniors among the subjects. Most of the subjects were business or accounting majors, or were majoring in some branch of agriculture, horticulture, or natural resources. The breakdown of the subjects accord- ing to major was as follows: business or some branch of business — 28, accounting - l7, agriculture or horticulture- 7, parks and natural resources - 7, animal husbandry — 4, 94 food systems — 2, communications - l. The remaining 14 subjects had not yet declared a major at the time of the study. In response to the question on high school mathematics courses, 5 subjects indicated that they had five high school mathematics courses, 13 had four courses, 37 had three courses, 20 had two courses, 4 had one course, and 1 sub- ject had no high school mathematics courses. The mean num- ber of high school courses taken by these subjects was 2.75. Most of the subjects indicated that they had taken one year of high school algebra (2 courses), and one—half year of geometry (1 course). The results of the question on previous college courses taken at Michigan State University showed that 78 subjects had successfully completed the prerequisite course in col- lege algebra, mathematics 108. Furthermore, a substantial number of the subjects took one or more remedial courses in high school algebra prior to attempting mathematics 108. It was found that 51 of the subjects took mathematics 082-104, ‘which is essentially a review of high school algebra II level :material. Of these 51, 14 also took mathematics 081-103, 'which introduces and reviews the material that corresponds to high school algebra I. The majority of the subjects involved in the study in- ciicated that they had had no previous course work that dealt VVith probability or statistics in any way. Only 21 subjects 95 mentioned that they had taken a course which involved any probability or statistics. Of these 21, 14 indicated that their previous experience was limited to one or two weeks in a high school business course or precalculus course, or to a very brief exposure in a college genetics course. Only 7 subjects among the 80 in the study had prior formal course work in probability or statistics. Procedure In the spring quarter of 1976, students registered into seven sections of finite mathematics, mathematics 110, at Michigan State University. Four of these sections were randomly selected for this study. The mathematics 110 course was offered at 1:50 p.m. and at 3:00 p.m. during the spring quarter. The four sections were randomly assigned to either the experimental activity-based course in elementary proba- 'bility and statistics, or the finite mathematics course ‘based upon the Weiss and YOseloff text (1975). One section of each treatment for each time slot was included in the study. The two sections of finite mathematics were desig- nated as control groups (C1 and C2), and the activity based sections were designated as exPerimental groups (El and E2). Information concerning the subjects in each group can be found in tables 3.2, 3.3, and 3.4. 96 TABLE 3.2 NUMBER AND SEX OF SUBJECTS WITHIN EACH GROUP Group MALE FEMALE N C1 14 12 26 C2 9 5 14 E1 11 9 20 E2 14 6 20 TOTAL 48 32 80 TABLE 3.3 CLASS LEVEL AND MAJOR.FIELD # Upper- Business Accounting Other* Group # Freshmen classmen Majors Majors Majors Cl 19 6 7 9 9 C2 9 5 3 3 8 E1 9 ll 8 4 8 E2 14 7 10 l 10 TOTAL 51 29 28 17 35 * Includes no preference for a major. 97 TABLE 3.4 PREVIOUS MATHEMATICS COURSE WORK Aver. # High School Math 081 Math 082 Previous GROUP Math Courses -103 -104 Prob. or Stat. C1 2.81 l 14 2 C2 3.08 2 6 1 E1 2.45 5 l4 2 E2 2.73 l 17 2 TOTAL 2.75 9 51 7 Tables 3.2, 3.3, and 3.4 above indicate that there is not much difference among the four groups with respect to the subjects within each group. The majority of the subjects in each group are freshmen, except for group El which has 9 freshmen and 11 upperclassmen. There was a predominance of business and accounting majors in all the groups. The average number of high school mathematics courses is about the same in all the groups except C2 in which it is slightly higher. Each group contained a substantial number of sub- jects who took the remedial mathematics course, mathematics 082—104 at Michigan State University. Only 6 of the 14 sub- jects in C2 took 082, but this group also had the highest average number of high school mathematics courses. At most two subjects in each group said that they had a course in Jprobability or statistics prior to the study. Two of the 98 seven subjects who had previous experience with probability were repeating mathematics 110. The sections labeled C1 and E were taught at 1:50 p.m., and those laleled C 2 2 and E1 were taught at 3:00 p.m. The classes met 5 days a week for a 50 minute period. Table 3.2 indicates that the sample sizes of the two control groups were 26 and 14 respectively. The two exPer- imental groups each had 20 subjects. The imbalance in the control group sample sizes resulted from the way students had pre—enrolled into sections at registration. The stu- dents had favored enrollment at the 1:50 period rather than the 3:00 o'clock period. The author had no control over the pre-registration process, as pre-registration section assignments were made by a computer. During the subsequent registration process, not enough students added mathematics 110 to fill the 3:00 o'clock sections. Four different instructors each taught one of the sec- tions. The experimenter taught section E1 of the study. Section E2, the other experimental section, was taught by Al Stickney, an instructor at Michigan State University. The two instructors of the control groups taught the topics from Weiss and YOseloff's Finite Mathematics that are out- lined in Appendix C. Robert Bentley, an instructor, taught section C1. John Novak, a graduate assistant, taught C2. The two instructors in the experimental group taught the activity-based course that is outlined in Appendices A and B. 99 Both these courses have been discussed in detail in previous sections of this chapter. On the first day of the term, the personal background forms described above were distributed in all four sections. The material and methodology Of the exPerimental course, including the course requirements, the necessity for access to hand calculators, and the log notebook containing all the assignments of the course were carefully explained to the experimental classes. The students who had enrolled in the two experimental sections had no idea prior to the first day of class that they would receive a different course. After the explanation of the experimental course, students were given the option of dropping the course and enrolling in a regular section of mathematics 110. One subject from each experimental group dropped the course. A pretest instrument, contained in Appendix D, was devised by the author and administered to the control classes on the first day, and to the experimental classes on the second day. The description of the course and requirements took up the entire first day in the experimental sections. The subjects were not told that they were involved in an experiment. They were only told by their instructor that he wished to gather information about their knowledge of some probability concepts prior to the course. The pretest instrument measured knowledge of some probability concepts, and the use of the availability and representativeness heu- Iristic prior to formal course work in probability. A 100 detailed description of the pretest is presented in the next section of this chapter. During the last week of the quarter, the groups were tested for knowledge of certain probability concepts and for their reliance upon the heuristics of availability and representativeness after formal course work in probability. The posttest, presented in Appendix D, contained the same questions on availability and representativeness as the pretest. The questions on probability concepts were limited on the posttest to those which were concerned with complex situations. Some very simple questions on the pretest that dealt with relative frequency and uniform probability models, and with elementary counting problems, were deleted from the posttest. The results of the pilot study had indicated that nearly every subject got the simple questions correct, so these questions were not included because they were not yielding any information. A description of the posttest can also be found in the next section of this chapter. There were 80 subjects in both the pretest and the post- test samples. Of these 80 subjects, 75 took both the pre- test and the posttest measures, 39 in the two eXperimental groups and 36 in the two control groups. The control and experimental samples each had 40 subjects on both measures. .Measures The pretest and posttest instruments used in this study twere constructed by the experimenter. Copies of the 101 instruments are in Appendix D. Three subscales, a proba- bility concepts subscale (P), an availability subscale (A), and a representativeness subscale (R) were contained in each of the two instruments. The items on the tests were compiled from several questions used by Kahneman and Tversky (1972, 1973), several items from the instrument used in the National Assessment of Educational Progress, and items con- structed by the experimenter. The items used in the availability scale (A) and the representativeness scale (R) were the same on both the pre- test and the posttest. The availability scale consisted of four items, questions #2a, 2b, 16, and 19, on the pretest and questions #7, 8, 9, and 10 on the posttest. The ques- tions were labeled A -A l 4 constructed by Kahneman and Tversky (1973) and A4 was con- respectively. A1--A3 were structed by the author. There were six items on the representativeness subscale. They were questions #l7ii, l7iii, 17iv, 17v, 18, and 13 on the pretest and questions #111, liii, liv, 1v, 3 and 4 on the posttest. They are labeled R1--R6 respectively. R2, R and R6 were constructed by Kahneman and Tversky (1972) 3) while R1, R4, and R.S were constructed by the author. The probability scale (P) contained 12 items on the pretest and 5 items on the posttest. The pretest contained some questions about elementary and counting concepts. These items are labeled P6-P14, and are, respectively, 102 pretest questions #3, 4a, 4b,5, 6, 7, 9, 10, 12, and 16. These items were included on the pretest to find out how much knowledge of simple probability, sample space, and counting principles existed among the subjects prior to formal course work in probability. There were three probability items that appeared on both the pretest and posttest. They were pretest #4c, 4d, and 8 and posttest #12a, 12b, and 14. These items were -P labeled P 3 respectively. The posttest also included 1 two items on probability that did not appear on the pretest. These were posttest questions #11 and 13, labeled P4 and P respectively. 5 There were several items on each of the instruments which were not part of any subscale. Two questions on count- ing the number of paths in a grid with two or three rows were included on each instrument. They are labeled N3 and N4 (N for "not on a scale“). These items were included to see if the subjects understood the definition of a path prior to answering to questions on availability. A path in a grid of symbols was defined as a sequence of line segments intersecting one and only one symbol in each row of the grid. The definition was explained orally by the instructor of each group before the administration of each of the test instruments. The instructor drew several grids on a chalk- board and emphasized that paths could "zig-zag" a great deal. .A question on estimating the number of paths in a 6 x6 grid (N ) was on the pretest. 5 103 Both tests also contained off—scale items on the gambler's fallacy (N2) and Birthday Problem (N1). The subjects were asked to give a reason for their answers to every item on the availability and representa- tiveness subscales, and for their answers to the more com— plicated probability questions. Each item received a score of O, 1, 2, or 3 based upon the response and the reason given for the response. A "O" was the worst and a "3" the best possible score on a particular item. The method for assigning points to each item is discussed in chapter four in the analysis of the results of this study. Hypotheses The following hypotheses were tested in this study: 1. There is no significant difference on the total test score (T) between the groups taught the experimental activity-based course (E1 and E2) and the groups taught the lecture-based course in finite mathe- matics (C1 and C2) on either the pretest or on the posttest. 2. There is no significant difference on the probability concepts subscale (P) between the groups taught the experimental activity- based course (E1 and E2) and the groups taught the lecture-based course in finite mathematics (C1 and C2) on either the pretest or on the posttest. 3. There is no significant difference on the availability subscale (A) between the group taught the experimental activity-based course (E1 and E2) and the groups taught the lecture-based course in finite mathe- matics (Cl and C2) on either the pretest or on the posttest. 104 4. There is no significant difference on the representativeness subscale (R) between the groups taught the experimental activity- based course (E1 and E2) and the groups taught the lecture-based course in finite mathematics (Cl and C2) on either the pretest or on the posttest. Method of Analysis The pretest data was analyzed by using t-tests on the four scale means of the pooled experimental groups (EllJEz) and pooled control groups (ClLJCZ) with the individual subjects as the unit of analysis. The assumptions of the t-test model are that the two populations are normally and independently distributed with equal variances. The author had no reason to suspect that the assumption of independence of the individual subjects was violated ppipp to the course in finite mathematics. Therefore, the individual subjects were used as the unit of analysis on pretest data. There was also no reason to suspect violation of the assumptions of normality and equal variances. Thus, the four hypotheses were tested using t-tests with a level of rejection set at a = .05. The posttest data was analyzed by using t-tests on the scale means with the class sections as the unit of analysis. The class sections were the largest independent units after the courses had been taught. There was no reason to suspect violation of the assump- tions of normality, independence, and equal variances with 105 the class section as the unit of analysis. Therefore, the four hypotheses were tested using t-tests with a level of rejection set at a = .05. Differences between the two groups in mean gain scores on the four scales were also compared using t-tests with the class section as the unit of analysis. The hypotheses were tested at the .05 level. A reliability coefficient for the posttest was calcu- lated from posttest scores using Cronbach's d-coefficient. Descriptive statistics for each item on the pre- and post— test instruments were compiled and compared. The analysis of the raw data was performed by a CDC- 6500 computer at Michigan State University using SPSS (Statistical Package for the Social Sciences, Nie et. al., 1975) packaged programs for the data analysis and output. The results of the analysis of the data are reported in section two of chapter four. Summary In the spring quarter of 1976, four classes of finite mathematics were randomly selected and randomly assigned to one of two courses in finite mathematics. TWO classes were assigned to an experimental activity-based course and two to a lecture—based course in finite mathematics. There were a total of 85 subjects involved in the study. The groups were pretested and posttested for knowledge of probability concepts and for reliance upon the heuristics of availability 106 and representativeness in estimating the likelihood of events. The pretest and posttest instruments contained items devised by the author and items used by Kahneman and Tversky (1972, 1973). This chapter has discussed the nature of the experi- mental course and the control course in detail. Differences and similarities between the courses in both content and teaching methodology have been illucidated. A description of the subjects, the procedure of the experiment, the hypoth- eses tested, and the method of data analysis has been set forth. In the next chapter, descriptive and statistical re- sults of the study are reported. CHAPTER IV ANALYSIS OF THE RESULTS OF THE STUDY Introduction The analysis of the results is presented in two parts. Part one contains a report on the eXperimental activity- based course. The second part presents the results of the statistical analysis of the pretest and posttest measures. The results of the hypotheses testing and descriptive statistics on the four classes can be found in part two. Part I: Repprt on the Experimental Course The day-by—day occurences within experimental group El were recorded in a log kept by the experimenter. This section reports on the observations of the experimenter made during experimental course. The discussion is pre- sented in three parts: the activities, student critiques of some misuses of statistics, and the course evaluation forms filled out by both experimental groups. The Activities Activity 1. Activity one begins by asking the groups to guess the probability of getting various numbers of heads 107 108 in tossing 6 coins. The groups performed the experiment 48 times, recording the number of heads. The students were slow to get started on this first activity. They spent a long time reading the activity, and appeared hesitant to begin working on the experiment. After approximately 15 minutes the groups began tossing coins and recording the outcomes. Experimental probabilities for the outcomes 6 heads, 5 heads,...1 head, 0 heads were calculated from the data using the relative frequency model. The groups had difficulty setting up the mathematical model for the experiment because they could not agree among themselves how to list the outcomes. Some students felt that only the number of heads Should determine "an outcome". For these students there were seven outcomes, from 0 heads up to 6 heads. Others felt that the result of each sepa- rate flip changed the outcome. For this latter group of students, the position of the heads among the six coins changed the outcome. The first three coins might be heads, or the second, third, and fifth coin heads. The issue was debated in the small groups. The instructor suggested to the groups that they might consider assigning probabilities to the number of heads using each of their two approaches to the model. The first approach to the model was abandoned, for it assigned a prob— ability of 1/7 to each of the outcomes. The experimental data had indicated that it was improbable that the outcomes "6 heads" and "3 heads" were equally likely to occur. In 109 fact, in 192 tosses, the outcome 6 heads occurred only once in the pooled small-group experimental data. The second approach to the model was adapted. It soon became apparent to the students that there were a large number of outcomes to list. The first attempts to list the outcomes as sequences of six heads and tails failed because the groups had not yet developed a systematic way of enum- erating the outcomes. Gradually, a systematic approach to listing the outcomes developed in each group. The groups discovered that if they held the values of some of the coins fixed while changing the others, the list of outcomes be- came much more manageable. When the 64 outcomes in the model had been listed, theoretical probabilities for the number of heads were cal- culated. Many students were surprised that their guesses for the probabilities were so far off. Over half the stu- dents had guessed that the probabilitiy that three heads would occur was at least 1/2. The probability of three heads based on their mathematical model was only 20/64. They were also surprised that the probability of 6 heads was so small. Only a few of the students had estimated the probability of 6 heads to be below 10%. In their written reports on activity one almost all the students mistook the assumptions of the exPeriment for the assumptions of the model. The assumptions of the ex- periment were that the coins were fair and that there was uniformity in the tossing procedure in the groups. These 110 assumptions were also listed as assumptions of the mathe- matical model. Only two students in 20 wrote that the model assumed that all 64 outcomes were equally likely to occur, and that the coins were independent of each other. The students did not have a clear conception of what a mathematical model was during the first activity. It was not until much later, after the first four or five activi- ties, that the necessity for determining the assumptions of the model itself and the limitations placed by those assumptions became apparent to the students in the experi— mental group. Activity 2. In the second activity, a model for toss- ing three tacks, listing the outcomes, and assigning prob- abilities was developed. The groups first had to find an estimate for the probability P(U) that a tack lands point up. The range of values for P(U) Obtained by 20 students each tossing a tack 72 times was from .48 to .76. As a re- sult of the wide range of outcomes for P(U), a discussion arose concerning the factors that may have affected the outcomes — the way the tack was dropped, the height it was dropped from, and the surface on which it landed. The class decided to rerun the experiment on estimating P(U) and to attempt to control for as many nuisance variables as possible. Some of the students stood a textbook on end and pushed a tack, sitting point upright, off the top edge of the book. Thus the groups could control for height, uniformity in dropping procedures, and landing surface. The range for 111 P(U) on the rerun experiment from the top of the book was from .52 to .71, with a cluster of values around .60. Other students controlled for nuisance variables by pushing tacks off the table top onto the floor. The values for P(U) from the higher distance off the table were mostly between .5 and .6. Values for P(U) were determined by averaging the results for each procedure. It was decided that P(U) was about 2/3 if the tack was pushed off the book onto the table, and about .55 if the tack was pushed off the table onto the floor. In any case, there was agreement among the subjects that the outcomes point up, U, and point down, D (on its side), were not equally likely to occur when based upon the experiment. However, when the subjects constructed a mathematical mpdgl for the experiment, the 8 outcomes for tossing three tacks were each assigned a probability of 1/8. This over- whelming tendency to see the tack model as a uniform proba- bility model persisted even with evidence from the second part of the experiment in which three tacks were tossed and the outcomes were recorded as ordered triples. The data indicated that the outcomes were pp; equally likely to occur, since UUU, UDU, and UUD, occurred much more often than DDD, DDU, or DUD. The subjects tended to in- dicate in their logs that there was probably something wrong with the tacks. "Theoretically, U and D should be equally likely, even though experimentally they were not", was written in several logs. The feeling among the students 112 that every probability model was really a uniform model was difficult to overcome. Manifestations of this belief persisted throughout the experimental course. Even in game theory in activities 5 and 6 the subjects tended to say that in a two-choice two-person game, each choice should be played 50% of the time, regardless of the payoffs. The instructor assisted the groups in discovering a model for the non—uniform case by means of questions. Instructor: "Suppose that three tacks were tossed on the table 1200 times. YOu have decided that P(U) = 2/3. In how many of those 1200 tosses would you expect to find the first tack land upright?" Student: "800, because that's 2/3 of 1200." Instructor: "Now, of those 800, in how many would you expect to see the second tack land down?" Student: "2/3 of the 800." In this manner, the model of multiplying probabilities for independent outcomes was slowly elicited from the groups. The written responses to activity 2 were cycled back and forth between the instructor and most of the students sev- eral times before all the errors in applying uniform prob- ability model properties to the non—uniform tack situation were cleared up. The theories for P(U) = 2/3 and P(U) = .55 for table and floor respectively were tested by the chi-square procedure later in the course. The observed frequencies for the 8 outcomes from tossing three tacks were tested 113 for goodness of fit with each theory, depending on whether the groups had tossed the tacks on the floor or on the table. Results tended to support P(U) = 2/3 for the table, and to contradict P(U) = .55 for the floor. Activity 3. This activity on modeling the outcomes for tossing three dice was similar to the coin and tack experiments. The difficulties with "equally likely vs. unequally likely outcomes" or “the best way to model the outcomes" (as ordered triples or as the sum of the three faces) that appeared in activities one and two were not as predominant in activity three. The subjects were not happy about having to list 216 outcomes in order to find theoretical probabilities. However, many of them discovered patterns during the process of making the list, or noticed the symmetry of the frequency distribution. This simplified the job of listing the outcomes. At the conclusion of this tedious listing process, the subjects were demanding count- ing principles that would help them list the outcomes for an exPeriment. Activity 4. The fourth activity was constructed to lead subjects to discover several counting principles. The instructor gave a brief talk on counting at the start of the activity. The students were led to a point where they could state the sequential counting principle in their own words. 114 The outcomes for the first few problems on activity 4 were listed longhand. Very gradually students began to see that the sequential counting principle would help them list the outcomes for the number of words that could be spelled from the letters L Z A K E. It was 5 x4rx3>x2 x l = 120, using each letter once. It took the groups a long time to discover what to do when some of the letters occur- red more than once. If the letters were L Z A K L, or L Z A L L, the first conjecture made in each group was that only 1/2 (respectively 1/3) of the 120 possibilities would actually be distinct. The instructor encouraged the sub— jects to list the outcomes. When only 20 outcomes for words from L Z A L L could be found, the search for an alter- native approach was begun. The groups each discovered that although the first L could account for three redundancies per word, the second L still accounted for two more re- dundancies per word. Thus they first divided the total number of 120 arrangements of five letters by 3, and then reduced the remaining 40 by oneAhalf. ‘With the help of examples, this long process eventually produced the formula nI/KK1:K2:""KP:)’ where 11 is the total number of letters in the word and Ki is the number of repetitions of the 1th letter. The subjects were elated when they discovered this formula. The classroom was filled with triumphant smiles. 115 The concept of a combination was isolated as a special case of this formula, with only two distinct letters to choose from The first step made by the group towards a general solution for "combinations“ consisted of writing (5 x4'x3) + (# redundancies), for problems such as picking the number of groups of three runners who could finish in the top three slots in a field of 5 runners. This gradu- ally changed to (5 x4-x3)/3!, and finally to 53/13! x22). The process of counting the number of groups of x people that could be chosen from a group of y people was seen to be equivalent to the process of counting the number of distinct words that could be spelled with x 0'3 and (y-x) N's, where C stands for "chosen" and N for "not chosen“. The equivalence among the two-letter spelling problem, the # subsets of size x from a set of size y, the binomial coefficient, and Pascal's triangle was pointed out by the instructor later in the course. Activity four provided a more efficient and systematic approach to counting the outcomes for flipping 6 coins (activity one) or tossing three dice (activity three). The students in the experimental course were quite pleased after they had worked through activity four. They indicated that they felt they had learned a great deal even though at times it had been very frustrating fOr them. Activity 5. In this activity three games were played by pairs of students in order to provide an introduction to 116 two-person game theory. In the first game one or two fingers were thrown by the players. Player one received payoffs of $10 or $30 when different numbers of fingers were shown. Player two received $20 whenever there was a match. After playing the game 20 times, results indicated that there was a tendency for player two, the matcher, so win. The subjects generally attributed the success of the winner of the game to his ability to "psyche out" the other player and guess what the other player would show. No one indicated that they thought the game was rigged. The in— structor suggested that the game be simulated to make it difficult to pick up a pattern in the opponent's choices. Every pair of subjects simulated an equally likely game, making their choices on a 50:50 schedule with coins. It did not occur to the subjects that perhaps a 50:50 schedule was not in the best interests of both players. In fact, if player-one does play a 50:50 schedule and player two, "the huckster", catches on, player two has an eXpected value of $5.00 if he always plays the finger on which the worst he can do is to lose $10. The advantages of carefully alternating among the choices became more apparent to the subjects when they played game 2. This 4'x4 game had black and red cards as entries in the payoff matrix. There were so many more choices in this game than in the first game that the students began to develOp strategies for playing the game. Rows or columns with too 117 many of the opponents entries were disdained or altogether avoided. There was a tendency to pick the "safer" rows which had two cards of each color. High payoff cards like 9's and 10's that were imbedded in a row that otherwise con— tained all opponent's cards were only occasionally gambled upon. The beginnings of naive "mixed strategies“ were used by the students in this game. The last game was a 4-x4 game that contained a saddle point. Pairs of students decided upon a strategy in this game that resulted in choosing the saddle point in 5 of the 8 pairs. Each of the other three pairs picked one of the two co—ordinates of the saddle point. This activity was conducted prior to any formal in- struction on game theory or strategies for two-person games. At the end of the activity the subjects were already dis- playing some intuition for both "mixed" strategies and "pure” strategies in their choices. Activity 6. This activity on expected value consisted primarily of working out solutions to problems and games in order to calculate the long—run payoff. A lecture on ex— pected value concerning the method of calculating the payoff fOr a 2 x2 game was given by the instructor. The optimal strategy for playing a 2 x2 game was simulated with coins and played 25 times. The subjects were surprised at how close the mean payoff for 25 plays came to the theoretical payoff calculated from the optimal mixed strategy. The 118 2 x2 game had a theoretical payoff of 4.5, while the range of 10 means for 25 plays of the game with coins was 4.2- 4.7. Another surprise in this activity occurred in the Carnival game that has two dice in a cage. The students were asked to guess which of the outcomes had the highest expected value, and then to calculate the expected values of all the outcomes. Bets paid even money on 8 (or 6), two-to-one on 9 (or 5), four-to-one on 10 (or 4), six—to- one on 11 (or 3), and ten-to-one on 12 (or 2). The house won everything on 7. Most of the students felt that 8 (or 6) was the best bet because it had the greatest probability of occurring. It turned out that 10 (or 4) was the bet that minimized the gambler's losses on this game. Activity 7. This was the first of two activities on the effects of sample size upon the variation of measures of central tendency and variability. The subjects guessed the number of cards that they would have to turn over to have at least a 50% chance of getting at least one ace. Guesses were somewhat high, mostly from 12-15 cards. Only one guess of 26 cards was made. The guesses indicated that subjects were much more aware of the deceptive nature of the probability of disjunctive events than they had been at the beginning of the course. A pretest question had asked for an estimate of the number of people needed so that there would be a 50% chance that at least two people had the same birthday. 62 out of 80 subjects responded 119 that it would take 183 people or more. 20 subjects re— sponded that it would take exactly 183 (see the next sec- tion.) The tendency to use 50% as a respresentative multiplier of the total population had almost disappeared in the experimental group E1 by the time activity 7 was done. Experimental data on the number of cards necessary to obtain an ace was gathered for sample sizes of 10, 20, and 100. The median was used by the subjects as an estimate for the number of cards necessary to have at least a 50% chance of getting an ace. The medians for sample size 10 ranged from 4 to 13. The medians for sample size 20 ranged from 4 to 9. The median for sample size 100 was 7. The true value was calculated to be 9. The experimenter has reason to believe that several of the really low median estimates for 10 trials were the result of very poor shuffling. The subjects did not take time to carefully reshuffle the cards between each trial. It surprised some subjects that the medians for sample size of 20 did not narrow down to the theoretical value better than the observed 4 to 9 range. It is likely that poor shuffling was partly responsible for this range of medians for sample sizes of 20. Mosteller (1961) reports means ranging between 9.25 and 11.75 for five samples of 20 trials. Mosteller was counting the card on which the first ace ap- peared, so his theoretical value was 10. A machine did the 120 shuffling in Mosteller's eXperiment. The subjects dis— cussed how bias could be introduced into a sample by im- proper or careless sampling techniques. Activity 8. Means and standard deviations for sets of two-digit numbers of various sample sizes were calculated in this activity. The samples of size 5 yielded means from 34 to 71, while samples of size 25 had means from 43 to 52. The standard deviations calculated for samples of size 5 or 25 indicated a similar “narrowing“ of the range of ob- servations in the larger samples. The subjects concluded that measures of central tendency and variability are rather unstable for small samples, and may not be very accurate indicators of the true population parameters. Activity 9. The students were presented with a chal- lenge which was much less structured than the first eight activities. The problem was to design and carry out an experiment to test the truth of the statement "Pulse rates go up when taken by a member of the opposite sex". The design of the experiment was set up by the experimental class during an open class discussion. This activity was different from the previous ones in that it was not handled in the small group format. The activity began with a brainstorming session. Sug- gestions for parts of the design and the experiment were given while the instructor acted as a secretary and recorded all suggestions on a chalkboard. Experimental class E1 121 decided to carry out the experiment on themselves. Pulses were taken on the temple or neck in order to maximize the chance of raising the pulse rates. Each person in the class took his (her) own pulse first. The pulse-by-self outcome was used as a basis for comparison with pulse rates found by members of the same sex or members of the opposite sex. Each "subject" had his (her) pulse taken by two members of the class of the same sex and by two members of the oppo- site sex. The pulses were recorded for a 30 second interval and a 60 second interval on each of the five trials. The sample in the class contained 11 males and 9 females. The data was organized into 2 x2 contingency tables of the forms # up # not up # uP * not up A) B) mal 5 same e sex opp. females sex Form A was set up for the two cases where members of the same sex or members of the opposite sex took the pulse. Form B was set up for the two cases where males or females took the pulse rates. Decisions concerning the design of the exPeriment or the experimental procedure were all made by the students themselves. After the data had been collected and organized, the instructor made up a series of questions to help the 122 students analyze the data. The question sheet can be found in Appendix B following the notes to the instructor on ac- tivity nine. All contingency tables were set up for both the 30 second data and the 60 second data. Chi-square statistics were calculated for all contingency tables to test the in— dependence of males vs. females or same-sex vs. opposite- sex with reSpect to raising the pulse rates. The calculations were carried out for both the 30 second and 60 second data. No significant differences were found on any of the contin— gency tables. Several students tested the 30 second data against the 60 second data and wrote the results in their logs. No differences were found between the pulse rates for the two time intervals. The last question on the work sheet for activity nine asked the student to write up a critique of their experiment. This activity was performed at the end of the course after the students had already analyzed many articles in newspapers and magazines for misuses of statistics. Most of their logs included the following sorts of suggestions and criticisms of the pulse-rate exPeriment. 1. We all knew each other an that may have biased the results. It would have been interesting to have done this activity at the beginning of the course to see if there were any differences. 2. It would be better to have one fixed person of each sex take everyone's pulse. This should be done by a very handsome man and a very beautiful woman and they each should be an expert at taking 123 pulses. we are not very good at taking pulses and this may have biased the results. 3. Knowing who is taking your pulse might affect the pulse rate. Thus the design of this ex- periment might not really help to answer the original question. The subjects should be blindfolded so that any bias that might occur from knowing the pulse-taker could be control- led. The subjects in the experimental class realized that their experimental design and procedure admitted many sources of bias due to uncontrolled nuisance variables. These observations on activities 1-9 were made in experimental class E Conversations between the exPeri- 1' menter and the instructor of eXperimental class (E2) indi- cated that similar things happened in the other experimental class on all of the activities. The design of the experiment for pulse rates was exactly the same in E2’ except that the 30 second data was not gathered. Misuses of Statistics Each student in the two experimental classes was re- quired to read How 39 Lie with Statistics (Huff, 1954), and to write at least ten short critiques of misuses of statis— tics that they found. Articles were taken from newspapers and analyzed for correct or incorrect uses of statistics. Examples of misleading graphs, inflated percentages, biased samples, insufficient sample size, and verbose quantitative descriptions with little or no foundation in fact were fer- reted out by the students. 124 Many examples of the use of statistics to mislead the consumer were found in advertisements. "Nine out of ten trucks made by the company since 1972 are still on the road". Students wrote: "Did this company sell the same number trucks every year? Were most of their trucks made and sold in the last two years?" There were many criticisms of reports on products that claimed "a statis- tical test showed that product A was better than ...". The students pointed out: "329 says that this is better? If a statistical test was performed, why not state and pub- lish the results? What is meant by the word 'better', better in What way?" Examples of graphs that were conveniently chopped off to make a certain point, or graphs that were unclearly la- beled with sliding scales on either the ordinate or the abscissa, or even graphs that had no labels or units of measure at all, were included in every student's log. Many of the misleading graphs were found in prominant weekly periodical magazines, such as $132 or Newsweek. Misuses of percentages were mentioned by practically every student. Many times percentages were used to mask small sample sizes. The percentage of cost increase or of profit was found to be conveniently inflated or deflated by merely Changing the denominator units. One student reported that a recent tuition hike at his university was proported to be a 13% increase when in fact it was a 22% increase. 125 The university calculated the percentage of increase by I/(C-kI) where I was the increase cost per credit hour and C was the cost before the tuition hike. The student calculated the increase by I/C. Several examples of statistics that were based upon a biased sample were analyzed. One student found an article that appeared on page one of the largest selling newspaper in a major city. The article mentioned that 93% of the people that were interviewed were against cross district busing to achieve racial integration in the schools. A continuation of the article on page 14 mentioned that 70% of the people that were interviewed did 295 have children in the schools. Another student criticized the charts that are published by bookstores and record stores that include the "top ten" books or records for the week. These figures are often based upon sales to a particular class of people, with special interests. Yet the list appear to claim juris- diction over all age groups, interests, and races. Several students found articles which misused the word “average" in one way or another. A recent sports article in a major newspaper for a large metropolitan area claimed that the average salary of a player in the National Basketball Association was $110,000. 0ne dubious student gathered data on all the N.B.A. players' 1975-76 salaries. He concluded that the average salary for a starting player in the N.B.A. was around $90,000 - $110,000 depending on the team. 126 However, most of the players receive much lower salaries. Salaries of second and third string players were only around $20,000 - $30,000. In another case of misuse of averages, a report on suicide in the Sunday magazine section of this same news- paper was found to be inaccurate. The report claimed that the suicide rate in the United States had gone up 10% in the last few years. A student who had access to an almanac with all the population statistics for the past 10 years reported that the suicide rate had gone Qggg 2% over the last few years. The magazine article did not take the pop— ulation growth of the country into account in their calcu— lations. These are only a few of the misuses of statistics found by the students in the experimental classes. By the time the pulse-rate activity (activity nine) was carried out these students had been sensitized to misuses of statistics, and they were able to pick out nearly every flaw in the design and procedure of the pulse eXperiment that they had set up. Experimental Course Evaluation Forms At the end of the experimental course the students in the experimental sections were asked to respond to a ques- tionnaire. The students were asked to comment on working in groups, the activities, the log that they kept, the texts, what they liked about the course and what they disliked. The questions used to gather this information can be found 127 at the end of Appendix D. The responses generally were in the form of a letter to the instructor. The students had the option of signing their responses, or of keeping their identity secret by not signing them or by typing their responses. There was general agreement among the students that the log kept by each student was essential to the course. It provided a study guide, a reference book, and a tremen- dous sense of accomplishment for the subjects in the exper- imental course. It was also agreed by everyone that working in groups was an excellent way to learn mathematics. Inter— action and cooperation in solving mathematics problems was a new experience for these students. Their comments on the evaluation forms indicated that they thoroughly enjoyed working in small groups. Several students did mention that a few of their group members had a tendency to rely on other people's work and to not contribute much. However, most of the students were very active and cooperated well in the groups. The activities performed in class were called "relevant to everyday life“. Several evaluations mentioned that the activities really helped to "prove" the theory that was being learned in the course. Reactions to Statistics by Example (Mosteller et. al., 1973) texts were mixed. Some students felt that these texts were very helpful and had applications to their major field 128 of study. other students felt that the books were hard to read, confusing, and, at times, poorly written. The students had a great deal of difficulty understanding the chapters on the chi-square procedure and on the binomial distribution in the weighing Chances volume. 0n the other hand, ngntg Lie with Statistics (Huff, 1954) was generally considered to be a highpoint in the course. Most of the students indicated that Huff's book and their own critiques of misuses of statistics had made them much more aware of numbers, and of the deceptive manip- ulations that could be performed with numbers in order to publish slanted statistical information. Overall attitude towards the experimental course was very positive. Almost every evaluation indicated that the students had thoroughly enjoyed the class. Several students wrote that they were “amazed to think they had enjoyed a mathematics class". Initial frustration at not having the answer or the rules or the "formulas" provided for them by the instructor had disappeared for most of the students by the end of the course. 129 Part II: Analysis of the Statistical Results Introduction The statistical analysis is presented in three sections. Section one contains the results of the hypothesis testing and compares the experimental and control groups on pretest, posttest, and mean gain scores. Section two contains an analysis of each individual item on the pretest and post- test. The third section contains scale—to-scale item-to— item, and item-to-scale correlation matrices. Comparisons Between the Experimental and Control Groups on the Four Scales In this section the results of the hypothesis testing on pretest, posttest, and mean gain scores are reported. Comparisons between the experimental and control groups were made on the total test score and on the subscales (probability, availability, and representativeness) using t-tests with a = .05 as the level for hypothesis rejec- tion. The results of the pretest comparisons are reported in tables 4.1 and 4.1A - 4.1D. The results of the posttest comparisons are reported in tables 4.2 and 4.2A - 4.2D. The results of the comparisons in mean gain scores on the availability and representativeness subscales are reported in tables 4.3, 4.3A, and 4.3B. 130 Notation In the analysis of results, Cl and C2 stand for the two control groups and El and E stand for the 2 experimental groups. The total test score is designated by TOTAL. The three subscales are designated by PROB. (probability), AVAIL. (availability), and REP. (represent- ativeness). Reliability Cronbach's coefficient-a was calculated for the post— test total score. The reliability coefficient-a was found to be .70. The coefficient estimates the percentage of the variance in the test scores that is due to non-error variance (Cronbach, 1970). 131 TABLE 4.1 SCALE MEANS AND STANDARD DEVIATIONS FOR THE FOUR GROUPS ON THE PRETEST SCALE (Points TOTAL PROB. AVAIL. REP. on (81) (42) (12) (18) Scale) GROUP 01 i = 32.62 i = 22.70 x = 1.81 x = 4.27 N = 26 s = 9.31 s = 3.96 = 1.44 s = 4.23 02 i = 34.64 2 = 22.86 2 = 2.43 2,: 2.71 N = 14 s = 9.23 s = 5.67 s = 1.55 s = 2.55 E1 2 = 33.50 i = 22.15 i = 1.75 i = 3.95 N = 20 s = 9.24 s = 4.43 s = 1.52 s = 4.37 32 x = 37.20 x = 23.30 i = 1.85 x = 5.30 N = 20 s = 8.18 s = 3.33 s = 1.35 s = 4.45 c1 ()02 x = 33.33 x = 22.75 5': = 2.02 x = 3.72 N = 40 s = 9.21 s = 4.56 s = 1.49 s = 3.77 EILJEZ x = 35.25 x = 22.73 i = 1.80 i = 4.62 N = 40 s = 8.81 s = 3.90 s = 1.41 s = 4.41 GRAND x = 34.24 x = 22.74 2 = 1.91 i = 4.17 N = 80 s = 9.02 s = 4.22 s = 1.45 s = 4.10 132 TABLE 4.1 A t-TEST RESULTS FOR PRETEST SCALE TOTAL GROUP Mean S.D. t-value d. f. Sig. ClLJC2 33.32 9.22 1.00 78 .318 El U E2 35. 35 8. 18 t is not significant TABLE 4. l B t-TEST RESULTS FOR PRETEST SCALE PROBABILITY GROUP Mean S.D. ) t-value d. f. Sig. ClLJC2 22.75 4.56 -.03 78 .98 El LJE2 22.72 3.91 t is not significant TABLE 4.1 C t-TEST RESULTS FOR PRETEST SCALE AVAILABILITY GROUP Mean S.D. t-value d.f. Sig. ClLJC2 2.02 1.49 -.69 78 .49 El U E2 1.80 1.42 t is not significant 133 TABLE 4.1 D t-TEST RESULTS FOR PRETEST SCALE REPRESENTATIVENESS GROUP Mean S.D. t-value d.f. Sig. ClLJC2 3.72 3.77 .98 78 .33 ElLJE2 4.62 4.41 t is not significant The results of t-tests on the pretest data (TABLES 4.1 A — 4.1 D) indicated that there were no significant differences between the experimental groups and the control groups on the total test score or on any of the three sub- scales prior to the course in finite mathematics. 134 TABLE 4.2 SCALE MEANS AND STANDARD DEVIATIONS FOR THE FOUR GROUPS ON THE POSTTEST SCALE (Points TOTAL* PROB.* AVAIL. REP. on (57) (15) (12) (18) Scale) GROUP 01 i = 19.70 x = 9.04 x = 2.15 x = 5.96 N = 26 s = 8.24 s = 3.65 s = 1.91 s = 3.48 02 i = 30.57 i = 12.86 x = 3.50 x = 7.57 N = 14 s = 7.64 s = 2.48 s = 2.28 s = 4.34 El 2 = 34.60 x = 12.65 2 = 4.30 2 = 10.50 N = 20 s = 6.98 s = 2.85 s = 1.83 = 4.66 E2 2 = 37.35 x = 11.85 i = 4.05 2 = 12.10 N = 20 s = 10.13 s = 3.34 s = 2.72 s = 4.05 01(J02 x = 23.50 i = 10.37 x = 2.62 x = 6.52 N = 40 s = 9.52 s = 3.74 s = 2.29 s = 3.83 ElLJEz 2.: 35.98 x = 12.25 i = 4.17 2.: 11.30 N = 40 s = 8.70 s = 3.09 s = 2.30 s = 4.39 GRAND i = 29.74 x = 11.31 2,: 3.40 x = 8.91 N = 80 = 11.02 s = 3.54 = 2.33 s = 4.74 *This scale was shorter on the posttest than on the pretest. 135 TABLE 4.2 A t-TEST RESULTS FOR POSTTEST SCALE TOTAL GROUP Mean S.D. t—value d.f. Sig. CIUC2 25.13 7.70 1.93 2 .19 ElLJE2 35.98 1.94 t is not significant TABLE 4.2 B t-TEST RESULTS FOR POSTTEST SCALE PROBABILITY GROUP Mean S.D. t-value d.f. Sig. ClUC2 10.95 2.70 .67 2 .57 EluE2 12.25 .57 t is not significant TABLE 4.2 C t—TEST RESULTS FOR POSTTEST SCALE AVAILABILITY GROUP Mean S.D. t—value d.f. Sig. CltJC2 2.83 .95 1.97 2 .19 EluE2 4.17 .18 t is not significant 136 TABLE 4.2 D t-TEST RESULTS FOR POSTTEST SCALE REPRESENTATIVENESS GROUP Mean S.D. t—value d.f. Sig. Cluc2 6.77 1.13 3.99 2 .05 t is significant EluE2 11.30 1.13 P(t 2_3.99) < .05 The results of the t-tests on the posttest data (TABLES 4.2 A - 4.2 D) indicated that there was a significant dif— ference between the eXperimental and control groups on the representativeness scale in the direction of the experimental groups. There was a tendency for the experimental groups to achieve higher scores on the total test scale and on the availability scale, although the difference was not signifi- cant. There was no difference between the experimental and control groups on the posttest probability scale. Next, it was of interest to compare the mean gain scores of the experimental and control groups on the availability and representativeness scales. The results are reported below in tables 4.3, 4.3 A, and 4.3 B. 137 TABLE 4.3 MEAN GAIN SCORES ON THE AVAILABILITY AND REPRESENTATIVENESS SCALES GROUP AVAILABILITY REPRESENTATIVENESS - ...- * = - —- = Cl (X2 X1) .34 (x2 X1) 1.69 C2 (X2 ~Xl) = 1.07 (X2 -X1) = 4.86 El (x2 -x1) = 2.55 (x2 -x1) = 6.55 * ‘2 is the posttest group mean on the scale. i1 is the pretest group mean on the scale. TABLE 4.3 A t-TEST RESULTS ON PRE-POST GAIN SCORES ON THE AVAILABILITY SCALE GROUP Mean S.D. t—value d.f. Sig. Cluc2 .71 .52 8.35 2 .01 t is significant EluE2 2.38 .25 P(t 2_8.35) < .01 138 TABLE 4.3 B t-TEST RESULTS ON PRE-POST GAIN SCORES ON THE REPRESENTATIVENESS SCALE GROUP Mean S.D. t—value d.f. Sig. ClLJC2 3.28 2.24 4.30 2 .03 t is significant EIUEZ 6.68 .18 P(t Z 4.30) < .03 Significant differences were found on the mean gain scores between the eXperimental and control groups on both the availability subscale and the representativeness sub- scale. The differences indicated significantly higher pre- test to posttest mean gain scores by the experimental groups. 139 Individual Item Statistics Descriptive statistics are reported for each item on the pretest and posttest in this section. The statistics include: the method of assigning points to each item; a distribution of the responses on each item for each class section; means and standard deviations on each item for each class section; and the grand mean and standard devia- tion for the entire sample of subjects for each item. The section first presents the representativeness sub- scale items (Rl-R6). The availability subscale items Ecllow (Al-A4). Next the probability items that were on the post- test are presented (Pl-P5). There are several items which did not appear on any scale (N1-N5) which are discussed. These off-scale items are interspersed among related on- scale items. Finally, elementary probability items that were included on the pretest but not incorporated into the posttest are analyzed. These probability items are labeled ..P . 6 14 Each table containing a distribution of responses on P an item has the method of assigning 3, 2, 1, or 0 points on that item in the column headings. All items that appeared on both the pretest and the posttest measures are reported in "double" tables with the pretest results at the top and the posttest results at the bottom. There were 26 subjects in group C1, 14 in C2, 20 in E1, and 20 in E2. When the total of the row frequencies for a group do not add up to the proper number on an item, it is because several subjects 140 in that group left the item blank. A brief discussion of the results follows the statistical tables on each item. (R1) Which of the following sequences is more likely to occur for having two children? a) B G b) G G c) about the same chance (note: coins were used on the pretest versions of R1, R2, R3, and R4) TABLE 4.4 A GROUP RESULTS ON ITEM Rl Pretest 3—same with 1-same GROUP correct reason no reason O—B G O-G G Mean S'D Cl 10 l 14 O 1.27 1.51 C2 5 O 8 O 1.07 1.49 E1 8 1 10 O 1.10 1.45 E2 11 3 5 1 1.65 1.42 TOTAL 34 5 37 l 1.29 1.46 Posttest Cl 15 O 11 O 1.85 1.49 C2 8 O 6 O 2.36 1.28 E1 16 O 4 0 2.40 1.23 E2 19 O l O 2.85 .67 TOTAL 58 O 22 O 2.32 1.26 141 TABLE 4.4 B t-TEST ON POSTTEST ITEM R1 GROUP Mean S.D. t-value d.f. Sig. ClLJCZ 2.10 .36 1.54 2 .26 EluE2 2.63 .32 t is not significant This item was constructed by the experimenter to test the use of the representativeness heuristic for a short sequence of events. The subjects who chose B G said that the outcome was more likely to occur "because B and G have an equal chance of happening and so the odds favor one of each". These subjects evidently felt that the outcome B G was representative of the 50:50 odds and so would be more likely to occur than the outcome G G. There is evi- dence (TABLE 4.4 A) that these college subjects used the representativeness heuristic to estimate the likelihood of this short sequence of events on the pretest. There was a tendency for the exPerimental groups to do better on the posttest on this item, although tht t-test (TABLE 4.4 B) was not statistically significant. There was improvement in all four groups on this item from pretest to posttest. 142 (R2) Which of the following sequences if more likely to occur for having six children? a) BGGBGB b) c) about the same chance BBBBGB TABLE 4.5 A GROUP RESULTS ON ITEM R 2 3-same 1-same Pretest with . but O-a) 0-b) MBan S D GROUP correct incorrect B G G B G B B B B B G B ' ' reason reason Cl 7 l 16 O .85 1.35 C2 1 1 11 O .29 .83 El 4 3 10 2 .75 1.21 32 5 2 13 O .85 1.31 TOTAL 17 7 50 2 .72 1.22 Posttest Cl 10 l 15 O 1.19 1.47 C2 7 O 7 O 1.50 1.55 E1 10 2 7 1 1.60 1.47 E2 14 O 5 l 2.10 1.41 TOTAL 41 3 34 2 1.57 1.48 143 TABLE 4.5 B t-TEST ON POSTTEST ITEM R 2 GROUP Mean S.D. t—value d.f. Sig. CILJC2 1.34 .21 1.72 2 .23 EluE2 1.85 .35 t is not significant Those subjects who chose B G G B G B indicated that B B B B G B was less likely to occur because of the long string of B's. 50 subjects responded in this manner on the pretest. This result supports the hypothesis of Kahneman and Tversky (1972) that combinatorially naive subjects be- lieve that B B B B G B is not representative of the dis- tribution of boys and girls. Reliance upon representativeness appeared to be less on the posttest. Half the subjects cor- rectly said that both sequences were equally likely to occur. The subjects who got the item correct on the posttest either mentioned that each successive child was independent of pre— vious children, or calculated the probability of each of the sequences to be 1/64. There was a tendency for the experi- mental groups to score higher on this item, although the t-test was not significant. 144 m4.H HG.H H ma am no N4 qm£ How H5000 0» hamxwa mnoe ma mocmsqmm QUHSR Ammv 145 TABLE 4.6 B t-TEST ON POSTTEST ITEM R 3 GROUP Mean S.D. t-value d.f. Sig. Cluc2 1.29 .30 2.17 2 .16 ElLJE2 2.00 .35 t is not significant Kahneman and Tversky (1972) found that a majority of their college subjects chose B G G B G B to be more likely because B B B G G G was not "random" enough to be repre- sentative of the process of having children. The format for their question differed from the multiple choice item in this study. Kahneman and Tversky asked their subjects to estimate the percentage of families in which the six children in the family were born in one of the two orders. In item R3 the subjects are given the option of picking "the same chance". Results different from those of Kahneman and TVersky were obtained. Many more subjects chose "the same chance" than Chose B G G B G B. H0wever, an analysis of the reasons given for the responses to this item indicated that the subjects were relying upon the representativeness heuristic. About 1/2 of those subjects who chose “the same chance" on the pretest did so because “there are the same number of H's and T's" in both sequences. About 1/3 of the "same chance" responders on the posttest also reasoned in this manner. Subjects evidently tended to feel that each of these sequences was equally representative of the 50:50 distribution of Boys and Girls among children. 146 Gains in all four groups were made on this item from pretest to posttest. There was a strong tendency for the experimental groups to do better on this item on the post- test, although the t-test was not significant. (R ) What is the probability that in six children there 4 will be three boys and three girls? TABLE 4.7 A GROUP RESULTS ON ITEM R 4 Pretest 3-22:64 1-1/64 Olizz O-other Mean S D GROUP 30-33% 50% responses ° ° C1 0 O 12 9 .OO .00 C2 0 l 11 1 .07 .27 E1 1 O 11 6 .15 .67 E2 1 1 12 2 .15 .49 TOTAL 2 2 46 18 .08 .43 Posttest C1 0 3 9 14 ll .33 C2 0 O 10 4 O7 .27 E1 6 9 3 2 1.25 1.07 E2 2 10 6 2 .80 .90 TOTAL 8 22 28 22 .56 .87 147 TABLE 4.7 B t-TEST ON POSTTEST ITEM R4 GROUP Mean S.D. t-value d.f. Sig. ClUC2 .09 .03 4.12 2 .05 t is significant EluE2 1.02 .32 P(t 2_4.12) < .05 The pretest results on this item indicate that a sub- stantial number of subjects felt that the probability of 3 boys and 3 girls was 1/2 (see TABLE 4. A). The outcome appears to be representative of the distribution of boys and girls, one-half of each. Several subjects even claimed that this outcome "had to happen" or'Whad 100% chance of occurring". On the posttest many experimental subjects switched their answer from 1/2 to either 1/64 or 20/64, while the control subjects stayed with 1/2. There was a significant difference between the eXperimental and control groups on this item (see TABLE 4.7 B). The notation "6C3" or "6P3" was used by many subjects in the control groups. This notation was usually put down with a "?" along with a guess for the probability, often a guess of 1/2. The control sub- jects were evidently not sure how to use the formula to calculate the probability. 148 (R5) Which is more likely to occur? a) Pulling one red ball from a jar containing 10 red balls and 90 white balls. b) Pulling four red balls in a row from a jar con- taining 50 red balls and 50 white balls. TABLE 4.8 A GROUP RESULTS ON ITEM R 5 Pretest 3-1/10 > 2-1/9 > _ O-b) GROUP 1/16 1/16 1 guessed 4 red Mean S.D. Cl 3 6 3 13 .84 1.00 02 O 1 3 10 .36 .63 El 1 4 2 12 .65 .99 E2 4 l 4 10 .90 1.21 TOTAL 8 12 12 45 .72 1.01 Posttest Cl 1 5 8 ll .80 .90 C2 1 3 4 6 85 94 E1 7 4 3 6 1.60 l 27 E2 10 5 3 2 2.15 1.04 TOTAL l9 17 18 25 1.35 1.17 TABLE 4.8 B t-TEST ON POSTTEST ITEM R5 GROUP Mean S.D. t-value d. f. Sig . clLJC2 .83 .03 3.78 2 .06 t is significant EIUEZ 1.87 .39 P(t 2. 3.78) < .06 149 There was a strong tendency on the pretest for the subjects to extend the probability of getting a red ball in one pull from the 50:50 distribution to all four pulls. Subjects felt that the chance of getting four red in a row was 1/2, and generally were unaware of the multiplication principle for independent events during the pretest. The probability of getting one red ball on one pull from the 50:50 distribution was apparently considered by the sub- jects to be "representative“ of the probability of getting several red balls in a row. The experimental groups improved substantially on this item on the posttest. There was a significant difference between the experimental and control groups (see TABLE 4.8 B). 150 (R ) The chance that a baby is born a boy is about 1/2. Over the course of an entire year, would there be more days when at least 60% of the babies born were boys in: a) a large hospital b) a small hOSpital c) makes no difference TABLE 4.9 A GROUP RESULTS ON ITEM R 6 Pretest GROUP 3-Small O-Large O-Same Mean S.D. C1 3 6 17 .34 C2 3 3 8 64 1.28 E1 4 4 12 .60 1.23 E2 7 2 11 1.05 1.47 TOTAL 17 15 48 .64 1.23 Posttest Cl 8 2 15 .92 1.41 C2 6 l 7 1.29 1.54 E1 14 1 5 2.10 1.41 E2 13 0 7 1.95 1.47 TOTAL 41 4 35 1.53 1.51 151 TABLE 4.9 B t-TEST ON POSTTEST ITEM R6 GROUP Mean S.D. t-value d.f. Sig. Clue2 1.10 .26 4.69 2 .04 t is significant EluE2 2.02 .10 P(t 2_4.69) < .04 Kahneman and Tversky claim that subjects tend to be— lieve that both hospitals should have birth rates that are equally representative of the population proportion of boys to girls. Thus, sample size should make no difference. The pretest results support this claim, as 48 subjects picked "makes no difference". Reliance upon the representativeness heuristic dimin- ished substantially on the posttest, particularly in the experimental groups. There was a significant difference (P(t 2_4.69) < .04 between the experimental groups and the control groups on this item. The subjects from the experimental groups who chose "the small hospital" wrote that there was "more chance for a higher proportion of boys in a smaller hospital because the sample size was smaller“. The following item, N1, was asked in two different versions. The pretest version is presented first, followed by the posttest version. 152 (NI-pretest) A man bets you one dollar that at least two pe0ple at a party you are attending have the same birthday. How many people would have to be at the party so that the man has at least a 50% chance of winning the bet? TABLE 4.10 A GROUP RESULTS ON PRETEST ITEM N 1 3-20 2-15 GROUP to 30 to 19 0-183 0—365 0-730 O—other Mean S.D. C1 4 3 4 2 3 10 .65 1.16 C2 0 O 5 3 1 5 .00 .00 E1 0 3 5 3 5 4 .20 .62 E2 2 l 6 5 3 3 .30 .92 TOTAL 6 7 20 13 12 22 .34 .89 Many subjects used "50%" as an indicator. They either halved or doubled the number of days in a year. Five subjects gave estimates in the thousands. In general, the subjects gave very inaccurate estimates. They did not seem to be aware of the pidgeon-hole principle. Many of them gave estimates for which there were more people than there are days in a year. 153 (Nl—posttest) People at a Carnival pick one number from 1 to 100. If two people match, they win a prize. HOW many people would have to be playing the game in order that there be at least a 50% chance that there would be win- ners? Give your best estimate. TABLE 4.10 B GROUP RESULTS ON POSTTEST ITEM N 1 3—9 2-17 1-21 GROUP to 16 to 20 to 30 0-50 0-100 0-200 Other Mean S.D. C1 1 l 1 7 4 3 9 .23 .71 C2 2 O 5 4 1 2 O .79 1.05 El 6 2 7 3 O O 2 1.45 1.19 E2 14 l 5 O O O O 2.45 .89 TOTAL 23 4 18 14 5 5 11 1.19 1.27 TABLE 4.10 C t-TEST ON POSTTEST ITEM N4 GROUP Mean S.D. t-value d.f. Sig. Cluc2 .50 .39 2.52 2 .13 EluE2 1.95 .71 t is not significant There was a strong tendency for the experimental groups to estimate the number of people necessary (13) more accu- rately than the control groups. There was little change in the responses of the subjects in the control groups from pretest to posttest. 154 (N2) A fair coin is flipped and comes up heads 10 times. in a row. If you could win $10 on a $1 bet by guess- ing the next toss, what would you guess and why? TABLE 4.11 A GROUP RESULTS ON ITEM N2 2-hedged l-heads . Péggggt 3-no diff. on go with ggtzais Mean S.D no diff. winner Cl 6 1 6 12 .96 1.25 C2 2 O 3 9 .64 1.08 El 0 1 2 17 .20 .52 E2 2 l 4 13 .60 .99 TOTAL 10 3 15 51 .62 1.04 Posttest Cl 2 1 8 15 .61 .90 C2 2 1 6 5 1.00 95 El 1 5 4 10 .85 1.27 E2 7 6 O 7 1.65 1.04 TOTAL 12 13 18 37 1.00 1.17 TABLE 4.11 B t-TEST ON POSTTEST ITEM N2 GROUP Mean S.D. t-value d.f. Sig. Cluc2 .80 .27 1.00 2 .42 EIUE2 1.25 .57 t is not significant 155 On the pretest a majority of the subjects fell for the gambler's fallacy. Their reasons for picking tails on the next toss included: “the odds are 1/2, but £25 for 11 times in a row"; "the probability of a head diminishes each time"; "the law of averages has been broken"; and "tails is due!". There were not as many subjects on the posttest who fell for the gambler's fallacy. Experimental class E2 scored better than the other three classes on this item. There were no significant differences on the posttest between the experi- mental and control groups on this item. Control group C1 appears to have regressed slightly from pretest to posttest on this item. There may be interference on this item caused by learning to multiply the probability of successive inde— pendent events. Subjects may be focusing on the entire sequence of tosses as if it were an event that had not yet occurred rather than considering the single independent toss. 156 (N3) How many paths are in this grid? (pretest X 0 X X (posttest X 0 X X 0 version) X X 0 version) X X O.X TABLE 4.12 A GROUP RESULTS ON ITEM N3 Pretest 3-12 2-11 1-10 GROUP paths paths paths O-other Mean S‘D C1 10 1 3 12 1.35 1.41 C2 12 O O 2 2.57 1.08 El 15 l O 4 2.35 1.23 E2 18 O O 2 2.70 .92 TOTAL 55 2 3 20 2.15 1.31 Posttest 3—20 2-19 1-28 GROUP paths paths paths O-other Mean S'D C1 10 1 O 15 1.23 1.48 C2 11 O l 2 2.42 1.16 E1 17 O 1 2 2.60 .99 E2 15 l O 4 2.35 1.23 TOTAL 53 2 2 23 2.06 1.36 TABLE 4.12 B t-TEST ON POSTTEST ITEM N3 GROUP Mean 8 . D . t-va lue . f . Sig . CllJC2 1.83 .85 1.05 2 .40 EllJEZ 2.48 .18 t is not significant 157 There was little change in the results of this item from pretest to posttest even though all classes had covered the sequential counting principle. Control group Cl did not do as well as the other three groups on this item. Most of the responses that were in the "other“ column listed 8 paths on the pretest and 11 on the posttest. These responses may have occurred because some subjects forgot to count the diagonal paths that skip-over one or more columns. The definition of a path was carefully explained in all sections prior to both tests. Examples of the different possibilities for the paths were drawn for the subjects and were visable during the tests. There was no significant difference be- tween experimental groups and control groups on item N3. 158 (N4) How many paths are in this grid? (pretest X X X. (posttest X,X X version) X 0 version) X.O X 0 O X X 0 XiX TABLE 4.13 A GROUP RESULTS ON ITEM N4 Pretest 3—24 2-22 1-28 GROUP paths pa ths pa ths °'° ther Mean 5' D ° Cl 5 O 2 19 .65 1.20 C2 11 O O 3 2.36 1.28 El 12 O 2 6 1.90 1.41 E2 14 3 O 3 2.40 1.10 TOTAL 42 3 4 31 1.70 1.43 Posttest 3-24 2-22 1-26 GROUP paths paths paths °'°ther “can 5 ° D ' Cl 7 O 2 17 .88 1.33 C2 11 O l 2 2.42 1.16 El 16 0 O 4 2.25 1.33 E2 17 O O 3 2.55 1.10 TOTAL 51 O 3 26 1.91 1.42 TABLE 4.13 B t—TEST ON POSTTEST ITEM 4 GROUP Mean S. D. t-value d. f. Sig. Cluc2 1.66 1.09 .95 2 .44 EllJEZ 2.40 .21 t is not significant 159 In accordance with item N3, there were only slight gains made by the groups on the posttest version of item N The C1 control group did not do as well on this item 4. as the other three groups. It was more difficult to "draw“ all the paths in item N than in item N3. The results on 4 N and N on the posttest were nearly identical. Those 1 2 who got both items correct on the posttest used the sequen- tial counting principle. The t-test found no significant difference between the experimental groups and the control groups on this item. 160 (Al) Consider the grids below. GridA XXXXXXXX GridB XXXXXXXX XXXXXXXX Are there: a) more paths in A b) more paths in B c) about the same number in A and B xxxxxxxxx xxxxxxxxx 161 4m.H Om.H a mN AH N mN A4908 m4.H mN.H A o N 0 AH Nm ON.H ON.H O O m m N Am 4m.H Om.H o m N N m NO mA.A ma. 6 NH 4 N 4 HO ammuumoa HH.H mm. m mm 4 m NH H4808 mm. 04. H OH O H N Na mN.H ma. N NA H H 4 Am mN.H ma. H m N O m NO mO.A O4. 4 NH A H m H0 . . m aA 4 cA usmAu . m u N macaw Q m 582 OAOEIO OAOEIO pmmmmsmla mm v mm N m m ummumum Hump cA NHmnN HG zmeH zo maHpmmm mpomo d 3...» mafia. 162 TABLE 4.14 B t-TEST ON POSTTEST ITEM A 1 GROUP MEAN S.D. t-value d.f. Sig. ClLJC2 1.11 .55 1.58 2 .25 ElLJE2 1.73 .04 t is not significant According to Kahneman and Tversky (1973), subjects should favor grid A over grid B because there appear to be more paths "available" in grid A. This was indeed the case on the pretest where 53 of the 80 subjects favored grid A. The reasons given for the choice of grid A included ”there are more X's in grid A" and "it is easier to draw a path in grid A". The pretest results support Kahneman and Tversky's contention that the availability heuristic is employed by subjects who are unable to count all the out- comes. There was still a tendency to favor grid A on the post- test even though the subjects had been exposed to counting techniques. However, the tendency was much slighter as only 29 subjects chose grid A. There was a tendency for the experimental groups to do better on this item. 19 of the 40 subjects in the exPeri- mental groups calculated 512 paths in each grid, while only 9 of the 40 control subjects successfully computed the re- sult. This tendency was born out by the results of the t- test (TABLE 4.14B) although the test did not find any significant difference. 163 Consider the grid below. Which type of path is more likely to occur? (A X X 0 X X X X'X O X 0 X XZX X X X X X.O X 0 X.X X a) a path that hits 4X. and l() b) a path that hits 5X (the pretest version used a 6)<6 grid. See next item, N3) 164 mm. mm. Nm 0N NH m A H4808 m8. 00. OH m H m 0 NH 44. 04. a m m N O Hm N0. A8. 4 4 m 0 A No 84. 08. 8 NH 4 O 0 H0 ummuumom 4m. No. 0N 0H Nm N O H4808 Am. m4. 0H 4 o 0 0 NH Am. mm. 8 N m 0 0 Am Am. mm. m H m 0 0 N0 4m. 4m. 4 m a N 0 HO m\¢ m.0 mo macaw mmxu . . mo mmmcm>Au NHHAAHmHAmam 120340 4040 mo 4 49040 a m 2402 xmlo (wucmmmummu mo mmswumn so pumum pmumAsUku ummumum 10A 004 x4uA AoA 044 x4uA 0000 «IN sAuumunouum N < mH.d mam¢fi ¢ EHBH ZO MBQDWHM mbomw 165 TABLE 4.15 B t-TEST ON POSTTEST ITEM A 2 GROUP Mean S.D. t-value d.f. Sig. ClUC2 .7O .02 -9.40 2 .01 t is significant ElLJE2 .60 .OO P(t g -9.40) < .01 The results on this item contradict the results of Kahneman and Tversky (1973) who fOund that a majority of their college subjects claimed that there were more paths that hit all X's. A majority of the subjects in this study chose paths that hit 5X and 10 (in a 6 x6 grid) on the pretest. The two kinds of paths were about equally favored on the posttest. Kahneman and Tversky claim that subjects favor paths that hit all X's because there are "more X's available". An analysis of the reasons given for the responses to this item showed that availability was'being used by the subjects, but not in the manner claimed by Kahneman and Tversky. 32 subjects on the pretest chose 4X and l<) to be more likely because "there was an 0 available". 16 subjects on the pretest chose 4X and 1() because "the probability of getting an X at each level is 4/ ". The former reason- ing indicates the use of the availability heuristic, while the latter reasoning suggests that subjects felt 4X. and 1C) was representative of the probability of getting an X 166 in a single row. On the posttest similar reasons were given for the choice 4X and 1(3. However, only 13 were indicative of "availability of O” on the posttest while 26 suggested the “representativeness" of 4/5. It appears that the reasons given for the response 4X and l() tended to switch from depending upon availability on the pretest to depending upon representativeness on the posttest. Those subjects (26 on pretest and 32 on posttest) who chose "5X's" did so because there were more X's, as Kahneman and Tversky claimed. The subjects who made this choice did seem to rely upon availability. In summary, there was no clear cut case for favoring either of the two responses "4X and l()" or "5X". Fur- thermore, it was possible that a mixture of both availability and representativeness were used in the responses to this item. There were more instances of the use of availability than of representativeness among the reSponses. Table 4.15 A indicates that practically no one answered this item correctly by applying counting techniques. The significance of the t-value simply indicates that more con- trol subjects scored a l for choosing "4X and 1(3". The answer is correct, but the reasons given for it were based upon availability or representativeness, and not upon count- ing principles. The significance on this item is an artifact of the scoring procedure. 167 (N5) Give your best estimate for the number of paths in the grid below. X.X O X X X X X X 0 X.X O X X X X.X X.X X.X O X X 0 X X.X.X X X X.X X.O TABLE 4.16 GROUP RESULTS ON PRETEST ITEM N5 6 5 4 O-way off GROUP 3—6 2-6 1-6 often 3% Mean S.D. C1 3 1 2 20 .53 1.07 C2 5 O O 9 1.07 1.49 E1 5 3 l 11 1.10 1.33 E2 4 l 2 13 .80 1.20 TOTAL 17 5 5 53 .84 1.25 Most of the incorrect responses to this item said there were 36 possible paths. used a multiplication principle on items N3 and N Many subjects who had successfully in which there were only 2 or 3 rows in the grid failed to gen- eralize the multiplication to this 61x6 grid. what surprising that 17 subjects answered NS and yet 39 one could use similar counting techniques on items A2 item was not included on the posttest. It was some- successfully, to estimate the number of paths that hit SX's. This 168 (A3) A man must select committees from a group of 10 people. WOuld there be: a) more distinct possible committees of 8 b) more distinct possible committees of 2 c) about the same number of committees of 8 as committees of 2 TABLE 4.17 A GROUP RESULTS ON ITEM A3 3-the l-the Pretest same O-Comm. O-Comm. GROUP correct same of 8 of 2 Mean S'D° a guess reason C1 1 2 3 19 .19 .63 C2 1. 5 l 6 .57 .85 El 0 4 2 12 .50 .81 E2 1 5 l 10 .40 .75 TOTAL 3 l6 7 47 .31 .67 Posttest Cl 6 O 9 11 .69 1.29 C2 4 l 3 S .93 1.38 El 7 3 4 6 1.35 1.42 E2 5 2 4 8 .85 1.30 TOTAL 22 6 20 3O .93 1.34 169 TABLE 4.17 B t-TEST ON POSTTEST ITEM A 3 GROUP Mean S.D. t-value d.f. Sig. Cluc2 .81 .17 1.05 2 .40 ElLJE2 1.10 .35 t is not significant Only 3 of the 80 subjects correctly calculated 45 com- mittees for each type of committee on the pretest, and these 3 had taken a probability course prior to the experiment. There was an overwhelming tendency on the pretest for the subjects to choose "committees of 2". Remnants of this ten- dency persisted on the posttest. According to Kahneman and Tversky (1973), subjects rely upon availability on this item and tend to believe there are more committees of 2 since instances of committees of two are easier to construct. The results in TABLE 4.17 A strongly support this contention of Kahneman and Tversky. Quite a few subjects had learned how to calculate the number of committees by the time the posttest was adminis- tered. It is somewhat surprising that there were not more than 22 subjects who calculated that 213% = 51%;.- . The subjects were generally unaware of the fact that choosing a committee of 8 was the same as choosing a "non-committee" of 2. There was not a significant difference between the two groups on item A3. 170 (A4) A jar contains 9 red balls, 4 blue balls, and 3 green balls. Which would be more likely to occur? a) Pulling at least one green ball in two tries (with replacement) b) Pulling two red balls in a row (with replacement) (Note: The distribution of balls on the pretest was 8 red, 4 blue, and 3 green, and “at least one blue" was compared to "two red“.) TABLE 4.18 A GROUP RESULTS ON ITEM A4 3-a) with 2-noted Pretest . . l-a) O—b) correct disjunc- Mean S.D. GROUP calc. tion guessed 2 reds C1 0 O 8 16 .35 .56 C2 0 O 6 8 .50 .52 E1 0 O 8 ll .40 .50 E2 0 O 11 7 .55 .51 TOTAL 0 O 33 42 .44 .52 Posttest Cl 0 O l 21 .03 .20 C2 0 O 5 8 .36 .50 E1 2 l 7 ll .70 .98 E2 2 3 6 9 .85 1.04 TOTAL 4 4 19 49 46 .81 171 TABLE 4.18 B t-TEST ON POSTTEST ITEM A 4 GROUP Mean S.D. t—value d.f. Sig. CllJC2 .20 .22 3.28 2 .08 t is significant ElUE2 .78 .1O P(t 2 3.28) < .08) It was not expected that the subjects would be able to successfully answer this problem prior to a course in probability. The subjects favored the outcome "2 reds“ on the pretest. Reasons given for this on the pretest in- cluded "two pulls makes it twice as likely" and "there are more reds". On the posttest after a course in probability the subjects still favored "2 red" claiming that 9/16 > 3/16. There were more red balls available, and the subjects appear- ed to have relied heavily upon the availability heuristic in their responses to this item. 8 subjects in the experimental groups recognized the disjunctive nature of the event "at least one green". No one in either of the control groups recognized the disjunction. The results of the t-test are significant at the .1 level. The experimental groups per- formed better on this item. 172 (Pl) List the outcomes for tossing three coins. TABLE 4.19 A GROUP RESULTS FOR ITEM P Pretest 3—8 out- 2-7 out- 1-4 out- GROUP comes comes comes O—other Mean S°D' C1 6 5 12 3 1.54 .99 C2 4 O 9 l 1.50 1.02 E1 3 2 14 1 1.35 .81 E2 2 l 15 2 1.15 .74 TOTAL 15 8 SO 7 1.39 .89 Posstest C1 24 O l l 2.80 .69 C2 14 O O O 3.00 .00 E1 19 0 O l 2.85 .67 E2 18 O 0 2 2.70 .92 TOTAL 75 O l 4 2.82 .69 TABLE 4.19 A t—TEST ON POSSTEST ITEM GROUP Mean S.D. t-value d.f. Sig. Cluc2 2.90 1.36 —1.06 2 .401 2.78 1.06 t is not significant E1 UE2 173 It is evident that on the pretest most of the subjects listed the four outcomes "3 heads", "2 heads-l tail", "1 head-2 tails", and "3 tails“ for the sample space. The results of the next item, P2, suggested that the subjects also felt that these four outcomes were equally likely to occur (see TABLE 4.19 A). Almost everyone listed the eight outcomes correctly on the posttest. Only one subject in 80 persisted in listing four outcomes for the sample space. There was no significant difference between the experimental and the control groups on this item. 174 (P2) What is the probability of getting 2 heads and l tail in tossing 3 coins? TABLE 4.20 A GROUP RESULTS FOR ITEM P2 Pretes" 3-3/8 2-re1 fre 1—1/4 O-other Mean 5 D GROUP ' ' ' ' Cl 2 3 10 11 .84 .92 C2 2 O 8 4 1.00 . 96 El 0 2 13 5 .85 .59 E2 1 0 ll 8 .65 .75 TOTAL 5 5 42 28 .82 .80 Posttest C1 17 O 2 7 2.03 1.37 C2 13 0 O l 2.79 .80 E1 18 O l 1 2.75 .79 E2 16 O l 3 2.35 1.18 TOTAL 64 O 4 12 2.42 1.13 TABLE 4.20 B t-TEST ON POSTTEST ITEM P2 GROUP Mean S.D. t-value d.f. Sig. CllJC2 2.41 .53 .33 , 2 .78 2.55 .28 t is not significant ElUE2 175 Most of the subjects listed 1/4 on the pretest for the probability of getting 2 heads and 1 tail. This was the result of listing the four outcomes "0 heads - 3 heads" on item P1 and assuming that the four outcomes were equally likely. Several subjects listed only 7 of the 8 outcomes on P1, and then correctly used the sample space to assign a probability to the outcome “2 heads and l tail". Although on the pretest 15 subjects correctly listed the 8 outcomes for tossing 3 coins on item P1, only 5 of these 15 used the sample space correctly to assign a probability of 3/8 to "2 heads and l tail“. The posttest results indicated that most of the subjects had learned how to use the sample space of 8 outcomes to assign a probability of 3/8 to "2 heads and l tail". There were, however, 11 subjects on the posttest who correctly listed the outcomes (on Pl)’ but did not apply the sample space when calculating a probability on P2. The control and experimental groups performed about the same on this item. 176 The pretest version of the following item was slightly different from the posttest version. The pretest version is presented first. For four games you have the following chances (P3-pretest) of gaining points: Game A: 20% chance of winning 15 points Game B: 40% chance of winning 10 points Game C: 10% chance of winning 25 points Game D: 50% chance of winning 5 points If you play the game many times, you would be most likely to gain the greatest number of points in: a) Game A b) Game B c) Game C d) Game D TABLE 4.21 A GROUP RESULTS ON PRETEST ITEM P 3 GROUP 3-Game B O-Game A O-Game C O-Game D Mean S.D. C1 13 l 0 11 1.50 1.53 C2 10 O 2 2 2.14 1.40 El 10 l 2 6 1.50 1.54 E2 12 2 2 4 1.80 1.50 TOTAL 45 4 6 23 1.69 1.50 177 For three games you have the following (P3—posttest) chances of winning points: Game 1: 50% chance of winning 8 points Game 2: 20% chance of winning 20 points Game 3: 30% chance of winning 15 points TABLE 4.21 B GROUP RESULTS ON POSTTEST ITEM P 3 GROUP 3-Game 3 O-Game l O-Game 2 Mean S.D. Cl 12 12 1 1.38 1.53 C2 11 2 l 2.36 1.28 El 15 5 O 2.40 1.23 E2 15 2 2 2.25 1.33 TOTAL 53 21 4 2.03 1.41 TABLE 4.21 C t—TEST ON POSTTEST ITEM P3 GROUP Mean S.D. t-value d.f. Sig. CILJC2 1.87 .69 .92 2 .45 ElUE2 2.32 .10 t is not significant There was a tendency for the subjects to choose the game with the highest probability of winning and to neglect the payoffs on both the pretest and posttest measures. A majority of the subjects did, however, get the item correct did not do as well on each instrument. Control group Cl 178 on this item as the other three groups. This item is an NAEP item on which 47% of the 17 year old population and 23% of the adult population responded correctly. The pro- portion of correct responses for these college students was 56% on the pretest and 69% on the posttest. There was no significant difference between the control and experimental groups on this item. 179 The following two probability items were included on the posttest but not on the pretest. the experimenter. Both were constructed by (P4) What is the probability that the sum of the faces will be 5 when a pair of dice are rolled? TABLE 4.22 A GROUP RESULTS ON POSTTEST ITEM P4 3-4/36, O-l/ll O-other Cl 15 l 2 8 1.77 1.49 C2 13 O O l 2 79 80 E1 12 O O 8 1.80 1.50 E2 13 l O 6 2.00 1.41 TOTAL 53 2 2 23 2.01 1.40 TABLE 4.22 B t-TEST ON POSTTEST ITEM P4 GROUP Mean S.D. t-value d.f. Sig. CluC2 2.28 .72 -.73 2 .54 EluE2 1.90 .14 t is not significant Only 2 subjects assumed an equally likely model for the 11 outcomes after a course in probability. Most of the in- correct responses in the "other" column were "1/6” or 180 "(l/6)2 = 1/36". These responses indicated that some sub- jects may have misread the question thinking that there was only one die, or that two fives had to show. It is some- what surprising that 27 subjects actually got this item wrong after taking a course in probability. 181 (P5) The probability that it rains in Seattle on a given day is 2/3. The probability that Bill forgets his umbrella on any given day is 1/4. What is the prob- ability that it rains and Bill forgets his umbrella? TABLE 4.23 A GROUP RESULTS ON POSTTEST ITEM P5 GROUP 3-1/6 0-2/34-1/4 O-other Mean S.D. C1 9 9 8 1.04 1.46 C2 9 2 3 1.93 1.49 E1 20 O O 3.00 0.00 E2 17 1 2 2.55 1.10 TOTAL 55 12 13 2.06 1.40 TABLE 4.23 B t-TEST ON POSTTEST ITEM P5 GROUP Mean S.D. t-value d.f. Sig. Cluc2 1.48 .63 2.59 2 .12 ElLJE2 2.78 .31 t is not significant There was a tendency for the experimental groups to perform better on this item, although the result was not significant at the .05 level. A number of students in the control groups attempted to set the problem up using P(ALJB) = P(A)-+P(B)-P(Ar)B), where A is the event “it rains" and B is the event “bring umbrella". These students 182 evidently confused the concept of "mutually exclusive" and "independent" events. The results of items P4 and P5 are quite different. The experimental and control groups performed about the same on P4, while the experimental groups did considerably bet- ter on P5 than the control groups. Item P4 dealt with a uniform probability model for two dice. On the other hand, P was concerned with independent events occurring 5 in sequence. The experience of developing a multiplicative model for the independent tacks in activity 2 may have been responsible for higher success of the experimental groups on this item. 183 The following elementary items on probability and sample space (P6 — P14) were included only on the pretest to obtain information about the subjects' conceptions of probability prior to formal course work. (P6) A jar contains 4 blue, 6 red, and 3 white marbles. If you draw one marble from the jar, it is most likely that you will: a) get a blue marble b) get a red marble c) have about the same chance of getting a red or a blue marble TABLE 4.23 GROUP RESULTS ON PRETEST ITEM P6 GROUP 3-red O-blue O-same Mean S.D. C1 24 O 2 2.76 .81 C2 13 O 1 2.78 .80 El 19 O 1 2.85 .67 E2 19 O 1 2.85 .67 TOTAL 75 O 5 2.81 .73 This item was easily handled by almost all the subjects. Those who chose "the same chance" gave as a reason for their answer that all three colors had the same chance of being drawn. These subjects did not pay attention to the relative frequency of each color in the sample. 184 (P7) a) A fair die is rolled. What is the probability of getting a 3? b) A fair coin is tossed. What is the probability of getting a head? TABLE 4.24 GROUP RESULTS ON PRETEST ITEM P7 GROUP 3-both right l-b) right Mean S.D. Cl 23 3 2.76 .65 C2 13 1 2.85 .53 El 18 2 2.80 .61 E2 18 2 2.80 .61 TOTAL 72 8 2.80 .60 It was expected that almost everyone would answer this item correctly. The item was included to find out the dif- ferent ways that the subjects would eXpress probability prior to a formal course. For part a), the responses in- cluded "1/6", "1 out of 6", “1 in 6“, "l to 6", "1:5", and "one in six rolls“. The responses to part b) were similar except that "50%" was added. There was a great deal of variation among the subjects in the language which they used to express probability. The 8 subjects who got part a) in- correct misread the question and thought there were two dice. 185 (P8) YOu are playing a game in which you are blindfolded and draw cards out of a box. If you draw a card that has an X on it, you win the game. In the boxes below, would you be more likely to win if you: a) draw from box A b) draw from box B c) makes no difference A O X 0 X X B X.X O X 0 O X 0 O X.O TABLE 4.25 GROUP RESULTS ON PRETEST ITEM P8 GROUP 3—no diff. O-Box A O-Box B Mean S.D. Cl 22 2 2 2.54 1.10 C2 10 2 2 2.14 1.40 El 17 3 O 2.55 1.10 E2 17 1 2 2.55 1.10 TOTAL 66 8 6 2.48 1.15 Those subjects who chose box A.wrote that there were more X's in A, so there would be a better chance of get- ting an X. Those subjects who chose B wrote that since there were "less 0's in box B", the chances of losing would be less in B. The reasons given for either type of incorrect response indicate that the "availability" of the X's was the deciding factor rather than the relative fre— quency of Xfis within a given box. The correct answers to this question were given by subjects who claimed that the ratio was 50:50 in either case. 186 Leffin (1971) used this same item in his investigation of the concept of probability possessed by elementary school children in grades 4 through 7. His results indicated a tendency among the children to pick box A, the box with the most X's. This tendency decreased as grade level increased (from 58% in grade 4 to 36% in grade 7). Only 8% of the college students in the present study chose box A. Appar- ently older subjects are less susceptible to the influence of the total number of X's. 187 (P9) Three friends agree to change the order in which they go through the lunch line each day. In how many possible ways can they arrange themselves? TABLE 4.26 GROUP RESULTS ON PRETEST ITEM P9 GROUP 3-6 ways O-other Mean S.D. C1 21 5 2.46 1.14 C2 12 2 2.57 1.08 E1 16 4 2.40 1.23 E2 20 0 3.00 0.00 TOTAL 69 11 2.60 1.01 Most of the 69 subjects who correctly answered “6-ways" actually listed the 6 possible arrangements, using names or symbols to identify the three friends. Of the 11 incorrect responses, 5 said there were 32 = 9 distinct arrangements. This item appeared on the mathematics inventory taken by the National Assessment of Educational Progress (NAEP). The results found by NAEP Showed that 47% of the 17 year old pOpulation and 28.3% of the adult population responded correctly. 86% of the college students in this study got the item correct prior to a formal course in probability. 188 (P10) At the start of a party game, eight red, six green, four blue, and two white slips of paper were thor- oughly mixed in a bowl. The chance that the first slip drawn at random will be WHITE are given by which of the following: a) —-2-- b) 1 C) 1 8+6+4 8+6+4+2 8+6+4+l 2 d) 8+6+4+2 TABLE 4.27 GROUP RESULTS ON PRETEST ITEM P10 GROUP 3-d) O-a) O-b) O-c) Mean S.D. Cl 17 l 8 0 1.96 1.45 C2 12 0 2 O 2.57 1.08 El 12 1 4 3 1.65 1.53 E2 13 2 3 1 1.95 1.47 TOTAL 54 4 l7 4 1.99 1.43 A substantial number (17) of the subjects chose response b) on this item. Reasons listed for this response indicated that "there is one chance out of the total number of papers". Those subjects who chose b) apparently focused on the word "first“ rather than on the number of slips of white paper. The word "first" may have suggested that there was only‘ggg chance at the draw. Results on the NAEP analysis Showed that 30.8% of the 17 year old population and 28.5% of the adult population responded correctly on this item. 68% of the college stu- dents in this study responded correctly. 189 (P11) A committee of two people is to be chosen from among Bill, Sally, JOe, and Beth. List all possible com- mittees of two from this group. TABLE 4.28 GROUP RESULTS 0N PRETEST ITEM P11 GROUP B-Eizigg l-ii:::§ O-other Mean S.D. Cl 24 O 2 2.77 .81 C2 13 O l 2.79 .80 El 18 1 1 2.75 .78 E2 18 O 2 2.70 .92 TOTAL 73 l 6 2.75 .82 Nearly every subject correctly listed the 6 pairs. Only one subject counted the outcomes as ordered pairs and obtained 12 committees of two people. The remaining errors were 42 = 16 (one), 23 = 8 (three), and 10 (one). 190 (P ) There are 162 games in a baseball season. The manager of the team always bats his pitcher last. He has eight other players to assign to a batting order. Are there enough games in one season to try all possible batting orders for the other eight players? If not, how many seasons would it take? 12 TABLE 4.29 GROUP RESULTS ON PRETEST ITEM P12 3-250 or i33:tgr 1-4 to 10 ggygfiozgfi GROUPS 8!/l62 300 Egoover or NO 1_2 Mean S.D. seasons seasons C1 1 3 3 19 .50 .86 C2 1 1 2 10 .50 .94 El 0 3 3 14 .45 .76 E2 0 1 3 16 .4O .82 TOTAL 2 8 11 59 .46 .83 Unlike items P9 and P11, the sample space for this question could not be listed. The item was included to see how many subjects knew the sequential counting principle prior to the course in probability. Only 2 students in 80 could do this problem correctly. Eight other subjects gave a reasonable estimate for the number of seasons necessary, somewhere between 100 and 300. Three-fourths of these col- lege students believed that either one season contained enough games to admit all possible batting orders, or that two or three seasons would suffice. There was an overwhelming tendency to underestimate the number of possible arrangements 191 of the 8 players. Prior to the course in probability the subjects apparently had very little intuition for the mag- nitude of combinatorial expressions. This is consistent with the results of the taped interviews in the pilot study. Both 8x7x6x5x---x2xl and 1x2x3x---x7 xB were greatly underestimated in the interviews. Similar results have also been obtained by Kahneman and Tversky (1973). 192 (P ) List an event that is certain to occur. List an 13 event that is impossible to occur. TABLE 4.30 GROUP RESULTS ON PRETEST ITEM Pl3 2-one log- GROUP 16;:gally 8:21:22: 25:20 26:52::05 Mean S.D. correct event events C1 1 O 6 19 .46 .70 C2 1 2 4 7 .71 .99 E1 0 1 6 13 .40 .60 E2 2 2 7 9 .95 .94 TOTAL 4 5 23 48 .61 .82 All 3 possible points were given only when the "events" listed by the subject were "logically certain" and ”logically impossible". Examples of acceptable "certain" events were: "pulling a red ball from a jar containing all red balls", "pulling either a red or a black card from a deck of cards", and "getting either a head or a tail when flipping a coin". Examples of acceptable "impossible" events were: "pulling a black ball from a jar of all red balls", and “rolling a 7 with one ordinary die". The list of "impossible" events given by the subjects included: I become a millionaire: Nixon re-elected as pres- ident; an elephant in a refrigerator: everyone graduates from Michigan State with a 4.0 G.P.A: landing on the sun; 193 living forever: George wallace getting the Democratic nomination for president; a man jumps 50 feet vertically; and peace. The list of "certain" events contained similar responses, only these were very likely to occur. TABLE 4.30 indicates that the subjects may have had a great deal of imagination but had a poor concept of "certain" and "impossible“ in the probability sense prior to course work in probability. 194 (P14) YOu are playing a game with two other people. One person picks a number between 1 and 10 and the other two try to guess it. The person who guesses closer to the number wins the game. a) If you have the first choice, what would you pick? b) If the first person picked seven, what would you pick? TABLE 4.31 GROUP RESULTS ON PRETEST ITEM P14 3-6 or GROUP 5 on a) 2'2 on g) 1—5 on a) O-other Mean S.D. 6 on b) on ) response C1 19 3 2 2 2.5 .95 C2 7 l 2 4 1.5 1.28 E1 15 2 2 1 2.1 1.25 E2 17 l l l 2.5 .95 TOTAL 58 7 7 8 2.22 1.14 Subjects received 3 points only if they responded "6" or "5" on part a), and ”6" on part b). These responses optimized their chances of winning the game. 7 subjects chose "5“ for part b) also. They failed to realize that the number 6 would result in a tie game. The reason most frequently given for a choice of "5" in part a) was "its in the middle". The reason most frequently given for the re- sponse "6" in part b) was that "then I'll have all the num- bers below 6". Several students responded "3" to part b) in order to "balance off the 7". Table 4.31 shows that 195 most of the subjects could find the best strategy for this game prior to a formal course in probability. The item was suggested by Professors John Masterson and Bruce Mitchell. 196 Correlation Matrices This section contains scale-to—scale, item-to-item, and item-to-scale correlation matrices. Pearson product- moment correlations were calculated. The significant level of each coefficient was listed beneath it. Relationships that were significant at the .05 level were singled out in the discussion following each matrix. TABLE 4.32 SCALE-TO-SCALE CORRELATION MATRIX FROM POSTTEST DATA SCALE TOTAL PROB. AVAIL. REP. TOTAL 1.000 PROB. r = .730 1.000 sig = .001 AVAIL. r = .648 r = .385 1.000 sig = .001 sig = .001 REP. r = .775 r = .307 r = .323 1.000 sig = .001 sig = .003 sig .002 (N = 80) Significant relationships were found between all pairs of subscale scores on the posttest. 197 TABLE 4.33 AVAILABILITY ITEM-TO-SCALE CORRECTION MATRIX FROM POSTTEST DATA SCALE TOTAL PROB. AVAIL. REP. ITEM A1 r .618 r = .442 r = .754 r = .247 $19 = .001 51g = .001 S19 = .001 sig = .013 A2 r = .101 r = .004 r - .357 r = .055 819 = .187 sig = .486 $19 - .001 sig = .314 A3 r = 240 r = .116 r — .602 r = .092 $19 — .016 sig = .153 $19 = .001 sig = .207 A4 r — .351 r - .170 r = .343 r = .327 $19 = .001 sig = .066 $19 = .001 sig = .002 (N = 80) The availability items all correlated significantly (p < .001) with the availability subscale score. Item A2, A3, and A4 were not related to the probability subscale SCOI'G. A 3, and A4 dealt with paths, the synunety of the combinations formula, and disjunctive events, respectively. REPRESENTATIVENESS ITEM-TO-SCALE CORRELATION 198 TABLE 4.34 MATRIX FROM POSTTEST DATA SCALE TOTAL ITEM R1 r .400 Sig = .001 R2 r = .498 Sig = .001 R3 r — .548 sig = .001 R4 r = .448 Sig = .001 R5 r = .535 Sig = .001 R6 r — .424 sig = .001 (N = 80) PROB. r = .065 Sig = .284 r = .078 sig = .244 r = .127 sig = .130 r = .197 sig = .040 r = .407 sig = .001 r = .300 sig = .003 AVAIL. r = .131 sig = .122 r = .127 Sig = .131 r ‘— o 156 sig = .084 r = .337 $19 = .001 r — .338 $19 = .001 r = .212 sig = .030 REP. r = .618 sig = .001 r = .802 sig = .001 r — .832 sig = .001 r = .491 sig = .001 r - .432 sig = .001 r = .417 sig = .001 All of the representativeness items correlated signifi- cantly (p < .001) with the representativeness subscale score. Items R1, R2, and R3 did not have a significant re- lationship with the probability subscale scores. These three items dealt with the relative likelihood of various sequences of heads and tails or of boys and girls. 199 TABLE 4.35 PROBABILITY ITEM-TO-SCALE CORRELATION MATRIX FROM POSTTEST DATA SCALE TOTAL PROB . AVAIL . REP . ITEM P1 r = .365 r = .594 r = .332 r = .088 Sig = .001 519 - .001 sig = .001 sig = .219 P2 r = .107 r = .350 r = .060 r = -.036 sig = .172 sig = .001 sig = .300 sig = .377 P3 r = .394 r = .667 r = .184 r = .070 sig = .001 sig = .001 sig = .051 sig = .267 P4 r = .607 r = .589 r = .291 r = .405 sig = .001 sig = .001 sig = .004 sig = .001 P5 r = .497 r = .570 r = .189 r = .276 sig = .001 sig = .001 sig = .047 sig = .007 (N = 80) All the probability items had a significant relationship (p < .001) with the probability subscale score. Items P1, P and P do not have a Significant relationship with the 2’ 3 representativeness subscale score. P2 does not have a sig- nificant relationShip with the availability subscale score. P asks fOr a list of the outcomes for tossing 3 coins, P2 1 asks for the probability of 2 heads and l tail in three tosses, and P3 is an expected value problem. ZOO H00. u 00N. u 000.H mHm H BMO . " GHQ 00N. n A mmo. u 8A4 8AA. u u 000.H 44 00m. u 8A4 H00.- u u 84H. u 0A4 mean. u .H moo. u 0A4 00m. u u 000.H mm >04. u mHm omo.l u H mmH. fl mam wHH. u H Nmo . " mflm hON. u H H00. u mHm mum. u u 000.H H4m. u 0H4 040.) u u ONN. u 0A4 moo.) u A 8N0. u 0A4 NHN. n H H00. n mHm 05m. n 8 H00. u me 0mm. u u 000.H Hm How u ze ZHBH ¢BHB¢BZflmmmmflm mm.v mqmda 201 Representativness variables R1, R2, R3, and R.4 were significantly related to each other. These items dealt with sequences of independent events. Variables R5 and R 6 conjunctive events and R6 concerned sample size. variable were significantly related to each other. RS concerned R4 was signficantly related to all the other representative— ness variables. R4 asks for an estimate for the probability that there will be 3 boys and 3 girls in a family with six children. 202 TABLE 4.37 AVAILABILITY SCALE ITEMFTO_ITEM CORRELATION MATRIX FROM POSTTEST DATA ITEM Al A2 A3 A.4 A1 1.000 A2 r = .188 1.000 sig = .047 A3 r = .139 r = -.039 l 000 Sig = .109 sig — .364 A4 r = .123 r = -.O49 r = -.101 1.000 (N = 80) The only significant relationship between availability variables occurred between A1 and A the two questions 2, dealing with paths. 203 TABLE 4.38 AVAILABILITY AND REPRESENTATIVENESS INTER-ITEM CORRELATION MATRIX FROM POSTTEST DATA ITEM A1 A2 A3 A4 Rl r = .101 r = .078 r = -.048 r = .198 sig = .185 sig = .246 sig = .337 sig = .039 R2 r = .116 r = .053 r = .012 r = .102 sig = .152 sig = .320 sig = .458 sig = .183 R3 r = .148 r = .028 r = .051 r = .088 sig = .095 Sig = .403 Sig = .326 sig = .219 R4 r = .278 r = -.116 r = .193 r = .308 sig = .006 sig = .152 sig = .043 sig = .003 R5 r = .208 r = .046 r = .183 r = .308 sig = .032 sig = .343 sig = .052 sig = .003 R6 r = .115 r = .052 r = .067 r = .280 sig = .155 sig = .325 sig = .278 sig = .006 (N = 80) There were few significant relationships found between the availability variables and the representativeness vari- ables. Item A4 (on disjunctive events) tended to have significant relationships with the representativeness vari— ables (R1’ R4, R5, R6)' Representativeness items R4 and R tended to have significant relationships with the avail- 5 ability variables (A1, A3, and A4). The results of the correlation investigations in Tables 4.32 - 4.38 suggest that neither availability nor represent- ativeness items were necessarily related to probability items. 204 Relationships of items on the two heuristics scales with the probability subscale were generally weak or non— existant. It may be possible, therefore, for a person to be able to solve probability problems and yet still not use probability theory in situations that are susceptible to either availability or representativeness. CHAPTER V SUMMARY, CONCLUSIONS, AND DISCUSSION Summary This study investigated college students' use of “avail- ability" and "representativeness" in estimating the likeli- hood of events. Considerable evidence was found in the literature to support the contention that people do not estimate probabilities in accordance with the theoretical laws of probability. 0n the other hand, the evidence sug- gests that people do rely upon the heuristic principles of availability and representativeness when they estimate the likelihood of events. The subjects involved in the study were 85 undergraduate students who had enrolled in a finite mathematics course at Michigan State University. Four class sections were ran- domly chosen and two each were randomly assigned to either the activity-based course (experimental) developed by the author, or a lecture-based course (control). The subjects were pretested and posttested on instruments devised by the experimenter. The instruments contained a probability sub- scale, an availability subscale, and a representativeness 205 206 probability subscale, an availability subscale, and a rep- resentativeness subscale. Cronbach's coefficient-a was calculated for the posttest. The reliability coefficient was .70. The events that took place in one of the experimental sections were recorded in a daily log by the experimenter. The students in the eXperimental sections worked on in-class activities in small groups. The activities were written by the experimenter and involved probability and statistics concepts. The log was kept by the experimenter to gather additional information concerning the way in which college students learn probability while working in small groups. A complete report of the observations made in the class can be found in part one of chapter four. The classes were compared on the posttest on the total test score and on the three subscales. The data was ana- lyzed by t-tests (a = .05) performed on the total scale and the three subscales. A significant difference was found between the groups receiving the lecture-based course and the activity-based course on the representativeness sub- scale. The eXperimental groups scored significantly higher on the representativeness subscale. There was a tendency for the experimental groups to score higher on the availa- bility subscale but the difference was not significant at the .05 level. The experimental groups attained signifi- cantly higher mean gain scores than the lecture—based grouPs 207 or the total test score. Analysis of the pretest showed that there were no differences between the groups on any of the subscales prior to a formal course in probability. The experimenter concluded that the experimental activity-based course appeared to have been more successful than the lecture-based course in helping college students to overcome their reliance upon availability and represent- ativeness. Limitations A personal background inventory indicated that most of the subjects in the study did not have strong high school mathematics backgrounds. Over 75% of the subjects responded that they had had only a year of high school algebra and 1/2 year of high school geometry. Furthermore, 65% of the sub- jects indicated that they had taken at least one remedial mathematics course in college in order to raise the level of their competence in high school algebra before they took the finite mathematics course. Any conclusions or general- izations made from this study should be limited to the popu- lation of students with similar backgrounds. The study was also limited by the fact that there were only four independent units of analysis upon which to base the posttest comparisons, namely, the fOur class sections. Individual differences in teaching style among the four instructors must also be considered in interpreting the re- sults of this study. 208 Results of the Hypothesis Testing Hypothesis 1. There will be no difference between the activity-based and the lecture-based sections on the total test scale. Hypothesis 2. There will be no difference between the activity-based and the lecture-based sections on the proba- bility subscale. Hypothesis 3. There will be no difference between the activity-based and the lecture-based sections on the avail- ability subscale. Hypothesis 4. There will be no difference between the activity-based and the lecture-based sections on the repre- sentativeness subscale. All four hypotheses were tested both on the pretest and posttest scores. Hypotheses 3 and 4 were also tested by comparing mean gain scores from pretest to posttest. T-tests were used to test the hypothesis with a level of rejection set at a = .05. The comparisons from the pretest indicated that there were no significant differences between the sections on any of the subscales prior to a course in probability. The posttest comparisons led to the rejection of hy- pothesis 4. The activity-based sections scored Significantly higher than the lecture-based sections on the representative— ness subscale. The activity-based sections also attained significantly higher mean gain scores on both the represent- ativeness and availability subscales. There was a tendency for the experimental sections to score higher on the posttest 209 availability subscale than the lecture-based sections. The difference was not, however, significant at the .05 level. No significant differences between the sections on either total test score or the probability subscale were found on the posttest. Therefore, hypotheses one and two were not rejected. Conclusions and Discussion The results of the hypothesis testing suggest that the experimental activity-based course was apparently more suc- cessful than the lecture-based course in helping college students overcome their reliance upon representativeness and availability to estimate the likelihood of events. The level of probability concept attainment was not significantly different in the two courses. It appears that the learning of probability concepts may not be sufficient to overcome reliance upon the heuristics of availability and represent- ativeness. Course methodology may be an important factor in overcoming reliance upon the heuristics. These conclusions are supported by the analysis of individual test items and the item-to-item and item-to-scale correlations. The activity-based sections scored signifi- cantly higher than the lecture-based sections on 4 of the 10 items on the heuristics subscales, and tended to score higher on 4 others. Correlations between the probability subscale and items on the two heuristics subscales were 210 generally weak or non-existant. A detailed discussion of the analysis of individual test items and of the correla- tion matrices can be found in chapter four. The observations made by the experimenter during one of the activity-based classes have been reported in detail in chapter four. It appears that college students can learn to discover sound elementary probability models and formulas while working on probability eXperiments in small groups. Furthermore, the effects of sample size upon measures of central tendency and variability may be learned by students working in small groups on activities such as those developed in this study. Making guesses for the probability of events and checking the guesses with a hand calculator seems to help make college students more cautious about probability and more aware of some of their own misconceptions about proba— bility. Small group work, keeping a log of all the class work and investigating misuses of statistics all appear to have a positive effect upon college students' attitudes towards mathematics. There were differences observed between the two control groups C1 and C2 on the posttest total score. The mean score in C1 was 19.70, while the mean score in C2 was 30.57. There were also differences between C1 and C2 on the three subscale scores, with C2 scoring higher on each subscale. There does not appear to be a difference between the posttest scores of the two experimental groups E1 and 211 E2 on any of the subscales. The mean gain scores of El and E2 on the availability and representativeness sub- scales are nearly identical (see Table 4.3). On the other hand, the mean gain scores of C2 are higher than those of C1 on both of the heuristics subscales. Table 4.1 indicates that there were no apparent differences among any of the four groups on the pretest measures. There appears to be uniformity in the posttest results across the experimental groups and non-uniformity across the control groups. There are several possible explanations for the differ- ence in performance between C1 and C2. The difference could be due to sampling variability. The study only had four independent groups, two within each treatment, and so it is not possible to say whether the scores of C1 are low relative to the population of all such classes that are taught finite mathematics in a lecture format. The differ- ence could be the result of variability in teaching styles. It is also possible that class size had an effect on the learning process. C had 1 had N = 26, while C2 N = 14. Olson (1971) has done extensive research on the factors that influence "quality education" and has found that class size has a dramatic effect upon the teaching-learning process. Over a period of seven years, Olson developed an index called "Indicators of Quality". Individualization, interpersonal regard, group activity, and creativity were 212 the four indicators isolated by Olson. Trained observers obtained data from classroom observations. The data was converted in quantitative scores. Olson's investigations took place in 18,528 classrooms in 112 suburban school dis- tricts of 11 metropolitan regions in the United States. Olson found seven significant predictors of his four indi- cators. The top three predictors of quality education were teaching style, subject matter, and class Size. For sec- ondary school classes, which comprised about half of his sample, Olson noted a Significant drop in the quality of education when class size l-15 was compared to class size 16-40. Although Olson's research was done in secondary schools, it is possible that his results on class size may have some implications for college courses. The subjects in this study were mostly freshman and the lecture format used in the control classes was similar to much of the teaching that occurs in secondary school mathematics courses. In any case, the higher scoring control group C2 enjoyed the advantage of class size N = 14. The results of this study on the pretest generally sup- port the results of Kahneman and Tversky (1972, 1973) which claim that combinatorially naive college students rely on the availability and representativeness heuristics to esti- mate probability. In chapter one it was noted that Kahneman and Tversky were skeptical about the possibility of helping peOple to overcome their reliance upon availability and 213 representativeness. The results on the posttest in this study suggest that the manner in which college students learn probability may make a difference in their use of availability or representativeness. The activity-based course was significantly better at overcoming the use of representativeness than the lecture-based course, and also tended to surpass the lecture course in overcoming the use of availability. It is impossible to isolate the specific factors of the eXperimental course which may have helped more students in the experimental sections to use probability theory rather than availability or representativeness to answer the post- test items. However, the extensive study of Olson (1970, 1971) cited above found that teaching style was the best predictor of quality education. The teaching styles that were high scoring quality predictors were small group work, individual work, discussion, laboratory work, and student report and demonstration. The styles that were low scoring predictors were lecture, question-and-answer, and movies. Everyone of the high scoring predictors was included in the methodology of the experimental activity-based course in the present study. It is very likely that differences between the experimental and control groups in overcoming reliance upon availability and representativeness were primarily a result of different classroom methodologies. 214 Implications for Future Research This study was limited in its method of analysis by the fact that there were only four class sections. A rep- etition of the experiment with many more classes involved would allow for a more appropriate method of data analysis such as an analysis of variance or an analysis of covariance with the pretest as covariate. Taped interviews were used to gather information about college students' reliance upon the heuristics of availabil- ity and representativeness prior to formal course work in the pilot study to this thesis. It might be interesting to tape periodic interviews with students who were taking the experimental course to find out where changes in their use of availability and representativeness occur and what activ— ities or class-related experiences might account for the changes. These interviews might also help to explore the reasons why some students still rely on representativeness and availability even after the activity-based course. Per- haps such students have not learned the probability concepts in the activities. Perhaps they have learned the probability concepts in the activities and yet still rely upon availa- bility and representativeness when they are in a situation outside of the classroom. A study on retention is necessary to determine whether the positive effects of the activity-based course on over- coming reliance upon availability and representativeness are maintained over a period of time. 215 Studies involving populations other than college stu— dents in a finite mathematics course should be carried out to determine the extent of the use of availability and rep- resentativeness. In particular, the posttest instrument in this study could be administered to undergraduate mathe- matics majors, or to graduate students in mathematics, or even to faculty members in a university mathematics depart- ment. Such studies might illucidate the pervasiveness of the use of the heuristics of availability and representative— ness. Finally, it should be recalled that availability and representativeness are heuristics which are used to simplify complex decisions and judgments. This study has investi- gated the uses of these two heuristics in a very special and somewhat artificial context: a college class in probability. Perhaps a more important question is: To what degree do the availability and representativeness heuristics affect the decision making process of professional people who must make judgments based upon information that may be only partially valid, or based upon probabilistic cues? In particular, doctors diagnose disease based upon symptoms which come in the form of probabilistic cues. Investors in securities, court judges, airplane pilots, administrators in business and education and classroom teachers also sometimes have to make judgments and decisions based upon uncertain information. The literature review in chapter two has cited evidence that 216 people do not always make optimal decisions and judgments in probabilistic situations. Most of the studies referenced were carried out in laboratory or experimental environments. What do people do in real life situations? What are the practical con- sequences of their judgments and decisions? Clarkson (1962) researched the decision making processes of a trust investor and gathered information on the investor's underlying "policy". The cues and strategies used by the investor as he made decisions involving the allocation of investment funds were modeled by Clarkson. There is currently a program of research in medical education at Michigan State University which is investigating the decision making process of exPerienced doctors as they perform a diagnostic work-up on a patient. This research is being conducted by the Office of Medical Education, Research, and Development (OMERAD) in order to obtain information for the training of medical students. The author has been unable to find examples of research that have looked specifically at the possible effects of availability or representativeness upon the judgments of people while they are performing their professional tasks. The author strongly recommends that research be undertaken to attempt to identify instances of the uses of availability and representativeness among professional people while they are on the job, and to determine the poSsible practical con— sequences of the uses of the heuristics. BIBLIOGRAPHY B I BLIOGRAPHY Austin, J.D., An experimental study of effects of three instructional methods in basic probability and statistics. Journal of Research.ip Mathematics Education, 1973, §, 3, 146-154. Barz, T.J., A Study pf TWo ways.pf Presenting Probability and Statistics 33 the Collegg Level. unpublished doctoral dissertation. Columbia, 1970. Bell, M., Mathematical Uses apd Models ip our Everyday WOrld. Studies in Mathematics, volume XX. School Mathematics Study Group: 1972. Brunswik, E., Representative design and probabilistic theory in a functional psychology. Psychological Review, 1955, pg, 193—217. , Systematic and Representative Design pf Psychological Experiments. Berkeley: University of California Press, (2nd Ed.), 1956. Cambridge Conference on School Mathematics. Goals for School Mathematics. Boston: HDughton-Mifflin, 1963. Carlson, J.S., Children's probability judgments g§_related 39 age, intelligence, socio-economic level, and sex. Human Development, 1969, 13, 2, 192-203. Carnap, R., What is probability? Scientific American, 1953, 189, 128-138. Chapman, J.C., and Chapman, J.P., Genesis of popular but erroneous psychodiagnostic observations. Journal pf Abnormal Psychology, 1967, 13, 193-204. Clarkson, G.P.E., Portfolio Selection:IA §imulation pf Trust Investments. Englewood Cliffs: Prentice-Hall, 1962. Cohen, J., and Hansel, M., Risk and Gambling. New YOrk: Philosophical Library Incorporated, 1956. 217 218 Cohen, J., Subjective probability. Scientific American, 1957, 197, 128-138. , ghance, Skill, and Luck: The Psychology pf Guessing and Gambling. Baltimore: Penguin Books, 1960. College Entrance Examination Board. Commission on Mathe- matics. Introductopy Probability and Statistical Inference for Secondary Schools: 5p Experimental Course. YOrk: New Ybrk, 1959. Cronback, L.J., Epsentials p; Psychological Testing. New YOrk: Harper and Rowe, (3rd Ed.), 1970. Davis, C.M., Development of the probability concept in children. Child Development, 1965, pp, 779-788. DeGroot, A.D., Thought and Choice‘ip Chess. The Hague: Moughton, 1965. Doherty, J., Level pf Four Concepts pf Epobabilipy Possessed py_Children p; the Fourth, Fifth, and Sixth Grade Before Formal Instruction. Unpublished doctoral dissertation. Missouri, 1965. Edwards, W., Conservatism in human information processing. In: Formal Representations p; Humap Judgment. Ed. by B. Kleinmutz. New YOrk: Wiley, 1968. Feigenbaum, E.A., and Lederberg, J., Mechanization of in- ductive influence in organic chemistry. In: Formal Representations prHuman Judgment. Ed. by B. Kleinmutz. New YOrk: Wiley, 1968. Fitzgerald, W., The role of mathematics in a comprehensive problem solving curriculum in secondary schools. School Science and Mathematics, 1975, 1, 39-47. Freudenthal, H., Why to teach mathematics so as to be useful. Educational Studies 1p Mathematics, 1968, l, 1, 3-8. Geeslin, W.,‘Ap Analysis pf Content Structure and Cognitive Structure ip Context g£.§ Probability Unit. ERIC Document ED 090 036. Gipson, J.H., Teaching Probability 13 Elementary School: Ap Experimental Study. Unpublished doctoral disser- tation. Illinois, 1971. 219 Hammond, H.R., Hursch, C.J., and ded, F.J., Analyzing the components of clinical inference. Psychological ge- view, 1964, 2;, 438-456. Hoeman, H;W., and Ross, B.M., Children's understanding of probability concepts. Child Qeyelopment, 1971, pp, 221-236. Hoffman, P.S., The paramorphic representation of clinical judgment. Psychological Bulletin, 1960, p1, 116-131. , Cue-consistency and configurality in human judgment. In: Formal Representations 9; Human Judgment. Ed. by B. Kleinmutz. New Ybrk: Wiley, 1968. Howell,‘W.C., Intuitive counting and tagging in memory. Journal pg Experimental Psychology, 1970, 85, 2, 210- 215. Huff, D., How 52 Lie with Statistics. London: WIW. Norton, 1954. Jarvick, M.E., Probability learning and a negative recency effect in serial anticipation of alternative symbols. Journal 9; Experimental Psychology, 1951, 3;, 291-297. Jones, G. A. The Performances Lf First, Second, and Third Grade Children Ln Five Concepts Lf Probability and the Effects Lf Grade, _9, and Embodiments Ln Their Perform— ances. Unpublished doctoral dissertation. Indiana, 1974. Kahneman, D., and Tversky, A., Subjective probability: A judgment of representativeness. Cognitive Psychology, 1972, g, 3, 430-454. , 0n the psychology of predic- tion. Psychological Review, 1973, pp, 4, 237-251. Kass, N., Risk and decision-making as a function of age, sex, and probability preference. Child Development, 1964, 3;, 577-582. Kipp, W.E., 5p Investigation p§_the Effects pg Integrating Topics Lf Elementary Algebra with Those Lf Elementary Probability within a Unit Lf Mathematics— Prepared for College Basic Mathematics Students. UnpubliShed doc- toral dissertation. Florida State, 1975. Klamkin, M., On the teaching of mathematics so as to be useful. Educational Studies Lp Mathematics, 1968, l, 126-160. 220 Klamkin, M., On the ideal role of an industrial mathematician and its educational implications. The American Mathe- matical Monthly, 1971, 1g, 1, 53-76. Kleinmutz, B., The processing of clinical information by man and machine. In: Formal Representations Lf Human Judgment. Ed. by B. Kleinmutz. New Ybrk: Wiley, 1968. Komorita, S.S., Factors which influence subjective proba- bility. Journal 9; Experimental Psychology, 1959, §§, 386-389. Leake, L., The Status pf Three Concepts 9; Probability in Children 2§_the Seventh, Eighth, and Ninth Grades. unpublished doctoral dissertation. Wisconsin, 1962. Leffin, W.W., A Study 2; Three Concepts 9; Probability Possessed 21 Children.in Grades Four-Seven. ERIC Document ED 070 657. McKinley, J. E. Relationship Between Selected Factors and Achievement in a Unit Ln Probability and Statistics for Twelfth Grade Students. Unpublished doctoral dissertation. Pittsburgh, 1960. McLeod, G.R., An Experiment in the Teaching pg Selected Concepts 2; Probability £9 Elementary School Children. Unpublished doctoral dissertation. Stanford, 1971. Messick, S.J., and Solley, C.M., Probability learning in children: Some exploratory studies. JOurnal 9; Genetic Psychology, 1957, 29, 23-32. Mosteller, F., Fifty Challenging Problems in Probability with Solutions. Reading: Addison Wesley, 1962. Mosteller, F., Rourke, R.E.K., and Thomas, G.G., Probability with Statistical Applications. Reading: Addison-Wesley, (2nd Ed.), 1970. Mosteller, F., Kruskal, W.H., Link, R.F., Peiters, R.S., and Rising, G.R., Statistics: A Guide 59 the Unknown. Ed. by Judith M. Tanur. San Francisco: Holden—Day, 1972. Mosteller, F., Kruskal, W.H., Link, R.F., Peiters, R.S., and Rising, G.R., Statistics by Example. Reading: Addison- Wesley, 1973. Moyer, R.E., Effects 9: 3 Unit 93 Probabilitnyn Ninth Grade General Mathematics Students' Arithmetic Computation Skills, Reasoning, and Attitudes. Unpublished doctoral dissertation. Illinois, 1974. 221 Mullenex, J.L., A Stud 2 of the Understanding of Probability Concepts by_Selected —Elementary School Children. Un- published doctoral dissertation. Virginia, 1968. National Council of Teachers of Mathematics. The Place gf Mathematics in Secondary Education. Fifteenth Yearbook of the NCTM, 1940. Newall, A., Shaw, J.C., and Simon, E.A., Elements of a theory of human problem solving. Psychological Review, 1958, .§§, 151-166. Newall, A. Judgment and its representation: An introducation. In: Formal Representations of Human Judgments. Ed. by B. Kleinmutz. New YOrk: Wiley, 1968. Newall, A., and Simon, H., Human Problem Solving. Englewood Cliffs: Prentice-Hall, 1972. Nie, N.H., Hull, C.H., Jenkins, J.C., Steinbrunne, K., and Bent, D.H., Statistical Package for the Social Sciences. New Ybrk: McGraw—Hill, (2nd Ed.), 1975. Olson, M.N., ways to achieve quality in school classrooms: Some definitive answers. Phi Delta Kappan, 1971, 5;, 1, 63-65. Page, D.A., Probability. In: The Growth 2; Mathematical Ideas Grades K-12. Twenty-fourth Yearbook of the NCTM, 1959, 229-271. Peterson, G.R., Du Charme, W.M., and Edwards, W., Sampling distributions and probability revision. JOurnal 2; Experimental Psychology, 1968, lg, 236-243. Phillips, L. D. Hays,‘W. L. and Edwards, Conservatism in complex probabilistic inference. IEEE Transactions .22 Human Factors in Electronics, 1966— 7 7- 18. Piaget, J., and Inhelder, B., L'a Genese' de l'Idee g2 Hazard' chez l'Enfant. Press Universitaries de France, 1951. , The Origin 9; the Idea g; Chance in Children. Translated by Leake, Burnell, and Fishbein. London: W.W. Norton and Company Incorporated, 1975. Piel, E.J., and Truxal, J.S., Man and His Technology. New Ybrk: McGraw-Hill, 1973. Pollak, H.O., On some problems of teaching applications of Mathematics. Educational Studies i§_Mathematig§, 1968, l, 1, 24-30. 222 Rorer, L.G., and Slovic, P., The measurement of changes in judgmental strategy. American Psychologist, 1966, 31, 641-642. Schwab, J.J., The practical: A language for curriculum. School Review, 1969, 1g, 1—23. Shepler, J., Parts of a systems approach to the development of a unit in probability and statistics for the ele- mentary school. JOurnal 9; Research in Mathematics Education, 1970, 1, 4, 197—205. Shepler, J., and Romberg, T., Retention of probability con- cepts: A pilot study into the effects of mastery learn— ing with sixth grade students. Journal 9; Research in Mathematics Education, 1973, 5, 1, 26-32. Shulman, L.S., The psychology of school subjects: a premature obituary? JOurnal 2; Research in Science Teaching, 1974, 3;, 4, 319—339. Shulman, L.S., and Elstein, A., Studies of problem solving, judgment, and decision making: Implications for edu- cational research. In: Review 9; Research in Education. Ed. by F.N. Kerlinger. Baltimore: Peacock and Company, 1975. 2Q Students and Teachers‘gf a Ninth Grade General Mathe- matics Class. Unpublished doctoral disSertation. Michigan, 1968. Simon, H.A., and Newall, A., Human problem solving: The state of the theory in 1970. American PsyChologist, 1971, g9, 145-159. Slovic, P., and Lichtenstein, 8., Comparison of Bayesian and regression approaches to the study of information processing in judgment. Organizational Behavior and Human Performance, 1971, Q, 649-744. Smock, C., and Belovicz, G., understanding 9; Concepts‘gf Probability Theory by_Junior High School Children. Final Report. ERIC Document ED 020 147, 1968. Stevenson, H.W., and Zigler, E.E., Probability learning in children. Journal 9; Experimental Psychology, 1958, gg, 185-192. 223 Thompson, M., Models, Problems, and Applications 91.Math— ematics. Unpublished pre-conference paper for a conference on topical resource books for mathematics teachers. Eugene, Oregon, 1974. Tune, 6.8., Response Preferences. Psychological Bulletin, 1964, 1, 4, 286-302. Tversky, A., and Kahneman, D., Belief in the law of small numbers. Psychological Bulletin, 1971, 16, 2, 105-110. , Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 1973, g, 207-232. , Judgment under uncertainity: Heuristics and biases. Science, 1974, 185, 4157, 1124- 1131. Vlek, C.A.J., Multiple probability learning. Acta Psychologica, 1970, ;§, 207—232. Weiss, N.A., and Ybseloff, M.L., Finite Mathematics. New Ybrk: WOrth, 1975. Wheeler, G., and Beach, L.R., Subjective sampling distri- butions and conservatism. Organizational Behavior and Human Performance, 1968,.1, 36-46. White, C;W., A Study 21 the Ability 21 First and.§gghth Grade Students 12 Learn Basic Concepts 91 Probability and the Relationship Between Achievement 15 Probability and Selected Factors. UnpubliShed doctoral disserta- tion. Pitsburgh, 1974. Wilks, S.S., Teaching statistical inference in elementary mathematics courses. The American Mathematical Monthly, 1958, 65, 143-153. Williams, J.D., The Compleat Strategyst. New Ybrk: McGraw- Hill, 1954. Ybst, P., Siegal, A., and Andrews, J., anverbal probability judgments by young children. Child Development, 1962, .11, 769—780. APPENDI CES APPENDIX A OUTLINE OF DAILY PLAN FOR THE EXPERIMENTAL COURSE 2 Days - Introduction and administration of pretest. 2-3 Days - Activity 1. Assignments: Activity 1 written up. S.B.E. Set #4, Exploring Data, pp. 27-31. Write up and hand in p. 29 #1-3, and p. 30,31, #4. About 4 days - Activity 2 on tacks. Go over any problems that came up with the Activity 1 write up when handing back. Towards the end of the week, see if anyone has come up with any articles showing the misuses of statistics yet. Suggest a day for in—class discussion of any articles found. Assignments: Activity 2 written up. First Homework assignment on generalization of coin activity. About 5 days - Activity 3. Go over Activity 2 problems when handing back. Discuss the equally likely model from the coin experiment vs. the unequally likely model from the tack experiment. Discuss “multiplying" probabilities (i.e., independent events). One day in class will be spent on a Mini- lecture introducing the sequential counting principle. Another day in class will be spent playing the game of picking 5 numbers from 1-12 and betting with the class that at least two are the same, and then calculating the probability that this will happen (they should know about multiplying probabilities by this time.) Go over Activity 3 in class if there are any big prob- lems in the write ups. Go over Sets #4 and #10 from S.B.E. when handing back. - 224 225 Assignments: write up of Activity 3 is due about 2 days after finishing the in-class part. Assign S.B.E., Set # lO, Exploring Data, pp. 87-90. Write up and hand in, pp. 89, 90, #1-3. (Note: have them draw dia- grams of the outcomes in #2 and #3.) Assign the second homework assignment, probability problems, after having talked about multiplying probabilities and having played the number 1-12 game in class. The assignment will be on a separate Sheet, and will consist of the Birthmonth, Birth- day, Flippant Juror problems from Fifty Challenging Problems, and of the problems on pulling two colored balls. They Should have about 2—3 days to work on these at home. Solutions to the Flippant Juror and Birthday Problem can be presented in class by individual students. About 7 days - Activity 4 on counting principles. Go over the second homework (Birthday set). Spend some time discussing their articles on misuses of statistics. Have them present individual solutions of the "harder" problems of Activity 4. Assignments: Activity 4 written up. (Or you can wait to grade this one when they hand in their logs at mid-term.) Assign S.B.E. weighing Chances, Set #2 about random digits. Write up and hand in pp. 12, 13, #1; p. 14, #1,2; p. 20, *1-5. About 2 days - Pull together any loose ends on problem set #2B or Activity 4. Review for first in-class test. One day - First in—class test. For the second part of the course, try to set some time aside each week to discuss any misuses of statistics that the students may have found. It may help to set aside some specific time slot for this during each week. 1 Day - Go over the test. Clear up any problems that may still exist in Activity 4. Outline the rest of the term's work. (This will include a brief introduction to game theory and the use of probability in calculating the 226 expected value of a game: an introduction to inferen- tial statistics (as distinguished from descriptive statistics that they have had up until now) via the chi-squared test, and the importance of probability in the decision making process of inferential statis- tics; and some work on the effect of sample size upon the stability of means and variability measures. At the end of the course there will be an open ended activity that involves setting up a controlled experi- ment (or uncontrolled if they decide to go that way!), gathering some data, analyzing the data, and making a decision based upon the statistical information that they have accumulated. The experiment will then be critized for validity and strength using the experience that they have obtained from analyzing the articles on the misuses of statistics. 4 Days - Activity 5. After they do Activity 5, discuss the re- sults that they have come up with for their strategies. Then, give a mini-lecture on the mini-max theory for strictly determined games with a saddle point, working through several examples. Continue your lecture on game theory by introducing games in which mixed strat- egies are necessary, and explain the method of elements, a way to calculate the best mixed-strategy for a 2 x2 two person game. (See Williams' The Compleat Strategyst, as well as the notes that will be handed out on game theory.) Set aside a day to answer questions on the problem set dealing with the binomial distribution, "Classifying Pebbles", assigned below. Assignments: Read S.B.E. Set #11 in Exploring Data, "Testing Beer Tasters", pp. 91-97. This is a good example of a misuse of statis- tics. Project 2, p. 96 can be done for fun and extra credit by anyone that is interested. Read Set #4 in weighing Chances, "Class- ifying Pebbles", pp. 33-43. write up and hand in: p. 40, #1; p. 41, #2,3,5,6,9: and p. 43, #15a,b,c. (Note: It is likely that the students will have questions on S.B.E. #4 on the binomial distribution. The formula at the top of page 40 appears rather suddenly, and sometimes the students do not see what it has to do with the pre- vious text. There may be a tendency to leave the coefficients off when applying this formula to problem 15. Nip it in the bud.) 227 Assign Problem Set #3 (the five games) when you have finished lecturing on best strategies for playing a 2 x2 game. Activity 5 can be handed in with this problem set. About 5 Days - Activity 6. (About three days) Before they do Activity 6 give a mini-lecture on expected value. Include among your examples the game which they will simulate 25 times in Activity 6 to see how close “long run average payoff" comes to theoretical expected value. Talk about Chi-Squared problem set from S.B.E. (see assign- ment). They are likely to have difficulty understanding the table on page 58, to have difficulty calculating degrees of freedom since the text really does not do a very good job with this, and to have difficulty under- standing how to calculate the expected frequencies from a grid of observed frequencies. Unfortunately, the text does not do a very good job explaining this clearly either. Spend time on misuses of statistics per usual. Explain the game of craps to them.before assigning the problem of calculating the probability that the roller wins at craps. Assignments: Read S.B.E. Set #6 in weighing Chances on the chi-squared procedure. Write up and hand in: p. 56, #1; p. 58, #2; p. 63 and 64, #1-3. Hand in Activity 6 written up. Assign the problem of calculating the probability that the roller wins at craps, due in about a week. About 5 Days - Activities 7 and 8. (About two days each) After you hand back the set on chi-square, give them the problem of testing their "theory" for the tack experiment, that is, use their theoretical probabilities for "ups" and "downs" that they calculated in Activity 2 to get ex- pected frequencies for each of the outcomes (eight) in throwing these tacks, and use the chi-square test to compare these to the observed frequencies (in 63 tosses) to test the goodness of fit between their theory and what really happened. (There are 7 degrees of freedom.) If you have data from Activity 2 that involves two (or more) theories for the tacks, depending on how the tacks were dropped i.e., on the floor or on a table top, then have them test each theory out. If you don't have such data, and time permits, have them throw three tacks against the wall and record the outcomes, and test this against an equally likely model. In general, let dis— cussion run as long as they are interested in this 228 eXperiment. The calculations can be made at home, and the results can be discussed the next day in class. Spend some time going over any misuses of statistics that they may have found. Assignments: Read S.B.E. Set #12 in Exploring Data on Extimating the Size of Wildlife Pop- ulations. Due, p. 101 and 102, #1,2,3, and 6. The write up of Activities 7 and 8 is due whenever it is convenient. About 5 Days - Finish any discussion of the tack-theory-testing that may still not be completed. Go over any problems or questions before the test, paying particular attention to any questions on the binomial distribution or on the chi-squared procedure that may still bother students. If time permits, discuss the misuses of statistics that they may have found, or perhaps have a student present the solution for the crap problem, if anyone has gotten it. Second in-class test. (one day) Go over test. If time permits, introduce Activity 9. (one day) Introduce pulse problem to them. "Pulse rates go up when taken by a member of the opposite sex." This is the beginning of Activity 9, an open ended problem that could go many different directions in different classes. Good luck! About 4—5 Days - Activity 9 on plus rates. Whenever it is convenient during the last two weeks, spend a day on the three cornered duel, and have them play with their calculators and the series that evolves in the one case. The posttest will be administered on one day. Announce this well in advance, say a week to ten days, and em- phasize that the test is for your information as to how much they know about certain concepts, and will in 22 way count towards their grade. Emphasize the importance that they attend that day. APPENDIX B ACTIVITIES, PROBLEMS, AND NOTES TO THE INSTRUCTOR. NOTES ON GAME THEORY AND EXPECTED VALUE Activity 1. Before doing this activity, write down your best guess for each of the following: If you flip six coins, what is the probability that you will get, a) six heads c) four heads b) five heads d) three heads Perform this activity in groups of six. 1. Flip six coins. Record the number of heads. Repeat this experiment 48 times in your group. Arrange the data in a 4-x12 grid. Use your data to answer each of these questions. a) What is the probability of getting 6 heads? 5 heads? 4 heads? three heads? two heads? one head? no heads? b) What is the probability of getting at least one head? At least two heads? 2. Make a list of all the possible outcomes for flipping six coins. b) Develop a mathematical model to find the theoret- ical probability for the outcomes of flipping six coins. c) What is the theoretical probability for getting at least one head? At least two heads? d) What are the assumptions of your mathematical model? 3. Compare the experimental probabilities from part 1 with the theoretical probabilities in part 2 above. How well do they agree? Make a graph to compare the experimental (observed) probabilities in 48 flips for 0 heads, 1 head... 229 230 6 heads with the theoretical probabilities, (How many times they should happen). Plot the two graphs on the same set of co-ordinate axes. Where is there close agreement between the two graphs? Where is there not close agreement? Why do you suppose that this happens? 4. What assumptions have you made in your experiment when flipping the coins? What suggestions do you have to improve the experiment? 5. List any other comments, questions, observations, or reactions that you might have to this activity. Note to the instructor for Activity 1. Before the Activity: Activity 1 is intended as an introduc- tion to equally likely outcomes, and is meant to be tapped later on as an example of a binomial experiment and utilized in sampling variation. (Thus the 4-x12 grid!) Before doing this activity, ask them (fbr definitions of) "what is probability" soliciting as many different re- sponses as you can get. Then give several examples using, say, coins, dice, cards-—something that involves equally likely outcomes and is simple--and ask them what the probability of several things is. Then ask them which, if any, of the definitions help calculate prob- ability, or describe accurately how they obtained the probability. The goal is to get them to isolate the relative frequency model #favorable outcomes . total # outcomes During the Activity: As a general rule, do not answer their questions outright. Try to solicit answers from someone else in their group. If this fails, try to ask them ques- tions that might have bearing on their question and for which they g9 know the answer in an attempt to build back up to their original question. The goal is for them to do the mathematics and solve the problems themselves, and for the instructor to act as a "mathematical physician and clinician". After the Activity: After they have performed the experiment and gathered the data, and answered several questions about it, put all their results from the coin tosses on the board and pool their results. (Try to get them to suggest this!) They could use the pooled results to calculate more exper- imental probabilities, compare these to both their own experimental probabilities, and later on to the mathematical model probabilities and graph all three on the same chart in question 3. Thus, you could get into the effects of sample size in an informal way, right off. 231 Activity 2. Thumbtacks - A pointed affair. Part 1 a) Before doing this activity, write down your best guess for probability that your tack will land upright when dropped. b) Devise some uniform way of drOpping your tack. Toss the tack 72 times. Arrange your data in a 6 x12 array with U to indicate upright an D to indicate down. c) Based upon the data you have collected, calculate the probability that the tack lands upright; that it lands down. Part 2 Do this part of the activity in groups of 4. Get a person in your group with each one of the (three) colored tacks. Use the probabilites calculated in part 1 to list the probability that the red tack lands up, that the silver tack lands up, and that the gold tack lands up. Ybu will toss the three colored tacks together and record the results, before performing this experi- ment, write down your best guess for each of the following: a) The probability that all the tacks land up. b) The probability that no tack lands up (what's another way to say this?) c) The probability that at least one tack lands up (what's another way to say this?) d) The probability that the red tack lands up. e) The probability that 2 tacks land up and one lands down. Devise some uniform way of dropping the three tacks and per- form the experiment 63 times. Record your data in a 7)(9 array by listing the results as triples, i.e., UIDD could stand for red tack up, gold tack down, silver tack down. Use the data you have gathered to calculate the exper- imental probabilities for the events in a-e listed above. Compare these calculations to your guesses. Any surprises? What assumptions have you made in doing the experiment? What suggestions do you have to improve this experiment? Part 3 a) b) C) d) 232 Develope a mathematical model to assign theoretical probabilities to the outcomes of this experiment. First, list all possible outcomes for the experi- ment, then devise a way of assigning a probability to each outcome. Use your data from part 2 to determine experimental probabilities for each of the outcomes listed above in part a. Compare these experimental probabilites with the theoretical probabilities for the outcomes given by your model. Make a graph to compare your experimental probabilities with the theoretical probabilities. How well do the graphs agree? What assumptions have you made in your mathematical model? Does the model for the tack experiment differ in any way from the model for the coin ex- periment? If so, how? Is there any similarity between the two experiments? List any other comments, questions, observations, or reactions that you might have to this activity. "This branch of mathematics (probability) is the only one, I believe, in which good writers frequently get results entirely erroneous." -- Charles Sanders Pierce Notes to Instructor for Activity 2. 1. Post result by tack color after part 1. If there is (probably will be!) wide dispersion in the re- sults, see if they can give some reasons for it. They may be willing to redo part one controlling for some factors. (i.e., height, surface, way dropping.) Post results of part 2 by group and outcome. Before they start writing up part 3 (which can be done out- side class) attempt to get some kind of "average" probability for U and D as predictors fbr the theoretical model. Help them list the outcomes, as in Activity 1. (Try to talk about the coin problem with 1 coin, 2 coins, 3 coins, --- 6 coins as possible motivation fOr a multiplicative prin- ciple in the model they are about to make.) They may, or may not be interested in trying tossing the tacks against the wall to see if the results 233 come up close to SO-SO. If they are tired of tack tossing, this part can be saved until later when we do the chi-squared statistic. They can then test the various theoretical models they have using X2 and their experimental data. 234 Activity 3. Do this activity in groups of 4. Before per- a) b) C) forming this activity, write down guesses for each of the following: the most likely sum for throwing 3 dice, and the probability of that sum the least likely sum for throwing 3 dice and the probability of that sum the probability that you get a 7 the probability that you get an odd number. 1. Toss the dice 84 times in some unifbrm way. Arrange the outcomes in a 7 x12 array by both triples, and the sum of the faces. For example: 3 5 4 R W G 12 shows that the red die was a 3, the white die a 5, and the green die a 4, and that the sum of the faces was 12. Record the frequency of each sum, and make a histogram (see page 5 of SBE Exploring Data) that will indicate the number of times that each sum occured. (This can be done at home.) 2. Use the data you have collected to calculate experimen- tal probabilities for each of the following: a) b) C) d) e) f) Each of the possible sums of the faces of the three dice. Compare your guesses for most likely and least likely outcomes to experimental most likely and least likely ones. How did you do? The probability that the sum was odd. The probability that the red die came up a 3. The probability that you got a 6 on one die and something other than a six on the other two dice. The probability that §1_least one die had a 6. What assumptions have you made in doing this ex- periment? 3.a) b) C) d) 4. Is 235 Develop a mathematical model to describe the exper- iment and to get theoretical probabilities for the outcomes. How many different outcomes are there? How many different ways can each sum occur, i.e., how many ways are there to make a 12? List the num- ber of ways that each outcome can occur. Use your information to calculate the theoretical probabilities that a 3,4,...,18 occurs. Superimpose a histogram for the theoretical proba- bilities on the top of a histogram for the experi- mental probabilities. wa well do the two compare? Any surprises? Why do you suppose the “surprise" happened? Calculate theoretical probabilities for those events listed above in part 2b, 2c, 2d, and 2e. the mathematical model for this activity more similar to the model for the tacks or the model for the coins? How so? Notes 1. Statistics are no substitute for judgement. -- Henry Clay to Instructor for Activity 3. Post results by group and sum after the experiment is done. Pool results. Suggest third histogram with class totals to be superimposed on group total and theoretical histograms. Talk about most likely and least likely outcomes. This activity may draw discussion on "how" to list outcomes i.e., 15 sums or 216 triples. The triples yield more imformation and can be used to calculate probabilities of the sums. The equally likely model applies. The general partition problem is hard, unsolved in part! It may bear mentioning. (i.e., no formula to yield the number of partitions of a given number). 236 Activity 4. Counting Outcomes and Counting Redundancies: Tbols for calculating probabilities. The sequential counting principle will help you get started on these problems. 1. Slobbobic Spellink. The county of Lower Slobbobia uses the arabic alphabet (26 letters) in their written language. HOw— ever, in Slobbobian, any arrangement of the letters makes a word. HOW many different words can be written in Slobbobian using the letters in each case below. a) G A K (List them) b) P Z U B (List them) How about Z Z U B? c) E Z A K L (Just say how many) d) L Z A K L (List them) e) L Z A L L (List them) f) In parts d) and e), some spellings are redundant, that is, yield the same word over again. How many redundant spellings of the single word L Z A L K (from part d) are there? HOW many redundant spellings of the word L Z A L L (from part e)? Give a reason for your answer. So, what percentage, or fraction, of the total number of possible arrangements of the five letters L Z A K L will be redundant? Same question for the letters L Z A L L. 2. How many different words in Slobbobian can be written from each of the following: a) J Z E K E K b) J c) H M Z U Z U T H T T d) I S S I 03:3 G S I P P I 3. Generalizations a) If you have N distinct letters, how many different words (in Slobbobian of course!) can you write? b) Now, suppose some of those N letters are identical (as happens above.) How would you modify your answer to part a) so that you count only distinct words? 237 c) Can you suggest a general rule for these kind of spelling problems? d) Using H for heads, T for tails, and using your general rule from part c), count the number of ways you can get 6 heads, 5 heads, 1 tail, 4 heads 2 tails, . . . 1 head 5 tails, no heads, for flip- ping coins. (Do this at home - you did part of it in 2c already). 4. Suppose that six people are running a race. a) How many different ways could there be a first place, then a second place, then a third place finish, that is, how many different one - two - three finishings are there? b) Suppose we are only interested in whether or not a runner finishes in the top three. That is, we are concerned about Egg the first three runners are, but we don't care about the order that they finished in. How many different groups of three people could cross the finish line? (Hint: How many times was each group of three counted in part a) above? i.e., how many redundancies are there for each group?) c) HOw many different groups of 4 people could cross the finish line? Of 5 people? Of 2 people? Of 1 person? Of 6 people? Where have you seen these numbers before? 5. ‘Write up answers to question 4(a-c). When there are 5 people running the race. When there are 4 people run- ning the race. (This can be done outside of class provided that you are able to answer question 6 at this point.) 6. a) Can you give a general rule for the number of ways of choosing a subset of x-persons (or things) from a set of y-persons (or things)? b) HOW many different groups of 12 can be choosen from a group of 25? c) How many different ways can you get 3 heads if you toss 8 coins? II Counting Challenges Using the sequential counting principle, as well as what you have learned about redundancies, you can get a good start on these problems. 238 1. How many Michigan license plates can be made? 2. How many double dip Baskin and Robbins ice cream cones can be made (if all flavors are in)? Does your answer depend on anything? Is there another possible answer? 3. How many different pizza toppings are possible if you had cheese, mushrooms, pepperoni, onions, and sausage to work with? 4. An ordinary deck of 52 cards has 4 suits with 13 denom- inations in each suit (ace, 2, 3,..., king). a) HOw many different pairs of jacks are there? How many different triples of jacks? b) HOw many possible pairs can be made in the whole deck? How many possible triples? 5. a) How many different ways can you get 5 cards of the same suit? (Called a fluSh) b) How many different flushes would there be in the whole deck? c) How many different ways can you get a hand of five cards that goes 4-5-6-7-8? (8 high straight!) d) How many different ways can a straight start, i.e., having a lowest card? e) Use c and d to count the number of possible different straights. 6. a) What is the probability that you get dealt a pair in a 5 card poker game? (Hint: What is the prob- ability of not getting a pair?) b) How many different 5 card poker hands are there? c) Using part b), and your answer to 5b) and 5e) above, find the probability that you get dealt a flush. A straight. Have you made any assumptions in your answers for the probabilities of a straight and a flush? Notes to Instructors on Activity 4. 1. Before doing Activity 4, give a mini-lecture with several examples that lead into the sequential counting principle, such as: 239 How many roads are there from N.Y. to L.A. if there are three from N.Y. to St. Louis, and 4 from St. Louis to L.A.? How many telephone numbers are possible? How many on campus? and so on! The first objective of Activity 4 is to get students to isolate the principle of dividing out by redun- dancies in counting problems. Asking such questions as, "For a fixed word (of N letters), how many ways can you get the same word? (then) What frac- tion of the total number of possible words would be redundant?" may help them. The special case of this principle which is usually called "combinations" is developed in question 4, leading up to a general rule. Please do not use the words "combinations" or "permutations" in re- ferring to any of the problems in this activity. The sequential counting principle will suffice to approach all these problems, and then the concept of redundancies will help to keep from double (triple etc.) counting the outcomes. Try to get the students to develop their own rules. If they get an incorrect one suggest an example that will lead them to a contradiction. This process is slow but helps students to avoid misusing formulas in- stead of analyzing the problem. Some of the problems in Counting Challenges will require time outside of class. Copious hints may be needed on these problems. The students can do the first part la) 4 If) at home after your mini-lecture (in interest of saving class time.) In fact, they'll need to spend time everyday outside of class on this activity. Encour- age them to do so. Some of the harder poker hands such as 2 pair, 3 of a kind, full house, could be given as extra challenges. 240 Activity 5. Introduction to game theory. 1. You are playing a game with a friend in a bar. Both of you show either one or two fingers. If you show one finger and your friend shows two, you win $10. If you show two fingers and he shows one finger, you win $30. If you both show the same number of fingers - both one or both two - your friend wins $20. Play this game with a partner 20 times. Keep a record of the payoffs and whose getting them. Switch roles and play it another 20 times, recording the payoffs as above. As you play the game, try to figure out what's the best thing for you to do, i.e., what's your best strategy. After you have played it both ways, write down what you think the best strategy is for each player. Play this game with a partner. Take the cards from 2 to 9 in a black suit and in a red suit out of a deck, so you'll have 16 cards in all. Shuffle and arrange them in a 4)(4 grid. Let one person be black, the other be red. Black picks a row and red picks a column in secret of course! Whatever the card is in the row and column that was picked is the payoff. A red 6 gives 6 points to red, a black 8 gives 8 points to black. Play the game about 20 times with a partner, keeping a running tally. Can you figure out a best strategy for each player? If you can't figure out a "best" one, can you suggest several good possibilities for each player? If you got all black cards in a row, or all red cards in a column, what happens? If this does happen to you reshuffle and try again. Draw g_picture of the game you play for your log. CHALLENGE QUESTION (for outside of class work). What is the probability that you do get all blacks in a row or all reds in a column when you set up your grid? A husband and wife, Jack and Beth, are mountain climbing in the Rockies. Beth likes trails and campsites that are at high altitudes, Jack likes low altitudes. The area of the mountains they are interested in exploring is criss-crossed by a network of trails, four running north-south and fbur running east-west. The campers have agreed to camp at a junction of two of these roads. The intersections of the N-S and E-W trails are at 241 various altitudes, so to make it as fair as possible, the couple decides that Beth will choose a north- south road and Jack an east-west road, and they will camp at the intersection. Of course, Beth would like to be as high up as possible, and Jack as low down as possible. Below is a matrix of the four choices for each camper with the numbers in each slot representing the altitude in thousands of feet at the intersection of those roads. Each camper gets only one choice. What should they choose? Jack 1 2 3 4 l 7 2 5 l 2 2 2 3 4 Beth 3 5 3 4 4 4 3 2 1 6 Write down what you have concluded, and why you think it should be done that way? Notes to Instructor on Activity 5. It is not important that the groups figure out the best answer or the right answers for these problems. This activity is intended to get them thinking about strategies, playing some alternatives some of the time and then switch- ing, or perhaps choosing one strategy and then sticking with it. After the activity has been completed, discuss the various strategies that came up in the groups for each of the three games. The last game may be used as a lead in to a mini-lecture on the mini-max pure strategy. The first game can be used when you begin to talk about mixed strategies. The Challenge Question is meant for outside of class. It is intended as a extra problem for anyone who is inter- ested. 242 Activity 6. EXpected value. l. The oddments for this game have been 5§x34 1 calculated previously. Use coins, or the table or random numbers, to simulate the playing of this game. (Ybur simulation needs to be 50-50 for the column choice, but 3 to l for the row choice.) ‘Write down how you simulated the game, play the game 25 times using the simulation for a row player's choice and a column players' choice, and list the payoff each time. Find the mean of the 25 payoffs. How close is your mean to the theoretical expected value of this game? (This is what we mean by long run average ex- pected payoff.) 2. Carnivals often have a game called OVER and UNDER. (This game was notorious a couple years ago in church festivals.) TWo dice are rolled down a chute. Someone playing the game can bet that the dice will show a number OVER 7, UNDER 7, or they can bet on 7. OVER and UNDER each pay even money, and 7 pays 4 times the bet. Where is the best place to bet? a) Calculate the expected value of $1 bet on OVER. How about for UNDER? b) Calculate the eXpected value of a $1 bet on 7. c) Is this game a fair game? (Note: Recall what the outcomes are for rolling two dice!) 3. Another Carnival game involves a cage with two dice in it. Players bet on the number that comes up. The pay- offs are as follows: a) 8 (or 6) pays even money b) 9 (or 5) pays two to one c) 10 (or 4) pays four to one d) 11 (or 3) pays six to one e) 12 (or 2) pays ten to one. If a 7 shows up, the house always wins every- thing! Where is the best place to bet? Write down what you think at first glance. 243 Calculate the expected value for a $1 bet on each of 8, 9, 10, 11, and 12. (Keep in mind, if you bet on 8, anything else loses when you calculate the prob— ability of losing!) What is the best place to play? Did you guess it? How do you feel about this game? In the notes the game "$5 is you roll a 6, otherwise lose $1.50" was discussed. The expected value for that game was 3. (41.50) + % $5.00 = :‘Qéfl = -.41¢ How could you change the payoffs to make it a fair game? Optimal strategies for playing each of the games below have already been calculated in problem set #3. Use them to find the expected value for each of the following 2 x2 games: a) [20 -10] b) [:4 6] c) [3 1] -30 20 5 2 4 3 d) [60 100] e) [ O -l:) 100 80 -3 0 In each of the following, set up the payoff-matrix for the 2 x2 games described, calculate the best strategy for each player, and find the expected value of the game. a) Fast Eddie and Slow Sam are deep into a TGIF party at Lizard's. Fast Eddie says, "Sam, let's play a game. we'll throw fingers, either one or two. If we both throw one finger, you buy me a beer. If we both throw two fingers, you buy me two beers. If we don't match, you just pay me a dime." Slow Sam knows that he is probably at a disadvantage if he plays this game, but he decides to try to figure out how much advance compensation he should receive from Eddie each time they play the game. Set up the payoff matrix, determine the best strat- egies for each player, determine the value of the game, and then determine how much Fast Eddie should cough up on each play to make the game fair. (Determine your own price for a beer.) 244 b) After leaving the bar, Slow Sam tries to remember whether today is his anniversary or not. If it is, he should bring his wife some flowers. He reasons, in his disabled condition, as follows: "If it is and I do bring them, I will have at least 2 good days (no griping) to look forward to. If it is, and I don't bring them, I will have at least 10 bad days (con- stant griping) to look forward to. If it isn't and I do bring the flowers, I will get 1 good day in. If it isn't and I don't bring the flowers, nothing is lost and nothing is gained! Set up the payoff matrix, calculate the strat- egy and expected payoff to Slow Sam. Notes to Instructor on Activity 6. 1. Give a mini-lecture on expected value, for 2)(2 games and in general, before the students do Activ— ity 7. Cover several (or manY!) examples thoroughly. Post the results of each group for number 1 on Activity 7. It may be interesting to find the Grand Mean Payoff for the pooled results. When they set up a payoff matrix, the may forget our convention of letting payoffs be for the row player, and be dealing with the "transpose" of the correct payoff matrix. This is particularly in- teresting for game 6b. Try it both ways and see what Slow Sam's decision would be. Problem 5 can be done at home, and then results discussed in the groups. 245 Activity 7. Sample size. HOw many cards would you have to turn over from the top of a well—shuffled deck so that there was at least a 50% chance that an ace was among them? Put down a guess. a) b) C) d) e) Carry out an eXperiment with a deck of cards. Shuffle well each time. See how many cards you have to turn over until the first ace. Do 10 trials of the exPerimant, and list the number of cards turned over each time in a column. Using your data from these trials, how many cards would you have to turn over so that you hit an ace half the time? What is the median number of cards you turned over? Now, do the experiment ten more times and list the outcomes in a second column of 10, so that you now have 20 trials. Using your data from these 20 trials, how many cards would you have to turn over so that you hit an ace half the time? What is the most number of cards you had to turn over? What is the least number of cards you had to turn over? What is the median number of cards turned over? Post your results for a) and b) on the board. Make a lO>¢X:fi>¢N€fl>¢N x:<><>:x:4><>:x a) More paths in grid A b) More paths in grid 8 c) about the same number of paths in each grid. Give a reason for your answer. 2. b) Consider the grid below. x X 0 x x X Which type of path is more X X X 0 X X likely to occur (circle one)? 0 X X X X X . X X X X 0 X a) a path that hits 5X and l() i g g g X g b) a path that hits 6X and no 0 268 2. 3. 269 Give reason for your answer. c) Give your best estimate for the number of paths in the grid above. A jar contains 4 blue, 6 red, and 3 white marbles. If you draw one marble from the jar, it is most likely that you Will (circle one): a) get a blue marble b) get a red marble c) have the same chance of getting a red or a blue marble Give a reason for your answer a) A fair die is rolled. What is the probability of getting a 3? b) A fair coin is tossed. What is the probability of getting a head? c) List the possible outcomes for flipping three coins. d) What is the probability of getting one head and two tails in flipping three coins? write down your best guess. YOu are playing a game in which you are blindfolded and draw cards out of a box. If you draw a card that has an X on it, you win the game. In the boxes below, would you be more likely to win if you (circle one): a) draw from box A b) draw from box B c) makes no difference 270 X X BOX A O O >O CDX O O X X 0 X Give a reason for your answer. Three friends agree to change the order in which they to through the lunch line each day. In how many pos- sible ways can they arrange themselves? At the start of a party game, eight red, six green, four blue, and two white slips of paper were thoroughly mixed in a bowl. The chances that the first slip drawn at random will be WHITE are given by which of the fol- lowing (circle one): a) ——l-— b) 1 c) 1 8+6+4 8+6+4+2 8+6+4+l 2 d) 8+6+4+2 Give a reason for your answer. For four games you have the following chances of gain- ing points: Game A: 20 percent chance of winning 15 points Game B: 40 percent chance of winning 10 points Game C: 10 percent chance of winning 25 points Game D: 50 percent chance of winning 5 points If you play the game many times, you would be most likely to gain the greatest number of points in (circle one): a) Game A b) Game B c) Game C d) Game D Give a reason for your answer. 10. 11. 12. 13. 271 A committee of two people is to be chosen from among Bill, Sally, Joe, and Beth. List all possible commit- tees Of two from this group. a) There are 162 games in a baseball season. The manager of the team always bats his pitcher last. He has eight other players to assign to a batting order. Are there enough games in one season to try all possible batting orders for the other eight players? (circle one) a) Yes b) No b) Give a reason for your answer. If you circled No, how many seasons would it take? Give your best estimate A man bets you one dollar that at least two people at a party you are attending have the same birthday. How many people would have to be at the party so that the man has at least a 50% chance of winning the bet? Give your best estimate a) List an event that is certain to occur. b) List an event that is impossible to occur. The chance that a baby Will be a boy is about one- half. Over the course of an entire year, would there be more days When at least 60% of the babies born were boys in (circle one): a) a large hospital b) a small hospital c) makes no difference Give a reason for your answer. 14. 15. 16. 17. 272 A fair coin is flipped and comes up tails 10 times in a row. If you could win $1 by guessing the next toss, what would you guess? (circle one) a) Heads b) Tails Give a reason for your answer. You are playing a game with two other people. One person picks a number between 1 and 10 and the other two try to guess it. The guess closest to the number Wins the game. a) If you have the first choice, What would you pick? Give a reason for your answer. b) If the first player picked seven, what would you pick? Give a reason for your answer. A man must select committees from among ten people. WOuld there be (circle one): a) more distinct possible committees of eight b) more distinct possible committees of two c) about the same number of committees of eight as committees of two Give a reason for your answer. Let H stand for head and T for tail. 1) WhiCh of the following is more likely to occur for tossing one coin? (circle one in each part) a) H b) T c) about the same chance 19. ii) Give iii) 273 Which sequence is more likely for two tosses? a) H T b) T T c) about the same chance a reason for your answer. Which sequence is more likely for six tosses? a) H T T H T H b) H H H H T H c) about the same chances Give a reason for your answer. iv) Give V) Which sequence is more likely for six tosses? a) H T T H T H b) H H H T T T c) about the same chances a reason for your answer. What is the probability that in six tosses there will be three heads-and three tails? write down your best estimate and give a reason for your answer. Which is more likely to occur? (circle one) a) Pulling one red ball from a jar containing 10 red balls and 90 white balls. b) Pulling four red balls in a row from a jar con- taining 50 red balls and 50 white balls. Give a reason for your answer. 19. 274 A jar contains 8 red balls, 4 blue balls, and 3 green balls. Which is more likely to occur? (circle one) a) Pulling at least one blue ball in two tries. b) Pulling two red balls in a row. Give a reason for your answer. Name 275 POSTTEST Answer the questions below to the best of your ability. Supply the reasons for your answers where they are re- quested. l. The probability that a baby will be a boy is 1/2. Let B stand for boy, and G for girl. (Circle one in each case below). i) ii) iii) iv) Which of the following is more likely to occur for having one child? a) B b) G c) about the same chance Which of the following sequences is more likely to occur for having two children? a) B G b) G G c) about the same chance Give a reason for your answer. Which of the following sequences is more likely to occur for having six children? a) B G G B G B b) B B B B G B c) about the same chance Give a reason for your answer. Which of the following sequences if more likely to occur for having six children? a) B G G B G B b) B B B G43 G c) about the same Chance Give a reason for your answer. 2. 276 v) What is the probability that in six children there will be three boys and three girls? Give a reason for your answer. A fair coin is flipped and comes up heads 10 times in a row. If you could Win $10 on a $1 bet by guessing the next toss, what would you guess? Why? Which is more likely to occur? (circle one) a) Pulling one red ball from a jar containing 10 red balls and 90 white balls. b) Pulling four red balls in a row (with replacement) from a jar containing 50 red balls and 50 White balls. Give a reason for your answer. The chance that a baby is born a boy is about 1/2. Over the course of the entire year, would there be more days when at least 60% Of the babies born were boys in: (circle one) a) a large hOSpital b) a small hospital c) makes no difference Give a reason for your answer. 6. 277 People at a Carnival pick one number from 1 to 100. If two people match, they win a prize. HOW many people would have to be playing the game in order that there be at least a 50% chance that there would be winners? Give your best estimate. a) HOW many paths are there in this grid? X.O X.X 0 X X.O X b) How many paths are there in this grid? X X.X XLO XiO XZX Consider the grids below. GridA XXXXXXXX GridB X X X.X X X1X.X X X X.X X X XHX Are there: (circle one) a) More paths in grid A b) More paths in grid B c) About the same number Of paths on each grid. 94>¢Xfifl>fix 9<>¢N7¢>¢Xfifl>¢x Give a reason for your answer. Consider the grid below. Which type of path is more likely to occur? (circle one) XX 0 XX a) a path that hits 4X and 10 xxxox , 0 xxx x b) a path that hits 5x X X.X X 0 . X 0 X.X x Give a pegpppror your answer 278 9. A man must select committees from a group of 10 people. WOuld there be: (circle one) a) more distinct possible committees of eight b) more distinct possible committees of two c) about the same number of committees of eight as committees Of two Give a reason for your answer. 10. A jar contains 9 red balls, 4 blue balls, and 3 green balls. Which of the following would be more likely to occur? (circle one) a) Pulling at least one green ball in two tries (with replacement) b) Pulling two red balls in a row in two tries (with replacement) Give a reason for your answer. 11. A pair of dice are rolled. What is the probability that the sum of the faces will be a 5? 12. a) List the outcomes from tossing three coins. b) What is the probability that there will be 2 heads and l tail? 13. 14. 279 The probability that it rains in Seattle on a given day is 2/3. The probability that Bill forgets his umbrella is 1/4. What is the probability that it rains and Bill forgets his umbrella? For three games, you have the following chances of winning points: Game 1: 50% chance of winning 8 points Game 2: 20% chance of winning 20 points Game 3: 30% chance of winning 15 points If you play the game many times, in which game would you be most likely to gain the greatest number of points? (circle one) a) Game 1 b) Game 2 c) Game 3 Give a reason for your answer. 280 EXPERIMENTAL COURSE EVALUATION FORM Please take a few moments and respond thoughtfully to the questions below. If you wiSh, type your answers. YOu may either sign your responses or not. Hand these in before or on the day of the final. 1. What suggestions would you have for improving this course? What would you like to change about the course? What would you like to leave the same? What did you like about this course? What did you dislike about this course? In answering these questions, please reflect upon: the required texts the in-class activities and activity sheets the log and assignments learning mathematics by working in groups with other students as well as anything else that you would like to say about the course. Thank you. MICHIGAN STATE UNIV. LIBRQRIES IIHIHHII 1“ \III III "MI W IIHUII |||| INN) ll) NH) IIHI 31293103147090