THE EFFECT OF S‘HMULUS EMFHASES ON STRATEGY SELECTEQN N THE ACQUESITION AND TRANSFER OF CONCEPTS Thesis for ”we Degree of DB. D. MICHIGAN STATE UNEVERSITY Thomas Robert Trabasso 1961 This is to certify that the thesis entitled THE EFFECT OF STIMULUS EMPHASIS ON STRATEGY SELECTION IN THE ACQUISITION AND TRANSFER OF CONCEPTS presented by Thomas Robert T rabas so has been accepted towards fulfillment of the requirements for Ph. D. degree in Psxchology , w": { " > 4 " ” 4 I" 4 £3 (1/; a y -’ ’ 1 \L L. ‘1‘4. ’ V v '__,\I_ ,r v ‘k" “g Major professor L-\ n ‘ Due 71.71}: inflict r/_ ”49/ 2 0—169 LIBRARY Michigan State University fi . -—_'—vfi .— A_. - -- ABSTRACT THE EFFECT OF STIMULUS. EMPHASIS ON STRATEGY SELECTION IN THE ACQUISITION AND TRANSFER OF CONCEPTS by Thomas Robert Trabasso This dissertation reports the results of a theoretical and experimental analysis of attention in a simple concept formation task. - An analysis of the literature suggested that attention in learning can be studied objectively. Aspects of the stimulus situation can be arranged so that they control or affect the locus of attention. The speed of learning is an. indicator of the probability that a relevant dimension is attended to and serves as an index of the effect of an attempt to direct S's attention. When a problem involves more than one relevant cue, or more than one irrelevant'cue, or a relevant cue which can be diminished or enlarged, . measurements of transfer-of-training can be 'used to gain more'detailed information regarding the distribution of attention over the parts of the stimulus pattern. Salient stimuli, such as (l) colorzor (Z) a large difference between discriminanda, which affect thel'direction of the_S_'s attention to cues, were varied in their contextual r61e in order to study their effect on learning and transfer of concepts. A stimulus which increases the probability of attend- ing to a relevant stimulus is called a stimulus "emphasizer" while one which directs attention away from a relevant stimulus is called a "counter- emphasizer. " A mathematical model of discrimination learning (Restle, 1961a; 1961b) is used in the formulation of the experimental problems. The model explicitly treats the mechanisms by which S selects strategies in cue- learning and defines the probability of solution as equal to the proportion Thomas Robert Trabas so of correct strategies in the problem. The goodness-of-fit and predictive value of the theory was tested against the data. The attention-value of a stimulus aspect was evaluated by using the model to estimate the measure of a, set of strategies based on that aspect. A transfer-of-training design known as "easy-to-hard" transfer was used to study (1) the role of attention and (Z) efficiency in concept formation. The degree of efficiency was hypothesized to depend upon stimulus emphasis and relationships of the original learning (easy) and transfer (hard) problems. The stimuli were complex flower patterns and the correct responses (two-choice) depended upon one or two aspects of the pattern. The relevant dimension of the hard problem was the angle of theeleaves to the stem of the flower. . Nine groups, of 20 _S_s each, worked on different original learning problems and were'all transferred to the same hard problem after criterion in original learning. A tenth group had the hard problem as its original learning problem and served as a control. All comparisons of acquisition and transfer were relative to this control group. In two problems, emphasis of the relevant angle dimension was achieved by either (1) doubling the difference between discriminanda or (2) removing irrelevant cues during original learning. Bothibroupswere highly efficient: they learned very rapidly and showed nearly perfect transfer. . Color on the angle of the leaves to the stem constituted an "emphasizer. " When a constant color was used on all trials, the effect was not strong; red had a, detectable effect, but green did not facilitate learning at all. When . color varied from trial to trial in a third problem, and was an irrelevant dimension, the net effect was slight facilitation of learning. In these three groups, color could not serve as the basis for a correct strategy and transfer tothe hard problem was perfect. Thomas Robert Trabasso Twoproblems had color added as a redundant and relevant dimension during original learning. Both. problems were learned faster than prob- lems with only one dimension relevant, an example of "additivity of strategies. " In one problem, the color was also an emphasizer and transfer to the hard problem was somewhat positive. In a second problem, color was a counter-emphasizer, appearing over the flowers during original learning, and transfer to the hard problem was slightly negative. Two control problems had color relevant and the angle dimension fixed. . Color was found to be more salient as a cue than the angle. There was no evidence for transfer of an "observing response" to the angle in these groups. The stochastic properties of the data were consistent with the expecta- tions of the Strategy Selection Theory. Analyses of _S_s' performances before criterion indicated that errors occurred at random with probability near one-half, constant and independent of how long S was in the pre- solution phase. Fitted theoretical error distributions yielded good approxi- mations in eleven of twelve cases. There was some evidence that _S_s use 7 "wrong" as well as irrelevant strategies in the pre—solution phase. Since wrong strategies depend upon the same cues as correct strategies, it was predicted that estimates of the measure of wrong strategies would be about the same as estimates of correct strategies. . This quantitative prediction was verified. Wrong strategies were detected and their measure correlated with the measure of correct strategies. Inter-correlations between practice, original learning and transfer problems indicated no stable individual differences. By taking account of stimulus emphasis, and using the Strategy Selection Theory, the additivity of relevant strategies and additivity of irrelevant strategies were accurately predicted. The degree of transfer was predicted in three ways: (1) number of _S_s showing perfect transfer, Thomas Robert Trabas so (2) mean errors in transfer and (3) cumulative distributions of error scores in transfer. -All predictions were based on parameters which had been estimated from original learning data and independent groups. . Seven of eight predictions on transfer were accurate. 7 Efficiency in concept learning was discussed in relation to the present and other findings. The question of the precise role of a stimulus emphasizer was examined and further investigations on emphasizers suggested. References: Restle, F. The selection of strategies in cue learning. Psychol. Rev. , 1961a (in press). Restle, F. Statistical methods for a theory of cue learning. Psychometrika, 1961b, 2__6_, 291- 306. a“ ' x’f-i - ) -' _/ _. Approved: \j A (“J9 (1de Frank Restle,. Major Professor Date: //‘//‘6/ THE EFFECT OF STIMULUS EMPHASIS ON STRATEGY SELECTION IN THE ACQUISITION AND TRANSFER OF CONCEPTS BY Thomas Robert Trabas so A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY D epartment of Psychology 1961 DEDICATION To Sue ii ACKNOWLEDGMENT The author wishes to express his sincere gratitude to Dr‘. Frank Restle, chairman of his committee, who willingly gave time and energy to the, planning, execution and development of this manuscript. In addition, he wishes to thank Drs. T.. M. Allen, A. B. Barch, D. M. Johnson, andM. R. Denny, members of his committee, who have lent their criticism and advice during the preparation of this thesis. Finally, he thanks John N. Schneider, who helped collect some of the data in the experiment. >§< #030? >l<**>i< >i<*>i< *>§<* iii TABLE OF CONTENTS CHAPTER Page I. INTRODUCTION ................... 1 II. THE STRATEGY SELECTION THEORY ....... 24 III. METHOD ....................... 38 IV. STOCHASTIC PROPERTIES OF THE DATA AND TEST OF THE STRATEGY SELECTION THEORY . . . . ......... ,, . . . . . . 47 V. EXPERIMENTAL RESULTS ............. 65 VI. . DETAILED PREDICTIONS .............. 76 VII. DISCUSSION ..................... 91 VIII. SUMMARY .................. . . . . 96 REFERENCES ........ . ................ 99 iv TABLE LIST OF TABLES . Page Experimental Groups and Problems .......... Test of Stationarity: lst versus 2nd Half Errors in the Pre-solution Phase ................. Maximum-likelihood Estimates of c .......... Maximum Discrepancies Between Theoretical and Observed Error Distributions . . . .......... Test of Consecutive Errors: Proportion of SS Making More, As Many, or Fewer Errors Following Errors than Correct Responses Following Errors ....... Estimates of w, the Proportion of Wrong Strategies, Compared with Estimates Of C ......... Inter-correlations Between Practice (P), Original Learning (CL), and Transfer (T) Problems ..... Comparison of Large Angle and Angle Only with Angle on Original Learning . . . . . . . . . . . . ...... Transfer to the Angle Problem by Large Angle and Angle Only Compared with Original Learning of Angle.............. ..... Comparison of Red Angle and Green Angle with Angle on Original Learning ........ . . .. . . . . . . . Transfer to the Angle Problem by Red Angle and Green Angle Compared with Original Learning of Angle ........................ . 4 Comparison of Angle + Angle Color Irrelevant with Angle on Original Learning and Transfer ....... 45 49 53 59 61 62 64 65 66 67 67 68 LIST OF TABLES -» Continued TABLE 5.6 , Page Comparison of Angle + Angle Color and Angle + Flower Color with Angle on Original Learning. . . . Transfer to the Angle Problem by Angle + Angle Color and Angle + Flower Color Compared with Original Learning of Angle ....... . ...... Comparison of Angle Color Control and Flower Color Control with Angle on Original Learning ....... Transfer to the Angle Problem by Angle Color Con- trol and Flower Color Control Compared‘with Original Learning of Angle ....... . . . . . . . ' Formulas of c for Experimental Groups Used in Predictions . . . . . ........ a. .. ...... Estimates of Sets of Strategies Used in Predictions . vi 69 70 71 72 78 78 FIGURE 2. 2. 0‘ rP-vP-rh rth H LIST OF FIGURES Page Stochastic Structure of Strategy Selection . . . . .. . . 27 -A Geometric Distribution showing, the probability of obtaining k tails before the first head in tossing a fair coinsixtimes.. ............ . Z9 Hypothetical geometric and normal distributions of error scores in a learning experiment . . . ..... 29 Examples of flower patterns used in experimental problems .............. . . . . . ..... 42 The probability of an error, conditional on _S_ being in the pre-solution phase ..... . . . .' ........ 51 Observed and theoretical cumulative distributions of the proportion of _S_s making n or fewer errors . . . 55-58 Predicted and observed cumulative distributions of the proportion of _S_s making n or fewer errors . . . 86 vii CHAPTER I INTRODUCTION The present study addresses itself to three central problems in current learning theory: 1. What makes a problem easy or difficult to learn? 2.. Under what conditions will learning transfer to new problems? 3. What roles does attention play in learning and transfer? Answers to these questions have a practical value (e. g. for use in edu- cational devices such as teaching machines) and are relevant to basic theoretical issues (e. g. the question of continuity versus non-continuity of learning). Attention, as will be shown, has been neglected in learning theory. The term carries several meanings so that its usage is frequently not precise. Attention, as used in the present study, is defined as: the active selection of, and emphasis on, one component of a complex experience and the narrowing of the range of-objects to which the organism is responding; the maintainance of a perceptual set for one object and disregard for other. (English and» English, 1958, p. 49) An illustration of the role of _S_'s attention to cues in a learning situation is given by the following example: An instructor in a biology laboratoryrwishes to train a student to discriminate a micro-organism from the background debris on a slide. To facilitate the student's identification of the micro-organism the instructor might (a) color the micro-organism or (b) point to it with a black line. The instructor's intervention into the stimulus situation via the color or line elicits an attending response of the student to the mic ro-organism- The additional stimulus in the situation has the effect of "pushing" another stimulus, the micro-organism, to the fore. To facilitate learning, the biology instructor'sxmethod of pointing out the micro-organism must make it easy to identify. . Later, for the training to be judged effective, the student must be able to identify the micro-organism. without the added stimulus. . An effective teaching method shouldfacilitate learning and lead to good transfer of training. The learning problem can be complicated by requiring the student to learn to differentiate the mic ro-organism not only fromthe background but also from other‘micro-organisms. .This requires further training, which might consist of the student looking at a set of slides, in a series, with each slide containing a different micro-organism. ' The student's job would now be to (l) discriminate each micro-organism from the back- ground and (2) discriminate each micro-organism from the others in the collection and correctly label them. The task has now become one involv- ing conceptualization of the stimuli as well as discrimination from back- ground cues. . The training procedure in this example is analogous to one commonly used in concept formation. . Inthis thesis, aspects of the stimulus situation which can be used to direct the S's attention are studied. An aspect, such as the color or black line in the example, which increases the probability of attending toa relevant stimulus is called-a stimulus "emphasizer. " The emphasizer itself may or may not be a cue to solution and thus, its own r61e can vary. The effect of an emphasizer is to facilitate learning by increasing the likelihood that S uses the relevant stimulus. A second‘ role of a salient stimulus such as color may be to distract S away from the relevant cue. Suppose the instructor inadvertantly colored a smudge on the slide. The student's attention would be directed by the color away from the micro-organism. Identification would then be retarded. An aspect of the stimulus situation which decreases the probability of the _S_ attending to the relevant cue is called a "counter-emphasizer. " In the example, if the colored smudge served as acounters-emph‘asizer and the student responded to it as -if it were the micro-organism, it is likely that no transfer of learning would occur on a test- situationwhich con- tained the micro-organism and no smudge. This example, with salient stimuli as emphasizers and counter- emphasizers, suggests a way in which attention and its-effect On learning can be investigatedin a set of concept formation problems. . Conc ept formation has become a focal point of experimental effort during the last decade (see Kendler, 1960, forua good-review). . Methods of investigating concept formation are formally like those used in the study of discrimination learning. - In general, _S_ is required to learn the same response to objects of the same class, but a; different response or no response at all to objects belonging to other classes. Hull (1920) us edwhat may be called a. "modified memory" method in an early study of efficiency of concept learning- Nonsense names were assigned to Chinese characters and _S_ was required to learn to“ name each character. The relevant aspect of the character was called a "radical" and the radicals were embedded in many compound characters. Each character contained one of twelve radicals and all characters withthe same radical were assigned the same name. By use of a memory drum, packs of 12 radicals were presented serially and after a practice trial, learningrwas by the anticipation method. A second. approach. is that of Heidbreder (1946) who had~_S_s sort a pack- of 144 cards into nine piles. . Each, pack contained three kinds of objects, three kinds of forms and three kinds of number groups. . Objects were easiest and number the hardest to learn. . In-both the Hull and Heidbreder approaches, the S had to search for a basis ofclassificationwithin each stimulus pattern. . In order to perform thetask accurately, _S_ must resort to some form of "conceptual behavior. " Concept formation in the Hull approach resembles learning of paired associates. One of the most successful approaches toward understanding of the learning of paired associates is the idea that errors arise through confusion between stimuli. Similar stimuli within a list are difficult to learn (provided different responses are required) and errors confusing two similar stimuli are more frequent than errors confusing two dissimilar stimuli. Factors in the difficulty of learning paired associates can be interpreted as depending upon a tendency to make the same response to two stimuli which are similar, i. e. , to "generalization. " The main state- ment of this approach is by Gibson (1940). The similarity of concept formation to discrimination learning allows the application Of theory and advanced experimental techniques of dis- crimination learning to the study of concept formation. At the same time, the similarity revives theoretical issues which have been raised in dis- crimination learning; namely, continuity of learning, the nature of selectivity and, as mentioned above, the r61e of attention. Since the position has been taken that attention is an important part of the learning process and that attention has been neglected as a factor in learning, a review of some Of the thinking on this problem by learning theorists is made first (question 3 above). Then, stimuli whichinfluence the ease or difficulty of a problem are considered (question 1). Finally, the discussion is centered on those studies which have dealt with efficiency (question 2) via the "easy-to-hard" transfer paradigm. This review will attempt to show that divergent results on transfer can be obtained from seemingly similar procedures. These topics, (1) attention in. learning theory, (2) stimulus sources of difficulty, and (3) experiments on easy-to- hard transfer, together provide the setting for the present experimental problem. Chapter 11 contains a detailed treatment of the experimental problem formulated in terms Of a theory of cue learning called the Strategy Selection Theory (Restle, 1961a; 1961b). Attention and Learning Theory The complaint that attention has been neglected is not new nor infrequent. William James (1890), in an introduction to his chapter on attention, had the following to say: Strange to say, so patent a fact as the perpetual presence of selective attention has received hardly any notice from psy- chologists of the English empiricist school. .The Germans have explicitly treated of it, either as a faculty or as a resultant, but in the pages of such writers as Locke, Hume, Hartley, and the Mill's and Spencer, the word hardly occurs, or if it does, it is parenthetically and as if by inadvertance. (p. 402) James defined attention as a process with a locus in the "mind, " where one out of several possible Objects was selected and made clear or vivid. _ Attention was not an isolate, for: The immediate effects of attention are to make us: (a), perceive-- (b) conceive-- (c) distinguish-- (d) remember-- better than we otherwise could-- both more successive things and each thing more clearly. It also (e) shortens 'reaction time'. (p. 425) The set of problems which have arisen out of the study of attention, before and since James, has been summarized by Woodworth and Schlosberg (1954). . In general, the experiments deal with the problem ofémaking some selective response such as looking at one of several simultaneously presented objects. The response depends on physically defined stimuli and organismic variables such as. past experience or set. Several problems are distinguished; namely, (1) the stimulus determinants of attention , (2) shifting and fluctuation of attention, (3) distraction, (4) divided attentionu-doing two things at once, and (5) Span of attention. . In the recent Zeitgeist on attention, experimental effort has been directed toward effects of simultaneous presentation of two stimuli and the question of which is attended to first (Broadbent, 1958) and on internal states of arousal (Be rlyne, 1960) . In the concept formation task (e. g. Hull's) a number of stimuli are competing for the S's attention at the same time. Some are relevant and can be used as a basis for solution while others are irrelevant and do not lead to a correct strategy. . Stimuli which are salient would be the ones most likely to be attended to and be used as the basis for a strategy. Thus, a salient stimulus which was relevant would be more likely to facilitate learning than one which was not salient but relevant. Hull, in his 1920 monograph on concept information, ididrno‘t neglect the r61e of attention. He considered the use of a salient stimulus in posing the question: What is the relative efficiency of evolving functional concepts from concrete cases in which the attention of the subject is con- tinuously attracted to the significant common element, as compared with the ordinary simple-tO-complex method?“ (p. 51). Hull assumed that a saturated red on the common element (radical) would attract the S's attention to the relevant aspect and lead to faster learning. For each _S_, each list of twelve Chinese characters had the same six radicals colored red, while the remaining six were left black. Colored and black radicals were counter-balanced over Es. The red symbols were learned faster and showed more transfer on a test series with no red than those symbols which were black in the training series. Hull concluded that there was a distinct advantage where the attention of _S_ is attracted to the common element in situ. Despite Hull's early demonstration of the r61e of attention in a learning task, S-R theorists have given attention no detailed treatment. Attention was excluded from the study of learning largely because it had “'mentalistic" import. Instead, attentional theories (e.g. Lashley, 1929) became a source Of criticism of behaviorism. S-R theorists continued to concentrate on observable responses and their reinforcement, precluding stimulus factors which direct the _S_‘s attention. The classic S-R approach of Hull and Spence regards learning mainly as the manifestation of approach responses to reinforced stimuli and avoidance responses to non-reinforced or punishing stimuli... Spence (1937; 1940) recognized that during discrimination training, E learns responses which increase the likelihood of his being stimulated by the relevant stimuli. . Spence called these responses "receptor-orienting" but did not attempt to account for their development. "-Berlyne (1951) made a plea for the inclusion of perception and, particularly, attention, into S-R theory. . Berlyne suggested that the perceptual process be -.treated-as aniintervening variable in Hull's system.“. The selective nature of learning was stressed since it reveals the direction of attention. . Stimulus variables, such as (1) intensity, (2) change, (3) postural adjust- ments and preparatory sets, and (4) organization of the perceptual field, were mentioned as some determinants of attention. A stimulus determinant of attention which. has _been singled out for investigation in the present study is a- "salient" stimulus, used-as an emphasizer of another stimulus. The question, What is salience? may , now be asked. , Salience may be regarded as a symptom of attention as well as an influence on its direction. William Stern .definedsalience phenomenologically as the degree to which an experience stands out ' sharply and is relatively disconnected from the rest of the experience. An antonym of salience is "embeddedness. " 'A salient stimulus, then, is one which. is relatively prominent in the psychological field in, relation to other stimuli. , Salience is not intensity of the stimulus (though they may be related) but it is a "distinctiveness" and a sort of immediately perceived importance (English and English, 1958, p. 471).. Color, then, as used by Hull (1920) seems to fit this definition of salience. Wyckoff (1952), following Spence's receptor-orienting notion, pub- lished a probability model calling the learned response an “observing response. " The observing response, R0, is learned by the principle of secondary reinforcement. The probability, p0, of the observing. response bears a circular relation to the speed of learning. The observing response is learned to the extent that its occurrence increases the prob- ability of reinforcement, but the observing response can increase the probability of reinforcement only if§_ is responding above chance, i. e. learning. Wyckoff tested his theory on pigeons (reported in. Prokasy, 1956) where _S_ was required to learn to step on a pedal (observing response) to the flashing ofawhite light in order to be presented- with two colors to be discriminated and receive reinforcement. The pedal press response was learned without direct reinforcement. Prokasy (1956) showed that rats would develop right or left turning habits (observing responses) .to the side where consistent reinforcement of a black-white discrimination could be obtained. The observing response notion of Wyckoff offers some possibility of a rapproachement between S-R and attentional theorists. However, Spence and Wyckoff seem more concerned with the form of the observing response and the reinforcement of it, thanwith either the stimulus conditions (cues) to observing or the detailed changes in the stimulus input which may be brought about by observing. Lawrence (1949; 1950) offered a mediating process called "acquired distinctiveness of cues" which takes into account the previous experience of§ in determining the selection of cues and rate of learning in new situ- ations. Lawrence trained rats on simultaneous and then on successive discriminations with the same stimulus dimensions. New instrumental responses were learned faster when the cues were familiar (had been previously reinforced). Selectivity of responding was demonstrated with respect to the relevant stimuli which would become associated in the new learning situation. Lawrence stated that discrimination learning con- sisted of two processes: (1) a change in the perceptual character of the stimuli which was brought about by prior learning, and (2) then the associ- ation of stimuli with instrumental responses. With human SS, Kurtz (1955) showed positive transfer of a reSponse where the training stimuli were not identical to the test stimuli but where they were distinguished by the same property as the latter; and negative transfer where the training stimuli were not identical to the test stimuli and were distinguished by a different property. This study supported Lawrence's notion of A acquired distinctiveness, but the cues were relations. rather than specific stimulus dimensions. . Lawrence's concept of a mediating pro- cess which causes "acquired distinctiveness of cues" brings out changes in the effectiveness of stimuli due to training but says little about the cues which give rise to the mediating process. . A third body of experimental literature has dealt with. learning to - "ignore" stimuli. Hammer (1955) tried to determine whether human _S_s learn to not attend to irrelevant stimuli. . In one group, letters which were irrelevant in a training problem were made relevant in the transfer problem. In a second group, irrelevant letters of the training problem were retained as irrelevant stimuli in the transfer problem. If _S_s learned to not attend to irrelevant stimuli, the first group should show negative and the second, positive transfer. No differences in transfer performance were found and no evidence for Ss learning to ignore irrelevant stimuli during training. . LaBerge and Smith (1957) derived a hypothesis from stimulus sampling theory (see Estes, 1959), and found that _S_s who respond asymptotically ignore "background common elements" which were associ- ated with partial reinforcement. Blank trials were inserted in the train- ing series as a test and when the partial reinforcement schedule associated with the background stimuli changed, S3 at asymptote did not change their responses but those who were not at asymptote did change on the blank trials. Hughes and North (1955), on rats, found that _S_s attend to partially correlated cues after learning had taken place on other cues, a result in opposition to the LaBerge and Smith finding. Hughes and North first 10 trained the rats to discriminate form. .After criterion was reached, during a series of overlearning trials, color (black-white) was partially reinforced (75% versus 25%). Training was then given on a black-white discrimination problem and transfer to this problem was found to be positive. . Restle (1955), in a mathematical theory of discrimination learning, explicitly describes a dual process by which _S_ learns to make correct responses. Cues which are relevant become conditioned to the response while those which are irrelevant become "adapted. " Restle (1959) tested the effect of making one of two relevant cues irrelevant (by "'scrambling") after conditioning and adaptation had taken place. . He found that about 50% of the previously adapted cues became unadapted and interfered on transfer. . Restle claimed that "background cues" which are irrelevant throughout the experiment are neutralized or adapted during original learning. Neutralization is not a formiof "learning to ignore" cues, for a cue will remain neutralized only if the relevant cues remain present and relevant. Thus an irrelevant cue is neutralized—with respectito some relevant cue. Restle (1958) has applied the same reasoning to learning set data and found some good approximations of his model to the data. The results were used to explain Hammer's failure to find. a trans- fer of neutralization since relevant cues were not carried from one problem to the next.- . Thus, Restle's notion of adaptation of irrelevant cues gives a quantitative acount of changes in the stimuli, due to a sort of observing process. His hypothesis thatthe rate of adapting depends upon theproportion of relevant cues relates these changes in perception to characteristics of the stimulus presented. The connection is, however, an ad gassumption and makes the S somewhat prescient. In a different approach to selectivity, Harlow (1959) Operationally defines "hypotheses" that monkeys use in forming concepts and discrimi- nating objects. Harlow calls this analysis "error factor theory. " 11 - For example, a "stimulus perseveration error factor", is said to occur when _S_ makes repeated choices of the incorrect stimulus object. .Moon and Harlow (1955) studied a number of error factors andvfound that these responses extinguish progressively throughoutthe course of learning set formation. Harlow and Hicks (1957), in a "uniprocess learning theory, " describe discrimination learning as a process of eliminating error factors. By "uniprocess" these writers mean that _S_ is not trying to learn correct responses but is learning to not make incorrect responses. Although .Harlow's error factor theory describes the gradual removal of erroneous stimulus-response connections, it does not give either the cue to observ- ing or the changes in stimuli which might form a basis for the removal of error factors. _ Restle (1961a; 1961b) has recently published a new discrimination learning theory (the Strategy Selection Theory, see Chapter II) which represents a complete revision of his position in: the 1955-model. .In the newtheory, explicit assumptions regarding the mechanisms by which._S_ solves cue problems are made. The _S_ is assumed to select "hypotheses" or "strategies" at random from a set‘Of strategies (which are determined by the stimulus situation) and to reSpond accordingly. If the response is correct, _S_ continuesto use the same strategy; if the response is in error, , S resamples a new strategy (with replacement) and continues testing. Sampling of strategies may be (a) one-at-a-time, (b) all-at-once, or (c) a random sample of strategies. The result is the same set of prob- abilities of success or failure. - Learning is described as discontinuous and the model is similar to the earlier attentional theories of Lashley (1929; 1938) and Krechevsky (1932). The theory handles most of the quantitative results as accurately as the 1955 model and its applications (Restle, 1957; 1958; 1959). ' .Lashley (1929) emphasized the perceptual and selective nature of discrimination learning. . He criticizedS-R theory by stating: 12 The description of discrimination as a mere combination of a positive and a negative reaction misses the essential features of the process, which are isolation Of figure, the discovery of dif- ferences and the generalizing characteristic of the responses. These are prior to and not a result of training. . (p. 184) Krechevsky (1932; 1938) in his emphasis on the use of "hypotheses" by rats was. influenced by the thinking of Lashley with respect to the dis- continuity and the selective nature of learning- Krechevsky described the discrimination process as one where the _S selects out of, the situation certain stimuli towhich he attends and continues to do so until he learns that they are not correct. The §._ then gives up responding to these stimuli and proceeds to select another set of stimuli to respond to. Krechevsky invoked certain Gestalt principles to account for the formation of hypothe- ses, such as the stimulus configuration forcing a specific response. . . . .A hypothesis; is the individual's interpretation of the data; it is not a phenomenon deriving from the presented data alone. (p. 532, 1932). Lashley and Wade (1946), again opposing S-R theories, credited the stimulus situation with an important role in the determination of selective responding. The concept of "perceptual. dominance" was. used to describe the fact that one stimulus dimension (e. g. color) 'may predominate over another (e. g. size) in the field even though both” are relevant. i If a. monkey is trained to choose-a large red circle and avoid a small greenone, he will usually choose any red object and avoid any green but will make chance scores when like colored large and small circles are presented. (p. 82) Warren (1954) tested this statement by training monkeys on ,problems where two or more stimuli were relevant and redundant. (either one or both could be used as the basis for a correct strategy) and thentransferred them to problems where one of the relevant dimensions was removed. Although color controlled responses more than other dimensions, the monkeys showed positive transfer to other dimensions, indicating that 13 they learned something about the second redundant dimension during training. .Although.Lashley and Wade concentrate on the distribution of attention over stimuli, they do nOt take much account Of how modifiable this distribution is ’or of the details of the stimuli involved. A A . In summary, S-R theories, most of which assume learning to be continuous, have neglected attention in learning and offer no detailed explanation for it as a part of the discrimination process. , On the other hand, attentional theories, most of which assume learning to be discon- tinuous, have been used to criticize S-R theories, but are vague and not precise as to mechanisms and conditions for selectivity of responding. Stimulus Sources of Difficulty Consider a two-choice learning problem. The _S_ must attempt to perform two tasks simultaneously. Using W‘oodworth andSchlos‘perg's (1954) familiar notation, the double task may be represented by the formula: RtRz = “0102.. 5152)- The question is whether R, can be connected with $1 and R; with 52. If other stimuli, 81' and S," are present and they call for R1 and R2 respectively, then similarity between S, and SI' facilitates learning but similarity between S; and SI retards learning. The degree of difference between two discriminanda, S, and 52, is a well-established variable which can influence the ease or difficulty of learning. a A reduction in the difference (by making 8, and S; more similar on the same continuum) results in slower learning. This relation holds for identifiable physical continua such as brightness, color, size, shape, pitch, number, etc. . Murdock (1960) has recently'defined "distinctiveness" on the basis of difference between discriminanda. In the easy-to-hard 14 training procedure, one of the main methods for making as hard dis- crimination easier is to manipulate the magnitude of the difference between discriminanda(with color, Lawrence, 1952; With‘pitch, Baker and Osgood, 1954; and with size, Restle, 1955). , Early experimenters on concept identification (Hull, 1920) and discrimination learning (Lashley, 1929; 1938) used: "complexity" of the stimulus as a source of difficulty. . Hull, in his monograph, made the radicals more difficult to identify by embedding the radical ina context with several irrelevant stimuli. .Lashley trained. rats on embedded figures and tested them on simple ones in an effort to determine to which aspects the rats were responding. . The modern .view of complexity of the stimulus is analytic, and systematic research has dealt with the major variables of relevant and irrelevant dimensions. . Archer, Bourne and Brown (1955) and Bourne and Restle (1959) report cases where difficulty; increases as a-function of the number of irrelevant dimensions in the problem. . Archer, Bourne and Brown us ed information theory to explain the result, contending that each added irrelevant dimension increases the- alternatives and slows learning. Bourne and Restle assumed that learning rate depends upon the proportion of relevant cues and this proportion decreases as irrelevant cues" are added. . . Relevant dimensions, when added and made "redundant, lead to faster learning (Eninger, 1952; Warren, 1953; 1954; Restle, 1959; Bourne and Restle, 1959; Trabasso, 1960).. This effect of faster learn- ing with added redundant relevant cues has been described as "additivity . of cues " (Restle, 1955). . Additivity of relevant and irrelevant cues can be handled quantitatively, even though their effects are opposite (Bourne and'Restle, 1959). A The nature of the concept to be learned is relatedto difficulty. Heidbreder (1946), over along series of studies, has consistently 15 demonstrated that object, form and number concepts are ordered in dif- ficulty with object concepts the easiest. . In more recent studies, where the stimuli are binary dimensions of color, form, size, etc. , the rate of learning On each of the several dimensions appears to be about equal, except that color is somewhat faster (Bourne and Restle, (1959). Warren (1953) has also confirmed that color is a more salient cue than. form or size formonkeys, although form and size are about equal. . However, Hara and Warren (1961) have indicated that inequalities of: the strength of dimensions occur because the stimuli have not been psychophysically scaled. Scaling the stimuli on the basis of discriminability with cats, . Hara and Warren were able to show that, for form, size and brightness, discriminations combining equally detectable differences in different sensory continua produced faster learning in cats (additivity of cues). ._S_s were also trained on problems where two cues were correct (small black figures versus large white ones). Critical tests of equivalence were then made by posing small white figures (+ form and - brightness) against large black ones (- form and + brightness). No preferences were shown when the cues were equated in terms of discriminability for an S and then opposed in these critical tests and averaged over Es. Hiedbreder concluded that the function of the concept, or theg's familiarity with it, controlled the difficulty of learning. This later work suggests that the difference in difficulty in her concepts were the result of unequal dis- c riminability . Separation of the relevant aspect from the background in the stimu- lus situation leads to faster learning. Object (3-dimensional) discrimi- nations are easier to learn than pattern (2-dimensional) discriminations, even where the relevant cues are identical (Harlow, 1945). . A similar result has been shown with mentally retarded children (House and Zeaman, 1960). . In these studies, by making the patterns three dimensional, the investigators were adding stimuli to the situation which lead to a 16 clearer separation of the relevant, cue (figure) from the irrelevant cues (ground). . Hull achieved a similar effect by coloring the radicals of the Chinese Characters red. Blazek and Harlow (1955) made discrimi- nation problems easier by increasing the color area on a two dimensional surface. . North (1959) achieved the same result with rats by filling in forms such as triangles and by making bars over which rats had to crawl, thicker. Warren (1953) found that larger forms were easier to learn even though the forms to be discriminated were of the same size. Restle (1958, p. 88) described this result as an effect of an increase in the proportion of "valid" cues, and derived a theoretical function to fit Warren's data. The interpretation here is that these results depend on the use of a stimulus emphasizer. . In each case, some stimulus is added which makes the relevant aspect more salient and increases the learning rate. Easy-to-Hard T ransfer The review of the work on sources of difficulty in. learning suggests that any discrimination or concept formation problem can be made easier by changing the stimulus situation. For example, instead of a small difference between discriminanda, one inserts a larger difference and the problem becomes easier. . But, what is the effect on subsequent transfer to the hard problem which involves the same relevant dimension and a reduced discriminative difference? Earlier theories such as those of Thorndike (1.914) and Guthrie (1935) would expect that the transfer be positive since there are "identical elements" or. a number of common. "conditioners" in the two tasks. The degree of transfer could not be specified in detail from either of these theoretical positions. Precisely what is meant by "identicality" is not clear in the Thorndikian viewpoint but an interpretation is that it means 17 any clearly discriminable aspect which is the same in the two tasks (McGeoch and Irion, 1952, p. 343). Estes' (1959) stimulus sampling theory, which is formally similar to the model of Bush and Mosteller (1951), represents a modern version of Guthrian theory. Estes assumes that the probability of a response is equal to the proportion of stimulus elements in the trial sample which are connected to the response. In the transfer problem, the stimulus sample would presumably contain elements which became connected to the correct response in the easy problem. Transfer would then be positive. The transfer situation in the hard problem would resemble what Estes has termed "stimulus compounding. " In a study by Schoeffler (1954), human SS were trained to first discriminate two "disjoint" (non-overlapping) sets of signal lights. After SS reached 100% discrimination in responding, they were tested on new combinations of the lights. Schoeffler made very accurate predictions on the proportion of responses to sets of stimuli which represented various combinations of the previously discriminated sets. S-R theorists, notably Gibson (1940) and Spence (1938), rely on stimulus generalization as a basic working assumption. . For Gibson, if r the stimuli in the two problems are similar and call for the same response then transfer would be positive. Spence and Hull (1950) base their expecta- tions on the concept of generalization gradients. Spence assumes that each response whichis reinforced to the positive stimulus results in some increment in habit strength and non-rewarded responses to the negative stimulus add to the habit of not responding to that stimulus. . These tend- encies of positive and negative responses generalize to similar stimuli along the continuum. , Both the "identical elements" and gradient theories agree that trans- fer between two tasks is greater the more similar the tasks. It seems to follow that the most efficient way to teach any task is to give training on 18 thattask itself. pr training is given on any other task, there. will be a loss in transfer. . The theoretical result is that any program of train- ing” on atask A and transfer on task B must“t‘ake longer, or produce more errors, than a program in which all training is on, task B. . Lawrence (1952) trained rats on a black-white discrimination prob- lem and found that it is more efficient to train the S first on an easier . discrimination. along a dimension and shift to a harder discrirninationon , the dimension, than to give all the training on. the (hard problem. This easy-to-hard transfer result is in conflict with the expectationthat trans- fer is most positive when the training and test discriminations are identi- cal. . Lawrence (1955) has shown that to account for his results by generalization gradients, one must postulate very Specific gradients and make unlikely assumptions about how habit strengths add. . In a review of Lawrence's work, Estes (1956) commented: These findings seem intuitively reasonable, but I do not see (and neither, evidently does Lawrence (1955)) that they can be handled in detail by any available theory. (p. 23) ' Estes failed to notethat Restle's (1955) theory, reviewed in another con- text, did handle Lawrence's transfer data. . In the Lawrence 1952 study, the easy problem was a black versus white discrimination and the hard problempitted dark- gray against light- gray so that the easy-to-hard transfer constituted what Lawrence called "transfer alonga stimulus continuum. "' Transfer fromone easy problem (at the end of 30 fixed trials) to the hard was found to be positive but a marked. disruption in performance occurred at the point of transfer. a A second group, which had a problem of intermediate difficulty, was transferredat the end of 50.,trials and showed positive transferwith no disruption. . A third group, which worked through: a series of gradually more difficult discriminations, showed the best positive transfer and very littl e di 8 ruption . 19 Baker and Osgood (1954) trained groups of human _S_s on discrimi- nation problems involving pairs of tones which differed only in frequency. .g The design was similar to Lawrence's. A fixed number of trials was used for training and the dependent measure was a difference in errors between a pre- and post-test. Positive transfer was obtained where the test was approached through a series of problems which became more difficult gradually. SS who were trained on a very easy pitch discrimi- nation and then shifted to the test showed a deterioration. (not significant) in performance on the test problem. .This result suggests that practical application of such a training design to human learning may be limited by the possibility that fine and difficult discriminations are often required in real life and that transfer from an artifically easy problem to a hard one may be poor. Restle, (1955),. in the s ame study in which he accurately predicted Lawrence's transfer data, trained human _S_s to criterion on an. easy dis- crimination (large vs. small black squares) and transferred them to a hard onewhere the difference in Size was reduced. Transfer was positive with some disruption at the point of transfer. . Again, the model correctly predicted. the amount of transfer. . North (1959) performed two easyeto-hardtransfer experiments on rats. .Ss were trained first on very "distinctive" forms (solid black triangles or thick bars) and then transferred to less distinctive forms (striated triangles or thin bars). . The relevantrelation was horizontal vs. vertical position of the forms. . In the triangle experiment, Es learned the solid triangle problem faster than controls with the striated triangles. Transfer, after 40 interpolated overlearning trials, was almost perfect. . In the bar experiment, two groups were run, onewith a gradualitransition (big-medium- small thickness) and one with an abrupt transition (medium- small). . No overlearning trials were interpolated. . The gradual transition group from big to medium showed nearly perfect transfer but performance 20 on the small bars was disrupted somewhat. The abrupt transition group , made nearly perfect transfer from the medium 'to the small bars. . In both a experiments, North assumed that the filled triangles and thicker bars “ "furnished "richer cues for discrimination" and were "more structured"'than the striated triangles or thinner bars. The results were interpreted as a demonstration of Lawrence's "acquired distinctiveness of form stimuli. " House and Zeaman (1960) trained mentally retarded childrenon a hard pattern problem using an easy-to-harddesign. _S_s in two groups were trained to criterion on objects (3-dimensional) and then .transferred to patterns (Z-dimensional). . In one group, the pattern was the same in both the object and patterniproblems so that relevant cues were identical in the training and transfer problems. . In a second group, the set of forms and colors which appeared in the object problem were different from those in the transfer problem. . A control group had the hard pattern problem without the benefit of prior training. The .two experimental groups were about equal performance on the object problems. . The group .which had the same cues on transfer showed nearly perfect transfer. The group which had different cues in the object and pattern problems made more errors on transfer. The control group performed worst of all. An application of Restle's 1955 model to thetr-ansfer data of the identical relevant cue object—to-pattern group failed to predict the amount oftrans- fer. . House andZeaman interpreted their results as supporting both Lawrence and Wyckoff. The interpretation was that an observing response was transferred. The observing response was defined as consisting of - "looking at the color and form cues. " In two further experiments on easy-to-hard transfer, §_s were trained first on an easy problem which contained two relevant and redundant cues. Transfer was to problems where the number of relevant cues was reduced. Warren (1954) trained monkeys on six types of problems, using combinations of color, form and size as stimuli. . Restle (1959) trained 21 human.§_s on problems involving consonant letter-patterns. , In both — experiments, _S_s made alarge number of errors on transfer to. reduced cue problems. 4 Overall transfer tended to be slightly-positive.- The easy- to-hard program as a whole took-more trials and errors -than:direct training onthe hard, single cue. relevant, problem. . These easy-to-hard transfer experiments can be classifiediinto three procedural types with differing results: 1. Both abrupt and gradual movement in a physical stimulus» di- -mension produces faster learning and sizeable positive transfer. . Performance is somewhat dis rupted. at the point of transfer whenthe A transition is abrupt. The easyeto-hard program produces less total errors than the hard problemgiven along. (Lawrence, 1952; Baker and. Osgood, 1954.; Restle“, 1955). 2.. Separation of therelevant cue from irrelevant cues (background) by-increasing the salience of the relevant cue duringtraining' leads to faster learning. . Transfer to. a-probleminvolving the same» relevant cue without the salient aspect is nearly perfect. . This easy-ato-hard program --is Inuch. shorter than the hard problem given alone. , (North, 1959; .House andZeaman, 1960). 3.. Addition. of relevant and redundant cues makes the training problem- easier to learn but the removal of one of the redundant cues on transfer produces disruption at the point of transfer. . Overalltransfer ' is only slightly positive. This easy-to-hard program produces as many or more errors than the hard problem given alone. . (Warren, .1954; Restle, 1959). . Applying the notion of a- stimulus emphasizer to thesedata, the suppositionis that Lawrence's method of working in one dimensionihas a dual effect. . The increased difference between the discriminanda: makes the relevant dimension more salient and draws the attention of the-S to the relevant cue, thereby increasing the probability of the _S_ using that 22 dimension as the basis for acorrect strategy. .In this case, the stimulus emphasizer is apart of the relevant dimension. Theblade-white dis- crimination draws the rat's attention to the brightness dimension by making it "stand out" at the expense of other, irrelevant stimuli. . “At the same time, the quality of the dimension has been. changed somewhat. . A kind of observing response oriented to these grosser aspects of the dif- ferential cue is thereby produced. . On transfer, some rats overlook the - now subtle difference between the cues and errors occur. . An identical supposition is madewith respect to the findings of Baker and Osgood and Restle (1955). . In the North and House and Zeaman studies, stimulus emphasis is achieved in. a different way. In, their investigations, the emphasizer was not an integral part of the relevant dimension (i. e. it was not relevant itself) but a stimulus aspect (_3-dimensionality) was added whichmade the relevant cues salient. . The problem was made easier because the emphasizer drew the attention of Sdirectly to the relevant cues or relevant . relatiOns- which appear in the hard problem. . On transfer, the emphasizer - was removed but had no disruptive effect. This nearly perfect transfer indicates that the emphasizer itself was not a basis for correct strategies in the easy problem. With regard to the Warren and Restle (1959) results, an alternative way of solving the problem was inserted by the addition of a relevant and a redundant cue, making the training problem- easier. If the _S_ attends to one relevant dimension and solves the problem.with strategies based on that dimension, he may not learn that the other dimension is alsorelevant. . If he solves on the cue .whichis removed on transfer, the effect is to have distracted the _S_ from the cue .whichis required for solution in the hard problem. .. If this is so, no transfer for this _S_ would occur. . The extra relevant cue during original learning would thus serve as a counter- emphasizer with respect to the relevant cue of the hard problem. 23 If the foregoing analysis is correct, optimal easy—to-hard; transfer would be obtained by using what may be termed a "pure" emphasizer. . A pure emphasizer is one which. increases the probability of Eattending ‘ to the relevant cue, but at the same time, is not a basis for a'correct strategy. . The emphasizer when removed on transfer would not interfer with transfer sincetit is nota basis for a correct strategy. . A stimulus emphasizer might, in some cases, be a relevant and redundant cue. . If so, the rfllesmight compete; the _S_ would attend tothe emphasizer as acue rather than to the relevant dimension being empha- sized.' . The analysis of the literature suggests that attentioninilearning Can be studied objectively. . Aspects of the stimulus situationcan be arranged so that they control or at least affect the locus of attention. , This possibility affords a powerful and reasonably precise. independent . variable for investigation. The Speed of learning is an indicator of the probability that. a relevant dimension is attended to, and serves as an index Of an attempt to direct the _S_'s attention. When a (problem involves more than one relevant cue, or more than one irrelevant cue, or a rele- vant cue which can be diminished or enlarged, measurements oftransfer- of-.-training can be used to gain-more detailed information-regarding; the distribution of attention over the parts of the stimulus situation. With objective and practical independent and dependent variables, the'main questions are open to experimental investigation. A This dissertation reports the results of sucha theoretical and, experimental analysis of attention in a- simple concept formationtask. CHAPTER II THE STRATEGY SELECTION THEORY In the present study, a number of flowerdesigns'are presented to the _S_ in. a series. . On each trial, the S is required to make one'of two responses and then is told the correct choice. Eachpattern is complex and the correctness of the _S_‘s response depends upon one or two aspects of the pattern. a The _S_ must, in each instance, try todetermine which aspect or aspects of the patterns are relevant. , To resolve the discrimination, the S might use "strategies" or hypotheses which are based upbn some stimulus. aspect. «A strategy is any consistent pattern of behavior to the cues in the situation. . The stimu- lus patterns give rise to a number of strategies, some of which. conflict. The _S_ has difficulty in solving whenever he uses strategies whichconflict with the strategy intended by the experimenter. . If all patterns were presented at once, the-"_S_ could make systematic tests of strategies in order to discover the relevant aspect or aspects. Since in at serial zpresentation1on'ly‘x5n‘e pretenses *pr‘e's‘ent atilth'e Lin 3 time of reinforcement, the S must remember what has beenpresented and what was reinforced. . In many tasks, the capacity of the S to remember aspects is exceeded. One approach. is to assume that §_ can remember only the strategy or strategies that he is using. . If the strategy is wrong, he then shifts to a new strategy. The shift from one strategy to another is assumedto be random, permitting the _S_ flexible rather than stereotyped .re sponding . 24 25 The conceptthatéuses strategies is similar to the attentional views of Lashley (1929) and Krechevsky (1932) on the learning process. . The _S_ selects a strategy and makes the indicated response. .. If the response is rewarded, then the same strategy is used again on the. next trial. If the strategy is in error, then the _S_ chooses at randomfrom the set of strate- gies available to him in the problem. V Restle (1961a; 1961b)has worked out a number of consequences of theseideas where sampiing of strategies is with replacement. . Restle's theory is referred to as the Strategy Selection Theory. 4 This model, as explicitly stated by Restle, determines the detailed stochastic properties of the data and gives a rational theory of "learning rate. ", We shall proceedwith the case where _S_ is assumed to use only one strategy per trial. 1 When applied to the two-choice concept formationltask, the; theory postulates a total set of strategies, . H, of which a subset C always lead me correct response, a subset W always lead to a wrong response and the remainder I lead to correct and wrong responses half..the time at random. . Let c, w and i. represent the proportion of '.'each type and c + w + i = 1. e The speed of learning depends upon the probability of selecting a correct strategy, . c. Stochastic Properties of the Data The S begins the task in a "starting state" where he chooses a strategy with probabilities: c, it is a correct strategy m m 1Restle (1961a) has shown that S may (a) use only one strategy per trial, (b) use all strategies at once a-nd attempt to narrow down to a cor- .rect one, or (c) use a random sample of all strategies and attempt to narrow down his sample, and each approach leads to the same set of probabilities in acquisition. 26 w, it is a wrong strategy and i, it is an irrelevant strategy. .If'_S_ samples a correct strategy, he is in the "solution phase" and-.makes no more errors. . If he samples either a wrong or an irrelevant strategy, he is in the "pro-solution phase" and will make at least one more error. a. A wrong strategy always leads toerrors whereas an irrelevant strategy . leads to an errorhalf the time. . Each time hemakes an error, he is returned to the starting state, where he resamples(with.replacement) a strategy with probabilities c , w , and i that it is correct, wrong or irrelevant. . As long as S is in the pre-solution_phase(using wrong or irrelevant strategies) he :shows no net gain or improvement sincehe will always make at least one more error and return to the starting state. j The'Sequence of correct and wrong responses is "stationary”in the Sense that the probability of an error-does not change during" the pre- solutiOn phase. This analysis ofé's behavior is illustrated by a- "tree" diagram in . Fig. 2. 1. . A second stochiastic property of themodel. is that the transition probability, c ,‘ from the pre-solution to the solution phase, is inde- pendent of the number of trials_S_ has been in the pre-solution phase. . Let T a be the total errors made by eebjeet .a .. Since the probability of no more errors is c , the probability of exactly k errors is Pr(Ta = k) which is Pr(at least one error). times- Pr(at least one-more error). . . (k times)... . . . Pr(no more errors) = (l-c)k(c), a geometric distribution. , (2.1) The mean of a geometric distribution of error scores is em = "f = <1-c)/(c> . (2.2) and the variance is var(k) = (1-c)/ (CZ. (2. 3) 2'? Correct 1 goes into-"solution phase" and uses Strategy —- the same correct strategywith no more errors. It): STARTING STA, Irrelevant 1 S is still in the 'Fpre-solution" phase Strategy __ and uses same irrelevant-strategy . (error) 1 with probability i- of-being- correct 4\ or wrong. . An error. returns _S_ to Wrong starting state. Strategyl\ S is still in the "pre-solution" phase and makes an error whichreturns him to the starting state. Figure 2. l. A Stochastic Structure of Strategy Selection. 28 Compared with the normal distribution of error scores, a geo- metric distributionis positively skewed and its mean is positively corre- lated with its variance. To illustrate the shape and characteristics of the geometric distribution, consider a problem of obtaining the first ahead-in tossinga coin. If the coin is fair, the probability of. ahead is -;- on each trial. . On trial 1, the probability is -;- of obtaining ahead and (1%) of a tail. The conditional probability of the last tail ontrial 1 is (l-i—Hé—L The conditional probability of the last tail (error) on. trial 2 is (l-i—Hl-i—Hly), etc. The probability of obtaining k tails is, in general, . (l-«i—)k(-§-), a geometric distribution. The expected number of tailsbefore the first head (Eq. 2. 2) is (hid/(£4 = 1 and the variance (Eq. 2. 3) is (Live-)2 = 2. . A geometric distribution of the number of tails until the first head occurs is shown in Fig. 2. 2 for six tosses. . In the present experiment, learning scores are expected to be geometrically distributed. a Fig. 2. 3 compares a hypothetical geometric distribution of error scores with one that is normal. A Sets of Strategies and Parameters of Learning According to the theory, the probability of solving the problem on any trial is c , the proportion of correct strategies. . Correct and wrong strategies depend upon stimulus aspects whichvary and. are rele- vant during training. . Irrelevant strategies depend upon those aspects which are fixed or vary and are uncorrelatedwith reward, and on other factors such as outcomes of previoustrials, boredom, etc. The relevant and most of the irrelevant strategies in the present experiment arisefrom flower patterns. . Elementary notions of set theory can be applied to the stimulus situation to permit 29 Probability 1 . 00 of ' k tails before the ° 50 first head 0. 00 1 l l in r . U I 2 3T 4 5 Tails (k) . Fig. 2. 2. «A geometric distribution showing the probability of obtaining k tails before the first head in tossing a fair coin six times. Frequency geometric of \ normal Subjects 2 Errors Fig. 2. 3. Hypothetical geometric and normal distributions of error scores in a learning experiment. 30 1.. a- logical analysis of the problemin. terms of the strategies present and 2.. estimation of the measure of each set which-gs use insolving. The measure-of a set reflects the set's "weight" in influencing the direction-of the _S_‘s attention and his speed of learning. Suppose that the strategies whicharise from the angle of the leaves to the stem of the flower are correct.. Such strategies from the angle . dimension are called A. The number of strategies in the set' A is -written.m(A) and is called its "measure. "7‘ -If_S_ samples from a set of H total strategies, with measure, . m(H),. then the probability that the strategy chosen is in the correct set A is P(A) = m(A)/m(H). ' (2.4) . If A is the only set of strategies which is correct, the rate of learning parameter, c is the probability of choosing a strategy :in .A. . Therefore, c = P(A) = m(A)/m(H). . (2. 5) The total strategy set, H ,, may be subdivided into subsets which arise from dimensions of the stimulus pattern. .. Supposethat in the pattern there are subsets 'A (correct angle strategies),. A* (wrong angle strategies),. L (leaf strategies) and -I (all other irrelevant strategies). . The set H may be written as the uni—oi (U) of the subsets, f H ='(AUA*U L U1) and m(H) = m(AU A* u LUI), whichis read . "themeasure of the set of H- strategies which contains all the strategies in the sets ~A , . A* , . L ,. andI and in all combinations of the subsets. " 51f the subsets are assumed to be disjoint(have no common. strategies), then 7 i , Y :zElementary set theory, its rules and relationship to probability theory maybe found in Feller (1950),. Kemeny, Snell and Thompson (1957) or Restle (1961c). ‘r 31 m(H) = [m(A) + m(A*) + m(L) + m(I)]. The probability of selecting correct strategies in A is now c = P(A) = m(A>/[m 1), the probability of selecting strategies in A is now c = P(A) = r.m(A)/[r.m(A) + r.m(A*) + m(L) + m(I)]. (2. 9) As in Eq. 2. 8, the rate of learning, c, is increased over the rate of Eq. 2.6 where A is not emphasized, provided c <%- and r > 1. 3. An increase in the salience of a relevant dimension by making it b. a larger cue. If the difference between discriminanda is increased, the effect is two-fold: (1) new strategies, B, , are added by the change of the stimulus qualities, and (2) the strategies which already existed are emphasized. Suppose that the angle difference in the problem is doubled. Then the new measure of angle strategies is written d.m(A) + m(B), where d > 1. The probability of selecting a correct strategy in A or' B is now d.m(A) + m(B) g_ [d.m(A) + d.m(A*) + m(B) + m(B*) + m(L) -+ m(I)] c=P(AUB)= (2. 10) The rate of learning in this problem would be faster than the one in Eq. 2.6, provided c < -;_-, since the numerator is increased relatively more than the denominator. 34 Transfer A general hypothesis of this thesis is that the degree of transfer depends upon the stimulus relationship between the training (problem 1) and transfer (problem 2) situations. The hypothesis may be written, as a conditional probability of solution on problem 2 after solution on problem 1. The proportion of _S_s who transfer perfectly after mastery of problem 1, with the set of correct strategies C1, toproblem 2,-with the set of correct strategies C2 is P(Cz/Cl) : m(Cln C2)/m(C1). (20 11) The set ClnCz is the "intersection" (n ) of the sets, C1 and C2, and - denotes the common strategies. , If the _S_ is using one strategy at a time, solution, of, problemfl would ensure that the strategy being used is in C1. - If C1 = C2 (the problems have the same relevant dimensions), then Eq. 2.11 equals 1.00. Transfer in this case is perfect for all §S. . Suppose that problem 1, the training problem, has two redundant sets of correct strategies, , A and L, but on transfer L is removed. After problem 1 is learned, it is known that _S_ is using a strategy in the set 01 = (AU L). The conditional probability that the strategy is in the set A , where C2: A, is P(A/AU L) = m(A)/[m(A) + m(L)]. . (2.12) Transfer from problem 1, where two redundant sets of correct strategies are present, might vary according to the emphasizer rOle. ‘If A is emphasized during training on problem 1,. Eq. 2. 12 becomes ~P(A /A U L) = r.m(A)/[r.m(A) + m(L)], (2.13) andmore Es are expected to transfer. If A is counter-emphasized by coloring another set L, Eq. 2.12 now becomes 35 P(A/AU L) = m(A)/[m(A) + r.m(L)], (2.14) and fewer gs show perfect transfer. . In summary, Eqs. 2. 6 to 2. 14 formulate, in mathematical terms, the hypotheses regarding stimulus emphasis and the relationships of original learning (easy) to transfer (hard) problems. . Eq. 2. 6 is the basic formula for c , the probability of solving the original learning problem, and is developed in Eqs. 2. 7 to 2. 10 to apply to the various acquisition problems. 1 Eq. 2. 11 is the basic formula for transfer of training, and is developed in Eqs. 2. 12 to 2. 14 to apply to experimental variations on transfer. These formulas summarize the hypotheses to be tested by experiment. Estimation of the Parameter c4 When learning is complete (all _S_s solve), the maximum-likelihood estimator of c is e = 1/["f+ 1] (2.15) and its variance var(e) = 62(l-C)/N (Restle, 1961a) (2.16) where Tis the mean total error score and N is the number of S8. In the present experiment, a learning criterion of 10 successive correct responses and a fixed number of trials are used. Some Es might fail to solve within the alloted time. 1 These non-solvers produce what is known as a "censored" distribution of error scores. The estimate of c must be modified in order to take into account the presence of non-solvers in a group. Consider a group of N§_s who make a total of T errors. Of these, Ns are observed to reach learning criterion within the fixed number of 4'This section is technical and the general reader may skip to the next chapter (Method) if he so wishes without any loss of information. 36 trials, making. a total of X1 errors, and the other N-Nsaés make a total of X2 errors. The likelihood of such an outcome-as a function of the possible values of the parameter c is Ns L(set of data with. X, errors, N solvers; c) = (l-c)X1 c ,, and S (set of data with X; errors, N-Ns non-solvers; c) = (1-c)xz. The joint likelihood of solvers and non-solvers is L(c) = [(1.e)X1 + X2 -ch] = [(1-e)1}c)“s]. (2.17) The value of L is to be maximized with respect to c. . We shall maximize log (L). . Log(L) = [T.log(1-6) + NS.1og(E)]. Taking the derivative of log(L) with respect to c, and setting it equal to O, dIOE(L) _ 0_ ”T + NS d(c) ‘ ‘ (1-5) 5 Solving for a, 8 = Ns / [T + N3]. (2.18) If all §_s solve, then C = l/T + 1, which is the maximum-likelihood estimator of c obtained by Restle (1961a) and given in Eq. 2.15. It can be shown that the derived maximum-likelihood estimator of c in .Eq. 2. 18 is biased and an unbiased estimate of c is E = (NS-1)/[i~'+ NS -1] , (2.19) However, in the present study, Eq. 2. 18, the maximum-likelihood estimator, is used topermit statistical tests of hypotheses suchas likelihood ratio tests . 37 Varianc e of c With. large samples of £8, the sampling variance of a‘maximum- likelihood estimate (provided the estimates arenormally distributed) can be calculated by the formula, -1 var(c) = , which. in the present case is E [d2 log(L)] d(C).»z var(c): T '1 ~N "Em” 7—1] . In a {censored distributionawith number of trials = 64, var(c) = [cz(1-c)]/N [1-(l-c)3."‘-33c(1-c)32 + 32c(1-c)33+32c(1-c.).3l]. This is very close to V var(c) = [cz(1-c)]/NS and is approximated by substituting E , for c so that our estimate of the variance of c is var(c) = [62(1-C)]/NS. . (2.20) ' Estimation of w, the Proportion of Wrong Strategies A high frequency of consecutive errors in the pre-solution phase is an indication of a relatively large proportion of wrong strategies, whereas a small frequency of consecutive errors is an indicationthat there are relatively few wrong strategies. To estimate w , the proportionof wrong strategies, one counts the "trial zero" (initial starting state) as an error. Then for each S, the number of errors which are followed by correct .. responsest and the number of errors which- are followed-by errors, M1, are obtained. Then averaging 2M0 and 2M1 for the group, all _S_s all _S_s {iv = (1'11, .171, + 1) “171,, + 1011) (2.21) whichis a corrected maximum likelihood estimate of w (Restle, 1961b, p. 299). CHAPTER III METHOD Subjects ~ The Ss were 215 students in elementary psychology courses at Michigan State University who received credit for participation in. experi- ments. Ten groups, each composed of 11 or 12 men and-9 or 8 women, were formed by assigning _S_s in a haphazard, pre-arranged order. Fifteen gs failed to solve their original learning problems within the allowed 64 trials. Each group with non-solvers was supplemented by . more gs until 20 solvers were obtained, so that 20 _S_s, all of whom had solved the original learning problem, were available for testing on the transfer problem. Apparatus From S's point of view, the apparatus consisted of a 2 x 2 ft. black screen. with a centered 4 x 5 in. window. Stimulus patterns were pre- sentedin the window on a hinged card holder. 1 _S_ classified. the patterns by saying aloud "A" or "B" while _E_) recorded each pattern and response. _S_ self-paced his responses. Correct pattern classifications were signaled by a lighted letter, A or B, located 2 in. below and to the left and. right of the window, respectively. Duration of reinforcement was about 2 sec. and patterns were. removed from view approximately 3 sec. after the light was turned off. _S_ sat facing the screen with his head approximately 20 in. fromthe window. . Except during the reading of instructions, E was shielded from _S_‘s view throughout the course of the experiment. 38 39 Procedure ,The same instructions were read to all groups. . Each_S_ was told that the study intended to find out how collegestudents form concepts. _ §_'s job. was to divide correctly a set of cards into two categories, A and B. aThe cards could be classified on the basis of a simple principle. . Each card would contain a different figure but all the cards within a category have something in common. S could take as much time as he wished to say aloud his classification, either A or B.. After he-responded, one of two letters would light, telling him the correct class ofthe card, A .or B. . Each time, he was to remember the basis upon which he classified the card and whether or not he was correct. Guessing of the response sequence was discouraged by informing S that the cards were shuffled. gs were told that two problems would be given. Thevfirst, a practice one, would be unrelated to the second (original learning-transfer). , On the practice problem, each cardwould contain a single geometric figure. After S solved the practice problem, he was told that all cardsswith triangles were A's and those with circles were B's. After the correct . solution for the, practice problem was stated, the second problem was introduced. In the second problem, the patterns would be floral designs, each card bearing a flower, stem and leaves... Instructions regarding classification and use of a simple principle were repeated. The. practice problem consisted of a sequence of geometric forms and was given to ensure that the procedure was understood and to orient the S to the task. . In the problem, form (circle vs. triangle) wasrelevant and color (black vs. white) and size (large vs. small) were irrelevant. Single stimulus figures were drawn in India ink. on white 3 x 5 in. file cards. . Large and small figures could be inscribed within 2 and 1 in. squares, respectively. The stimulus deck contained eight different cards, representing all combinations of the three dimensions. Cards were 40 shuffled before and during learning at the end of 8-trial blocks until E reached a criterion of 8 successive correct responses. After practice, each S received an original learning problem for 64 trials or until 10 successive correct responses were given, which- ever occurred first. Those §_s who solved the problem were continued without interruption to the transfer problem which was the same for all groups. . One group learned the ”transfer" problem as its original learning problem and served as a control; all other groups had different original learning problems. 1 Comparison of each group with the control constituted an example of the transfer-of-training paradigm, X—->Y versus Y. Selection of the Learning Problem A two—choice concept formation task was designed to meet several considerations: 1. Two distinct response choices in order to prevent response generalization. 2. A completely specified set of stimuli with some degree of complexity and spatial separation of parts within each pattern. 3. Continuous variation of a relevant dimension above psycho physical threshold. 4. Variation inlevel of difficulty by addition of relevant or irrelevant dimensions. 5. A distinctive dimension (color) as a stimulus emphasizer. 6. An intrinsically interesting and plausible problem. To meet the above requirements, flower patterns were abstracted from those published by Hovland (1953). These proved to be satisfactory in a pilot study . 41 Stimuli The stimuli in all experimental problems were flower patterns. . Figure 3. 1 shows four examples used in original learning and transfer problems. To ensure uniformity of the drawings, separate templates of the main dimensions were used. Patterns were drawn in India ink on white 3 x 5 in. file cards. Each figure could be inscribed within a 1 x3 in. rectangle. , The base was 1 in. long, stems were 2 or 2%- in. high and leaf stems were 3/4 in. long. The vertical angle of each of the three leaves to the stem was the same on any card. The five dimensions of the patterns were: 1.. The angles of the leaves to the stem was relevant and had two values, either 300 versus 600 or 150 versus 750. When the angle di- mension was constant, all angles equalled 450. 2. The flower dimension had four values: tulip, daisy, pansy, and fleur-de-lis. 3. The leaf shape dimension had two values: smooth or scalloped. 4. The leaf position dimension had two values: 2 left and 1 right or 1 left and 2 right. In nine problems, the flower, leaf shape and leaf position dimen- sions were varied independently of one another and were irrelevant. On a tenth problem, they were fixed and only the angle dimension, which was relevant was varied (30 vs. 60 degrees). 5. Col—onwas used as an "emphasizer" or "counter-emphasizer" with respect to the angles. When color varied, it had two values: red versus green. A different order of presentation was given to each _S_ by shuffling the deck before each session. Cards were reshuffled at the end of 32 trials of each problem, if _S_ had not yet reached criterion. 42 A: 30 B:|5'° c; 75” 0: 60° 3:13 s22 % 4 Fig. 3. 1 Examples of flower patterns used in experimental problems (actual size). The degrees indicate the a size of the angle of the leaves,to the stem. 43 Expe rim ent a1 Groups Group Angle, which served as a control for acquisition and transfer, had a problem with angle relevant. and flower, leaf shape and leaf position irrelevant. No color appeared in this problem. Patterns with 300 angles were A's and those with 600 angles were B's. This same Angle problem was the transfer problem for all groups. Groups Red Angle and Green Angle had the Angle problem but the angles and stem included were always colored: red in the Red» Angle problem and green in the Green Angle problem. Color did not vary from trial to trial and could not be used to solve the problem directly. The constant color thus served as a "pure" emphasizer. Since color was spatially contiguous with the relevant angles, it was intended to make the angles "stand out" and emphasize that relevant dimension. Group Angle +Angle Color had the Angle problem but either red or green appeared on the angles. The A patterns had 300 red angles and the B patterns had 600 green angles, making angle and color relevant and redundant. _S_ could learn that A was either a 300 angle or a red angle, and that B was either a 6G0 angle or a green angle. Color was presumed to be an emphasizer and a redundant relevant dimension. Group Angle Color Control had a problem with angles all at 450 during original learning. Patterns with red angles were A's and green angles were B's, making color relevant. Correct responses depended upon the color of the angle. Group Angle + Flower Color had the Angle problem but either red or green appeared on the flowers. The A patterns had 300 angles and red flowers and the B patterns had 600 angles and green flowers, making angle and color relevant and redundant. §_ could learn either that A was a 300 angle or red flower and that B was either a 600 angle or green flower. Color was presumed to be a counter-emphasizer with respect to the angle dimension, and also a redundant relevant dimension. 44 Group Flower Color Control. had a problem with allanglesfixed at 450 during original learning. . Patterns with red flowers were A's and patternswith green flowers were B's, making color relevant. Correct responses depended upon the color of the flower. Group Angle + Angle Color Irrelevant had the Angle problem but either red or green appeared on both angles. . §_s could notuse the irrel- evant color to solve the problem, but color was spatially contiguous with the angles and was intended to serve as an emphasizer. GrouprLarge Angle had the Angle problem but the angle difference —was increased from 30 to 60-degrees. Patterns with 150 anglesawere A's andrthose with 750 angles were B's. The larger angle differencewas intended to make the relevant angle dimension. "stand out" and serve as another kind of emphasizer. Group Angle Only had a problem where all the'irrelevant'dimensions of the Angle problem were fixed. The 300 angles card was an A and the 600 angles cardawas a B. The removal of all the irrelevant dimensions by fixing them was intended to draw the attention of the _S_-to the relevant angle and make the problem easier to learn. Table 3. 1 summarizes these ten experimental groups, and describes the problems in detail. Data The dataus ed in the individual comparisons (Chapter V) are total errors made by individual Ss in reaching learning criterion orwithin 64 trials, whichever occurred first. For comparisons of acquisition during original learning, the data are from the first 20 _S_s in each group. A For comparisons on transfer, the data are from the 20 Es who solved their original learning problems and were transferred to the Angle problem. 45 Cues Constant (30-60) Table 3-1. Experimental Groups and Problems Theoretical Relevant - Irrelevant Number of Group Significance Dimensions Dimensions Patterns Angle Control Angles Flower, leaf 32 (30-60) shape and position Red Angle Constant Angles Flower, leaf 32 Emphasizer (30-60) shape and position Green Angle . Constant Angles Flower, leaf 32 Emphasizer (30-60) shape and position Angle + Emphasizer Angles(30-60) Flower, leaf 32 Angle Color 7 Relevant Color(red-green) shape and position Angle Color Emphasizer Color(red-green) Flower, leaf 32 COntrol Relevant shape and Control position Angle + Counter- Angles(30-60) Flower, leaf 32 Flower Color Emphasizer Color(red-green) shape and Relevant position Flower Color Counter- Color(red-green) Flower, leaf 32 Control Emphasizer shape and Relevant position Control Angle + Emphasizer Angles Flower, leaf 64 ' Angle Color Irrelevant (30-60) shape and Irrelevant position Large Angle Lawrence Angles Flower, leaf 32 ~ Easy-Hard (15-75) shape and position Angle Only Irrelevant Angles none 2 46 Statistical Tests In the experiment, mean errors are asymptotically normal with large numbers of _S_s so that individual comparison t-tests are used. The program described in Chapter II is feasible only if reliable estimates of the probability of sampling given sets of strategies can be obtained. Predictions on the rate of learning and transfer are to be made (Chapter VI), so it is desirable to make them as accurately as possible and statistically test any discrepancies between the predicted and obtained results. To test the hypotheses: 1. c = co or 2. c1 = c; the theory offers likelihood ratio tests which use the maximum-likelihood estimates of c. In a likelihood ratio test one maximizes the likelihood _over a restricted parameter subspace (the null hypothesis) and also over the entire space of logical possibilities. The ratio of these two likelihoods is called X. With large samples, the value of -2 ln().) is distributed approximately as chiz, provided the null hypothesis is true. The degrees of freedom are equal to the number of free parameters (see Restle, 1961b, pp. 301-304). ~ If C is assumed normally distributed with var(C) = 62(1-C)/ Ns’ where N8 is the number of Es who solve, then the difference between c1 and c2, if independent, is normally distributed with variance equal to var(cl) + var(cz). Normal distribution _z-tests of differences between two parameters are made in Chapter V. CHAPTER IV STOCHASTIC PROPERTIES OF THE DATA AND TEST OF THE STRATEGY SELECTION THEORY According to the Strategy Selection Theory, the responses of an E are composed of a sequence of correct and wrong responses) in an irregu- lar order (the "pre-solution phase"), followed by an infinite sequence of correct responses (the "solution phase"). a In the experiment, 10 correct responses in a row represented the learning criterion and constituted the "solution phase. " Some _S_s failed to solve and consequently had no solution phase. The model may be said to describe the data well if 1. the responses before the last error do constitute a statistically stationary sequence of correct and wrong responses and 2. transition from the pre-solution to the solution phase occurs with a probability which is constant and independent of how long S was in the pre-solution phase. These two questions about the stochastic structure of the data can be answered by considering, first, behavior during the pre-solution phase, and second, the distribution of errors made in the pre-solution phase. Analysis of the Pre-solution Data The pre-solution phase contains all the trials before the last error observed. This last error may be followed by a criterion run of 10 con- secutive correct responses or, in the case of non-solvers, may be the error at which training was terminated by E. Let the trial of last error for subject a be m Let Xa n be a random variable which takes on a. 47 48 the value 1 if§ makes an error on trial n and 0 otherwise. 1 Then the proportion of errors during the pre-solution phase is pa = :1 xaq / (ma-1). (4.1) If w, the proportion of wrong strategies, is small, then the proportion of errors during pre-solution trials (pa) should be near %- for all _S_s, and in fact should have a binomial distribution, - m -1 , Pr E’s: k/ (ma'li) = (ma 1 (‘2') a - (4.2) k To obtain a statistical test, approximate the binomial distribution in Eq. 4. 2 by a normal distribution with mean = i— and variance pq/N = (4)2/ (ma-1). With this approximation, the statistic Dz(a) = 4(ma-1)(pa-%-)z (4. 3) has a chiZ distribution with 1 degree of freedom. D2 (a) was computed for each _Swith 2 or more errors and the values summed for each group and then over all groups. These pooled values should have an approxi- mately chiz distribution with (if corresponding to the number of _S_s involved. , In original learning, 163 _S_s made 2 or more errors and the pooled D2 = 147. 32. . Since the value is less than the degrees of freedom, it is not significant. - In transfer, 76 S5 made at least 2 errors and the pooled ‘Dz = 63. 94, again less than the elf. No statistically significant deviation of individual error proportions from fi-during the pre-solution phase was observed, a result in accord with the strategy selection theory. . Furthermore, pooled values of D2 were not significant for any one ofthe 10 original learning groups and the 4 transfer groups, which contained _S_s with 2 or more errors. If the data showed gradual improvement in the probability of a correct response during the pre-solution phase, then the proportion of * ref- 49 errors should be less than . 50. . In fact, the observed mean proportion of errors was . 55 for original learning and . 51 for transfer pre-solution trials. Since these values were numerically larger than . 50. SS may have used wrong as well as irrelevant strategies (see below). .In any case, there is no indication, either numerical or statistical for a gradual elimination of errors during the pre-solution trials. Although the hypothesis that pa = i— could not be rejected, -S_s could have made disproportionately more errors early or late in the pre-solution phase. A test of stationarity, whether or not the probability of an error went up or down during the pre-solution phase, was made. If the probability of an error was constant, then error frequencies during halves of the pre-solution trials should be equal for each _S_. To make a statistical test, each S's pre-solution trials, ma-l, was divided in half, discarding the middle trial where ma-l was odd, and errors per half were counted. . For _S_s who had at least 2 errors remaining, errors in the first half were compared with those in the second half by a sign test. Table 4. 1 shows proportions of _S_s making more, as many or fewer errors during the first half pre-solution trials compared with the second half. Table 4. 1. Test of Stationarity: lst versus 2nd Half Errors inpthe Pre- solution, Phase. m Proportion of Ss Making More, Aerany or Fewer lst Half Errorsk ' m— , Number Conditionfl ___ More As LMany AFewer of 53 Original learning . 27 . 36 . 37 158 Transfer . 38 . 21 .41 72 50 First half errors did not differ significantly from second half errors, either during original learning (z = -l. 59) or transfer (z = -0. 13) at the .05 level. These results indicate that the probability of an error during the pre-solution trials did not change, and support the theory. Whether learning is continuous or discontinuous during the pre- solution phase can be shown by construction of at "backwards learning curve" for each group. If the individual learning curve is discontinuous, the backward. group learning curve shows performance at a chance level and a sharp jump to better than chance performance (Hayes, 1953). Since criterion was not reached by all SS, such curves for each experimental group could not be constructed. However, ’if the probability of an error is constant and independent of how long the _S_ is in the pre- solution phase, the proportion of errors made by _S_s who are still in the pre-solution phase should be constant and near . 50 over trials. The proportions were obtained by forming, for each trial k, (a ratio of total errors on trial k to the number of SS who were still in the pre-solution phase on trial k. The probability of an error, conditional on E being in the pre-solution phase, is given for original learning (A) and transfer (B) in Fig. 4.1. Both curves in Fig. 4.1 appear to be reasonably flat over the 70 trials where errors were made, even where the _S_ frequencies grew small. Variation of the proportion was nearly all contained within the .40 to . 60 range and the fluctuation'centered around . 50. If gradual learning occurred during the pre-solution phase, these curves would diverge frOm . 50 toward zero over trials. Errors Before Solution The second aspect of the Strategy Selection Theory is the assump- tion that the probability of moving from the pre-solution to the solution phase is a constant, c , on any trial the S responds erroneously. 51 T I B. TRANSFER 3 8 E? § 3 8 '8 8 '8 m n A A A A l l V Y Y 1' V v v ' PROPORTION OF ERRORS 5 L l l 1 l I l l L I j 5 I015 zozs3035404550556065i0 TRIALS ‘T A. ORIGINAL LEARNING 9‘ d 00 -o " O 0 c 0 8 I g 1 : i J i 1‘ > b r W O N c» o o t££%# 6 l _ L T I PRO PORTION OF _ERRORs 3 51015 2025 3035 4045 5055 6005 7b TRIALS Fig. 4. 1. The probability of an error, conditional on§_ being in the pre-solution phase, for original learning (A) and transfer (B). 0 52 The probability of solving-with zero errors is c ,. with one error is c(l>-c),. with 2 errors is c(1-c)7‘, andwithin errors is c(1-c)n, a geo- metric distribution. The cumulative geometric gives the cumulative proportion of SS who solve making n or less errors, that is n . p(n) = 2 c(1.e)1 = 1 -(1»c)n+l (4.4) 1:} for a given value of c. The second part of the test of the stochastic properties of the data is to compare obtained distributions with theoretical distributions of error scores using Eq. 4.4. -This, in turn, requires an estimate of the parameter c for each group. Estimation of c , the Probability of Selecting a Correct Strategy A maximum-likelihood estimator of c ,_ by Eq. 2. 18, which takes into account the presence of non-solvers‘in a group, is 6: Ns/[T + N3], where NS is the number of solvers and T is the total errors made by the group. Using Eq. 2. 18, estimates of c were calculated for all groups in original learning. Six groups made zero or near-zero errors .on' transfer, leaving estimates for four transfer groups. Two of the transfer groups (Angle + Angle Color and. Angle~+ Flower. Color) should show transfer effects and their distributions would not be geometric. Thus, a total of - ten estimates on original learning and two on transfer dataxwere calcu- ~ lated. These values, their standard deviations, corresponding mean errors, and number of solversin each group are given in Table 4. 2. 53 Table 4. 2. . Maximum-likelihood Estimates of c Original Learning Mean Number of Standard Grog Errors Solvers 6 Deviation of 6:” Angle 19.50 14 .035 .0092 Red Angle 12.45 18 .067 . .0152 Green Angle 19.40 14 .035 .0092 7 Angle + Angle Color 3.45 20 . 225 . 0443 Angle Color Control 4. 05 19 .190 . 0392 Angle + Flower Color 2. 40 20 . 294 . 0552 Flower Color Control 3. 40 19 . 218 . 0442 Angle + Angle Color 14. 65 17 . 055 . 0130 Irrelevant Large Angle 8.35 18 .097 .0217 Angle Only 2.60 20 . 278 .0528 Transfer Angle Color Control 20. 10 14 . 034 . 0089 Flower Color Control 19. 75 14 . 034 . 0089 m “The standard deviation of C was obtained by taking the square root of the value obtained by substituting C for each group into Eq. 2. 20 which is var(C) = 82(1-8) /N5, the variance of 8. 54 Goodness -Of-fit Distribution T ests Using the obtained estimates of c , given in Table 4. 2, the theo- retical cumulative proportion of _S_s making n or fewer errors were com- puted by Eq. 4. 4 for each group. Empirical cumulative proportions were also obtained. Non-solvers in some groups produced a piling up of error scores near 32. Their error scores were assumed to be binomially distributed with mean, 64(-;-) = 32, and variance, 64(§-)(§-) = 16. Thus, the theoretical distribution of _S_s making n or fewer errors is a summed geometric distribution combined withthis binomial distribution of non- solvers. Figures 4. 2, 4. 3 and 4.4 present the Observed and theoretical distributions for ten original learning and two transfer groups. The fit of theoretical to obtained distributions is very close in each of the twelve graphs. The degree of approximation was tested by the Kolmogorov-Smirnov one-sample test since expected values of low error scores were too small to justify the chiZ test (Walker and Lev, 1953, p. 443). . Maximum discrepancies between theoretical and obtained distribu- tions are given in Table 4. 3. Maximal discrepancies tended to be small, averaging . 169 for the twelve comparisons. . In ten cases, the p value of the Observed maximum discrepancy was larger than . 20 and in one case, larger than .05. Only in Angle Color Control was the theoretical distribution rejected at the . 05 but not at the . 01 level. This group learned rapidly and had more _Ss making exactly 2 errors than was expected.. The failure to reject the approximations in eleven of twelve cases agrees with. the assumption that c , the probability of moving from the pre-solution to the solution phase, is a constant. Figs. 4. 2, 4. 3 and 4.4. Observed and theoretical cumulative distributions of the proportion of _S_s making n or fewer errors. Theoretical distributions are computed from equations and parameter estimates given in the text. 55 mmommm muommm .mm on .3 ON B..-.o. m. 0 mm om. mm om h. o. m o «In. 4 ‘14 56 Fig. 4.2 H 24> immu— aSOu mLoz<+ mdza a T ' .fi are . . . .. 03 wrza 3m . m: nm>dmmm out a. Jrorrmxom I... Lt L. maze .< H 57 mmommm hm. om. .mN ON 14.. O. mmoxmw mm cm 3 ow m. o. m _ _ c d - m4¢z< mmmcq. a Fig. 4.3 Zzo 5oz... a m H. M98 Maze-E +m.._.uz< .m q d 1 a a 4 ”.38 fiezfflrza .< Aw>mmhm 01v arotmromrbd 39° A r ssssiéis r SiDEL’QflS‘ :10 NOLL‘dOd 58 WmOde . mmommm hMonnoNn_o.hommomMNouhgmb .q. q q .q q q q I 1 a 1 d nxwumzcmt . .. nnmnuzsrt Jog—.28 «38 520.7. .0 .. 42.58 «38 mnwzc .O Fig. 4.4 l L I :10 NOLL‘dOdOHd J) 1 a a a . 4 Ir. 32.523 ntZEEou . nofizoo «Sou m32< .q V 322cm... izEEOV 49:28 «.38 629... .m . . , J «u. bro... I... .. Au stemmed V .98 838 8 20 SiDgfeflS V T ass 59 Table 4. 3. . MaximumtDiscrepancies Between Theoretical and Observed , Error Distributions Original Learning a): P: P: ** .10>P>.05 .05>P>.01 .Group Maximum Discrepancy Angle . 114 rRed‘ Angle . 125 Green Angle . 216 Angle + Angle Color Irrelevant . 105 Large Angle . 100 Angle Only . 278* Angle + Angle Color . 215 Angle Color Control . 332** Angle + Flower Color . 101 Flower Color Control . 178 Transfer Angle Color Control . 114 Flower. Color Control . 150 60 Estimation of w , the Proportion of Wrong Strategies The mean proportion of errors in the pre-solution phase, Sa' was reported aboveto be . 55 for original learning and .51 for transfer data. . The fact that these values are larger than . 50 suggests that. wis larger than zero, since a high frequency of consecutive errors in the pre- solution phase indicates a large proportion of wrong strategies and chance frequency of consecutiveerrors is an indication of relatively'few wrong strategies. . If w is small or zero, then errors should be as frequent after errors as after correct responses. To test this, frequencies of errors followed by errors and correct responses followed by errors were tabu- lated for each4_S_ during the pre-solution pha’seg Chiz tests were performed on the resulting 1 x 2 table for each S where the expected frequency of each cell was at least 5. These values were summed over _S_svwithin and then, over groups. During original learning, 69‘_S_s showed a pooled chiz of 59. 85 which. is not significant. During transfer, the pooled chiz was 60.72 with. 55 d_f_,, again not significant at the . 05 level (z = 0. 58). 7 These tests indicate that the S's probability of making an error is independent of whether or not he made an error on the previous trial, and that wis probably small. I Comparison of individual S's proportion of errors following errors and correct responses following errors would also showany tendency to use wrong strategies- This comparison was made and tested by a sign test on the pre-solutiontrials of all Es who made 2 or more errors. The proportions of _S_s making more, as many or fewer errors following errors than correct responses following errors are given in. Table 4.4. 61 Table 4.4. Test of Consecutive Errors: Proportions of SS Making More, as Many, or Fewer Errors Following Errors thanCorrect Responses Following Errors Proportion 0f §S Number of Condition -More As Many Fewer» Ss Original Learning . 50 . 19 . 3 1 163 Transfer .43 . 08 .49 76 SS made significantly more errors following errors than correct responses following errors during original learning (2 = 2.79,. P < .01). The difference was not significant on transfer (2 = -0. 36, P > . 05). . Estimates of w were obtained by Eq. 2. 21, w = [M1 - Mo + 1]/[l\_/ll + Mo] = [M1 + Mo + ll/T, a correction of Restle's Eq. 38 (1961b, p. 299), awhere M0 is the mean number of errors followed by correct responses and K711 is the mean number of errors followed by errors. T, the mean total error score of a group, is equal to M0 + M1. .In the computation of Mo and M1, the last error in each _S_'s sequence is taken into account. The obtained estimates of w are reported in Table 4. 5 and are compared with estimates of c. The estimates of w were larger than the estimates of c in all twelve cases: A positive relationship between w and c is expected from. the theory sinceaboth correct and wrong hypotheses depend upon the relevant dimension. . Inspection of the pairs of values in Table 4. 5 indicates a positive relationship. The rank order correlation between the two sets of estimates was +.79, (P < .01), indicating that w is significantly related to c. Since all estimates of w were larger than zero, there is an indi- cation that _S_s use both wrong and irrelevant strategies in solving the problems and do not merely guess. 62 Table 4. 5. Estimates of w , the Proportion of Wrong Strategies, Compared With Estimates of c. (Groups are ranked according to C). Original Learning Group 113 6 Angle . 092 . 035 Green Angle .129 . 035 Angle + Angle Color Irrelevant . 188 . 055 Red Angle .100 . 067 Large Angle .138 . 097 Angle Color Control . 383 . 190 Flower Color Control . 353 . 218 Angle + Angle Color . 275 . 225 Angle Only .462 . 278 Angle + Flower Color . 542 . 294 Transfer Angle Color Control . 149 . 034 Flower Color Control . 104 . 034 63 Individual Differenc e s The assumption that the probability of selecting a correct strategy is a constant and is the same for all §S implies that individual differences are randomand not stable. A To test this, inter-correlations were com- puted on error scores between the practice problem:(P), original learning (CL) and transfer (T) for all groups where possible. . Thedata for the correlations of P-OL was from the first 20 SS and for correlations between P-T and OL-T, from the 20 solvers in each group. The correlations are presented in Table 4. 6. Of the 18 correlations shown, only one was significant. This corre- lation, for Angle + Angle Color Irrelevant between P and CL, dropped from 0. 52 to 0. 34 (not significant) when three additional solvers of the group were added. The set of non-significant intercorrelations obtained shows the absence of any stable individual differences on problems, a result in accord with the Strategy Selection Theory. The evidence in this chapter is consistent withthe Strategy Selection Theory. An analysis of the pre-solution performance indicated that _S_s distribute their errors randomly with probability of an error near . 50. The probability was found to be stationary in the pre-solution phase by (a) evidence that frequencies of errors during the first and second halves of the pre-solutionphase were equal and (b) the proportion of errors, conditional onS being in the, pre-solution phase, was constant and near . 50 over trials where errors were made. . Fitted error distributions from the theory yielded gOod approximations in eleven of twelve cases. Evidence for _S_s using wrong strategies was given, although the proportion of wrong strategies, w , was indicated by independence tests to be small. A high positive correlation between estimates of w and c, the proportion of correct strategies was found, indicating a relationship expected from the theory. . Inter-correlations between practice, original learning and transfer problems indicated no stable individual differences, as expected from the theory. 64 Table 4. 6. Inter-correlations Between Practice (P), Original Learning (OL), and Transfer (T) Problems. (N = 20 Es). L Group P-OL P-T ’ OL-T Angle -0. 03 —— -- Red Angle -0. 24 .. -- Green Angle +0. 06 —. -- Angle + Angle Color +0. 04 -0. 07 -0. 32 . Angle Color Control -0. 34 -0. 15 +0. 24 Angle + Flower Color -0. 13 -0. 35 +0. 03 Flower Color Control -0. 31 -O. 12 -0. 22 Angle + Angle Color +0. 52** -- -- Irrelevant Large Angle +0.12 -- -- Angle Only +0. 23 -- .. #0:: .05 >P> .01 CHAPTER V EX PERIMENTAL RESU LT S To evaluate the effect of the training conditions on acquisition, the original learning performance of each experimental group is compared .with that of group Angle. The effect of training on transfer is studied by comparing each. experimental group's performance on the Angle problem »with the original learning of group Angle. Together, these comparisons constitute examples of the transfer-of-training paradigm, X—-> Y versus Y. Discriminanda Difference Increase and Removal of Irrelevant Dimensions Large Angle and Angle Only bothlearned significantly faster than group Angle. Doubling the angle size difference in Large Angle decreased mean errors by slightly more than one-half. . Removal of the irrelevant dimensions by holding them constant made acquisition of the angle concept considerably easier to learn in Angle Only. See Table 5. 1. Table 5. 1. Comparison of Large Angle and Angle‘Only with. Angle on AA Original Learning Mean Parameter Tests Group Errors S.D. * t** z*** -21n().)>1<*>1<>':< Angle 19.50 10.51 -- -- -- Large Angle 8.35 9.07 -3.60 2.58 8.75 Angle Only 2.60 2.06 -7.07 4.50 36.45 * **Standard Deviation >“($11.05 > 2.04 Wlth (if: 30 z.05 = 1096 -2 In (x) = Chiz.05 = 3.80, 1 (if 65 **** 66 Both groups showed nearly perfect transfer to the Angle problem. In each group, only three _S_s made errors on transfer. ~Transfer was so very positive that statistical tests comparing Large Angle and Angle Onlywith Angle were unnecessary. . After Es reachedlearning criterion, reduction in the relevant angle size difference or introduction of new irrelevant dimensions had a very small disruptive effect on transfer. A Transfer data for Large Angle and Angle Only are summarized and com- pared with original learning performance on Angle in Table 5. 2. Table 5. 2. Transfer to the Angle Problem by Large- Angle and Angle Only Compared with Original Learning of Angle _L ' L L Lin J— Group Mean Errors S. D. Angle 19. 50 10. 51 Large Angle 0. 20 0. 62 - Angle Only 0. 40 1. 35 A A Color as a Constant Emphasizer ‘ The role of color as a constant emphasizer on acquisition of the angle concept was studied by comparing Red Angle and GreenaAngle with Angle on original learning. Transfer of the constant emphasizergroups to the Angle problem tested the effect of removing a stimulus which could not be the basis for solution. Red Angle and Green Angle are compared with Angle on original learning in Table 5. 3 and on transfer in Table 5. 4. The effect of color on the angles during origin-a1 learning was not pronounced, but transfer to the Angle problem was perfect for Red Angle and nearly perfect for Green Angle. A Since the «transfer was’perfect or nearly perfect, removal of color, which was not a basis for a correct strategy but an emphasizer, had no disruptive effect. 67 Table 5. 3. Comparison of Red Angle and Green Angle with Angle on Original Learning Mean Parameter Tests Group Errors S. D. t z -2 ln().) Angle 19.50 10.51 -- -- -- Red Angle 12.45 11.79 -2.00 1.85 4.37 Green Angle 19.40 13.01 -0.03 0.00 0.00 Table 5.4. 4 Transfer to the Angle Problem by Red Angle and Green Angle Compared with Original Learning of Angle Mean Group Errors S.D. Angle 19. 50 10. 51 Red Angle 0.00 0.00 Green Angle 0.05 0.00 , Y. 68 ’On original learning Red Angle amade one-third fewer errors than Angle, but the difference was of borderline significance in two of the three statistical tests. . Green Angle made the about same number of errors as Angle, indicating that the green color had no emphasizing effect. As a further control on transfer, 14 _Ss in Angle who solved the original learning problem were given an additional 10 trials and made no errors onsthis "hard-to-hard" transfer. Color as an Irrelevant Dimension and an Emphasizer In Angle + Angle Color. Irrelevant, color played two rales which lead—to opposite effects., If the irrelevant r61e predominated, the group would learn more slowly than Angle. If the emphasizer rOle predomi- nated, faster learning would occur. The performance of Angle + Angle Color Irrelevant is summarized and compared with Angle in Table 5. 5. Table 5. 5. . Comparison of Angle + Angle Color Irrelevantwith Angle on Original Learning and Transfer A. . " . 4 - . .Mean Parameter Tests Group Condition ' Errors S. D. t z -2 In. (it) Angle Original 19. 50 10. 51 -- -- -- - Learning Angle + Angle Original 14.65 12.99 -1.31 1.26 1.77 Color Irrele- Learning »vant' 'Angle + Angle Transfer 0. 00 0. 00 unnecessary ‘Color' Irrele- .vant 69 When color was both an emphasizer and an irrelevant dimension, learning was slightly facilitated. Although. Angle + Angle Color Irrelevant made ”fe‘wer‘errc'i'rs‘than Angle on original learning, the differenceawas not “Significant. . The slight facilitation is an. indication, however, that the emphasizer r‘Ole predominated over the retarding effects of the added irrelevant dimension. . Since color was irrelevant and could not be the basis for a correct strategy its removal had no effect as Angle + Angle Color Irrelevant showed perfect transfer to‘ the Angle problem. Color as an Emphasizer or-a—Counter-emphasizer and a Redundant Relevant Dimension .Angle + Angle Color and Angle + Flower Color, whichahad prob- lems involving color as a- relevant and redundant cue during original learning, are compared with the original learning of Angle in Table 5. 6 and on transfer in Table 5. 7. Table 5. 6. . Comparison of Angle + Angle Color and Angle + Flower' Color with Angle on Original Learning A _n‘. “A. u _ w _s_ _m . A m .._A , '— , Mean Parameter Tests Group 1 Errors ‘5. D. t z ' -2 111(k) Angle 119.50 10.51 -- -- e- Angle+ Angle 3.45 5.31 -6.10 4.44 30.85 Color Angle + Flower 2. 40 3. 59 -6. 90 4. 52 40. 20 . Color 70 Table 5. 7. Transfer to the Angle Problem by Angle + Angle Color and Angle + Flower. Color Compared with Original Learning of Angle* - vMean Group Errors S.D. t Angle 19.50 10.51 -- Angle +' Angle 16.45 14.45 --O. 76 Color Angle+ Flower 23.45 12.14 1.13 Color mm “mm * . Normal distribution and likelihood ratio parameter tests were not made, since with the possibility of transfer, error distributions would no longer be geometric. Both groups learned faster than‘Angle which had only the angle dimension relevant. -All §_s solved the original learning problem and were transferred to the Angle problem in both groups. Angle + Angle Color, in which the angle was emphasized during original learning, made slightly fewer errors on transfer than Angle. . Angle + Flower -‘Color, in whichvthe angle was counter-emphasized during original learning, made slightly more errors on transfer than Angle. . Transfer for both groups did not differ significantly from the original learning by Angle. Color Versus Angle as a Salient Cue 'Color was found to be more salient as a cue than angle. Both Angle Color Control and Flower Color Control, which had fixed 450 angles, made significantly fewer errors than Angle. See Table 5. 8. 71 Table 5. 8. Comparison of Angle Color Control and Flower Color Control with Angle on Original Learning Vv— v Fvi fiw~ _ln__ 4 Mean Parameter Tests Group .Errors S. D. t z -2 ln()\) Angle 19.50 10.51 -- -- .. -- Angle Color 4.05 8.49 -6. 13 3.88 20.22 Control Flower Color 3.40 5.31 -6. 12 4.07 28.94 Control Additivity of Strategies Additivity of angle and color strategies is suggested since the redundant relevant cue groups made fewer errors than their color con- trols which had. only color relevant, or the Angle group, which had only the angle relevant. - Angle + Angle Color made 3.40 mean errors com~ pared with 4. 05 for Angle Color Control and Angle + Flower Color averaged 2.40 errors compared with 3.45 for Flower'Color Control. Angle made 19. 50 mean errors. Test for Transfer of an Observing Response The slight positive transfer of Angle + Angle Color and the slight negative transfer of Angle + Flower Color to the Angle problem, although not significant, could be interpreted as supporting a strict interpretation of the observing response theory of Wyckoff (1952). The interpretation is that _S_s learned to look at that part of the pattern which was colored during original learning. Transfer of an observing response to the angle dimension would lead to positive transfer in Angle + Angle Color, for §_s would have presumably learned to look at the angles during original learning. Transfer of an observing response to the flowers in Angle + 72 Flower Color would lead to negative transfer on the Angle problem. The possibility of transfer of an observing response to the place which was colored during original learning was tested by comparing transfer performance of Angle Color Control and Flower Color Control on the Angle problemawith the original learning of Angle. The com- ' "parisons‘ are given in Table 5. 9. Table 5. 9. Transfer to the Angle Problem by Angle Color Control and - Flower Color Control Compared withOriginal Learning of Angle Mean Parameter Tests Group , AErrors S. D. t z -21n().) Angle 19.50 10.51 -- -- -- Angle Color 20.10 13.86 0.15 -0.01 0.00 Control Flower Color 19.75 14.01 0.06 -0.01 0.00 Control Transfer of Angle Color Control and Flower Color Control to the Angle problem did 'not differ significantly from original learning by Angle. . These results indicate that thereawas no important transfer of an observing response to the place which was colored during original learning. Other Evidence for an Emphasizer Effect One way of evaluating the emphasizer and counter-emphasizer effects in Angle i- lAngle Color and Angle + Flower Color is to compare the number of _S_s in each group Who show perfect transfer to the Angle problem. 73 Perfect transfer indicates that an§ probably solved on the angle dimension during original learning. . In transfer from Angle + Angle Color, five _S_s made no errors, whereas, from Angle + Flower'Color, only. one _S_ made no errors. The difference is borderline significance (z = 1. 82,. P < .. 10) and is attributed to the role of color. . In Angle + Angle Color, color was spatially contiguous to the angles. . In Angle + Flower Color, the color was spatially separated from the angles.. In Angle + Angle Color, the color apparently increased the probability of _S_s attending to and solving on the basis of the angle dimension, but in Angle + Flower Color, color had the opposite effect. When the five _S_s who made no errors on the Angle problem are removed from Angle + Angle Color, the mean errors for the remaining fifteen S8 was 21. 93, not far from the 20. 10 of Angle Color Control. Efficiency of Training Total efficiency comparisons of all the groupscannot be adequately evaluated since a number of S3 failed to solve within the time limit on original learning. The existence of non-solvers on transfer also pre- cludes a trials to criterion measure or a total error measure for use in efficiency comparisons. Three groups, Angle Only, Angle + Angle Color and Angle+ Flower ' Color, did have all gs solve original learning problems at about the same rate... In Angle Only, training _S_s on the relevant angle without irrelevant dimensions present and then transferring themto the Angle problem proved to be a highly efficient training procedure. Angle Only made a total of 3. 00 mean errors on original learning and transfer, with all _S_s solving the Angle Problem. Compared with the performance of Angle, the difference was highly significant (t = 6. 88). 74 The redundant cue groups, Angle + Angle Color and Angle + Flower ' Color, did not perform significantly better on transfer than the Angle group in original learning. Training on these redundant-cue Easy problems was apparently wasted. The apparent inefficiency of this redundant-cue program is difficult to evaluate, because the program gave the Es more opportunities to make errors. . However, the redundant- ' cue-to-hard procedure cannot be rated more efficient than training on the hard problem alone. Despite the presence of two non-solvers, Large Angle, which made 8. 35 mean errors on original learning and only 0. 20 mean errors on transfer, may be regarded as more efficient than Angle, which had six non-solvers and made 19. 50 mean errors. This result of efficiency in transfer along a stimulus continuum in a concept identification task is in accord with previous findings on this training program (Lawrence, 1952; Baker and Osgood, 1954; Restle, 1955). Comparison of Statistical Tests In all comparisons that were made, the probability values of the t- and z- tests and -2 1n (X) were about equal. These results suggest that the tests are about the same power. Summary of Experimental Findings The major experimental findings reported in this chapter are: l. Doubling the angle size difference during original learning led to faster acquisition and nearly perfect transfer of the angle concept. 2. Removing the irrelevant dimensions during original learning led to faster learning and nearly perfect transfer of the angle concept. 3. The constant emphasizer effects were not strong. When red appeared on the angles, acquisition was faster. When green was used, 75 no emphasizer effect was shown. Transfer from the constant color emphasizer groups to the Angle problem was perfect or nearly perfect. 4. Adding color which was an irrelevant dimension and an emphaa sizer led to slightly faster learning. Transfer to the Angle problem - was perfect. . Removing color, when not a basis for a correct strategy, had no disruptive effect on transfer. I 5. Adding color as a redundant relevant cue led to faster learning but transfer was either (a) slightly positive when the color had been an emphasizer during original learning or (b) slightly negative when color had been a counter-emphasizer during original learning. 6. Color was found to be a more salient cue than the angle. 7. Two procedures appeared to be most efficient: a. training §_s on the relevant angle cue in the absence of irrelevant dimensions and then transferring them to the hard Angle problem and b. training _S_s with a large difference between the angle sizes and then transferring them to the hard Angle problem. CHAPTER VI DETAILED PR EDICTIONS According to the Strategy Selection Theory, the probability of solv- ing the problem on any trial is c , the proportion. of correct strategies. Correct and wrong strategies depend upon stimulus dimensions which vary during training and are correlated with reward. Irrelevant strategies depend upon stimulus dimensions which vary but are uncorre- 1ated with reward. Other factors such as the outcome of previous trials, incidental events, etc. also determine the number of irrelevant strategies in the problem. The strategies in the present experiment arise from the flower patterns. Several problems had patterns where the angle dimension was varied and relevant. Some of the strategies arising from the angle dimension depend upon the contrast of the small and large angles. These strategies are relevant when that dimension is made relevant, as when the 300 angles are in class A and the 600 angles are in class B. Such strategies from the angle are called A . Strategies from the flower color are called Cf and from angle color are called Ca. All other irrele- vant strategies in the problem, from all sources, are assembled into one set, I . If a dimension is fixed (does not vary), then strategies depending upon the contrast between values of the dimension do not exist in the problem, though other strategies arising from them are presumably present. Fixing a dimension by making it one-valued removes just the contrast set of strategies but does not affect the irrelevant strategies in the set I (cf. Trabasso, 1960). 76 77 The following sets of strategies recur, in-various combinations, in-more than one problem: Angle (A),. Color on. Angle (Ca), Color on .Flower (Cf), and the irrelevant or background set, (I). In addition, tWo emphasizer effects (multipliers), r- and g ,1 for red and green emphasizers respectively, are used. Theoretically, these six numbers are sufficient to calculate the probability of solving, c , -for" eightvgroups by use of Eqs. 2.6 to 2. 10 of Chapter II. . These same parameters, as employed in Eqs. 2. 11 to 2. 14 of Chapter II, yieldquantitative pre- dictions of transfer. The measures of the sets A , Ca , and Cf will be estimated relative to the measure of I . .1 For example, to estimate .m(A‘), we'use the Angle problem in which the angle dimension is relevant, color does not appear, no emphasizer was used, and the set I are irrelevant. Since evidence was found for a strong relationship between the proportions of wrong and , correct strategies (Chapter IV), the assumption made in; ChapterII that .wrong and correct strategies from any dimension are equal in measure, i. e. , that m(A) = m(A*), appears justified, and shall be made. . Inall problems to be considered, the dimensions of flower- shape, leaf shape and leaf position are irrelevant, and strategies based on these dimensions are incorporated into the irrelevant set I . . Then from. Eq. 2.6, c(Ang1e) = m(A)/[m(A) +> m(A*) + m(I)] = m(A)/[Zm(A) + mm]. (6. 1) In Chapter IV, (Table 4. 1, an estimate of c. for the angle group is given as -. 035. Putting this estimate in for c (Angle) in Eq. 6. 1, we find -m(A) =7.0381m(1). (6.2) This gives the measure of angle strategies relative to the measure of irrelevant strategies. . Formulas for c in. five groups which-are usedto estimate'm(A), m(Ca), m(Cf), r, and g are shown in summary form in Table 6. l. 78 The formulas are derived using equations from Chapter II, as referred to in the table. . Expressions proportional to m(I), analogous to Eq. 6. 2, are shown'in Table 6. 2. Table 6.1. Formulas of c for Experimental Groups used in Predictions A . , - A m n_. A— _ Theoretical Group c Definition of c ‘ Equation Angle . 035 m(A)/[Zm(A) + m(I)] 2. 6 Red Angle .. 067 r. m(A)/[2... r. m(A) + m(I)] 2. 9 Green Angle . 035 g. m(A)/[2. g. m(A) + m(I)] Z. 9 ‘Angle Color Control . 190 m(Ca)/[2m(Ca) + m(I)] 2. 6 _ Flower Color Control . 218 m(Cf)/[2m(Cf) + m(I)] 2. 6 Table 6. 2 . . Estimates of Sets of Strategies Used in Predictions Set of Strategies Measure Proportional Estimate Angle ' m(A) . 038m(I) Red Angle r. m(A) . 077m(I) Green Angle g. m(A) . 038m(I) Angle Color ‘ m(Ca) . 306m(I) Flower Color ' m(Cf) . 386m(I) 79 Additivity of Strategies in Original Learning Strategies are said to be additive when it is shown that _S_s can learn a problem with two relevant sets, of strategies (either of whichcan be used to solve the problem) more rapidly than a problem with only one set of relevant strategies. The Strategy Selection Theory asserts that the rate of learning (c) depends directly on the proportion of correct strategies, so that from additivity of strategies, one can predict additivity of learning rates. Provided that the sets of strategies are disjoint, these predictions constitute a test of the correctness of the theory. - In the problem of Angle + Angle Color, the emphasized angle and the color of the angles are relevant and redundant dimensions. All others are irrelevant. When the angle is red, by Eq. 2.8 and Eq. 2. 9, r.m(A) + m(Ca) [2r.m(A) + Zm(Ca) + m(I)] c(Red Angle + Angle Color): and when the angle is green, g.m(A) + m(Ca) [Zg.m(A) + 2m(Ca) + m(I)] c(Green Angle + Angle Color) 2: Since red and green angles appear equally often, the mean If r. m(A) + m(C a) + 1 2-L?.r. m(A) +2m(C a) + m(I) T (6. 3) c(Angle + Angle Color) = __g.m(A)L+ m(Ca' ) Zg.m(A) +Zm(Ca) + m(I) To predict c(Angle + Angle Color), appropriate estimates of the sets of strategies (from Table 6. 2) were substituted into Eq. 6. 3. The predicted c (Angle + Angle Color) = .. 210, which is close to the observed value of . 225. This prediction is not rejected by the likeli- hood ratio test (-2 ln(>.) = 0. 09 P > . 05). This test treats the prediction as a fixed parameter, i. e. , tests the hypothesis that C: co. 80 Converting the prediction to mean errors by Eq. 2. 2, T = (l-c)/c = (l-. 210)/(. 210) = 3. 76 mean errors predicted. where 3.45 were observed. Even if this prediction is taken as a fixed value, the difference between the predicted and observed is not significant at the . 05 level (t = 0. 26). In the problem of Angle + Flower Color, the angle and the color of the flowers are relevant and redundant dimensions. 1 All others are irrelevant. From Eq. 2.8, n [m(A) + m(Cfi] [2m(A) + 2m'(Cf) + m(I)] c(Angle + Flower Color) I: (6.4) The predicted c(Angle + Flower Color) =, . 229, somewhat lower than the observed . 294, but the difference is not significant at the . 05 level (-21n().) = l. 52). Predicted mean errors are (1-. 229)/(. 229) = 3. 37, compared with 2.40 observed, a result within sampling variability (t = 1.46, P > .05). 1 Additivity of Irrelevant Strategies The theory states that the learning rate (c) depends upon the relative amounts of relevant and irrelevant strategies, from whichit follows that increasing the number of irrelevant strategies should retard learning. 1To followthe theoretical argument exactly, one must say'.that when the flower is colored, this not only adds relevant color cues but also emphasizes the shape of the flowers. Since the shape of the flower is irrelevant, the result is that one component of I has been emphasized. Unfortunately, this possibility was not foreseen at the time the experi- ment was designed, and no group has introduced which would permit estimating m(F), the measure of strategies associated with the shape of the flower. Hence, even if it is assumed that the emphasizer effects, r and g , would have the same effect on flowers as they have on angle, it is not possible to predict the increase in I. The calculations reported here are predicted on the assumption that the additional measure of irrelevant strategies, through emphasis of flower shape, is negligibly small. The observed discrepancy is opposite to that expected from emphasis of flower shape. 81 In the problem of Angle + Angle Color Irrelevant, the angle dimension was relevant and emphasized, red and green angles appearing about equally often. Angle color was an added irrelevant dimension and all others were irrelevant. When the angles are red, by Eq. 2.6, c(Red Angle) = r.m(A)/[2r.m(A) + m(Ca) + m(I)] and when the angles are green, c(Green Angle) = g.m(A)/[2g.m(A) + m(Ca) + m(I)]. Since red and green angles appear equally often, the mean C(Angle + Angle Color Irrelevant) = r g.m(A) a) + m(I)] + %- [Zg.m(A) + ,m(Ca) + m(I) ° = 1 [ r.m(A) 2r.m(A) + m(C (6. 5) The predicted c(Angle + Angle Color Irrelevant) = . 040, lower but not significantly different from the observed value of . 055(- 2 ln().) = 1.54, P > .05). Converting to predicted mean errors, T = (1-.040)/(.040) = 24.00 where only 14. 65 were observed. Taking the prediction as a fixed value, the difference is significant at the .05 level (t = 3. 22).2 Transfer _S_s who were transferred to the Angle problem had first met a learning criterion on the original learning problem. The probability of ZThe conversion formula for predicted mean errors assumes that learning is complete. This condition was not satisfied since the group was stopped at the end of 64 trials, and 3 of the 20 Ss were still making errors. If the same conversion formula were appli-ed to the maximum- 1ikelihood estimate of .055, T = 17.48, which is still less than the pre- diction. The likelihood ratio test is not as stringent as the t-test since the t-test does not take into account the sampling variance of the para- meter estimates used in the prediction. 82 solution on problem 2 (transfer) after having solved problem 1 (original learning) is given in Eq. 2. 11 (Chapter II). Consider those original learning problems where the angle dimension is the only relevant dimension, namely Angle, Red Angle, Green Angle, and Angle + Angle Color Irrelevant. If color is present and either con- stant or irrelevant, then the _S_ cannot come to depend upon it as a basis for a correct strategy. Since the angle dimension is relevant in both problems 1 and 2 and neither color nor any other strategy can be correct, C1 = C2. Eq. 2. 11 then becomes unity and all groups should show perfect transfer. This prediction is confirmed. Angle, Red Angle, Green Angle and Angle + Angle Color Irrelevant showed perfect or nearly perfect transfer. A total of 74 _S_s made only 1 error in transfer. It is possible for S to have reached criterion by chance and then commit errors on transfer. In original learning, the probability of ten in a row correct is _c_:_ + (l-c)(~)—,‘l°. Using Angle as an example, with 6 = . 035, the guessing probability is (l-c)(%-)10 3' .001. The proportion of _Ss who guess to those who solve is (1-c)(%-)1°/ c, or in Angle, .001/.035 = 1/35. This result means that one might expect one _S_ in thirty-five to solve the original learning problem by chance and commit errors on transfer. This did not occur in the present study. (One S in Green Angle made one error on transfer). In the control problems of Angle Color Control and Flower Color Control, the C1 strategies arose from the color dimension. Since the C2 strategies were based on the angle, the set (C, n C2) = 0. Then, Eq. 2. 11 is also zero. The expectation is that these groups would per- form on the Angle problem much like Ss in Angle. The expectation is confirmed since no evidence for any kind of transfer was observed in thes e groups . 83 Predictions of transfer for the redundant-cue groups is by Eq. 2. 13. .For Angle + Angle Color, the conditional probability of solving on the angle during original learning is __ 1 r.m(A) l’_ g-mW 1 P(A/AU Ca) — :- (r.m(A) + m(Ca) + %’ [g.m(A) + m(CaU (6. 6) The predicted proportion of SS who solve on the angle during original learning is . 155, and the expected number is 20(. 155) = 3. 10. The remaining 16. 90 SS, who have theoretically solved on angle color, should show no transfer to the Angle problem since Ca is not contained in the A strategies of problem 2. Using the estimate of c from Angle Color Control, .034(16. 90) = 0. 57 of these _S_s are expected to solve the Angle problem without any errors. Together, 3.10 + 0.57 = '3.67'§_s are predicted to make zero errors in transfer to the Angle problem. Five _S_s in Angle + Angle Color did show perfect transfer. To predict mean errors on transfer for Angle + Angle Color, the expectation is that 3. 10 _Ss make no errors since they solved on the angle during original learning and that the remaining 16. 90 _S_s would perform like their controls in Angle Color Control. Since Angle Color Control averaged 20. 10 errors, the predicted mean errors for Angle + Angle Color on transfer is (3.10)0 + 16. 90(20. 10) = 16. 98, very close to the 16.45 observed. 20 Taking the prediction as a fixed value, the difference is not significant at the .05 level (t = 0.16). . For» Angle +, Flower Color, by Eq. 2. 13, the conditional probability of solving on the angle during original learning is P(A/AU cf) = m(A)/[m(A) + m(cfn. (6.7) 84 The predicted proportion of _S_s who solve on the angle during original learning is .09, and the expected number. is 20(.09) = 1. 80'. The re- maining 18. 20 Es, who have theoretically solved on flower color, should show no transfer to the Angle problem since Cf is not contained in the A strategies of problem 2. - Using the estimate of c from.Flower‘Color , Control, .034(18. 20) = 0.62 of these _S_s are expected to solve the Angle problem without any errors. Together, 1.80 + 0.67 = 2.42 _S_s are pre- dicted to make zero errors in transfer to the Angle problem. One§_ in Angle + FlowerColor did show perfect transfer. To predict mean errors on transfer for-Angle +' Flower, Color, the expectation is that l. 80 SS make no errors since they solvedon the angle during original learning and that the remaining 18. 20§s would perform like their controls in Flower Color Control. Since Flower Color Control averaged 19. 75 errors, the predicted mean errors for Angle + Flower‘Color is (1.80)0 + 18. 20 (19. 75) = 17. 97, lower than the 23.40 observed. 20 Taking the prediction as a fixed value, the difference is of borderline - significance (t = 2.02,. P = .05). Predicted Cumulative Error Distributions on Transfer For Angle Color Control and Flower Color. Control, the predicted rate of learning is .035, the value from group Angle. All SS in these two color control groups are expected to perform like 88 in group Angle since Ca and Cf strategies are not contained in the ‘A strategies of the Angle problem. « From Eq. 4.4 (Chapter IV), the predicted cumulative distribution of the proportion of Ss making n or fewer errors, for solvers, 18 p(n)=1-(1-.035)n+1. 85 . Angle contained 6 non-solvers so that 6 non-solversareexpected on transfer from Angle Color Control and Flower Color Control, respectively. Six Es failed to reach criterion in both these control groups on transfer. ‘ . Assuming the binomial distribution withmean = 32 and variance = 16 for the .6 non-solvers ineach group, the cumulative normal approxi- mation for the 6~non-solvers was added to the cumulative‘geometric, 1- (1—.o'35)n+1 . Figure 6. 1 (C' and‘ D) shows the predicted and obtained'distributions for , to form the predicted distribution of error scores. “ Angle Color Control and Flower Color Control. The maximum (dis- crepancy for Angle Color 'Control 'was 7. 130 and for Flower color (Control, . 146,. both of which were non-Significant at the . 05 level by the-Kolmogorov- Smirnov one—sample test (maximum discrepancy allowed = . 320). For Angle + Angle Color, 3. 10'_S_s are-expected to show-perfect - transfer, as they theoretically solved on the angle during original learn- ing. 6' The learning rate forthe- 16. 9O remaininggs is . 034 from Angle Color Control. The expected numb erof non-solvers is taken’ as propor- tional to the number of 58 who failed to solve in transfer from Angle Color MControl,_,SinceW6/20= .30 the expectation is that. 30(16. 90)= 5.07 Se would not solve, whereas 6 were, observed in Angle + Angle Color's transfer to the Angle problem. The predicted cumulative proportion of SS making n or' feweri errors was obtained by i A 1. using the cumulative geometric (Eq. 4. 4), p(n) == 1 (1-. 034)n+1, '. for the 16.90 _S_s who theoretically solved their original learning problem on the angle color, 2. adding to this, a cumulative normal approximation to the bi- nomial distribution of errors for the 5. 07 expected non-solvers, with :mean = 32 and variance = 16, and 86 , . .38» 23 cm Ger/Mm mmumefiumo Houoamnmm can maofldswo 80.3 pofifiafloo our. mnoflfinwuumflp pouowponnm .mnonuo .333 no a magma mum: mo coauuomoum 65. mo maoflfinwuumg 95313890 po>uomno can pouufluounm .H .o .wmh my... OK m m Wmom m m mm 8.3 3 201m 0 hm cm .94. Own. 2 m o m . .. q 1 . Jr . . . . . q q 4 4 a 1 :0 o 46. n :8 N .1. m mtg V is H. . A +8 3 :3 4.3 10 . £ch 48:8 «38 5231 a .5328 ~38 who: or... d t 0 H u 0 a _ A v p O N i.9 n3 0 .18 J— .2. 9 4m>ammmo Am m. A 3 m Owl—.0.ng 1:“ D n... 1 . . S- «38 32641 .5qu .m h... MES m32<+m$z< .a :2. 87 3. adding to this composite distribution, the 3. 10 _S_s who were expected to make no errors on transfer since they theoretically solved on the angle during original learning. The resulting predicted distribution is compared with the observed in Figure 6. 1(A). ‘ The fit to the observed distribution for Angle + Angle Color on transfer was quite close. ‘The maximum discrepancy was ,. 144 and is not significant at the . 05 level. For Angle + Flower Color, 1. 80 _Ss are expected to show perfect transfer as they theoretically solved on the angle during original learning. The learning rate for the 18. 20 remaining SS is . 034 fromFlower'Color ' Control. The expected number of non-solvers is taken as proportional to the number of SS who failed to solve in transfer from Flower Color Control. Since 6/20 = . 30, the expectationis that . 30 (18. 20) = 5.46 Es would not solve, wereas 10 non-solvers were observediin Angle + Flower Color's transfer to the Angle problem. The predicted cumulative proportion of _S_s making n or fewer errors was obtained by 1. using the cumulative geometric (Eq. 4.4),. p(n) = 1 -(1-.034)n+1, for the, 18. 20 _S_s who theoretically solved their original learning problem on the flower color, 2. adding to this, a- cumulative normal approximation to the bi- nomial distribution of errors for the 5.46 non-solvers, with mean: 32 and variance = 16, and 3. adding to this composite distribution, the 1. 80 _S_s who were expected to make no errors on transfer since they‘theoretically solved on the angle during original learning. The resulting predicted distribution is compared with the observed in Figure 6. 1(B). 88 The fit to the observed distribution fOr Angle + Flower Color on transfer was poor. . The maximum discrepancy was . 357, larger than the allowed . 320 and the prediction is rejected at the . 05 level. Some Further Questions on Emphasizer Effects In the predictions of the rate of learning for Angle + Angle Color and Angle + Angle Color Irrelevant, an assumption was made that the color served as a red emphasizer 'on half the trials and as a green empha— sizer on the other half. Since green had no emphasizer effect when it appeared alone (Chapter V), the assumption amounted to saying that emphasis occurred on only the red trials. Another possibility is to con- sider that red and green contrast from trial to trial has an emphasis effect. Stimulus change has been noted as a variable influencing the ._S_'s attention, so that color contrast as an emphasizer is not an unreasonable assumption (Berlyne, 1951). Suppose that the emphasis effect is equal to that of the red empha— sizer in Red Angle. For Angle + Angle Color Irrelevant, Eq. 6.5 now is c(Angle + Angle Color Irrelevant) = r.m(A)/[2r.m(A) + m(Ca) + m(I)]. (6. 8) The predicted c(Angle + Angle Color Irrelevant) is . 053, a value very close to the . 055 observed (-2 ln()\) = 0. 07,P > .05). Converting to mean errors, T = (1-.053)/(. 053) = 17. 87, which is not significantly different from the 14. 65 observed. For complete learn- ing, T = 17.48 for Angle + Angle Color Irrelevant when the maximum- likelihood observed estimate of .055 is used. This being the case, the prediction is quite close. If no emphasizer effect were assumed, Eq. 6. 5 becomes c(Angle + Angle Color Irrelevant) = m(A)/[2m(A) + m(Ca) + m(I)]. (6.9) 89 The predicted c(Angle + Angle Color Irrelevant) is . 027 significantly lower than the . 055 observed (-2.ln()\) = 7. 09,15 < . 05). Converting to mean errors, T = (l-.. 027)/(.027) = 36.04, which is rejected at the . 01 level (t = 7. 38). When no emphasizer effect is assumed, the prediction is for about twice as many errors than were observed. These analyses, given additivity of irrelevant strategies, show that an emphasizer effect was operating in Angle + Angle Color Irrelevant and suggest that the effect was stronger than would arise if red and green- trials had separate effects. Applying the same reasoning to Angle + Angle Color, when the red and green contrast is assumed equal to the constant red emphasizer effect, Eq. 6. 3 now becomes, [r.m(A) + mg] [2r.m(A) + 2m(Ca) + m(I)] ' c(Angle + Angle Color) = (6. 10) The predicted c(Angle + Angle Color) is . 217, very close to the . 225 observed (-2 ln().) = 0. 02), and not significantly different at the . 05 level. Predicted mean errors are T = (1-. 217)/(. 217) = 3.61, somewhat higher but not significantly different from the 3.45 observed (t = 0. 13, p > . 05). . This prediction is closer than the one where red and green were assumed to have separate effects. If no emphasizer effect is assumed, Eq. 6. 5 now becomes, c(Angle+ Angle Color) = [2m(A)[T(fm:Cr§(1€%JUT] . (6. 11) a Predicted c(Angle + Angle Color) is . 204, not significantly dif- ferent from the observed . 225(-2 ln().) = 0. 23, p > . 05). Predicted mean errors are T ‘= (1-. 204)/(. 204) = 3. 90 which is not significantly different from the 3.45 observed (t = 0. 38, p > . 05). The assumption of no emphasizer effect cannot be rejected in Angle + Angle Color. 90 In this chapter, the Strategy SelectiOn Theory was tested by pre- dictions of the rate of learning during original learning and the degree of transfer. Additivity of relevant strategies was accurately predicted for two redundant and relevant cue groups. . The prediCtions took account of emphasizer effects. 1 Additivity of irrelevant strategies was also pre- dicted accurately, and in this case, it was necessary to take account of the emphasizer effect to obtain an acceptable prediction. . The degree of transfer was predicted-in three ways: ‘ (1) number of Se showing perfect transfer, (2) mean errors in transfer, and' (3) cumulative distributions of error scores. All predictions were basedon parameters which had been estimated from original learning data andrindependent groups. , Predictions were accurate for seven of eight groups. CHAPTER VII DISCUSSION Lawrence (1952) found that it is more efficient to trainian §_ for n trials on an easy problem and then transfer him to a hard problem con- taining the same relevant dimension than to train from the beginning on the hard problem. The present experiments replicated Lawrence's. result, and suggest that it depends upon stimulus emphasis. If the relevant dimension is emphasized but not changed to make a problem easy, learning is facilitated and transfer is perfect. This produces an extreme form of the Lawrence effect. . In the "transfer on a continuum experiment, " the relevant dimension is emphasized and somewhat changed. Learning is facilitated and there is a small dis- ruptionin transfer. The net effect is to make the easy-to-hard program somewhat “more efficient than the hard program alone. If the problem is made easy by introducing a redundant relevant cue which does not emphasize the final test cue, learning is facilitated but transfer is very slight, and no net gain in efficiency is realized. The experimental results on emphasizer groups, difference in discriminanda (large Angle) group and the redundant-cue groups agreewith the earlier results on these three kinds of easy-to-hard experiments as reviewed in Chapter I. In the present study, color (red versus green) was more salient as a cue than the angle (300 versus 600) or the emphasized (red colored) angle dimension. A distinction should be drawn between the color's _ saliency as a one and its saliency as an emphasizer. The weight of a cue may be reduced by reducing the discriminable differences between the values of the dimension, but the saliency of the dimension as an 91 92 errrphasizer‘ may not be impaired. . For example, the color dimension might consist of a light red versus a dark red. The cue value of the color might be reduced since discrimination between the values is dif- ficult, but the color would still "stand out" on the pattern and preserve the emphasizer function. An extreme case of this is given in Red Angle, where there is no discriminable difference between the colors on the angles and an emphasizer effect was observed. The question arises as to whether the cues are hard to distinguish because they are (1) like the background (embedded) or (2) like one another (similar). An emphasizer, like color, would seem to have its maximum effect when the cue to be learned is embedded but not similar to other cues. A reconsideration of the present problem suggests that the flower patterns, .with fairly distinct but similar angles, were not optimal configurations for finding a large emphasizer effect with color. , The Angle problem was difficult because the two stimuli (angles) are physically similar and other irrelevant stimuli were present. However, the angles already stood out somewhat on the pattern and were separate from the other irrelevant dimensions, so that further emphasis was not of marked benefit. In Large Angle, where the difference between the discriminanda (angles) is doubled, the effect is more marked. Hull's (1920) embedded radicals, which were distinct from one another, apparently provided more opportunity for an emphasis effect of the red color. The above discussion suggests that solving the problem requires 1. attending to the cue and 2. given that. the cue is attended to, discriminating one cue from another. Let the probability of attending to the cue be P(A) and the prob- ability of discriminating, given that it is attended to, be P(D/A). Then the probability of discriminating is P(D) = P(A). P(D/A). 93 If the cue is not attended to, then it cannot be discriminated, so that P(D/K) = o. The emphasizer effect is on P(A). . In the case of Red Angle, the red- color did not change the difference between the angles, it, then, does not presumably affect P(D/A). The multiplier effect of the empha- sizer is on P(A) while P(D/A) remains constant . One other interpretation would be that the emphasizer adds a constant amount to the cue. This is, the angle would gain a constant amount due to the color. However, in the present study, the powerful color only increased m(A) from . 038 in Angle to . 077 in Red Angle, whereas the angle color had a measure, m(Ca), of .. 306 in Angle Color Control. Addition would give a prediction of a larger and wrong order of magnitude than observed, but a multiplying effect can give a value in line with the results. This result is taken as justification of the assumption that the measure of the set of strategies which arise from the emphasized dimension was multiplied by a constant larger than one. The addition of elements which are "more of the same" but "indistinguish- able" from existing elements in the psychological field is referredto by Restle (1961c) as a case of "homogeneous classes" of stimuli. This is what North (1959) apparently was referring to by calling the filled-in triangles "enrichment of cues. " The precise r61e of an emphasizer remains to be more fully investi- gated. - In the present study, each problem was constructed to demonstrate some kind of emphasizer effect. .In three groups, the role was dual; in Angle + Angle Color, color was an emphasizer and a relevant dimension; in‘Angle + Flower Color, color was a counter-emphasizer and a relevant dimension; and in Angle + Angle Color Irrelevant, color was an emphasizer and an irrelevant dimension. .In each case, the rOles were experimentally confounded. By use of the Strategy Selection Theory, estimation-of the 94 sets of strategies was made possible and predictions on the learning rate were accurate by assuming each role to be independent. . This assumption of independence may be open to question. . It is possible for the roles to interact; . In Angle + Angle Color, the r61es might compete. The _S_ might attend to color as a cue or to the colored angle as a cue. . In Angle + Flower Color, the rolesmight co-operate. Color was placed on a part of the pattern away from the angle (1. e. , color was a counter-emphasizer) and served as a cue. ._§'s attention might be diverted'away from the angle and he might be more likely to use color as the basis for a correct strategy. . In Angle + Angle Color Irrelevant, the roles might compete, as they lead to opposite effects on the rate of learning. Since the group learned somewhat faster than Angle, the emphasizer role apparently predominated. .. If the r61es are not independent, as assumed, then there remains the problem of finding a way to detect an interaction. Given the vari- ability of the data and estimates, the predictions were not precise enough to show an interaction effect. , If the color and angle cues were about equal in strength and problems constructed as in the present study, then a failure to make accurate predictions might indicate an interaction. To study the distribution of attention over the parts of the stimulus pattern in a concept formation task, the use of a constant emphasizer seems promising- The constant emphasizer avoids some of the confound- ing of the roles of the emphasizer but does not lead to an evaluation of the separate roles. .. Suppose a problem were constructed of suitable complexity and contained two spatially separated relevant cues. The cues might be about equal in strength. A transfer of training design, similar to that in the-present study, might be used to test effects on training. During original learning, one of the cues is emphasized by a constant and salient color. Transfer is to a problem with one of the cues and the color removed. If the color was on the cue which is removed, an 95 example of counter-emphasis with respect to the retainedrelevant cue is demonstrated. If the retained relevant cue on transfer was colored during original learning, an example of emphasis is demonstrated. Weights (of the sets of strategies arising from each cue, emphasized or not emphasized, could be obtained from suitable control groups and then the amount of transfer predicted. Transfer tests are used to estimate the direction of tvhe_S's attention to cues and the degree of emphasis or counter-emphasis. CHAPT ER VIII SUMMARY A transfer-of-training design known as "easy-to-hard" transfer was used to study: (1) the role of attention, and (2) efficiency in concept formation. . The degree of efficiency was hypothesized to depend .upon stimulus emphasis and relationships of the original learning (easy) and transfer (hard) problems. . The stimuli were complex flower patterns and the correct responses (two-choice) depended upon one or two aspects of the pattern.. The rele- vant dimension of the hard problem was the angle of the leaves to the stem of the flower. , Nine groups, of 20 ES each, worked on different original learning problems and were all transferred to the same hard problem after criterion in original learning. A tenth group had the hard problem as its original learning problem and served as a control. . All comparisons of acquisition and transfer were relative to this control group. In two problems emphasis of the relevant angle dimension was achieved by either (1) doubling the difference between discriminanda, or (2) removing irrelevant cues during original learning. Bothgroups were highly efficient: they learned rapidly and showed nearly perfect transfer. . Color on the angle of the leaves to the stem constituted an "emphasizer. " When a constant color was used on all trials, the effect was not strong; red had a detectable effect, but green did notfacilitate learning at all. When color varied from trial to trial in. a third problem, and was an irrelevant dimension, the net effect was slight facilitation of 96 97 learning. . In these three groups, color could not serve as the basis for a correct strategy and transfer was perfect. . Two problems had ‘color added as a redundant andrelevant dimension during original learning. Both problems were learned. faster than prob- lérrisw1th only" one dimension relevant, an example of "additivity of strategies. " ‘In one problem, the color was also an emphasizer and transfer to the hard problem was somewhat positive. . In a second problem, color was a counter-emphasizer, appearing over the flowers during original learning, and transfer to the hard problem was slightly negative. Two control problems had color relevant and the angle dimension fixed. , Color was found to be more salient as a cue than the angle. There was no evidence for transfer of an "observing response" to the angle in these groups. The stochastic properties of the data were consistent with the expectations of the Strategy Selection Theory. , Analyses of _S_s' performances before criterion indicated that errors occurred at random.with probability near one-half, constant and independent of how long S was in the pre- solution phase. {Fitted theoretical error distributions yielded good approximations in eleven of twelve cases. There was some evidence that _S_s use "wrong" as well as "irrelevant" strategies in the pre-solution . phase. Since wrong strategies depend upon the same cues as correct strategies, it was predicted that estimates of the measure of wrong strategies would be about the same as estimates of correct strategies. This quantitative prediction was verified. Wrong strategies were detected and their'measure correlated with the measure of correct strategies. Intercorrelations between practice, original learning and transfer prob- lems indicated no stable individual differences. By taking account of stimulus emphasis, and using the Strategy Selection Theory, the additivity of relevant strategies and additivity of irrelevant strategies were accurately predicted. The degree of transfer 98 was predicted in three ways: (1‘, number of _Ss showing perfect transfer, (2) mean errors in transfer, and (3) cumulative distributions of error scores in transfer. . All predictions were based on parameters which had been estimated from original learning data and independent groups. Seven of eight predictions on transfer were accurate. Efficiency in concept learning was discussed in relation to the present and other findings. The question of the precise role of a stimulus emphasizer was examined and further investigations on emphasizers suggested. REFERENCES Archer,. E. J. , Bourne, L- E. , and Brown, F. G. Concept identification as a function of irrelevant information and instructions. . J. exp. Psychol. , 1955, Q, 153-164. Baker, R.~A. and Osgood, S. W. Discrimination transfer along a pitch continuum. . J. exp. Psychol., 1954, 48,. 241-246. Berlyne, D.. E. Attention, perception and behavior theory. . Psychol. Rev., 1951, 5_8_, 137-146. Berlyne,. D. E.. Conflict, arousal and curiosity. New York: McGraw Hill, 1960. Blazek, N. C.’ and Harlow, H..F. Persistence of performance differences on discriminations of varying difficulty. J. comp. physiol. Psychol. , 1955, 48,. 86-89. Bourne; L. E. , Jr. and Restle,. F. . Mathematical theory of concept identification. Psychol. Rev., 1959, 66, 278-296. Bower, G- H. . Properties of the one-element model as applied to paired- associate learning. Tech. Rep. 31, Contr. Nonr 225 (17),. Inst. for Math. Stud. in the Soc. Sci. , Stanford Univ., 1960. Broadbent, D.. E. Perception and Communication. . London, New York: Pergamon Press. 1958. T Bush, R. R. and Mosteller, F. A.- A model for stimulus generalization and discrimination. Psychol.. Rev. , 1951,. _5_8_, 413-423. English, H. B. and English, A.. C. . A Comprehensive Dictionary of Psychological and Psychoanalytical Terms. . New York: Longmans, Green.. 1958. Eninger, M.. U. Habit summation in a selective learning problem. J. comp. physiol.. Psychol., 1952, 42, 511-516. Estes, W. K. .Learning. .In Ann. Rev. Psychol., 1956, 1, 1-38. 99 100 Estes, W- K. The statistical approach to learning theory. . In Koch, S. (Ed..), Psychology: A study of a science. Vol. 2. . New York: McGraw-Hill. 1959, 380-491. Feller, W. Introduction to probability theory and its applications. (lst ed.). . New York: Wiley. 1950. Gibson, E. J. A systematic application of the concepts of generalization and differentiation to verbal learning. Psychol.. Rev. , 1940, 41, 196-229. Guthrie, E. R. The psychologyof learning. New York: Harper, 1935. Hammer, M. The role of irrelevant stimuli in human discrimination learning. J. exp. Psychol., 1955, 50, 47-50. Hara, K. and Warren, J. M. Stimulus additivity and dominance in dis- crimination performance by cats. J. comp. physiol. Psychol. , 1961, _E_, 86-90. Harlow, H. F. Studies in discrimination learning by monkeys: HI. ' Factors influencing the facility of solution of discrimination problems by rhesus monkeys. . J. gen. Psychol. , 1945, 23., 216-227. Harlow, H.. F. . Learning set and error factor theory, in Koch, 8. (Ed.) Psychology: A study of a science. Vol. 2,. McGraw-Hill, New York. .1959, 492-537. Harlow, H. F. and Hicks, L- H. .Discriminationlearning theory: uniprocess vs. duoprocess. Psychol.. Rev., 1957, 64, 104-109. Hayes, K.. J. The backward curve: A method for the study of learning. Psychol. Rev. , 1953, 6__0, 269-275. Heidbreder, E. The attainment of concepts: II. The problem. 5 J. gen. Psychol. , 1946, 22, 191-223. House, B. J. and Zeaman, D. ,Transfer of a discrimination from objects to patterns. J. exp. Psychol., 1960, 5_9., 298-302. Hovland, C.) I. A set of flower designs for experiments in concept formation. Amer. J. Psychol., 1953, 66, 140-142. 101 Hughes, C.. L. and North, A. J. Effect of introducing a partial corre- lation between a critical one and a previously irrelevant cue. J. comp. physiol. Psychol. , 1959, 52, 126-128. Hull, C- L. Simple qualitative discrimination learning. Psychol.. Rev. 1950, 51, 303-313. ' Hull, C.. L. Quantitative aspects of the evolution of concepts. Psychol. Monogr., 1920, whole no. 123. James, W. The principles of psychology. Vol. 1, 1890. Dover Publi- cations, Inc. (paperback), 1950. Kemeny, J. G., Snell, J. L. and Thompson, G- L. Introduction to finite mathematics. Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1957. Kendler, T. S. Concept formation. . In Ann- Rev. Psychol. , 1960, _13 447-472. ‘ ‘ Krechevsky, I. Hypotheses in rats. Psychol. Rev., 1932, _32, 516-532. Krechevsky, I. A study of the continuity of the problem-solving process. Psychol. Rev., 1938, 45, 107-133. Kurtz, K. H. Discrimination of complex stimuli: the relationship of training and test stimuli in transfer of discrimination. J. exp. Psychol. , 1955, _S_(l, 283-292. , Lashley,. K. S. Brain mechanisms and intelligence. Un. Chicago Press: 1929. Lashley, K. S. f The mechanism of vision. . XV. . Preliminary studies of the rat's capacity for detail vision. . Ljen. Psychol. , 1938, i8; 123-193. Lashley,. K.. S. and Wade, M. The Pavlovian theory of generalization. .Psychol. Rev., 1946, 53, 72-87. Lawrence, D.. H. .Acquired distinctiveness of cues: 1. Transfer between discriminations on the basis of familiarity with the stimulus. .J. exp. Psychol., 1949, 39, 770-784. Lawrence, D- H. .Acquired distinctiveness of cues: II. . Selective associ- ation in a constant stimulus situation. J. exp. Psychol. , 1950, _42, 175-188. V ' 102 Lawrence, D. H. The application of generalization gradients to the transfer of a discrimination. . J. gen. Psychol. , 1955, 22, 37-48. Lawrence, D. H. The transfer of a discrimination along a continuum. J. comp. physiol. Psychol., 1952, 45, 511-516. LaBerge, D.. L. and Smith, A. . Selective sampling in discrimination learning. J. exp. Psychol., 1957, 54, 423-430. Moon, L- E. and Harlow, H. F. Analysis of oddity learning by rhesus monkeys. J. comp. physiol. Psychol., 1955, 4_8_, 188-194. Murdock, J. The distinctiveness of stimuli. Psychol. Rev., 1960, 67, 16-31. North, A. J. Acquired distinctiveness of form stimuli. J. comp. physiol.. Psychol., 1959, 52, 339-341. Prokasy, W. F. , Jr. The acquisition of observing responses in the absence of differential external reinforcement. J. comp. physiol. .Psychol., 1956, 22, 131-134. Restle, F. Additivity of cues and transfer in discrimination of consonant clusters. J. exp. Psychol. , 1959, _51, 9-14. Restle, F. Discrimination of cues in mazes: A resolution of the "place- vs.-response" question. Psychol. Rev. ,. 1957, _6_4', 217-228. Restle, F. 1 Psychology of judgment and choice. New York: Wiley, 1961c. Restle,. F. The selection of strategies in cue learning. Psychol. Rev. 1961a (in press). Restle,. F. Statistical methods for a theory of cue learning. Pflrchometrika, 1961b, _2_6_, 291-306. Restle,. F. A theory of discriminationlearning. . Psychol. Rev. , 1955, 62, 11-19. Restle, F. Toward a quantitative description of learning set data. Psychol. Rev. 1958, _62, 77-91. Schoeffler, M. S. 1 Probability of response to compounds of discrimin- able stimuli. J. exp. Psychol., 1954, 48, 323-329. 103 Spence, K. W. The differential response in animals to stimuli varying within a single dimension. Psychol. Rev., 1937, :12, 430-444. Spence, K. W. Continuous versus non-continuous interpretations of discrimination learning. Psychol.. Rev. , 1940, _41, 271-288. Thorndike, E.. L. The psychology of learning. . New York: Teachers College. 113914. Trabasso, T. R. Additivity of cues in the discriminationlearning of letter patterns. rJ. exp. Psychol., 1960, 62, 83-88. Walker, H. M. and Lev, J. Statistical inference. . New York: Henry Holt. 1953. Warren, J. M. Additivity of cues in visual pattern discriminations by monkeys. -J. comp. physiol. "Psychol., 1953, 46, 484-486. Warren, J- M. . Perceptual dominance in discrimination learning by monkeys. J. comh physiol. Psychol., 1954, 17, 290-292. Woodworth, R. S. and Schlosberg, H. Experimental Psychology, New York: Henry Holt. 1954. Wyckoff, L. B. , Jr. The role of observing responses in discrimination learning. Part I. Psychol. Rev., 1952, 52, 431-441. v.7. H....H .uo' IIIIIIIIIIII 11111111191311111111111111)le