HTHS AN EXPERIMENTAL STUDY OF THE EFFECT OF DIFFERENTIAL NON'REINFORCEMENT OF 'E‘HE INCORRECT RESPONSE ON ‘1"HE LEARNING OF THE CORREC RESPONSE IN SiMPLE T'MAZE LEARMNG Thesis for the Degree of M. A. MICHIGAN STATE COLLEGE Morton D. Dunham E949 1115518 This is to cortifg that the thesis cntitlml "An Experimental Study of the Effect of Differential Non- Reinforcement of the Incorrect ReSponse on the Learning of the Correct RfifiEQBfi9d§9;51mple T-maze Learning". Morton D. Dunham has been am.‘eptc_-(l tmvzmls fullillnwnt (ll ”10‘ I'qullI‘EIHC‘lllS lHI' maStBLQLAm (lg-grin? in W , 2 (- Mé/ur lirulcswr (7 [Iain M ’21: /_71/7 - --“_lu - .n n_.'h. 'me 1-. --h- n -I--..‘Ial ubi- -_.|-;. _-—-' n.- - “A.” g.--“ .-—.-A A‘ n- I AN EXPERIMENTAL STUDY OF THE EFFECT OF DIFFERENTIAL NON-REINFORCEMENT_OF THE INCORRECT RESPONSE 'ON THE LEARNING OF THE CORRECT RESPONSE I IN S 1121le T-MAZE LEARN INC by NORTON D. DUNHAM A.THESIS Submitted to the School of Graduate Studies of Michigan State College of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of MASTER OF ARTS Department of Psychology 1949 IEESlS ACKNOWLEDGEMENT Grateful acknowledgement is made to Dr. E) Bay Denny for his advice and assistance throughout the course of this research. (3533an :2 e, me a 9mi- TABLE OF CONTENTS LIST OF TALXBLES O O O O O O O O O O O O O O O 0 L18 T OF FIWPES . O O O O O O O O O O O O O O DITROWCTION O O O O O O O O O O O O O O O O 0 EXPERIMENTAL PROCEDURE AND TECHNIQUE A. Apparatus . . . . . . . . . B. Subjects . . . . . . . . C. Preliminary Training . . . D. Method of the Experiment . RESULTS AND DISCIISS ION O O O O O O O O O O O O A. Learning Measures . . . . . . . . . . B. Comparison of learning for the three groups based on the first two trials measure. C. Comparison of learning for the three groups based on the initial trial measure . . D. Comparison of learning for the three groups with regard to strength of initial prefer- ence O O O O O O O O O O O O O C O O O E. Comparison of learning for the three sub- groups with a strong position preference as based on the first two trials measure . F. Comparison of learning for the three sub- groups with a strong position preference as based on the initial trial measure. G. Comparison of learning for the three sub- groups with a weak position preference as based on the first two trials measure H. Comparison of learning for the three sub- groups with a weak position preference based on the initial trial measure . . ADDITIONAL THEORETICAL CONSIDERATIONS . . . . SUMMARY AND CONCEUSION . . . . . . . . . . . . BIBLIOGRAPHY . . . . . .-. . . . . . . . . . . APPENDIX A - Original Data . . . . . . . . . . APPENDIX B - Comparisons . . . . . . . . . . ii iii \‘lx'lI—‘d 10 ll 15 15 15 18 21 24 25 28 31 34 38 41 46 TABLE I. II. III. VI. VII . VIII. Comparison of the 1-1 and X-4 groups and the X-l and K—4 groups in terms of the mean number of correct responses on the first LIST OF TABLES two trials per day for days two to nine . Comparison of the X-l and X-4 groups and the X-l and K-4 groups in terms of the mean number of correct responses on the initial trial per day for days two to nine . Comparison of the X-l and X-4 sub-groups and the X-l and K-4 sub-groups with a strong position preference in terms of the mean number of correct responses on the first two trials per day for days two to nine . Comparison of the X-l and X-4 sub-groups and the X-l and K-4 sub-groups with a strong position preference in terms of the mean number of correct responses on the initial trial per day for days two to nine . Comparison of the'X-l and X-4 sub-groups and the X-l and K-4 sub-groups with a weak posi- tion preference in terms of the mean.num— ber of correct responses for the first two trials per day for days two to nine . Comparison of the 1-1 and X-4 sub-groups and the X-l and K-4 sub-groups with a weak posi- tion preference in terms of the mean numr ber of correct responses on.the initial trial per day for days two to nine Record of original responses for the'X-l group . Record of original responses for the‘Xe4 group . Record of original responses for the K-4 group . O O 0 iii 17 20 23 27 30 35 43 44 45 TABLE X. XI. XII. XIII. XIV. Comparison of groups in terms of the per- centage of correct responses on the first two tr 18.18 0 O O O O O 0 O O 0 I O O O O 0 Comparison of groups in terms of the per- centage of correct responses on the initial trial 0 O O O O O O O O O O O O O O O O 0 Comparison of Sib-groups with strong posi- tion preference in terms of the percentage of the correct responses on the first two trials . . . . . . . . . . . . . . . . . . Comparison of sub-groups with strong posi- tion preference in.terms of the percentage of correct responses on the initial trial Comparison of sub-groups with.weak posi- tion preference in terms of the percentage of correct responses on.tre first two tr 18.18 0 O O O O I O O O O 0 O O O O O O 0 Comparison of sub-groups with weak posi- tion preference in.terms of the percentage of correct responses on the initial trial iv 46 47 48 49 50 51 l. 2. 3. 5. 6. 8. LIST OF FIGURES ‘ FIGURE Plan drawing of the T-Maze . . . . . . . . Photograph of the TéMaze . . . . . . . . . Learning curves for the three groups based on the percentage of correct responses for the first two trials measure per day . . . Learning curves for the three groups based on the percentage of correct responses for the initial trial per day . . . . . . . . Learning curves for the three sub-groups with strong position preference based on the percentage of correct responses for the first two trials per day . . . . . . Learning curves for the three sub-groups with strong position preference based on the percentage of correct responses for the initial trial per day . . . . . . . . Learning curves for the three sub-groups with a weak position preference based on the percentage of correct responses for the first two trials per day . . . . . . . Learning curves for the three sub-groups with a weak position preference based on the percentage of correct responses for the initial trial per day . . . . . . . . 16 19 22 26 29 32 AN EXPERIMENTAI.STUDY OF THE EFFECT OF DIFFERENTIAL NON-REINFORCEMENT OF THE INCORRECT RESPONSE ON THE LEARNING OF THE CORRECT RESPONSE IN SIMPLE T-MAZE LEARNING INTRODUCTION Extensive research has been carried out in regard to the concept of reward or reinforcement1 in learning, and for many psychologists the strength of a habit is primarily a function of the number of reinforcements. Little is known, on the other hand, and little study has been made about the effect of non-reinforcementz or the non-reward of the incorrect response upon the learning of the correct response. In recent analyses of discrimination learning and trial and error learning, Spence (4) and Hull (2) have found it necessary to incorporate the concept of non- reinfbrcement, assigning certain decremental or inhibitory properties to non-reinforcement. HOwever, there exists 1Reinforcement refers to the strengthening of a stimulus-response relationship by fulfillment of a need or an expectancy; i.e. food reward for a hunger drive. 2Non-reinforcement or non-reward is the opposite of reinforcement and refers to the non-fulfillment of a need or an expectancy; i.e. the absence of food when hungry and expecting food. very little experimental evidence to support the hypothesis of a cumulative, more or less permanent inhibitory effect from.non-reinforcement of the incorrect response, even though it is well established that continuous non-reward of a previously reinforced response will be followed by the extinction of that response. In fact, separate studies by Spence (5) and Denny (l) which were only indirectly concerned with the role of non-reinforcement of the incor- rect response upon the learning of the correct response, lead the above experimenters to question whether the non- reward of the incorrect response facilitates to any degree the learning of the correct response. Spence (4), in the theoretical discussion of dis- crimination learning mentioned above, proposes that re- warding states of affairs result in the incremental strength- ening of a response, and that non-reward results in the decremental weakening of a response. Each non-reinforce- ment leads to a decrement in the tendency of the reaction which just precedes the non-reinforcement. Spence (6) has used this same line of hypothesizing to explain trans- positionbphenomena in stimulus-response terms as opposed to a Gestalt patterning analysis. In his analysis it is ”Transposition refers to the tendency to respond to the relation between two stimuli rather than.to either one of the absolute stimuli. Thus an animal which has been trained to select the brighter of two lights will often so- lect the dimmer one (transpose) if it is presented together ‘with a still dimmer light. The same phenomenon has been observed in regard to~patterns, colors, and tones. necessary to postulate both a stimulus generalization gradient4 of excitation (reward) and a gradient of in- hibition (non-reward) in order to give an adequate behavior- 1stic explanation of transposition. Moreover, it is necessary that the inhibition of the wrong response be above and beyond any extinction effect suffered by both the correct and incorrect response through response alone; _that is, through.the inhibitory effect of reactive inhibi- tion.5 In a similar discussion of trial and error learning, Hull (2) prOposes that non-reinforcement results in the experimental extinction of incorrect responses. In his theoretical analysis, Hull hypothesizes about the changes that would occur in behavior under conditions in which three mutually incompatible responses are all elicited by the same stimulus, and only one of these responses - the 'weakest one - is the correct response. According to Hull's analysis the incorrect responses would gradually undergo extinction because of non-reinforcement, and the correct response would gradually be strengthened because of rein- forcement. These reaction tendencies would canpete and 4Stimulus generalization.gradient refers to the generalization of response tendencies to similar stimuli which decreases proportionately with the increase in the difference between the original stimulus and the new stimulus. 5Reactive inhibition (IR) is the drive state pro- duced by a response which tends to inhibit the repetition of that response. oscillate until the correct response by incremental rein- forcement (strengthening) and the incorrect responses by decremental non-reinfor cement (weakening) would no longer compete and the stronger correct response would always be elicited by the stimulus whenever it was presented. Practically no experimental evidence exists in sup- port of the above non-reinforcement hypotheses. In the learning of discrimination problems by chimpanzees, Spence (5) found that differential non~reinforcement of the incor- rect responses had no effect on subsequent response strength of the correct response, unless these wrong responses had previously been reinforced. Results similar to Spence's were obtained by Denny (l) in a partial reinforcement learning situation. Using a simple 'T-maze he found no differences in the learning of the following two groups: one group received 4 non-rein- forcements to the incorrect side and 2 reinforcements to the correct side, the other group received 2 non-reinforced trials and also 2 reinforced trials to the correct side. However, Denny suggests that secondary reinforcing cues6 in the delay boxes which were present just prior to the oc- curence of the non-reinforcement may have offset the effect of the subsequent non-reinforcement. 6Secondary reinforcing cues are those stimuli in a learning situation which have acquired a reward or sub- goal value by being associated with reinforcing state of affairs. At least one study with humans lends some support to the hypothesis that non-reinforcement of the incorrect response may aid in the learning of the correct response. Holsopple and Venouse (3) conducted an experiment with eleven students of typing who were making four automatic and habitual errors in the spelling of words which outside of transcription they knew how to spell accurately. The students were given practice in which two of the words were constantly misspelled exactly as they had misspelled them in transcription, and in which two of the words were constantly practiced correctly. After equal amounts of practice the students were given dictation in which the four words appeared at least four times each. 0n the test no student made an error in spelling a word which he had practiced incorrectly while ten of the eleven students mde errors on words practiced correctly. Although there is little experimental evidence to support the hypotheses of Spence and Hull, it is assumed on the basis of their analyses that non-reinforcement of the incorrect response in a learning situation may operate to aid the learning of the correct response by weakening the incorrect response and thereby increasing the relative strength of the correct response. If experimental condi- tions could be arranged in two groups of subjects so that there were equal reinforcement of the correct response and markedly unequal non—reinforcement of the incorrect response, it would be possible to test the above assumption. u, The present study primarily attempts to do this, that is to compare learning under conditions of a differ- ential amount of non-reinforcement of the incorrect response, and an equal amount of reinforcement of the correct response: and secondly, to compare the effects of differential non- reinforcement under conditions (1) where secondary rein- forcing cues precede the non-reward end-box, and (2) where secondary reinforcing cues are eliminated as much as pos- sible. The control of the secondary reinforcing cue aspect of the experiment was introduced because it was felt that perhaps the inconclusive or negative results obtained by other investigators were due to the camouflaging effects of preceding secondary reinforcement. The complete hypothesis under investigation is that under conditions of controlled or minimized secondary rein- forcement a definite difference in learning in favor of the greater non-reinforcement group will be found between the two groups receiving unequal amounts of non-reinforcement of the incorrect response; and, conversely, that under conditions of uncontrolled secondary reinforcement no sig- nificant difference in learning will be found between the groups receiving unequal amounts of non-reinforcement of the incorrect response. EXPERIVENTAL PROCEDURE AND TECHfiIQUE A. Apparatus. The apparatus used was a single choice-point T-maze. The plan is shown in Fig. l, and a photograph is shown in Fig. 2. The maze consisted of a starting box, a combination stem and constant choice- point, three interchangeable delay boxes, and two end- boxes or goal boxes. The apparatus was moveable and similar units were interchangeable. The sides and bottoms of all units were constructed of 3/h inch, h-ply veneering. The roof of the starting box was made of l/h inch veneering, that of the stem of glazed screen, the roof of the choice- point of translucent glass, those of the delay boxes of painted window-screen, and the roofs of the goal boxes was made of l/h inch hardware cloth. The translucent glass arrangement at the choice-point allowed the E to follow the path of the S through the choice-point without allowing the S to receive visual cues from the external environment. The roofs of the delay boxes of window- screen were painted so that the mesh was nearly entirely ’1 covered thus preventing a from seeing out. Wooden doors constructed of l/h inch pine were placed at the exit of the starting box and at each end of the delay boxes. The door at the choice-point was T-shaped and was designed to prevent the S from retracing his path once a choice had been made (See figures 1 and 2). The sides of the units were slotted allowing the doors to slide perpendicularly, and the doors were Operated by a GB :9 DB .5. (p l en» 5 __JL_. SB Figure 1 SB - starting box S - stem CP'- choice-point DB - delay box GB C D CPD L1, 1 ' *1 goal box curtain door cho ice-point door 10 system of strings, pulleys, and couhterweights suspend- ed from the ceiling. The goal boxes were constructed differentially in shape, size, color, and texture. The positive goal box was trapezoidal in shape, white in color, and was floored with thin gauge tin. The negative goal box was square, black, and floored with 1/4 inch hardware cloth. Two of the delay boxes were painted grey, while the third was painted black and was floored with 1/4 inch hardware cloth to correspond to the negative goal box. Immediately in front of the entrances to the delay boxes a black curtain was suspended to prevent the S from re- ceiving cues from the delay boxes while at the choice point intersection. The starting box, stem and choice-point were painted grey. Illumination was furnished by a 40 watt goose-necked desk lamp placed immediately above the stem so that it illuminated the interior of the stem.and the choice point. A mirror suspended at an angle over the choice- point allowed the E to follow the path of the S through the choiceepoint while E controlled the system of strings, weights and pulleys operating the doors from his position at the starting box. B. Subjects. The subjects were albino rats from 11 the rat colony of the department of psychology of Michi- gan State College. The ages of the animals at the beginning of training varied from 130 to 150 days. A total of 62 animals were used of which 17 were males and 45 females. C. Preliminary Train1_gg' . All animals were placed on a strict food regimen one week prior to the experiment in which they were fed 8 grams of Purina Dog Chow per day at the same hour of the day as they were to run in the ex- periment. During this period the animals were handled to reduce emot ionality. Two days prior to the day the learning series was to begin each group was run in a straight alley maze consisting of the starting box, one of the grey delay boxes, and one of the two goal boxes. These preliminary trials consisted of :5 experiences to the white goal box with food and 2 trials to the black goal box with no food for each of the two days. Thus the preliminary training consisted of a total of ten trials or which 6 were reward- ed and 4 were non-rewarded. D. Method 9; the Experiment. Following the pre- liminary training the animals were placed at random in one or three groups of either the experimental or control con- d it ions . Under the experimental conditions there were two groups of animals termed X-l and X-4. The X—l group con- sisted of 23 animals and the X-4 group consisted of 19 animals. They were run under conditions designed to con- 12 trol or elhninate secondary reinforcement of the incorrect response as much.as possible. This was accomplished by using the black delay box followed by the black goal box, and the grey delay box followed by the white positive goal box. The animals in the 1-1 group received a total of 3 trials per day, of which 1 was non-reinforced and 2 were reinforced. The animals in the X-4 group received a total of 6 trials per day, of which 4 were non-reinforced and 2 were reinforced. Thus each group of animals under the experimental conditions received an.equal number of reinforced trials, and an unequal number (4:1 ratio) of non-reinforced trials per day. The control group, designated'K-4, consisted of 20 animals. They were run.under conditions designed to pro- duce secondary reinforcement in both delay boxes. This was accomplished by using the grey delay boxes inter- changeably on both.sides of the choice-point on both the reinforced and non-reinforced trials. It was asswmed that both grey delay boxes which were used in both the prelimin- ary training and in the training series in conjunction with the white positive goal box would acquire secondary rein- forcing properties by being associated with the white box and through stimulus generalization. Thus on the non- reinforced trials in which the S was delayed in the grey delay box prior to entry in the negative black goal box, secondary reinforcement could operate to reinforce the 13 wrong response, and thus slow down the learning of the cor- rect response. The details of the daily experimental routine were as follows. On the first trial of the first day of the regular learning series each S was given a free-choice trial which was always followed by entry into the negative goal box and was thus non-reinforced. This response de- termined the preference and each S was trained to the side opposite this first free-choice. The X-4 and K-4 groups were given free choices on the first two trials. On the third trial if a S had not completed a correct response it was forced to the correct side by blocking off the wrong alley at the choice point. After the third trial each S was given free choices until it had completed either 4 non-reinforced (NR) trials or 2 reinforced (R) trials. Any remaining trials were forced. In the event that an animal had made a correct response on one of the first two free-choice trials, it was continued on free trials until the completion of either 4 NR or 2 R trials and then it was forced. When the first two trials were correct responses an animal was, of course, forced to the incorrect side on the remaining 4 trials. The 1-1 group was also given 2 free-choice trials each day. In this group, for all animals that went wrong on the first trial, the forcing technique was modified for 14 the second trial. This was necessary in order to have only one NR trial and still have a measure using the first two free trials of each day. Instead of inserting the forc- ing block at the choice point, the door to the delay box on.the incorrect side was closed and S was allowed to cor- rect a partially wrong response and retrace its path to the correct side. Partially corrected responses were, of course, recorded as incorrect. All other forced responses were handled in.the same manner as in the X-4 and X-l groups. The criterion for having.made a left or right response was the rat's touching the curtain with its nose. All animals were delayed for 15 seconds in the delay boxes, and for 30 seconds in the negative goal box. If an animal refused to enter the goal box on non-reinforced trials it was removed from.the delay box 60 sec. after the end-box door was opened. An animal which refused to leave the choice-point in 60 seconds was removed. Records were kept of such incomplete responses. Food reward consisted of one mediumrsized pellet of Dickinson Dog Food, about 0.35 grams. At the end of a day's run the animals were fed 8 grams of Purina Dog Chow in individual cages before return to the home cage. The animals were picked at random.from the home cage so that they were not ran in the same order on suc- ceeding days of experimentation. 15 All animals were trained for a period of 9 days. RESULTS AND DISCUSSION A. Learning_Measures. The measures of learning used were the mean numbers of correct responses on the initial trial and on the first two trials of each day. The initial trial measure was used because it provides for equalization of the number of previous food reinforcements, and because initial trials are not affected by the tendency towards spontaneous alternation. The mean number of cor- rect responses on the first two trials of each day is con- sidered to be a.more stable measure because it provides for twice as many responses as the initial trial measure. B. Comparison inlearning_fpr the Ehggg groups Qgggglgg_§hg'fig§£|§w2.trials measure., The results for the three groups based on the percentage of correct responses for the first two trials per day are shown in Fig. 3 and Table I. From an examination of the learning curves for the two experimental groups, X-4 and X-l, it will be seen that the‘X-4 group learns considerably faster than the X-l group. On the last day of training the X-4 group is re- spending at a performance level of 87% correct, whereas the X-l group has only attained a level of 62% correct responses. From Table I it is apparent that the overall learning scores for the two groups, based on days 2 to 16 Figure 3 - Learning curves for the three groups based on the percentageof correct responses for the first two trials per day. """"" 8" — 60- Per(¢n( 6,0iccl keS/ol-scs '5 I 4 Day: Comparison of the X-l and Xe4 groups and the X-l and Group 2P1 XF4 XFl K-4 TABLE I K-4 groups in terms of the mean number of rect responses on the first two trials N 23 19 2b 20 per day for days two to nine. Mean 8.26 11.47 8.26 10.50 0' 4.26 5.47 4.26 2.93 03 .91 .82 .91 .67 Diff 6.21 2.04 2.66 1.81 17 cor- 001 " .05 "' .02 .10 18 ’97, are significantly different. The mean difference of 3.21 correct responses for the first two trials gives a E of 2.66 which is significant between the one and two per- cent levels of confidence. A comparison of the X-l group and the K-4 group shows that the learning curves are not as widely separated as in the X-l and X-4 groups. How- ever, the mean difference for these two groups for days 2 to 9 is 2.04 correct responses giving a t of 1.81 which.is significant between the five and ten percent levels of confidence. These findings largely agree with the expectations as set forth in the introduction. As predicted the difference between the Xe4 and IE1 groups is significant. It was also predicted that the difference between the K-4 and X-l groups would not be significant, and while the results based on the first two free trials do not clearly support this hypothesis, the findings ShOW'a smaller difference between the latter two groups. C. Comparisgg,g£,learning §g£_§hg,§h£gg'groups bgggg'gg,§hg_initial pgggl'measure. The results for the three groups based on the percentage of correct responses for the initial trial per day are shown in Fig. 4 and Table II. These results are in close agreement with the results using the first two trials measure. From.Tab1e II 7The data for the first day are excluded because the variable of differential non-reinforcement did not operate until after the first day of experimentation. .— ' Perccnl (jarred! Keys/900595 19 Figure A - Learning curves for the three groups based """ on the percentage of correct responses for the initial trial per day. 100- 90r— 70-— 9* g) I 01 o I t I o. o I ar— jr') —- 4 5 29qys TABLE II 20 Comparison.of the X-l and X-4 groups and the X-l and D-4 groups in terms of the.mean number of cor- rect responses on the initial trial per day for days two to nine. Group X-l X-4 Xel K-4 N as .19 25 20 Mean 5.70 5.68 5.70 5.05 Cr 5.65 2.21 5.65 1.85 (FR .77 .52 .77 .42 Diff t 1.98 2.15 1.55 1.55 .20 21 it will be seen that the difference between the X-4 and X-l groups is significant between the two and five percent levels of confidence, while the difference between the K-4 and X-l groups is signficant only at the twenty per- cent level of confidence. These findings support the theoretical expectations as given in the introduction. An examination of the learning curves for the X-l group in Figures 5 and 4 reveals a rapid rise in perfor- mance on the second day followed by a sudden fall on days 5 and 4 for the two trials measure and a fall on days 5, 4, and 5 for the initial trial measure. The higher score on the second day in the Xel group may possibly be explained by the fact that all 83 in this group are consistently reinforced to the correct side on the last two trials of the first day, whereas most of the $3 of the X-4 and K-4 groups received secondary reinforcement of the wrong re- ‘sponse by way of the delay box for the last two trials. The decrease in performance in group X-l may possibly be ex- plained by the fact that any rewarding property originally possessed by the negative delay box extinguishes much.more slowly because there is only one non-reinforcement to that side per day. D. Comparison 2;,learnigg g: the three groups with regard 33 strength g§,initial pgeference. It will be re- called that on the first trial of the first day of the regular training series each S was given a free-choice trial 10’? Peace»! Cori-cc! kc: ponies 22 Figure 5 - Learning curves for the three subgroups with S 8 8 3 U! Q 20+ a strong position preference as based on the percentage of correct responses for the first two trials per day. x\ / '\‘ / \‘y 0/. 'l ./ ,r- ~-°-\ 2" /. .\. ./ \ /./ ’0 ' \ . / / / '11. X" // ./ x/ A , ,0“ ./ / x/ f .I //°—- - —o\ I? i ,/ \\ / .I / \ // 'I P- — _ 4 \\ / ll, ' t/ .l I X‘4'————. II .” K-4K--------X !’ -x.:o—--——-—-o ,1/ v F 1 1 4L 1 l l L, l 2 3 6' i I 9 25 TABLE III Comparison of'the X21 and X-4 subgroups and the X51 and K-4 subgroups with a strong position pref- erence in terms of the mean number of ' correct responses on the first two trials per day for days 2 to 9. Sub- group N Mean CV CE“ Diff t P X-l 12 5.75 5.59 1.14 r x—4 10 11.30 2.86 .95 5°57 6'74 < ~01 K-4 10 9.20 2.89 .96 5'47 3°35 '03 ‘ '05 2h in order to establish a position preference, and that each S was trained Opposite to this preference. Inas- much as each S was also given a free-choice on the second trial it was possible to estimate the relative strength of the preference. Those as going left-left or right-right on the first two trials were designated as having a strong position preference, and those 58 going left-right or right-left on the first two trials were designated as having a weak position preference. In this manner each of the three groups, K-h, X-l, and h-h, was divided into two sub-groups of strong or weak preference, and a com- parison of learning for each of the sub-groups with a strong or weak preference was made. These comparisons follow. Pl . Comparison 9: learning for the three sub-groups g with g strong position preference gg based 9Q the first two trial measures. The results for the three sub-groups with strong position preference based on the percentage of correct responses for the first two trials per day are shown in Fig. 5 and Table III. An examination of the learning curves in Fig. 9 reveals that the K-h strong preference subgroup learns considerably faster than the X-l group with strong preference. The K-4 group on the ninth day has attained a level of 90% correct resnonse, rhereas the X-l group has only reached a level of 55% correct response on the ninth day. The h-h group falls between the two reaching a level of 70$ correct response. It may be 25 seen in Table III that the difference between the X-4 and iX-l group is significant at less than one percent level of confidence, and that the difference between the X51 and K-4 group is significant between the two and five percent levels of confidence. These findings fully support the original hypothesis that a significant difference would be found between the X-l and X-4 groups. The hypothesis that no significant difference would be found between the X-l and K-4 groups is not borne out, although, as in the come parison between the complete groups, a smaller difference is found between the latter two groups. It should also be noted, that the decrement in performance of ther-l group on days 3 and 4 as observed in Fig. 5 does not occur in the sub-group composed of Ss with.a strong position preference. As will be shown later, the decrement comes from the Ss with a weak position preference. F. Comparison 9;,learning fpg_§gg EEEEE subegroup§_ Egégg.measures. The learning curves for the strong prefer- ence sub-groups based on the initial trial measure do not differ appreciably from those based on the first-two trial measures. The results are shown in Fig. 6 and Table IV. The difference between the Xel ande-4 sub-groups is again significant beyond the one percent level of confidence, and the difference between the X-l and K-4 sub-groups is also N O\ \- Figure 6 - Learning curves for the three subgroups with a strong position preference as based on the per- centage of correct responses for the initial trial per day. woL— ,f /../ v” 90—- ./ f I I, / .l / I 80 — j j a S 3 3 grand conflac g F: spears: U.) Q 20 [0 27 TABLE IV Comparison of the X—l and X-4 subgroups and the X-l and K-4 subgroups with a strong position preference in terms of the mean number of correct respon- ses on the initial trial per day for days two to nine. Sub- group N Mean 6 07., Diff t P ‘X-l 12 2.27 2.26 .72 r . x-4 10 5.70 1.75 .58 3'43 6'75 < '01 X-l 12 2.27 2026 e72 K-4 10 4.50 1.85 .62 2'04 2°14 '03 ' '05 28 significant between the two and five percent levels of confidence. One of the most obvious and significant findings in this preference analysis is that the X-l sub-group with the strong position preference to the wrong side showed, over a period of nine days little if any learning of the correct response after the second day. Contrariwise the‘X-4 strong position preference sub-group shows a great deal of learning, even more, as will be shown later, than the xe4 weak position preference group. This seems to in- dicate that in order for a strong, incorrect habit to be overcome by a weak, correct habit within a limited number of trials it is necessary that there be available a certain minimum number of trials of the wrong habit for the ex- tinction of this response. G. Comparison g£,learning £93,3ggkggggg_sub-groups £13.11 _a_ M position preference 3; M 33 £113 £1333 1:173 gaggl.measure. The results for the three sub-groups with weak position preference based on the percentage of correct responses on the first two trials are shown in Fig. 7 and Table‘v. These learning curves show a much different pattern than has been revealed heretofore. The most outstanding characteristic of these curves is their variability and fluctuation. In all three sub-groups the learning fluctuates widely from day to day, learning is not appreciable, and anooth clear-cut learning curves are not found. Also to be noted again is the large decrement in 29 Figure 7 - Learning curves for the three subgroups with a weak position preference as based on the percentage of correct responses for the first two trials per day. .100 t a 8 a 2.016th cor/'4‘! (II/00.3.1 8 10 X-4 .7 c M X---—-—~---x x40—————£ 1 1 1 1 l 1 1 1 Z 3 d O 1 5' flaye TABLE V 50 Comparison of the X-l and‘X-4 subgroups and the‘X-l Sub- group 131 X-4 ‘X-l K-4 and K-4 subgroups with a weak position prefer- ence in terms of the mean number of cor- 11 11 10 Mean 10.58 11.67 10.58 11.40 rect responses for the first two trials per day for days two to nine. 0’ 0:. Diff 6.41 1.05 6.96 1.40 1'09 5.41 1.05 2.11 .70 '83 e63 e50 " e60 066 .50 - 060 31 performance in the X-l sub-group. A slight decrement is also observed in the X-h and K-h sub-groups when weak position preferences are present. Table V shows that no significant difference obtains between the K—1 and X—h sub-groups and between the X-l and K-h sub-groups; the pg are between the fifty and sixty percent levels of confidence for both comparisons. These findings clearly indicate that the degree of position preference plays a significant role in determining the course of learninI (3 under the present experimental conditions. H. Comparison pf learning for the three sub- groups with g weak position preference gg based 2n the initial trial measure. Figure 8 and Table VI reveal sub- stantially the same findings for the weak preference sub— groups when the initial trial measure is employed except that the X-h group shows little day to day fluctuation. There are no significant differences between the X-l and X-h sub-groups and between the X—l and K-h sub-groups. The inconsistency in performance of the weak position pre- ference sub-groups and the fact that the overall perfor- mance and final level of performance in the X-h and K-h groups is no better than in the group with the strong pre- ference to the wrong side poses the question as to what is the relationship between speed of learning and final level of performance and the strength of the initial response 100 '— PCPCGJ! Corr-cal fat/Douro; 32 Figure 8 - Learning curves for the three subgroups with a weak position preference as based on the per- centage of correct responses for the initial trial per day. $70- m C) I 3 I ~ ‘17 r—- \ I. / \\ i / \‘ I ’0’ 5‘0 F'— \\ I. / . / \I / EP- - J 4/7 .- 30 >— K. ....... 20k. },‘, 8‘*”' " [rt/c X‘JG-~—-——O I f 10» I] ’1 “'es Uri—- p— “r v»— Q.— w 5 flay: 55 TABLE'VI Comparison of the X-l and X-4 subgroups and the X-l and K-4 subgroups with a weak position.prefer- ence in terms of the mean number of cor- rect responses on the initial trial per day for days two to nine. Sub- group N Mean O’ 07“ Diff t P X-l 11 5.00 2035 0'71 X-4 9 5.67 2.55 .95 '57 ~57 ~50 ~ -60 ‘X-l 11 5.00 2.55 .71 , K-4 10 5.80 1.47 .49 .80 .95 .50 - .40 3h tendency as measured by the direction of the initial re- sponse in a T-maze situation. From these data the rela- tionship does not clearly seem to be one in which train- ing against a strong preference will take more trials than training against a weak response, that is, as cur- rent learning theory would predict. hore research would seem to be warranted in this direction. ADDITIONAL THEORLTI AL CONbIDE“leON5 The fact that a significant difference in learning was obtained between the complete groups receiving unequal amounts of non-reinforcement to the incorrect side indi- cates that non-reinforcement of the wrong response in a differential resuonse situation is an important factor in the learning of the correct response. The fact that ppph of the groups which received 4 non-reinforcements per day learned significantly better than the K-1 group seems to indicate that secondary reinforcement does not appreciably counteract the effects of subsequent non-reinforcement, at least under the temporal and stimulus conditions employed in the present study. In this context it should be point- ed out that in the K-h group, the group in which the delay boxes were made differential, that the differences between 35 the boxes were not striking. Both boxes were the same size and shape and possibly indiscriminably different in brightness, since one was grey and the other black, with little light entering either box. The main difference was probably in the tactual impressions from the floor, but under-the-surface floor-cues were uncontrolled. In other words some animals of the X—h group may have received secondary reinforcement for wrong responses by way of stimulus generalization for a considerable number of trials. Also for all groups, cues at the choice-point were uncon- trolled and could have served to reinforce wrong responses secondarily. All this means, of course, is that despite the symmetrical properties of the maze non-reinforcement to the incorrect side is a significant variable. It also seems probable that whatever cues are dis- tinctive in the total maze situation may acquire the pro- perty to mediate non-reinforcement. After a number of days of training any secondary reinforcing values (positive expectancy) possessed by the cues on the wrong side seems to extinguish. In turn these cues seem to acquire nega- tive expectancy value. This hypothesis is based on the typical recoil behavior of the animals in the X—h group on the trials to the wrong side late in the learning series. Furthermore, the X-h subjects which were performing near the one-hundred percent correct level early in the learning series showed characteristic behavior on being 36 forced to the incorrect side. These Ss attempted to climb out of the maze at the choice-point, ran back and forth in the stem, and upon finally entering the negative goal-box immediately tried to climb out of the box. This frustration type behavior and the fact that the inter trial interval was fifteen minutes or more may be csnsidered as evidence against interpreting the inhibitory or extinguish- ing properties of the non-reinforced trials as due to re- active inhibition. Rather, we postulate that a frustration drive state produced by repetitive non-reinforcement makes the animal avoid the cues on the wrong side and makes possible the reinforcement of a response which avoids these cues; that is, additionally rewards a response to the correct side. Such a state of affairs would account for the fast- er learning in the.X-4 and K-4 groups. The present findings support the theoretical hypo- theses of Hull (2) and Spence (4) in assigning decremental 0r inhibitory preperties to non-reinforcement. However it should be emphasized that the relationship between non- reinforcement and the strength of the original response is some function whereby an increase in the position prefer- ence to the wrong side increases the importance of non- reinforcement of the wrong response in the learning of the correct response. This relation might be expected from current stimulus-response learning theory. Thus with a large differential between opposing response tendencies, 37 the combined decremental and incremental process which weakens the strong incorrect tendency as well as strengthens the correct tendency will allow the correct and originally weaker response tendency to develop to a point of dominating the incorrect response in a fewer num- ber of trials. This hypothesis now has empirical confir- mation. Further experiments need to be directed towards determining the effect of the strength of the original position preference on differential response learning. The present findings, however, do not agree with the results obtained by Denny (l) in a similar experi- mental set-up. Why does the present study indicate that non-reinforcement is a relevant variable while the study by Denny gave negative evidence? The main difference be- tween these two studies is probably the difference in the ratio and the absolute number of non-reinforcements given. In the study by Denny one group received 2 reinforced and A non-reinforced trials per day and the other group re- ceived 2 reinforced and 2 non-reinforced trials per day. Therefore, the ratio was 2 to l as compared to the 4 to 1 ratio in the present study, and the group which had the lesser number of non-reinforced trials in Denny's study re- ceived 2 non-reinforcements per day as contrasted to the one non-reinforcement per day given the X—l group in the present study. Because the learning curve for the X—l group is abnormally depressed (see Figs. 3 and A) it is very likely that the crucial factor in showing that non-rein- 38 forcement was an important variable was the fact that only 1 non-reinforcement instead of 2 was given. It is also true that secondary reinforcement in the delay box was uncontrolled in Denny's study, but we see from the present results that this factor played only a minor role in negating the influences of non—reinforcement. SUMMARY AND CONCLUSION This experiment was designed to test the hypothesis that a significant difference in learning in favor of the greater non-reinforcement group would be found between two groups receiving equal reinforcement of the correct response and unequal non-reinforcement of the incorrect response, especially if secondary reinforcement on the wrong side was minimized as much as possible. The apparatus was a single choice-point T-maze con- sisting of interchangeable parts, and designed to control secondary reinforcement and extra-maze cues as much as possible. Subjects were 62 albino rats of which 17 were male and #5 were females. The 83 were divided into three groups: (I) The X—h group consisted of 19 83 which received 2 re- inforced trials to the correct side and A non-reinforced trials to the incorrect side per day. (2) The X—l group consisted of 23 85 which received 2 reinforced trials 39 to the correct side and l non-reinforced trial to the wrong side per day. In both of the X groups secondary reinforcement was controlled as much as possible. (3) The K-4 group consisted of 20 Se which received 2 rein- forced trials to the correct side per day and 4 non-rein- forced trials to the incorrect side per day. In the K-4 group secondary reinforcement was uncontrolled. All groups received training for 9 days. The results in terms of the mean number of correct responses from days 2 to 9, based both on the first-two trial measure and on the initial trial measure revealed a significant difference between the groups X-4 and X-l. The differences between the Xel and K-4 groups though still in favor of the'K-4 group was somewhat less significant. Each of the three groups were divided into two sub-groups on the basis of the relative strength of the initial position preference. It was found that with the sub-groups with the strong position preference to the incorrect side the differences between the X31 and X-4 sub-groups and the X—l and K-4 sub-groups were large and even more significant than with the complete groups; while all differences between the weak position preference sub- groups were small and insignificant. ”Evidence of a frustration type of behavior was observed in some S upon receiving forced non-rewarded re- sponses to the wrong side and some theoretical implica- #0 tions of this behavior were discussed. From the present study the following conclusions seem warranted. 1. Differential response learning, in addition to being a function of the number of reinforcements may also be a function of the number of non-reinforcements of the incorrect response. This has been found to be true when a differential of four non-reinforcements to one has been used. 2. If secondary reinforcement of the incorrect response precedes non-reinforcement of the incorrect re- sponse there seems to be a slight but noticeable slowing in the learning of the correct response. With better control of secondary reinforcement this affect might be even more noticeable. 3. In general an increase in the strength of the position preference to the wrong side increases the effect of the non-reinforcement of the wrong response on the learning of the correct response. A. Simple T-maze learning seems to be a rather unexpected and complicated function of the strength of the original position preference. l. 41 BIBLIOGRAPHY References Denny, M. R., The role of secondary reinforcement in a partial reinforcement learning situation. i. Exp. Psych. 1946, 99, 373-389. Hull, C. L., Simple trial and error learning. 1. Comp. Psych. 1939, a1, 233-258. Holsopple, J. Q. and Vanouse, 1., A note on the beta hypothesis of learning. School and Society, 1929, fig, 15-160 Spence, K. W., The nature of discrimination learning in animals. Psychol. Rev. 1936, $3, 427-449. Spence, K. W}, Analysis of the formation of visual discrimination habits in chimpanzees. {. Comp. Psych. 1937, g3, 77-100. Spence, K. W. The differential response in animals to stimuli varying within a single dimension. Psychol. Rev. 1937, fig, 430—444. Supplementary References Brunswik, E., Probability as a determiner of rat be- havior. l. Exp. Egych. 1939, §§, 175-197. Hilgard, E. R., Theories g; Learnin . New York: Appleton-Century-Crofts, Inc. fi948. Hilgard, E. R. and Marquis, D. G., Conditioning and Learning. New York: D. Appleton-Century. I940. Hull, C. L., Erinciples of Behavior. New York: D. Appleton-Century. 1938. Krechevsky, 1., A note concerning 'The nature of dis- crimination learning in animals'. Psychol. Rev. 1937, gi, 97-104. McGeoch, J. A., The Psychology 9£,Human_Learning. New York: Longmans, Green and Co. 1942. APPEND ICES APPENDIX A ORIGINAL DATA 43 Table VII - Record of original responses for the X-l group. Subject Number 1 DIN {OGDQCDOWIP a: tn an :4 FJ r4 F' +4 ta P‘ +4 l4 F4 no :4 c> «a cn in o: cn +9 6 a: F4 <3 23 Total Correct 0 O O O "x" correct response Trials 0 X 0 1 (fl 0 O 12 12 13 12 11 7 8 x x o x x o o x o o x x o o o o x x o o o o x x x x x o o o o o o o o o x x o o x x o o o x 9 10 "o" incorrect response 9 10 11 12 13 14 15 16 17 18 X 0 O I 0 X 0 X X X 0 I X 0 O I 8 12 10 13 10 16 11 17 13 16 Total Correct 9 9 14 13 12 15 15 12 14 14 12 10 NmHNtF 10 202 44 Table VIII - Record of original responses for the X-4 group. Subject Number 1 comqmmrP-OZN F‘ ta F' F4 Id F‘ ta F4 +4 (D q a: an i# a as +4 'o 19 Total Correct "1" correct response 3 4 5 0 0 X I X I X I X 0 I X I X 0 O 1 X 0 0 O I X X X X Z O O O 0 I. O 0 O X 0 O I O 0 I I X I O O O O O 1 0 O O O O O 6 9 12 "o" incorrect response Trials 6 7 8 9 10 ll 12 l3 l4 l5 16 17 18 x x o x o o o x x x x x x x o o z o x o z o x x x o x x o x x z x x x o x o x x o x x o o o x x x x o x x x x x x x x x x x x O O O O X X I o 1 x x 0 x x o o x x x o x o x x x x X x X I 0 X X I X x x x x x x x I. O X 0 I. 1 X x x x x x x 1 o x o x z x o O O O O I O 1 9 12 12 14 16 16 14 17 14 15 19 16 17 Total Correct 8 l7 8 13 l5 13 10 15 17 7 13 10 14 13 16 11 14 9 4 227 45 Table IX - Record of original responses for the K-4 group. Subject Number 1 (GUJQOUIIFO‘JN i4 F! F4 Id F' F4 l4 F‘ rd I4 c) (D ‘q 0: cm 0% a: no ta 10 20 Total Correct 1 2 o x o z o x o o o o o o o o o o o x o x o o o x o o o o o o o x o -o o x o z o x 0 10 "I" correct respo 3 4 5 6 7 8 nse Trials 9 10 x x o x z o o x x o x x x x o x x x x x x o x o o z x o o x x x o x x x x x 1 o ”o" incorrect response 11 12 13 14 15 16 17 18 O O I I 0 I O I I. I X I O O O I O O I. O X I I Z X I I 1 I Z 0 O I O I O I I I I O I O O 2 O I I O X 0 I X I I I I I I X 0 N N O H O O N H N N H Total Correct 11 14 15 12 14 15 12 4 13 15 12 6 13 10 13 10 14 14 14 12 12 16 11 16 16 18 11 216 46 APPENDIX B COMPARISONS Table'X - Comparison of groups in terms of the percentage of correct responses on the first two trials. Group X-l Group X-4 Group K-4 N 23 N 19 N 20 Number Percent Number Percent Number Percent Correct Correct Correct Correct Correct Correct Day 1 12 26 9 24 10 25 2 25 54 15 39 19 48 3 20 44 21 55 23 58 4 19 41 24 63 24 60 5 20 44 3O 79 28 7O 6 23 50 30 79 24 6O 7 26 57 31 82 27 68 8 28 61 34 89 32 80 9 29 63 33 87 29 73 Total ‘ Correct 202 227 216 Total Percent Correct 48.79 66.37 60.00 47 Table XI - Comparison of groups in terms of the percentage of correct responses on the initial trial. Group X-l Group Xe4 Group K-4 N 23 N 19 N 20 Number Percent Number Percent Number Percent Correct Correct Correct Correct Correct Correct Day ' 1 0 00 0 00 0 00 2 13 57 6 32 6 30 3 ll 48 12 63 10 50 4 9 39 12 63 10 50 5 8 35 14 74 14 70 6 10 43 16 84 12 60 7 10 43 17 89 16 80 8 11 48 15 79 16 80 9 13 57 16 84 18 90 Total Correct 85 108 102 Total Percent Correct 41.06 63.16 56.67 48 Table XII - Comparison of subgroups with strong position preference in terms of the percentage of the correct responses on the first two trials. Group X-l Group K-4 Group X-4 N 11 N 10 N 10 Number Percent Number Percent Number Percent Correct Correct Correct Correct Correct Correct Day 1 0 00 0 00 0 00 2 6 27 8 40 4 20 3 6 27 9 45 13 65 4 8 36 12 60 10 50 5 8 36 12 60 16 80 6 5 23 10 50 17 85 7 8 36 12 60 17 85 8 10 45 15 75 18 90 9 12 55 14 70 18 90 Total Correct 63 92 113 Total Percent Correct 31.82 51.11 62.78 49 Table XIII - Comparison of subgroups with strong position preference in terms of the percentage of cor- rect responses on the initial trial. Group X-l Group X—4 Group K-4 N 11 N 10 N 10 Number Percent Number Percent Number Percent Correct Correct Correct Correct Correct Correct Day 1 0 00 0 00 0 00 2 3 27 2 20 3 30 3 3 27 6 60 2 20 4 4 36 5 50 6 60 5 3 27 7 70 5 50 6 3 27 9 90 6 60 7 2 18 10 100 7 70 8 3 27 8 80 6 60 9 4 36 10 100 9 90 Total Correct 25 57 44 Total Percent Correct 25.25 63.33 48.89 50 Table XIV - Comparison of subgroups with weak position preference in terms of the percentage of correct responses on the first two trials. Group'X-l N 12 Number Percent Correct Correct Day 1 12 50 2 19 79 3 14 58 4 11 46 5 12 50 6 18 75 7 18 75 8 18 75 9 17 71 Total Correct 139 Total Percent Correct 64.35 Group X-4 Group K-4 N 9 N 10 Number Percent Number Percent Correct Correct 11 14 14 13 14 16 15 114 50 10 61 11 44 14 78 12 78 16 72 14 78 15 89 17 83 15 124 70.31 Correct Correct 50 55 7O 60 80 70 75 85 75 68.89 51 Table XV - Comparison of subgroups with weak position preference in terms of the percentage of Group X-l Group X-4 N 12 N 9 Number Percent Number Percent Correct Correct Correct Correct Day 1 0 00 0 00 2 10 83 4 44 3 8 67 6 67 4 5 42 7 78 5 5 42 7 78 6 7 58 7 78 7 8 67 7 78 8 8 67 7 78 9 9 75 6 67 Total Correct 60 51 Total Percent Correct 55.56 62.96 correct responses on the initial trials. Group K-4 N 10 Number Percent Correct Correct {OGCODPCDCRO 10 58 00 3O 80 40 90 60 90 100 90 64.44 (0...:n .llr'lluil '1'? 0..) v ’3 3 . D ‘ . . .llrl'I-Iw w I