T—I I H I l ( Illul \ V f “IN! WIN Ill!) i I 1| «)4 0)..) "Cflmm AN EXPERIMENTAL STUDY OF THE EFFECT OF POSiTEGN REVERSAL AFTER ONE OR TWO REIN¥ORCEMEWT$ 0N SVIMPiE T-MAZE LEARNING it‘d THE KAT Thai: for flu Dogma cf M. A. MiO'HGAM STATE COM‘S Dam‘e‘hy Hedi Mame W439 THESIS This is to certifg that the thesis entitled An BIooPiLcntal btuly of tn? 3i”ect of Position Reversal After One or Two Reinforeeuee;s on Sin 1 T-‘tze Learning in the Rat presented in} 17 Dorothy “all Moore has been accepted towards fulfillment oi the requirements for $ v ~ , T‘\ F ____b-_ni}_n___ deqrce 1n MMQTY me W . Maiqfl ltrulcsst if “a“, August :3 5, 19kg O~169 -- ‘- -——. —- I" urn “p“ r ‘ T ill a A11A1,111A1 ”“1“M (‘3 .11 U}: '1'"! .fiv'f—t “m"‘rf‘t '\ 1 3—1 _ ran-1"- w.- wt Lr 1:31 murJJ.t Li 1(13111Lu. w'-'\ ."1 If." ‘3 W"‘ 'jr‘l‘l't‘ ‘.. "‘I‘ :‘1' 'mfi t'\‘t r‘ ' VJ" "1 JxL‘r -_'Jlt. 01103 Luk/l it, J.l_ . l 1 ’43..) 1.2:): l.>) K’Jl.‘ \J‘I. 'llJDJ T 7 T 11:: : mum T -. n/JL‘J 11.1.1.“ _.- l‘. 1.1;": £141 a . 1r k)J DOIOLhy L811 100 re A T3313 Graduate School of Tichigan ”,r1c11ture and A0 ‘1ied l fulfilwcnt of l u311t tea to tie btate College of Science in ¢ u requirements for the degree of ‘ JI’lC " ' Gr '1'\ -'-v '“rfir‘ 1...“;C) 11111. (/f LLLiiQ Department of sycltology 19h? THESIS ’1jflflrx T Y"? wrfinT ALUI'A.\J.:..I_H_)',7J_L__J;. '1 Grateful acknexl;dwemcnt is made to Dr. X. Ray Denny for his willing guidance anc cucouregewcnt, and for his ever—present 4n assistance in pee study reported here. Qflt5°_+i\_ «:5- pm; "1’” .‘J g ,V. ";1 J ('1 T. ”“1 "-1 1‘ “":L'1 1—1"(‘\ LIST (3? T"L'3:_J'E\3 o o o o o o o o o o o o o o o o o o o o o o I " Tar-1 r'fi fi-v-n"~, in 1.1.1.31. K1.“ :lJU-LJ.IJ\Do o o o o o o o o o o o o o o o o o o o o o o 9" ”1'71'3l;"1l\"'f‘ flw..-v .Lo liai-L\1'1.)l)UJ-J_L)l.o o o o o o o o o o o o o o o o o o o ”’7' m":vw:1r‘1* ~ *1 " "7" .LJ.. 1_ t1'\,"...1.L.l_Cxilt t-) LCIACUECJ- J. o o o o o o o o o o o o 0 “mil — trincio e of deinforcewent Tolwntn — trinciule of Bricctency ‘orrer - Two—Ta tor Theorv .flrtte — Conoletiort ‘P?*LZ9SLG ‘ -1 " . .‘t. _‘ - _ _ - j J. a . Jenny - Lerulnenc Acsron 3e .g-r_‘0unce1s ...-. — Am "1 * “r1 *1 "V".‘ 2")" “ .LJl. _).L .J. I u..L \J. .L \-J $44k.” J- o o o o o o o o o o o 0 ”'TJ -fi~r la 3 - n 'T - T 3 "t T‘I" 1' “1—1 1 - t —- t . -{ ‘ .l. a ._1 14L _.._1 '.._'J.'. JLLJ l LLD‘JAJ‘J‘Ji o o g . g o g o o o o o o o 0 Apparatus Subjects rreliminery Training Jcthod 1'7” '1'“ ‘r W ’W/""" """./‘“(-! ‘lJ—Io J J- 1 -‘ 1- I .121 J- J )J-Ui D o o o o o o o I o o o o o ' 'W' " vvwr L) Q-‘JIl’ 383* ALJ— o o o o o o o o o o o o o o o o o o o o o o o o riff 'A VDJ -‘{ o o o o o o o o o o o o o o o o o o 0 o o o o o o o 13 1h 31 33 * 16-1 or. ””11 *1" i1 .11. CL‘ 1.- L,11._‘1.) Table keg I. Con11ri son ofC.rmo1’s C and K—2 and of Groups X-l and X—E in Terms of the Teen Eumber of Correct Jesponees in Woth Free Trials of Days 2—5 . . . . . 2h II. Compirison Hf alps C and K—2 and of Groups X—l ano X-2 on ther 1econd and Third 0333 of Learning in Terqs of the Lumber of Correct fiesqonees on tl’le m if! iill '1‘ ri a1 9 o o o o o 0 o o O o g o D 0 O O 2 )1 LI"” CF FlGUJLS 7" 0 "j '1gure rage 1. Ground Plan of nwraretus . . . . . . . . . . . . . 15 2. Learning Curves of Groups C, K—l, and K—Z, as Bicsed unon txe Ler Cent of Correct Reswonses on the Initial ”Lrifll of the Day . . . . . . . . . . . . . 21 1f 7-. W? “n , 1—1 1nd A—Z as JQSQC "‘ -‘ 6 ~ 1‘ r- R J‘ n rm" CU LL89 0.130.; on Cue 1.10 r11 1 Free irif‘ ”ac Of t‘fle Dan. 0 o o o o o o o o o o o o o 2. Leurnin: ClF"€S of Group C upon the 1er Cent cf Cori e f\) \1) I. ITTRGDUCTIOI The complexity of adaptive phenomena has long offered a challenge to the theoretical psychologist in the field of learning. Although the concept of reinforcement1 occupies a central position in learning theory, psychologists are in considerable disagreement as to the nature of reinforcement. The attempt to give a single description of the reinforcement process — one which will embrace the many diverse learning situations,— has initiated much experimental research and much theorizing. From these attempts it appears that the crucial question is ”Jhat is the critical factor for the strengthening of an adaptive response?”, or, simply, " hat is the essential condition for learning?” gned to answer this Of the many theories which have been desi question, Thorndike's (13) 'law of effect' and Hull's (7) principle of reinforcement have probably wielded greatest influence. For convenience, these two theories may be roughly classed together as 'drive satisfaction' theories, as they both designate reward as the essential condition for learning. In opposition to the drive satisfaction or drive reduction theories, is Toln.n's (lh) principle of expectancy. In his system, reward or drive satisfaction does not strengthen a respo se l 'Reinforcenent', as here used, is defined as a stite of affairs which incrementally strengthens a response. tendency, but serves to keep the organism motivated or in a state of erpectanev for a particular goal object or behavior consequence. Tolman‘s theory and others which generally adhere to this arinciple of anticipation or expectancy may be classed as 'expectancy theories'. Some effort has been made to integrate these two vieWpoints into a single theory. These will be examined in the following section. Ii. 'TIEC‘uiETICJLL 3-;CliGdCUZID Of the current theories concerning the nature of reinforcement, those of Thorndike, Hull, Tolman, howrer, ”hite and Denny are of special concern in the present study.1 R a. Edward L. Thorndike — Law of Effect One of the most influential principles describing the nature of reinforcement has been the 'law of effect' proposed by Thorndike (l3). Essentially Thorndike holds that the satisfving outcome of a reaponse tends automatically to strengthen the association between the stimulus and the response. Thorndike (13) .1 presents his law as follows: dhen a modifiable connection between a situation and a response is made (n1 is accompanied or followed by a satisfying state of affairs, that connection's strength is increased..... By a satisfying state of affairs is meant roughly one which the animal does nothing to avoid, often doing such things as attain and preserve it. (Thorndike, 13, p. 176) Thus for Thorndike reward becomes the essential condition for learning. Hilgard and harquis (h) point out that the law of effect is somewhat of a misnomer in that it does not require that the behavior sequences strengthened by reward should necessarily be instrumental in securing the reward. The effective factor in determining the l The attempt is made to give an outline of only the concepts and principles contained in these theories which are relevant to this study; the pertinent exserimental data is too extensive for reporting here. selection of the correct response is its proxinity in time to the reinforcement, i.e., the last response which occurs prior to the reward is the one most strongly reinforced. There is no implication of purposive behavior or of insight contained in Thorndike's formulations. Reward or goal satisfaction acts directlv on neighboring connections to strengthen them, without mediation by ideas or consciousness on the part of the organism. Originally Thorndike held that if the stimulus response connections were followed by an annoying state of affairs or punish- ment, the connection would be weakened. However, because of experimental evidence which contradicts this 9 ase of the theory, his most recent formulation of the law of effect omits the consequences of punishment. B. Clark L. hull - Principle of Reinforcement Hull's (S, 6, 7) theoretical interpretation of learning is the most systematic of the current theories. This View of primary reinforcement, although more quantitative and particularized, is basically the same as Thorndike‘s law of effect. In both the reward acts directly and mechanically on cue—response connections. Thorndike (12) defines reward or reinforcement in terms of the satisfying consequences of a response, while Null (7) thinkS' of reinforcement as drive-reduction or the decrement in a physiological need. 5 3H In his book, Erinciples of DO havjor, rull (7) states that when a condition of need e: ists, random and v;riz Cole oeha.vjor is evoiced, and the followinc crlain of events could result: In case one of these random res won es, or a seqwlexce of then, r Nil 3 in tie reduction of a need doainant at the time, tuere follows as an indirect effec. what is known as reinforcenent (G). This consists in (l) a strengthening of the particular receptor—effector connections which originall3r mediated tr1e reaction and (2) a tendenC3 l r all ret,eptor disc11r es (s) occur1in3 at about the sane ti:e to acquire new connections with the effectors mediatin3 t1e resi onse in lea.rnin3.......... As a result, when the sa1e need again arises in this or in a sirilar situation, t:1e sti uli WiLl activate the sane effectors more certzi nl3r, more promptly and nore vifiorously than on the first occasion. (Null, 7, p. 336) Hull also stresses the point that this increuent in habit strength occurs only when the receptor and effector activities are in close te poral contiguity and are closel3r follo vred in time by a reinforcing state of affairs (drive—reduction). He states Hhenever a reaction (R) takes olace in temporal contiguity with an afferent rece tor impulse (3) resulting from the inpact upon the receptor of a stimulus ener3v (S), and this conjunction is fo7l wed closelv bv a d1“1“ution in the drive, D, and in the drive receptor disciarge (SD), there will result an increnent, A (s —--) It, , in the tendency for that stimulus on subsequent occasions to evoke that reaction. (Hull, 7, p. 71) It is postulated o1at he bit strer 3th grows siaply as an increasing exponential function of the number of reinforcements. Hull (5, 6) also sets for h a concept involving fractional anticipatolf resvonses (r3), which, he maint mi 18, beco e condi oioned stimuli to adaptive behavior. These fractional anticipatory reSponses are 51a1 parts or fractions of the more complex goal response. :or exanple, salivating, chewing, and swallowing, vhich are fractionations of the co nlete eating process and do not interfere with most overt motor responses, constitute what hull means by fraction 1 an iciie tor? resoonse. Such responses with their accomuanfin; sti11lus coiponents function as behavior— directing stiuuli or pr vi de the physical basis for purposive behavior.l H The role 01 fractional responses is elaborated by Hull (5) in the following way: The drive stin Llus accounts veif well for the random seeking reactions of a hungry organk _:1, but alone it is not sufficient to produce the intofrati- on of comtlex behavior sejtenees such as is i11volved in maze learning. There must alwnys be a reward of some kind. Once the reward} is been 3iven, however, the behavior undergoes a marked chanfie most definitely characterized by evidences of actions anticipatory of the goal, :hic11 actions tend to aplear as aeeo111n11ents to the sequence ordinarilv leading to the full overt goal reaction. It iss mo:n how' th ,se efractional anticipatory reactions could be drawn to the bejinninz of the behavior sequence and 1e1nt.1ned tr1rou3hout bv t11e action of the drive stimulus (S1). The kinaesthetic stimulus resulting from this persis ent anticipatorv action should furnish secon1 stizulus (s~) which would persist very much like 83' T1ese tw >ersistirg sti uli 1 liice should have the aoacity efforning multiple excit torv tendencies to the evocation of everv reaction within the seouence. (All 11,5 )3 :3. 50h) 1 It is i1portant to note, however, that Hull does not posit any causal relationship between fractional anticipatory reactions or their completion (for instance, in the complete goal response of eating) and reinforcement, as such. jithin the Hullian frame- work need—reduction remains as the only and essential principle of primary reinforcenent. C. Edward C. Tolman — ”rinciple of Expectancy Tolman's (1h) sign—gestalt or expectancy theory offers an alternative to rewarded response learning. t postulates that the organism follows 'signs' which mark the 'behavior route' leadinfi to the 'significate' or goal; a behavior route instead of a m venent pattern is learned. The interpretation emphasizes the perceptual or cognitive capacities of the animal. Rewarding states of affairs Operate zxnectancies or . mainly to specifly and maintain expectancies. COgnitive maps, rather than specific motor responses, are learned primarily through simple contiguity. Although Tolnan does not provide a structural basis for expectations, he does not deny that there is one, and attempt has been made by some to tie the princiale of expectancy to an objective basis (e.g., fihite (15), with his fractional anticipatory reSponse analysis). D. O. H. Lowrer - Two—Factor Theory1 The two—factor theory proposed by Iowrer (9) was primarily developed in order to account for the phenomenon of avoidance learning which, according to howrer, is not adequately explained by Hull's reinforcement theori. This interpretation divides learning into two types, that for (l) skeletal muscle responses and l A discussion of the implications which Kowrer's two—factor theory holds for psychotherapy is presented by Shoben (lO), ‘ pl 0 l3S'J-Ullo (2) smooth muscle or autonomous responses. mhe former is accounted for by the principle of drive decrement, while vicero-motor activity is said to be learned through simple association or contiguity. For exaaple, clxiety or fear, which is the visceral aspect of pain, is conditioned to the cues associated with the onset of the drive stimulus (pain). The anxiety so established acts as a secondary drive which then allows the avoidant skeletal responses to be strengthened according to the principle of reinforcenent as formulated.by Hull (7). Lowrer, in other words, has ignored parsimonious considerations and postulated two classes of learning which must be accounted for by two different principles. L. h. K. White - Completion Hypothesis 2 4+ 1% u "t 15’ t ‘tm «men's an atue pt Jug been mane 0f Jni e 2 o in Cbrubv nil s theoretical constructs with those of Tolman and Thorndike by util'zing the concept of fractional anticipatory reactions. In the completion hypothesis, it is proposed that the completion of fractional anticipatory responses constitutes a reinforcement act. White states: The fractional anticipatory reaction is an incipient response or an 'activity in progress' in the literal sense that it is a specific physical act which has been started and not completed. The goal situation makes possible the rounding—out of a coordinated activity-pattern, or the finishing of a complex act in the same manner in which it has been finished on previous occasions. If, then, completion means the transition from an incipient reaction to the complete \O reaction of which it .;s prev iO‘sl, a part "in hull's symbolism, the transition frozn rG to R3), our translation of the s tisfaction hypotiesis can.be eXpressed as follows: The completion of a fractional r anticiputorv reaction tends to reinforce ece nt and . . A. . _ _ . . 1 ,3 \ co1conu1t o—R connections. (“hite, 1;, p. 399 j In this ma finer, white, bv prog)osing hul 1's (5, 6) concept of fractional anticipatory res pouse as the unifyin~ UrlnC“lO, ias encompass lThorndi :e's (13) law of effect by interpreting the completion of fraction: 1 anticip torv reswion es as satisfying states of affairs, and incorporated the directing influence present in Tolnan's (1h) exnecta lCV principle. It is unfortunate, however, that ahite has not advanced h' hY?OLnGCiS do"ona the most tenuttive stage. It would seen th at with further development such a nvpotnesis night 0: e) roach to unders tending the nature of reinforcement. F. I. Riv Denny - :ertinent Response hypothesis Rn interpretation of reinforcement which is somewhat similar to fihite's (15), although independently developed from his, has been proposed by Denny (3) in a series of unpublished lectures.1 According to Denny's theoretical analysis, the so—ca lleds ices _etal and autonomic types of learning can be subsumed under one principle of reinforcefient. In the case of the classical or respondent conditioning of eye blink, pupilary reflex, knee jerk, leg with— drawal, galvanic skin response, etc. (not strictlv in the autonomic l The writer is indebted to Or. Denny for the use of his unpublished lecture ma .terinl, from which th is ou bline of the pertinent resaonse h"-Wo [CSiS is directlv derived, and for his cooperation in per: onall/ clarifvin the principles which he proposes. lO category, it should.he :dded), the principle of contiguity seems to account satisfactorily for the establishment of the conditioned response. Also in the eqse of anxiety or fear it is presumably the presentation of shock or noxious stimulation, not its 0 seation, that sets up the secondary drive of fear. Yet not any pairing of stimuli or of response and stimuli will bring about Denny states: In instrumental learning or operant conditioning it is w-ll known that drive satisfaction or reward must also be present. But what is this so called drive reduction? Is it actually different from jerking one‘s knee when the appropriate stimulus is given to the appropriate structure? Is there actually drive reduction when a rat gets a tiny pellet of feed in a maze or a Skinner box situation? The organism is so structured originally to respond in a fairlv consistent and specific way, say to a.blOW'on the patellsrtendon, and to respond grossly or emotionally to a noxious stimulus. Jhen it eats a piece of food it also makes certain original responses such as chewing, salivating, and swallowing. Jhen an organism gives an innate or reflexive response in a neutral stimulus situation, this particular stimuls situation acquires the property to evoke this response. That is, when an organism gives the responses it it supposed to make, or is so structured to make, to a prepotent stimulus the remaining stimuli in the context acquire the property to evoke this response. (Denny, 3) It is postulated that the animal can make responses that it is supposed to make under two main sets of conditions, which may'bv no means be mutually exclusive: (1) Permanent - when its innate structure so dictates, (2) Temporary - when the momentary state of the organism, primarily its current response organization or set, ll presupposes the organism toward one type of response rather than another. In other words when an o ganism is set to eat food, (has fractional anticipatory responses in terms of making incipient and implicit eating responses) the appropriate or pertinent act for the animal is to eat the food in the goal box and it learns to do that faster with succeeding trials. It also learns to make the more successful instrumental responses leading up to the eating of the food. It is proposed (1) that all responses occurring in immediate temporal contiguity with the consumnatory or pertinent response are also being established and strengthened and (2) that any stimulus which tends to increase or confirm the anticipatory response acts to strengthen any concurrent response.:L Denny's reinforcement hypothesis is then as follows: If the organism makes the response it is permanently or temporarily structured to make, then that response and others very close to it in tine become hooked up and fixated to the present stimuli. In instrumental learning this amounts to saying that the instrumental response in order to be learned, must occur concurrently with the fractional anticipatory responses. fresumably the fractional anticipatory responses which constitute the organism's set acquire some habit strength to the 1 Unlike Tolman's (1h) analysis, Denny's position accounts for the learning of appropriate action as well as the expectancy. 12 maze situation as soon as the first goal response is made; on each subsequent trial an increase in the anticipatory reaction occurs with the making of responses leading up to the goal response, and, in turn, responses instrumental in supplying to the goal object are strengthened. After the first trial subsequent responses are learned in essentially the same way as Hull (7) proposes that responses unich lead to drive reduction are learned.1 1 In Hullian terms, all learning, according to Denny, takes U place by means of secondary reinforcement. III. STATETENT OF THE PROBLEI Explicit in the theory of reinforcement forwarded by Denny is the principle that the organism must be 'set' to make the responses which lead to reward or consummation before learning can take place. Thus in a completely new situation, responses removed to any degree in time from the goal reSponse cannot receive an increment in habit strength until after the anticipatory set has been established. According to this interpretation, it could be assumed that all that is learned on the first and perhaps second trial by the hungry rat which finds food in a new maze is an anticipatory response. Instrumental responses are not strengthened on these trials because there is as yet no anticipatory set to be consummated. The hypothesis to be tested to support this theory is as follows: If, in a simple T-maze learning situation, the goal boxes are reversed in position from left to right, and vice versa after one and possibly two rewarded trials, we should expect no difference in the learning of these animals and animals trained consistently to one side.1 1 This position reversal technique is similar to that used by Spence (12) in a discrimination learning test of the continuity and non—continuity theories. IV warm :«mw nea'ftr . Q.x.1._.Jit 4.5.1.“; --?\.bJ)-c.11 A. Apparatus The appar.a tus consisted of a siuple T—m ze, made up of a starting box, a conbined sten and choice point section, a pair of arms, and two goal boxes. The ground plan of the maze is presented in Figure l (p. 15). Jith the exception of the goal boxes the inside alley width was 5"; the height of all parts of the maze measured 11—1/2". The sides and floor of the maze proper were made of 3/h” plywood, and main sections of the haze were moveaole. The interior of the maze ‘wns painted a uniform gray throughout. The roof of the starting box was wood, the stem was covered with a fine screening which was difficult b0 see through. The choice point section, the two arms and the two goal bo::es he ad ah ard: rare cloth roof of 1/2" mesh. Vertical sliding doors were placed at the entrance to each goal box, in the choice point section, and at the exit of the starting oox. They were painted the same gray as the maze interior and were mde of l/h" plywood. An inverted T-shaped door was placed in the choice point sect ion as SlOTn in the ground plan (Figure li. This door was so constructed 0 prevent the ani ml from retracing its steps once it had made a choice. iith the exce tion of the starting box door which was Opened from two to five seconds after the animal was placed in the box, all doors in the maze were open at the b inqin~ of each trial. The doors at l m 4:» ... l SB ‘v: . , r" .. 1 1,- n . . . . ,— 4 AL ' t. - . ,L are 1. «nwu" warn u: an“ r LL“. >---s~ rtsnfl wrxi :--vt““3 ‘ * . . .- .. - ’ . ,_, _... w . . n .— “a - V ,V ' " . ' ’ - .« - ~- I_ ‘—.1“‘—" t, 1" 3‘3 ‘.\k—--r '-'_'Et‘} ({r" 13 l 7——‘ {3’7“ t.l,l{'t c5).—;] 1"(‘I 3 - ' s; * -' " ' r- ‘ 30?; 3——3H0r83 C--ertains 13 the entrance of the goal boxes were closed immediately after the animal entered. All moveable doors worked by a system of strings and weights. A block of wood of the same size and color as the other doors was inserted flush with the stem on forced trials. One inch from the entrance to each goal box, curtains of black material were suspended from a cross bar so as to obscure the .‘ ! goa portions of the maze lying bey-hd. The negative goal box was trapezoidal in shape, was painted white, and has a smooth sheet metal floor. The inner dimensions of this box were S—l/h” at the entrance, 9-l/h" at the extreme end, and 9—3/h” long. The positive goal box was approximately square in shape, having inside dimensions of ll-l/h" x ll-B/h". It was painted.black throughout, and had a l/h" plywood floor which was covered with hardware cloth. A round coaster-like glass food dish, aparoximately 2" in diameter, was placed on the floor opposite the entrance to the box. 8. Subjects The subjects were albino rats from the animal colony of the psychology department at Kichigan State College. A total of 63 animals which had no prior experimental experience were used. The ages ranged from 108 to 190 days. A total of 33 female rats and 35 male rats were used. They were placed in the groups so as to approximately equalize the number of males and females in each sub-group. 17 C. Preliminary Training Seven days of preliminary training was given to all animal.. For the first four days, this consisted of handling and petting On the fifth, sixth, and seventh days of the preliminary period, the animals were placed in a straight alley, which consisted of the starting box and one of the arms from the maze already described. Each 3 received four trials per day, making a total of 12 preliminary trials in the straight alley. The S's received no food reward on these trials. They were retained in the second (or arm) section for fifteen seconds. Beginning on the fifth day of the preliminary training period all of the animals were placed on a food regimen of nine gms. per day. They receiVed this at regular feeding time for the remainder of the preliminary period. At no time during these seven days did E feed the 8'5, and all preliminary handling, and training was carried on at least three to four hours from the time of feeding. D. Kethod Upon the completion of preliainary training, S's were placed at random in one of three groups. There were 2h animals in Group X-2, 20 animals in Group X—l, and 2h animals in Group C. Each main group was subdivided into two equal sub-groups. 13 Group X—2 animals were goal-reversed after two rewarded trials. For half the animals the positive goal was on the right for the pre-reversal trials, and on the left following position reversal. The other half was trained left for the first two trials and to the right for the remaining trials. Group X-l had one rewarded trial before goal reversal was imposed. Except for this condition, the sub—groups were set up in the sane manner as in Group X—2. Group C se ved as a control group and received no position reverse . One sub-group was rewarded for running right, while the second sub—group was trained to the left alley throughout the entire experimental period of five days. This procedure with sub—groups was followed in order to randomize position preference in the animals. Feed reward, of one large pellet of dog food (approximately .hO gms.), was given in the black goal box at the end of the correct alley. Animals remained in the positive goal box until the food was eaten. Each 3 was kept twenty seconds in the negative white goal box which was alw ys at the end of the incorrect alley. The end boxes were changed in position in order to conform to the experimental design for each sub—group. In the event an animal refused to enter the goal box within forty—five seconds, he was removed from the maze. A record was kept of all refusals to enter. 19 (W Each 0 had four trials per day for a period of five days. with the exception of the trials of the first day for Group X—l, the first two trials of each day allowed the animal free choice, while the remaining two trials were forced in such a manner as to equalize the number of correct and incorrect trials each day. The last tw trials were forced in a way to make possible only four combinations of responses on 'the four trials. These were HELL, LL33, RLLR, and LREL. This pattern was adhered to in order to discourage alternation of response. Since position reversal was carried out in Group X—l following one rewarded trial, the first day trials for this group were alternately free and forced, in order that one correct and one incorrect trial could precede reversal. A period of LS minutes elapsed before the second set of free and forced trials was run. On all succeeding days the pattern of running was the same for Group X-l as for Group C and X-2, i.e., tw free trials and two forced trials per day in that order. Animals were run in blocks of from six to ten in number. The order of running Changed from day to day, but rema'ned constant for successive trials on the same day. At the end of each days run, the 3's were fed nine gms. of Purina dog chow checkers in individual feeding cages. Thus a food deprivation of from 22 to 23 hours preceded each day‘s trials. V. RESULTS The results in terms of the per cent of correct responses for Groups X-l, X-2, and C are granhically presented in Vigures 2 (p. 21) and 3 (u. 23).1 The curves in Figure 2 are based upon the per cent of Correct resnonses for the initial trials of each day, while the learninfi curves of Figure 3 are plotted in terms of the per cent of correct reSponses for the first two free trials of each d y.2 The initial trial data is considered superior in this study because (1) it is not affected by the tendency toward alternation which persisted despite the precautions taken to prevent it, and (2) it provides an equilization of the number of previous food reinforcements. The results shown in Figure 2 indicate that the non—reversal group (C) and the group on which goal reversal was imposed a"ter one rewarded trial (X—l) are wall matched in yerformance. There is practically no difference anywhere along the learning curves. However, the K—2 group, when ccmpared with the X—l and C groups shows retardation on all days. The greatest retardation is shown on the initial trials of the second and third dayS. There is over an 30 per cent drOp in the per cent of correct responses on the second day. Group C and Group X- both attained a performance level of 75 per cent on the second day whereas Group X—2 had only 20.8 per cent correct responses. l The data from which these curves are plotted is presented in Tables A and B of the Appendix. 2 The oer cent of correct responses for the first day is calculated in terms of the correct position for trials following reversal in order to show the relationship between initial position preference and post—reversal results. h c .-—* o------o X-l _ _ _ _ _ O 1 w m w w .2...» .252. .5 32053. 5253 .53 cum 2 n—-—-u x- 00 DAYS ”U I“), 0 1‘1 Yl 1 .vL 22 Essentially the sane trends are shown by the learning curves based on two trial data in Figure 3 (P. 23). The somewhat smaller difference shown between the K-2 group and the other groups in the two trial data is expected, at least in part, for the reasons previously aentioned. The perfoniance of the X~l and C groups is superior to the K-2 group on the second and third days and the drop in the per cent of correct responses for the X—2 group on the second day is again in evidence, with no corresponding decline in the other two grou)s. However, no consistent difference between X—2 and the other two groups exists on the fourth and fifth days with the two- trial data, indicating probably the slight effect of the variable being maninulated in a long run learning analysis. According to Hull's (7) description of the growth of habit strencth in which the fi st increments to the habit are the L) largest we would expect that any difference which w uld appear ’W between the 'reversal' ani the control groups would be greatest on the days immediately following goal reversal, rather than on later trials or in overall performance. This is evident in the learning curves of Figures 2 and 3. Table 1 presents an overall comparison of Groups C and K—2 and Groups X—l and X—2 for the first two free trials of days 2—5.1 l The data of the first dag is excluded because the variable of position reversal did not operate for all groups until the second day. PER CENT CORRECT RESPONSE ON FIRST TWO FREE TRIALS V) 9) I00 I- 90- “Dr- 70r— 60'- 50'- I 0’ k- / x4 o----—-$ ‘\ , x-z I-—-—-x p—n p- l l l l 1 3 4 DAYS 30 «1911‘s! 3' 14””I‘hj-V‘77 L" 1.L“"‘S pf :'_}:'riw.l;‘;:. C, f—l, and 71—23153 flagged me n the fer cent 03. F Igj- 11.18 C} r t‘}] .1.) I)"‘;“/’ A “n ‘ o‘ . "- . --~'->. “"‘ ‘ r P ' ." ‘ MOII‘WL Ra‘s): Luna‘s on trle‘ [no 41‘0“) 2h ‘ fi TADL‘ I L4 Conparis n f Groups C anc X—2 and of Groups X-l and X-2 o o in Terms of the Lean Number of Correct ReSponses on Both Free Trials of Days 2-5 The. Correct Responses C 2h 6.00 1.19 X—2 2h s.h6 1.08 ~5h0 1-61 -?0 2-1 20 6.25 .99 ’ ) \O x_2 an G.h6 1.08 -790 2-47 ‘“9“ Here we see that the difference between the means of Group C and Group X—2 vields a t of only 1.31 which is significant only at about the 20 per cent level of confidence. The difference between X—l and X—2 groups, however, yields a fairly large t of 2.h which .1. is signified t beyond the two per cent level of confidence. TABLE II A Comparison of Groups C and K—2 and of Groups K—l and X—2 on the Second and Third Days of Learning in Tenns of the humber of Correct Responses on the Initial Trial* Day 2 Day 3 no. Correct Chi no. Correct Chi Group N Responses Square 3 Responses Sq1are P C . 2L!- 13 c o ,3 20 Cl f} X-2 21L 5 12 .02 (.OU]. ll /‘ .‘JZ (.02 X-l 20 15 o q, 18 :) X-Z 2)-!- 5 10070 (.OL_ 11 70/4 <00]- * The Yates correction for continuity, i.e., a deduction of .5 from each of the discrepancy values, has been made in the calculation of chi square to allow for the small frequencies. m ‘JI ) when we turn to Table II for a comparison in tern. of the U zruflaer of correct res onses on the initial trials of the second and third days of learninj we see much more significant differences as obtained by the chi square test. The X—2 group fives sijni icantly fewer responses than either Creip C or Group X—l on both the second and t1irl days, i.e., on the days when a predicted difference should show up. The null fl 1 h pothesis o no difference oetween the 192 group and the other two groups on the initial trial of the second and third days can therefore be rejected. l The chi square test cannot be legitimately employed with the tw —trial data. {"1" ‘r 1ne results of the present study show clearly that the h— group whicL was goal—reversed after one rewarded trial shows no retardation on subsequent trials when compared with the control groui which received no position r versal, and, presumably there is no difference betv .een these txo groxps. However, learning by the L—2 group, which had position reversal carried out followinj two rewarded trials, was significantly slowed up following reversal, or, in other words, had learned to some extent to turn in the direction I ... the first two reinforcements. O H.) THI 8 seems definitely to suggest that under the conditions prevailin: in this study, no learning of the instrumental response took place on the first rewarded trial, while on the second rewarded trial some increment to the habit was effected. In the lisht of Dennv's (3) h; pothesis this may be interpreted as meaning (1) until aIfractionale antici eatery response is set up, no instrumental learning is possible, and (2) that an anticipatorv set starts to build up in one trial. A confirmation of this partial set on the second trial is effective in brincinj aooxt some increient to the instrumental resaon se, causing a decrement in mi rfornanw when the goals are reversed of ter two reinforcements. However an, as yet, undiscussed aspect of the results prevents such an interpretation from being conclusive enough to reject the drive~satisfaction theories of Hull (7) and Thorncike (13). since the ani aal had no expectation of finding food in the maze on the 27 first trial, random and exploratory behavior was elicited, and a lapse of aggroxinately 30 to 120 seconds occurred between the response of tur nin: right or left and finding and eating the food. According to: Eull's (7) princ ciple of the reinforcement gradient, such a long delay between response and reward would nrOJaolgr not allow for an increzent to the haoit on this decrement in pe erfo ornance follO’dnj goal reversal after one rewarded trial would not he expected. whichever of these interpretations is accepted, sone kind of eX)ect1ncv set or anticipation would seem to oe essen ial for instrumental response learning. Unless there is a set which acts in a behavior-directing capacity immediate rewnrd in the maze situ tion is i1pos si3le . The question then becomes, ”shat is the role of the expectancy set?" Does it f anction to bring about (1) more r11ediate drive reduction, or (2) a more inaediate increase in the making of 1ractional (inelicit) or complete (overt) corsiiwatin respon -ses. The evidence for the concept of secondary reinforce1ent (l, 2, 3, 11,) which does not involve drive reduction, and the i: probaL ilihv of one small pellet of food reducing the hunger drive militates against the first alternative. hevertheless, further research must be carried out to detenuine which of these inte rnretations, if any, is correct. in experiment similar to this study is suggested in which the prelinin ar rtraining includes having the reversal—after—one—reinforcenent— vb group eat out of the glass dish that is used in the positive goal 1is mijht serve to reduce the lapse in time between the instrumental resoonse and the goal response on the initial expe imental trial, makin: it similar to the time then by the control group on the second rewarded trial. frecautions would still have to be token to insure the absence of an expectancy set on the first trial, althoufih under these conditi ns there might be elicited anticipatorv reactions ujon the iercegtion of the filass dish in the coal b x. Results from additional studies such as the one suggested above will serve to make the exiloratory findings of this study more congleto and wake possible a better analysis of the nature of reinforcement. vv-‘ 7 -— m"*"' -,'>~r W" 'w “was ~ *"‘-"W iflo OUAiALAHL -Llfl.) LIL/i.L/'_:‘4J__’.LLI‘.LLD This study was d signed to test the hypothesis that when xgectancy of reward is alsent, learni.3, under the condition of position reversal following one and possibly two rewarded trials, 9 J. will proceed in much the same manner as under~conditions 0 non— reversal of goals. There were two exqerimental groups: one, of 20 animals, which was goal—reversed following one rewarded trial, and another, of ?h animals which was reversed foll win: two rewarded trials. A control group (U22h) received no reversal. D The no aratus consisted of i simple T—naze. All animals received ('0 seven days Ol handling prior to five days of learning trials, on which days the? received two food reinforcements per day and an equal number of trials to each side. The resxlts reveeled no differences between the non-reversed "roup ond.the group which was reversed following one rewarded trial. A significant difference was oitained on the second and third days between the group which was goal—reversed following two trials and the other two groups. On he basis of results found in this study the follow1ng ci‘ conclusions may be drawn: 1. In the absence of expectancy on the first rewarded trial, no learnins of the instrumental response takes place; therefore an expectancy or enticipetory sot seen: essential for instrumental ressonse learning. 3O 2. Some expectancy seems to be built up in one trial. 3. Ch the basis of this study alone the specific function of the erpsctancv set cannot be determined. J.) " Del PIN {I ‘IOJ‘L * I? e \O 10. ll. 12. 13. LITEL_TU13 CITED Bugelski, R. thinction . J. Comp. rsvchol., 193 3 ith and without sub—go: l reiniorcer: ent. 32‘3312-1330 Denny, L. R. The role of secondarv reinforcement in a partial reinforcement learning situation. J. its. isychol., l9h6, Denny, I. R. muuolished lecture series (untitled). hichigan State College, East Lans in:, 19 is. Hilgard, E. R. and D. G. harquis Conditioning and learning. how York: U. Appleton-Century Co., l9hO, 429 p_). I , C. L. Goal attraction and directing ideas conceived as habit phenomena. ffifChOlo L°Voa 1931 3 J “"‘536' ., C. L. Tind, neeianisn and ndnstive behavior. Esvchol. Hull, C. L. rrinciales of oei“*ior. 39W fork: D. nonleton- ‘ \‘T' Century 00., 19u3, u23 PJ- , p ' .L 1 'J. Lourer, C. L. i stinulus—resfionse aralvsis oi anxietv ans 1U) role as a reinforcing agent. rs“chol. Eev., 1939, u‘, H r ,7 :1‘ 3‘30) 0 Lo oarer, C. 3. On the dual nature of le:1rr n1.” - a reinterpretation J v7 01 ”conditioning” and ”problem-solving”. mar . Educ. Rev., 19h , l7, lC2-lh3. Shoben, E. J. A learning-theory interpretation of psychotherapy. harv. Educ. Rev., 19 hi, 1), lEEJ-luS. 11" Skinner, B. F.1ne ’Meiavior of ore D. Anpleton— Century 00., l93o, isms. New York: 535. '3 O Spence, K. J. in ex_erimental test of the continuity and non- continuitv theories of discrimination learnin'. J. Ere. Thorndike, E. L. The fundamentals of learninfi. flew York: Bureau of fublications, Teachers College, Columbia University, "7 wr Tolnin, 3. C. Luruosive behaVior in animals and men. flew iork: .«T‘? D. xpjleton-Cent ur: 00., 193 , uo3 pp. shite, R. I. The comeletion hvwoth.sis and reinforcenent. ‘ ' \ ,f l / Ir \ i 37,0110]. . DEV . , 193K) , (1.3 3 39U—ilJLL. Uennv, E. R. The effect of usinfi dif"erer;tial end b0; :85 in a "11918 T ”12; learning Situation. J. SID. re"chol., 19h?) T’."w,_,, ‘7‘ ‘. rj, .- r- A '.«~ f ,_ 1' 5-. ‘ 'A ,1 - ullbdrd, ‘ Theor.cs oi 1e;rni J. .01 101R. 3. nonleton— JJ. -L. ntury 00., Hull, C. L. Thorrr ke's lunaaWeRCLl of learning. fsgchol. Bull., 32, 53 -i23. 1— gradient h"00t40818 and maze learning, "\ J 2’ 39 9-113 0 'l APOCAOJ"“V I. 'Hyootheses' in rats. rsychol. Rev., 1932, 39, 516"532 o Iowrer, C. I. frepara trrv set (eXJectancy ) - a deternine nt in motivation enfi learning. isycaol. Rev., -L9I , b:, 02—91. Spence, K. n. Theoretical interpretations of le rning. In F. A. loss, e it. ., Comparative ES:ChOlOZ” (revised edition), How lork: Pren tice-Hall, Inc., l9ho, 230—329. Tolman, 3. C. Law of effect. Psvch. Rev., 1938, hh, 2OC'-203. 33 .4 Connsrison of Grou s K—l K-Z and C in Tenxs of the fer Cent 4L 5 3 of Correct Responses on the Initial Trial of Each Dgy Group C Group X—l Group K—2 Day Correct Cent Correct Cent Correct Cent 1 ll h5.9 10 50.0 15 62.5% 2 18 75 .0 15 7:5 .0 S 20 .8 3 20 :3 3 .3 13' 90 .o 11 1.15 .9 h 2? 91.7 19 95.0 20 33.3 5 2‘4 10;: .o 20 100 .o 23 9S Lean for Cent 95 79.2 8 82.0 68 T6.6 % This per cent is calculated in terms of the correct position for trials followinc reversal. 3h Comparison of Groups C, X-l, and K—2 in Terms of the Per Cent of Correct Responses on the Two Free Trials Each Day Group C Group X—l Group X-2 he. rer No. fer ho. fer 01y Correct Cent Correct Cent Correct Cent 1 25 52.1 20 50.0 23 h .9* 2 30 62.; 2h 60.0 21 h3.7 3 37 77.1 29 72.5 29 60.h h 36 75.0 35 87.5 39 81.2 5 bl 35.h 37 92.5 h2 87.5 Lean fer Cent 169 70.h 1h; 72.; 156 65.0 * This per cent is calculated in terms of the correct position ior trials Collowint reversal. *-- 35 \ Behavior Data of Lninals in Group C on Ten Tree Trials Given at the Rate of Two Trials fer Day. "K" Refresents a Correct Resionse3 ”O” Represents an Incorrect ReSjonse Trials Total 10 Correct i4 [\3 W C‘ UT C\ N C3 \0 X X X On? I X X X X 0-2:- X ‘ 0-; Animal fl Animal j An in (ll '1‘ Animal fa >4 4C) 40 O J x A K t [Ln i1] :11. 1/: S X 0.) :- coxooo '—<><><><><><: X X t I 0* \r 1 J J I 0 ll imnwldfi fninal f Animal f0 k.‘ r‘v NONOONNN >< A N ; >4me .\ .1 l ‘ ‘ . x X X P4 Pa 0.5400057- N ‘1 ‘l ’I _>>¢<><:»<><§><>-1><>4>< >1>~1>< k f" N-‘ [’1 >4 Animal £10 0 X X Animal fill X X X X 0 X 0 3111145151]. j§]_§3 :' (J :C 3: 3C TC (3 I{ }{ unirmfl-.;13 0 If (3 X I{ Ii X 3{ X Qtriiiflzzl g91_11 C3 Ii I( }C Ii Ii ?( Ii Ii Lnififll fig 0 X X X X M X C* X X Animal £16 I: K Ii I ‘C ‘1 T (r Ii C% “'17 1: O X X X X X ‘1 3': X I‘m iwlal If. -1r1i12151 H 3. 3 I( l: ‘{ EC 3C (3 ii }{ }{ ii Animal El? X G O K I X X X 0* X X 0 K ” X X X X I X X C X X X I ibitnal 520 /t 0 Animal 121 C K x.imal £22 0 0 Animal 523 O 0 Animal fEh X C ‘\-\]\J'l-\J\O\OC3-\]\-O-\]'\1\] CO‘ONWCDWNCAJC\-\J I V h:>:cw>4 >4>4><>4c>c) >4 C): CO \J'l Total Correct Ea. Trial 11 1h 13 12 20 17 22 1h 2t 17 169 % Denotes refusal to enter 30a- box Within h, seconds. Behavior Data of Animals in Group X-l on Ten Free Trials Given at the Rate of Two Trials Per Day. "X" Represents a Correct Resnonses "0” Represents an Incorrect Resfionse Trials Total 1 2 3 h S 6 7 8 9 10 Correct Animal #1 X I X X X 0 X 0 X 0 7 Animal f O X X 0* X X X X X X 8 Animal #3 0 X X X X X X X X X 9 Animal J 0 X 0 X X 0 X 0* X X 6 Animal f5 X 0 X 0 X X X 0 I X 7 Anima #6 0 0 0 0 X X X X X X 6 Animal :5 o 0 X o x c- X x X 0 5 Animal £3 0 X X X X X X X X 0 8 Animal #9 X X X 0 X 0 0% X X X 7 Animal {5'10 X X X 0 X X I X X 8 Animal ,Fll 0 X X 0 X X X X X X 8 An inal 5512 0 0 0 0 X 0 X X X X 5 Animal £13 I 0 X X X 0 X X X X 8 Animal flu X X X X X 0* X X X X 9 Animal £15 X 0 0 0 X X X 0 X X 6 Animal $16 X C X 0 O X X X X X 7 Animal £17 X 0 0 X 0 X X X X X 7 Animal flS 0 0 X 0 X X X X Y X 7 Animal 1:59 I: 0 x 2»: 3: 0 1 3 V X 8 Animal $20 0 X X X T: X X X X X 9 Total Correct Ea. Trial 10 10 15 9 18 ll l9 16 20 17 th % Denotes refusal to enter goal box within 37 Behavior Data of Animals in Group X—2 on Ten Free Trials Given at the Rate of Two Trials fer lav. "X” Represents a Correct Resuonse° "O" Represents an Incorrect Resoon U) \- e Trials Total l 2 3 h S 6 7 d 10 Correct Animal #1 X X Animal f2 Animal f3 Animal ju Animal f“ Animal f0 animal f7 Animal 333 Animal f Animal #10 Animal mll Animal £12 Animal $13 Animal flh :lninal f a Animal £16 k V's NNNX \O I W ONOOC? >5 A ‘ f'fi \_/ ‘><><>:><><><>4 >4>i><>c>>:>:>:w:w:w:c>>: ,OHNNNMNMNH N --%:V:C>CI {>4 C» (—3 $7 c>><~*><><><>:> >4 NONNN C) C 3 CO S‘ >: >4 \r '7 .1: 1(U ‘I - c>ni » “.1 . A ”J >w pl 7 K4 ‘4 1" k4. V‘s A b A r\ _. .2 >4 f" V N 0* NOONNONOCOC‘C)OOOOP4 p O NPQNHNNNNNNNNN'xH‘ "NNNN C\U’LCDCAJCDUIO\~\IO\CDO\\HO\C\O\C\U'L-\]CO\‘I\1-\'IO\-\l , V i lnimal fl? 0 O X k X I X Animal J18 X 0 O X G X 0* X A Animal $19 0 X X 0 3 X X ' O 0 animal £23 1 X 0 K K K K K K An'fial le C K O X X K K K I animal f22 X C X 0% X X X X X 3 an ima ,,~,- 23 X I 0‘ 7' X C X O O):- X Anfmal f2h I O C X 0* O X X X X Total Correct Ea. Trial 9 l6 5 16 ll 13 20 19 23 19 lq6 I % Denotes refusal to enter goal box within AS seconds. ay29'so ROOM "SF. 0ND? ..-. ”'TITIIlLlITILleIllIfillilfllllflifllfllflllfl