‘Qa. ' 3 “rm/0] ABSTRACT DETERMINANTS OF "PRE-MORAL" DEVELOPMENT IN PIGEONS BY Harry Jay Caplan Pigeons were trained to peck an illuminated disk before eating from a freely available food source, thus rewarding their own performance. This self—reinforcement pattern was established during a training period of punish- ing non-contingent self-feeding by food withdrawal. During the testing phase, a higher punishment probability main- tained the key peck better than a lower probability. A more stringent training criterion maintained the key peck better in testing than a lenient criterion, independently of the punishment probability in testing. Absence of training with food withdrawal punishment resulted in a reduced maintenance of the self-reinforced key peck in testing. A 0.75 probability of punishment in training was not as effective as a 1.00 probability in establishing the key peck response. A pretraining history of free food had little effect on the acquisition of the self—reinforced response during training, but did reduce the maintenance of Harry Jay Caplan the self-reinforced key peck in testing. The results suggest that the key peck response is not reinforced when followed by access to freely available food. Rather, the key peck response had a high probability of occurring only when the alternative response of non-contingent self-feeding was punished. The results do not support the notion of a natural tendency to key-peck for food. DETERMINANTS OF "PRE-MORAL" DEVELOPMENT IN PIGEONS BY Harry Jay Caplan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1974 ACKNOWLEDGEMENTS My thanks to Dr. Mark E. Rilling whose help and commentary were invaluable to me in the preparation of this thesis. My appreciation is also extended to my wife, Linda, for her help in the preparation of the Figures. ii ACKNOWLEDGEMENTS LIST OF TABLES . LIST OF FIGURES . INTRODUCTION . . METHOD . . . . RESULTS . . . DISCUSSION . . LIST OF REFERENCES TABLE OF CONTENTS iii Page ii iv 14 42 58 Table 1. LIST OF TABLES Page Summary of the method parameters. Column one lists the numeral of the Groups. Column two lists the conditions during pretraining, column three during training, and column four during testing . . . . . . 12 The mean number of responses during the blackouts. Data is presented for Groups I, II, III, IV, V and X. For each group the low and high mean number of responses during the blackouts are given for both training and testing. Paired with the mean number of responses during the blackouts are the number of sessions needed by the pigeon to meet the criterion of the phase from which the number was taken . . . . . . . . . . . . . . . 40 iv Figure 1. LIST OF FIGURES Page The mean key peck responses in testing (left ordinate) and the mean percent self-reinforcement in testing (right ordinate), for Group I (panel A), Group II (panel B), Group III (panel C) and Group IV (panel D). Each data point represents the mean of 100 trials over a block of two consecutive sessions. The first number in the parentheses is the punishment probability in training, and the second number is the punishment probability in testing . . . . . . . . . 15 The mean key peck responses in testing (left ordinate) and the mean percent self— reinforcement in testing (right ordinate) for Group VI (panel A), Group VII (panel B), Group VIII (panel C) and Group X (panel D). Each data point represents the mean of 100 trials over a block of two consecutive sessions. For Groups VI and VII the first number in the parentheses is the punishment probability in training, and the second number is the punishment probability in testing. For Group VIII, the number in parentheses is the probability of punish- ment in testing. For Group X the second and third entries in the parentheses are the probabilities of punishment in training and testing, respectively . . . . . . . . . 18 Figure 3. Page The mean total transgressions in testing for Group I (panel A), Group II (panel B), Group III (panel C) and Group IV (panel D). Each data point represents the mean total number of transgressions for 100 trials over a block of two consecutive sessions. The first number in the paren- theses is the probability of punishment in training, and the second number is the punishment probability in testing . . . . . 24 The mean total transgressions in testing for Group V (panel A), Group VI (panel B), Group VII (panel C), Group VIII (panel D) and Group X (panel E). Each data point repre- sents the mean total number of transgressions for 100 trials over a block of two consecu— tive sessions. The first number in the parentheses is the probability of punishment in training. The second number is the probability of punishment in testing. The single number in the parentheses for Group VIII is the probability of punishment in testing . . . . . . . . . . . . . . 27 The mean total transgressions in training for Group I (panel A), Group VI (panel B), Group IX (panel C) and Group X (panel D). Each data point represents the mean total number of transgressions for 100 trials over a block of two consecutive sessions. The single number in the parentheses is the probability of punishment in training. The free food pretraining history of Group X is also in the parentheses . . . . . . . . . . . 33 Upper panel. The mean additional key pecks during the last 5 FRl sessions (open bar) and the mean additional key pecks during the first five training sessions (closed bar) for each bird in Groups I, II, III, IV and V. Lower Panel. The mean additional key pecks during the first five testing sessions (Open bar) and the mean additional key pecks during the last five testing sessions (closed bar) for each bird in Groups I, II, III, IV and V . . . . . . . . . . . . . . . 36 vi INTRODUCTION The study of operant behavior has generally been limited to procedures in which reinforcement is response contingent. This means that the experimental animal must emit a response (e.g. a key peck) to obtain access to a particular reinforcer (grain). In the absence of responses access to the reinforcer is not provided, and thus rein- forcement is experimenter controlled. Reinforcement is controlled by the subject when it is freely available, but is not consumed until a response has been emitted. This process is called self-reinforcement. As discussed by Skinner (1953), the process of self-reinforcement presupposes that: (l) the individual has access to a particular reinforcer but (2) does not obtain the reinforcer until he or she has emitted a parti- cular response. While the sequence of events approximates the operant conditioning paradigm in which a response is followed by a reinforcer, it is important to note that the subject could obtain the reinforcer without first emitting the response. The question of why he continues to reSpond is posed by Skinner as a major theoretical problem. While Skinner recognized the phenomenon of self- reinforcement in humans, little experimental attention has been given to its role in establishing and maintaining behavior. Bandura (1964, 1967) and Kanfer (1963a, 1963b) have started to examine self-reinforcement processes in humans, using three basic procedures. The first is the directed learning paradigm in which the experimenter adopts a criterion of what is required from the subject for rein- forcement. Responses which meet or exceed the designated criterion level are rewarded, while responses which fail to meet this level are not. When subjects later reward their own behavior, they are likely to reward themselves using the criterion employed by the experimenter. Kanfer and Marston (1963a) have shown that the rate at which sub- jects reinforce themselves for self—judged correct responses is influenced by their training, miserly versus indulgent. For comparable performance, subjects who had miserly reward during training rewarded themselves less often than sub- jects who had lenient training. The second procedure is the vicarious learning paradigm, in which the effects of a model's performance on self-reinforcement are examined. Bandura and Kupers (1964) used a model performing a rigged bowling task, in which either a high or low performance criterion was established. Children who observed a model setting a high standard of self-reward rewarded themselves sparingly, and only after high scores were obtained, whereas children who had observed low standard setting models rewarded them- selves generously after mediocre performances. The third procedure is the temptation paradigm in which the subject administers rewards for which a set of explicit rules and standards exist, and over which the experimenter apparently has no ability to control or even to observe the subject. However, the experimenter does have a knowledge of the subject's performance and can keep track of earned and unearned rewards. Kanfer and Duerfeldt (1968b) found that school children who kept their own scores on a number matching task, tend to cheat more fre- quently if the magnitude of the reward was higher than similar school children who were playing for less salient rewards. Bandura (1969, 1971) and Kanfer (1970) have reviewed much of the literature on self-reinforcement, and conclude that the results of these studies indicate that once per- formance standards for self—reinforcement have been established through modeling or direct training, the sub- jects can maintain their behavior by self-administered reinforcers as well as, or better than, they do by experimenter-controlled reinforcement. Mahoney and Bandura (1972) develOped an infrahuman analogue to self-reinforcement using pigeons. Part of their self-reinforcement procedure included the two necessary characteristics used in human studies. First, the organism has access to a freely available reinforcer. Second, the organism does not consume the reinforcer until the desig— nated response has occurred. The self-reinforcement paradigm may include a sequence of three phases: pre- training, training, and testing. During pretraining the subject is magazine trained and shaped to peck the key. Training involves punishment for transgression, i.e. the subject is taught to peck before trying to eat. During testing, no punishment for transgression is given. There- fore only the testing phase meets the requirements of self- reinforcement. The Mahoney and Bandura procedure involved three phases: pretraining, training and testing. During pre- training the pigeons were magazine trained and shaped to peck the key. The training phase provided a learning history during which punishment occurred. A pigeon was placed in an experimental box with a key, food magazine, and a house light. The training sessions were divided into trials, at the start of which the magazine was in the raised position providing access to the grain. If the pigeon pecked the key and then inserted its head into the food magazine, it received 3.5 sec of grain access followed by a 30 sec blackout. But if the pigeon inserted its head into the food magazine before pecking the key, a 30 sec blackout began without allowing any grain access time. Inserting the head into the food magazine before pecking is referred to as a transgression. During training the birds were punished for transgressing with a probability of 1.00. Both of the birds acquired the selfnreinforcing response of pecking the key before inserting the head into the food magazine. Once this response was acquired, the testing phase was begun. During this phase, transgressions were not punished, i.e. the pigeon could have grain access indepen- dently of whether it peeked the key. Thus food was freely available and the bird could peck or not before inserting its head into the food magazine. Again, a self—reinforcing response is pecking the key before inserting the head into the food magazine. During testing, the key peck was main- tained for hundreds of trials, even though the response was not required to obtain the grain. In the Mahoney and Bandura study, only the testing phase is a true self- reinforcing procedure, while the training procedure provided a developmental history. Auto-shaping (Brown and Jenkins, 1968) is a procedure in which the repeated pairing of a key light and reinforce- ment results in the acquisition of a key peck response by naive birds. The acquisition of the auto-shaped response is influenced by the pigeon's reinforcement history, as a study by Engberg, Hansen, Welker and Thomas (1972) demon- strated. A group of pigeons which received pretraining with a treadle press response acquired the key peck faster than a group that received no pretraining. A third group received pretraining with noncontingent reinforcement and was the slowest to acquire the auto-shaped key peck. Thus a history of response-dependent reinforcement apparently facilitated the acquisition of a new work response, while a history of "free loading" apparently retarded the acquisition of a contingent response. While Mahoney and Bandura successfully established a self-reinforcing response in two pigeons, they did not study complete extinction of the response in either of the two birds. Rather, they continued to make other experi- mental manipulations. In addition, both of the birds were switched from a probability of punishment of 1.00 during training to a 0.00 probability during testing. The present study has several purposes. First, five different punish- ment probabilities were present during testing to study the effect of differing probabilities of punishment on the extinction of the key peck response. Second, a lenient and a stringent training criterion was set to study the effect of the severity of training on the extinction of the self-reinforcing response. Third, free food pretrain— ing was presented to study the effect of a response inde- pendent food history on the acquisition and extinction of the self-reinforcing response. Fourth, the effect of punishment training was examined by omitting the training procedure. Fifth, one group received a 0.75 probability of punishment during training to determine if the self- reinforced reSponse could be acquired with a probability of less than 1.00. Studies of this type may provide information analogous to human situations such as incon— sistency in parental training or learning to free-load in a welfare society. METHOD Subjects Thirty experimentally naive, white Carneaux pigeons were randomly assigned to ten groups of three pigeons each. The birds were maintained at approximately 80 percent of their free feeding weights, and were run at approximately 80 percent plus or minus 15 grams. The pigeons were housed in individual home cages, where they had free access to water and grit. Wing and tail feathers were clipped prior to the start of the study. Apparatus Two Lehigh Valley Electronics pigeon chambers were used with the left key covered throughout the study. The right key was illuminated by a green light during each trial, except during a peck during which a white light pulsed on. A house light was at the front of the chamber and a light was present in the food magazine. The food magazine was equipped with a photocell to monitor the bird's feeding behavior. A Grason-Stadler Model 9013 noise generator, set on white noise, provided a masking sound. Electro-mechanical programming equipment was located in an adjacent room. Procedure PretrainingéGroups I-V.--Following magazine train- ing the pigeons were shaped to peck the key, and reinforced for each of 30 key pecks. Reinforcement for each key peck is referred to as fixed ratio 1 (FRl). A peck raised the grain hopper from the lowered position. Timing for the 3.5 sec access to the mixed grain reinforcement began when the photocell beam in the food magazine had been broken. Ten sessions of FRl were then given during which a 30 sec blackout followed each reinforcement. During the blackout, all lights were off. Thirty and 50 FRl were given during each of the initial and final five training sessions, respectively. Training-Groups I-V.--Following the 10 sessions of FRI pretraining in which a peck lifted the grain hopper to the raised position, punishment for transgression training was started. During this phase the grain hopper was in the raised position at the start of each trial. If the pigeon pecked the key prior to placing its head into the food magazine, it received 3.5 sec access to the mixed grain. If the pigeon placed its head into the food maga- zine without pecking the key, thus transgressing, the grain hopper was dropped without allowing the pigeon access to the grain. The probability of punishment for transgression was 1.00. A 30 sec blackout followed each trial indepen- dently of access to the mixed grain. Each session terminated after 50 reinforcements. Training continued until the pigeon transgressed less than, or equal to, a mean of three times in two consecutive sessions. Thus if a pigeon transgressed four times in session x and two times in session x+l, this pigeon reached the criterion to end training. Testing-Groups I-V.--During testing, Group I was switched from a 1.00 probability of punishment for trans- gression to a 0.00 probability; Group II to a 0.25; Group III to 0.50; Group IV to 0.75; and Group V continued at 1.00 probability of punishment for transgression. All other parameters remained the same as in training. Test— ing continued for each pigeon until one of three conditions was met: failure to peck the key during a session, and thus obtaining 50 reinforcements through transgression; failure to peck the key more than once in two consecutive sessions; failure to meet either of the first two conditions after 40 sessions. The purpose of Groups I—V was to assess the effect of differing punishment probabilities during testing on the extinction of the key peck response. Pretraininngroups VI and VII.--Same as for Groups Training-Groups VI and VII.--Same as for Groups I-V, except that training continued until the pigeon transgressed zero times on two consecutive sessions. 10 Testing-Groups VI and VII.--During testing, Group VI was switched from a 1.00 probability of punishment for transgression to a 0.00 probability, Group VII from a 1.00 to a 0.75 probability. All other parameters were as presented for Groups I and IV, respectively. Groups VI and VII were run to assess the effect of differing punishment probabilities during testing on the extinction of pecking, and to assess the effect of a more stringent criterion during training than Groups I and IV on the extinction of the key peck. Pretraining-Group VIII.--Same as for Groups I-V. Training-Group VIII.--No punishment for trans- gression training was given, rather each pigeon received 15 sessions of 50 FRl reinforcements. Testing-Group VIII.--Same as for Group I. In particular a zero probability of punishment for transgres- sion was given. Group VIII was run to determine whether or not punishment for transgression training was needed to estab- lish the self-reinforcing key peck response. Pretraining-Group IX.--Same as for Groups I-V. Training-Group IX.—-Training for Group IX was identical to the testing procedure for Group IV. In particular, the probability of punishment for transgression was 0.75. ll TestingéGroup IX.--No testing was given to Group IX. Group IX was run to determine if the self-rein- forced key peck would be acquired at a punishment probability of less than 1.00. Pretraining-Group X.-—Following magazine training, the pigeons were given 10 sessions of 50 free 3.5 sec assess periods to the mixed grain. A 30 sec blackout followed each food presentation. Following the 10 free grain sessions, the pigeons were shaped and continued pretraining as described for Groups I-V. Training-Group X.--Same as for Groups I-V. Testing-Group X.--Same as for Group I. Group X was run to assess the effect of a history of non-contingent reinforcement on the acquisition and extinction of the key peck response. Table 1 summarizes the procedural conditions for each group during pretraining, training and testing. At the end of each session, the pigeons were returned to their home cages and fed to 80 percent of their free feeding weights. The following data were collected daily: the number of first pecks to the green key; the number of first pecks to the green key plus additional pecks to the green key; total peaks to the key; the number of reinforcements; the number of transgressions; the number of blackouts; the duration of the green key presentations; and the duration of the blackouts. 12 TABLE l.--Summary of the method parameters. Column one lists the numeral of the Groups. Column two lists the conditions during pretraining, column three during training, and column four during testing. TABLE 1 Group Pretraining Training Testing I Magazine training; 1.00 prob. 0.00 prob. shape; 10 days FRl 2 days of mean less than or equal to 3 II " " 0.25 prob. III " " 0.50 prob. IV " " 0.75 prob. V " “ 1.00 prob. VI " 1.00 prob. 0.00 prob. 2 days at 0 VII " " 0.75 prob. VIII " FRl for 15 0.00 prob. days; IX " 0.75 prob. ---- X lMagazine training; 1.00 prob. 0.00 prob. 10 days free food; shape; 10 days FRI 2 days of mean less than or equal to 3 RESULTS Percent Self-Reinforcement in Testing Figure 1 presents test data for each pigeon in Groups I, II, III and IV. Each data point represents the mean percent self-reinforcement for 100 trials. Percent self-reinforcement is obtained by multiplying the number of first pecks to the green key by two, for each session. Since each session was 50 trials, the mean percent self- reinforcement for 100 trials is the mean of a block of two sessions. Many of the statistical differences were obtained by using the Mann-Whitney U test. When a bird met a testing criterion in the study, it was no longer run, so that data were not collected for many situations where it was desir- able to show an effect. An assumption was adopted concern- ing the percent self-reinforcement. The assumption was that once a pigeon dropped below 40 percent self—reinforce- ment, it never exceeded 40 percent self-reinforcement again. This is supported by 18 birds and 2,178 bird data points. It is contradicted by one bird and one bird data point. Figure 1 shows the relationship between the probability of punishment for transgression during testing and the percent self-reinforcement during testing. The 14 15 Figure l.--The mean key peck responses in testing (left ordinate) and the mean percent self-reinforce- ment in testing (right ordinate), for Group I (panel A), Group II (panel B), Group III (panel C) and Group IV (panel D). Each data point represents the mean of 100 trials over a block of two consecutive sessions. The first number in the parentheses is the punishment probability in training, and the second number is the punish- ment probability in testing. 50.4 40- 30- 20.. I O- GROUP l (LOO, 0.00) 16 40.4 SOJ 20.; IO. GROUP III (I.O0,0.50 \\. \ .- MEAN KEY PECK RESPONSES IN TESTING on 0 20-1 IO. 0 A O 5 IO I I5 BLOCKS OF 2 SESSIONS IN TESTING FIGURE l. l 20 17 main information in Figure l is the initially high main- tenance level of the key peck response for the Group (IV) which had a 0.75 probability of punishment for transgres- sion during testing. This maintenance level was higher than that for Groups which had 0.00 (I), 0.25 (II), and 0.50 (III) probabilities of punishment during testing during many of the first 500 trials and then becomes indistinguishable from the other Groups. Specifically, the percent self-reinforcement of the 0.75 group was higher (U=0, p less than 0.05) than the 0.00 group during trials 100-500; than the 0.25 group during trials 100—200 and 400-500; than the 0.50 group during trials 200-500. No effects were found following the initial 500 trials. Thus a 0.75 probability of punishment for transgression was more effective in maintaining the key peck response than the lower probabilities during testing. The higher probabilities of punishment during test- ing of the 0.25 and the 0.50 group over the 0.00 group apparently had little effect on the key peck response during testing. At no time were the percent key peck responses in these groups statistically distinguishable from each other. The panels A and B of Figure 2 present test data for each pigeon in two groups which had identical pretrain- ing and the stringent training criterion, with a 0.00 (VI) and a 0.75 (VII) probability of punishment during testing. While there is no initial difference in percent 18 Figure 2.--The mean key peck responses in testing (left ordinate) and the mean percent self—reinforcement in testing (right ordinate) for Group VI (panel A), Group VII (panel B), Group VIII (panel C) and Group X (panel D). Each data point represents the mean of 100 trials over a block of two con- secutive sessions. For Groups VI and VII the first number in the parentheses is the punishment probability in training, and the second number is the punishment probability in testing. For Group VIII, the number in parentheses is the probability of punishment in testing. For Group X the second and third entries in the parentheses are the probabilities of punishment in training and testing, respectively. IN TESTING PE CK RESPONSES MEAN KEY 19 PERCENT 40.4 -80 30.1 A .60 20.. .403 I0. tact; 0 0:3 501 GROUP VII ([00. 0.7 ) . 00g 40. ‘ .80,_ 30. a .60; 20. -40 o 0‘2 50‘ GROUP vm (NO TRAINING,0.00) ”003% 4x -805 30.I 0 Leo% 2m -40 ”Lt-k -20 o. o 50. GROUP x (PRETRAIN.I.00,0.00) poo 4o. .80 30. 0 P602 20. .40 no. . -20 o o 5': (o :15 2'0 BLOCKS OF 2 SESSIONS IN TESTING FIGURE 2. 20 self-reinforcement during testing, the 0.75 probability group maintained a higher (U=0, p less than 0.05) level of self-reinforcement than the 0.00 probability group during the final 900 test trials. Again this suggests that a higher probability of punishment was more effective in maintaining the key peck response than a lower proba- bility of punishment. The top panels of Figures 1 and 2 present the percent self-reinforcement during testing for each pigeon in two groups which received 0.00 probability of punish- ment during testing, Groups I and VI. Both groups received identical pretraining and testing, differing only in the training criterion. Group I was trained to a criterion of a mean of three or less transgressions, and Group VI to a mean of zero transgressions, on two consec— utive days. There is an initial effect in that during trials 100-500 the group with the stringent training criterion showed a higher (U=0, p less than 0.05) percent self-reinforcement than the group with the lenient training criterion. Thus a more stringent training criterion was more effective in maintaining the selfereinforcement response than a lenient training criterion. After the first 500 trials the effect was no longer present. Panel D of Figure l and panel B of Figure 2 present the percent self-reinforcement during testing for each pigeon for two groups which received 0.75 probability of punishment during testing, Groups IV and VII. Both groups 21 received identical pretraining, training and testing condi- tions, except that Group IV was trained to a criterion of three or less transgressions, and Group VII to a criterion of zero transgressions, on two consecutive days. While there was initially no effect, the group with the stringent criterion showed a higher (U=O, p less than 0.05) percent of self-reinforcement during the second half of testing than the group with the lenient training. Group VII showed a higher percent self-reinforcement than Group IV during trials 900-1000, 1400-1600 and 1700-2000, a total of 600 trials. Group VIII, which received no punishment for transgression training, demonstrated that when punishment for transgression does not occur during training, self- reinforcement is reduced during testing. Panel C in Figure 2 presents the individual test data for each bird in Group VIII. Initial effects are seen when the percent self-reinforcement during testing for Group VIII is compared with Groups I and VI. Group I received 1.00 probability of punishment for transgression training until the pigeon transgressed a mean of three or less times, while Group VI received the more stringent criterion of zero times, on two consecutive days. Group I showed a higher (U=0, p less than 0.05) percent self-reinforcement than Group VIII dur- ing trials 0-200, and Group VI showed this same effect during trials 0-800. Thus punishment for transgression training facilitates the self-reinforced key peck response 22 in testing when compared to similar testing conditions with an absence of training. The more stringent training criterion further facilitated the maintenance of the key peck response in testing. Ten days each of 50 free reinforcements prior to shaping of the key peck response during pretraining resulted in an initial tendency to reduce the percent self-reinforcement in testing. Panel D of Figure 2 pre- sents individual test data for each pigeon in Group X, which received training and testing identical to Group I. The probability of punishment during training was 1.00 and during testing was 0.00. The only difference is that Group X received free food during pretraining. When com- paring Group X and Group I (panel A of Figure 1) no significant effects are seen, though there is an initial tendency for Group I to have a higher percent self— reinforcement. The free food pretraining group may also be compared with Groups I and VI concurrently. While Group VI differs from Group X in two ways (Group VI had no free food pretraining and had a more stringent training criterion), Group VI differs from Group I on a single dimension (the more stringent training criterion). Thus the effects of the free food pretraining can be singled out. Comparison of Groups I and VI showed an initial effect of a higher percent self-reinforcement for Group VI, apparently due to the more stringent training criterion. This effect was present for 400 trials at the start of testing. The stringent training group showed a higher 23 percent self-reinforcement than the free food group for 800 trials at the start of testing. Thus the free food pre— training further reduced in Group X (by an additional 400 trials) the effectiveness of the less stringent training criterion on the maintenance of the self-reinforcing response during the initial part of testing. Panels C and D of Figure 2 show the individual test data for the no punishment group (VIII) and the free food pretraining group (X). At no time are the groups statis- tically different, i.e. the data do not show a higher percent self-reinforcement for the group with the free food pretraining and the 1.00 probability of punishment for transgression training (X) than the group with no training at all (VIII). Thus it appears that the free food pretrain- ing weakens the effectiveness of the punishment for trans- gression on the maintenance of the key peck response in testing. Transgressions in Testing Figure 3 shows the transgressions during testing for Groups I, II, III, and IV. Each data point represents the mean number of transgressions for 100 trials. Since each session was 50 trials, each data point represents the mean number of transgressions for a block of two consecutive sessions. Groups which received a testing probability of punishment of 0.00 (I), 0.25 (II), 0.50 (III) and 0.75 (IV) are shown in the four panels of Figure 3, respectively. As the probability of punishment increased, more 24 Figure 3.--The mean total transgressions in testing for Group I (panel A), Group II (panel B), Group III (panel C) and Group IV (panel D). Each data point represents the mean total number of transgressions for 100 trials over a block of two consecutive sessions. The first number in the parentheses is the probability of punish— ment in training, and the second number is the punishment probability in testing. 50. 25- IN TESTING MEAN TRANSGRESSIONS 13 25... O '6» FIGURE 3. A y .. / GROUP l (IOO.O.OO) s/ GROUP ll (LOO. O25) c _ _____ ,. GROUP III (LOO. 050) '3 GROUP Iv (LOO, 075) ,5. 5 I0 I!» ab BLOCKS OF 2 SESSIONS IN TESTNG 26 transgressions were required to achieve the same level of self-reinforcement during testing. Thus at a punishment probability of 0.00, 50 transgressions indicated zero percent self-reinforcement, while at a punishment proba- bility of 0.50, 100 transgressions were required to reach the same zero percent self-reinforcement. This effect is shown in Figure 3. Group IV, which had a 0.75 punishment probability, transgressed more than Group III, which had a 0.50 punishment probability, to reach the same level of self-reinforcement. When comparing the transgressions for the 0.00, 0.25, 0.50 and 0.75 probability of punishment groups, it can be seen that the higher the punishment probability the greater the number of transgressions obtained. Figure 3 shows that the number of transgressions increased as the probability of punishment increased. Panel A of Figure 4 shows that even though each transgression was punished, the transgression response still occurred. Group V presented a unique situation, for each transgression was punished during testing, so that 100 percent self-reinforcement was required by definition. Thus a pigeon could transgress zero, 300, or more times during a session, however, since access to grain was given only after a key peck, the 50 reinforcements of the sessions would be obtained by pecking before placing the head into the magazine. When compared to the groups with the 0.00, 0.25, 0.50, and 0.75 probabilities of punishment for transgression in Figure 3, it is seen that transgressions 27 Figure 4.--The mean total transgressions in testing for Group V (panel A), Group VI (panel B), Group VII (panel C), Group VIII (panel D) and Group X (panel B). Each data point represents the mean total number of transgressions for 100 trials over a block of two consecutive sessions. The first number in the parentheses is the probability of punishment in training. The second number is the probability of punishment in testing. The single number in the parentheses for Group VIII is the probability of punishment in testing. 28 325.1 GROUP V (LOO, LOO) I00. 2 75. '— I3 I- 50.. E 25— g 0. 13" ‘ n I a, GROUP VIII.oo, a) 0.00) m o: 0 co 5 a: '— 2 GROUP X (IOO, 0.00) 5 Ib I5 do BLOCKS OF 2 SESSIONS IN TESTING FIGURE 4. . 29 frequently occurred more often in the group with the 1.00 probability of punishment (V) than in the groups with lower probabilities of punishment for transgressing. Transgressions during testing for Groups I and VI are presented in panel A of Figure 2 and panel B of Figure 4, respectively. Both groups received identical pretraining, 1.00 probability of punishment for trans- gression training, and 0.00 probability of punishment testing. The only difference between the groups was the more rigid training criterion of Group VI. Four hundred of the first 500 trials showed a higher (U=0, p less than 0.05) number of transgressions in Group I than Group VI. Thus a more stringent criterion helped to reduce trans- gressions in testing. In order to carry out the Mann-Whitney U test on transgression data, an assumption was made that once a bird that was tested at 0.00 or 0.25 probability of punishment transgressed more than 20 times, it never again transgressed less than 20 times. This is supported by 14 birds and 1,246 bird data points. No birds or bird data points contradict this assumption. Panels B and C of Figure 4 show transgressions during testing for Groups VI and VII, which received iden- tical pretraining and had a 1.00 probability of punishment for transgression training. Group VI was tested at a punishment probability of 0.00 and Group VII at a 0.75 probability. While both groups initially showed a low 30 level of transgression, Group VI pigeons transgressed in the second half of testing at a higher level than Group VII. Two of the three pigeons in Group VI reached a level of 50 transgressions, while the third pigeon exceeded a mean of 30 transgressions and never again went below this level. All of the pigeons in Group VII remained in test— ing for 40 sessions, and never exceeded 20 transgressions. Group VI transgressions were higher (U=0, p less than 0.05) than Group VII transgressions during the final 900 testing trials. Again, a higher probability of punishment for transgressing was more effective in maintaining the key peck in testing. Panel D of Figure 3 and panel C of Figure 4 show the transgressions during testing for Groups IV and VII. Both groups received identical pretraining, 1.00 probability of punishment for transgression training, and a 0.75 probability of punishment testing. Group IV was trained to a mean of three or less transgressions on two consecutive days, and Group VII was trained to a two successive day criterion of zero transgressions. While there was no initial difference, the lenient criterion birds began to transgress up to a 100 trial mean of 180, while the stringent criterion birds did not exceed a 100 trial mean of 20 transgressions. This further demonstrates the effectiveness of a more stringent criterion over a less stringent one in reducing transgressions in testing. 31 Panel D of Figure 4 shows the transgressions during testing for Group VIII, which received no punishment for transgression training. All three pigeons began testing at a mean level greater than 25 and reached 50 transgres- sions within 4 blocks of 100 trials. When compared to Group I and VI, which both had punishment for transgression training, it is seen that the punishment for transgression training initially reduced the number of transgressions in testing. The absence of punishment for transgression training resulted in a rapid increase in the number of transgressions in testing. Panel E of Figure 4 shows the tranSgressions during testing for Group X, which received free food pretraining. Comparing Group X with Groups I and VI (which had no free food pretraining) there was an initial tendency for the .pigeons with the free food pretraining to transgress more than the birds with no free food pretraining (U=0, p less than 0.05) during trials 100-200. This same effect was seen during trials 0-100 when comparing Groups X and VI. Panels D and E of Figure 4 show the transgressions during testing for Groups VIII and X, respectively. Group VIII had no punishment for transgression training, and Group X had punishment for transgression training preceded by free food exposure. Group X showed no differences from Group VIII during the start of testing. Therefore punish- ment for transgression training was reduced in effectiveness when preceded by free food. 32 Transgressions DuringTAcquisition Figure 5 shows the transgressions during acquisi- tion for Groups I, VI, IX and X. These groups are presented for the following reasons: Group I received identical pretraining and training as Groups II, III, IV and V. These groups showed no differences and a representative group is shown. Groups VI and VII received identical pre- training and training. Since no differences were observed between these two groups, Group VI is presented as repre- sentative. Group VIII received no training and is there- fore not shown. Group IX received 0.75 punishment for transgression training and Group X received free food pretraining. Since Groups IX and X are unique, they are both shown. Groups I and VI received identical pretraining, but were trained to differing criteria. Group I was trained toa mean of three or less transgressions on two consecu- tive days, and Group VI to zero transgressions on two consecutive days. No differences in the number of trans— gressions or days to training criteria were found resulting from the stringency of the training criterion difference. Group IX (0.75 punishment for transgression during training) data are presented in panel C, and may be compared to the transgressions during acquisition of Group I (1.00 probability of punishment for transgression). While the overall pattern found in Group I was a decrease in the number of transgressions, the pattern for Group IX showed 33 Figure 5.—-The mean total transgressions in training for Group I (panel A), Group VI (panel B), Group IX (panel C) and Group X (panel D). Each data point represents the mean total number of trans- gressions for 100 trials over a block of two consecutive sessions. The single number in the parentheses is the probability of punishment in training. The free food pretraining history of Group X is also in the parentheses. I 50.1 m. GROUP I (LOO) GROUP VI (LOO) IN TRAINING §P§ MEAN TRANSGRESSIONS .9 ,3 GROUP IX (0.75) 50. IOOJ GROUP X (FREE FOOD, LOO) I3 do BLOCKS OF 2 SESSIONS IN TRAINING FIGURE 5. 35 an increase in transgressions. To quantify this result, a number was obtained by subtracting the mean of the number of transgressions on the first three days of training from the mean of the number of transgressions on the last three days of training. A positive number indicates an increase in transgressions, and a negative number indicates a decrease. The larger the obtained number the larger the change. The transgression decrease in Group I is shown by the obtained numbersof -4.2, -8.0, and -l9.0. The transgression increase in Group IX is shown by obtained numbers of +76.l, +22.7 and +148.3. Thus a punishment probability of 0.75 during training was not as effective as the 1.00 probability in reducing transgressions. Group X (panel D) showed a pattern for transgressing during acquisition similar to Group I. The difference between the two groups was the presence of the free food pretraining for Group X. Apparently the exposure to the free food had little or no effect on transgressions during acquisition. Additional Pecks Additional key pecks are those key pecks emitted after the initial key peck in each trial. The upper panel in Figure 6 presents additional key pecks during the last five days of FRI pretraining and the first five days of punishment for transgression training for Groups I to V. 36 Figure 6.--Upper panel. The mean additional key pecks during the last 5 FRI sessions (open bar) and the mean additional key pecks during the first five training sessions (closed bar) for each bird in Groups I, II, III, IV and V. Lower Panel. The mean additional key pecks during the first five testing sessions (open bar) and the mean additional key pecks during the last five testing sessions (closed bar) for each bird in Groups I, II, III, IV and V. 37 I50 UMEAN ADDITIONAL PECKS DURING LAST 5 FRI SESSIONS I00 .MEAN ADDITIONAL PECKS DURING FIRST 5 TRAINING SESSIONS 5O _ u I . I I I :oMEAN ADDITIONAL PECKS DURING 0|-ll FIRST 5 TEST SESSIONS 1 IMEAN ADDITIONAL PECKS DURING LAST 5 TEST SESSIONS PI 3 S E I? MEAN ADDITIONAL PECKS TO THE GREEN KEY :g O . I II III w v FIGURE 6. GROUPS 38 These groups were chosen because they had identical pre- training and training. The other groups were excluded so as not to confound this result with procedural differences. The open bars represent the mean number of additional pecks during the last five days of FRl pretraining, and the closed bars represent the mean number of additional key pecks during the first five days of punishment for trans- gression training. An increase (U=52, p less than 0.05) was seen in the number of additional pecks during the start of training. This increase occurred for 11 of the 15 birds. Thus more additional key pecks occurred at the start of training than at the end of pretraining. The lower panel of Figure 6 shows additional key pecks during the first and the last five days of testing. The open bars represent the mean number of additional key pecks during the first five days of testing and the closed bars the mean of the additional key pecks during the last five days of testing. The data indicate a decrease (U=57, p less than 0.05) in additional key pecks during testing. Fourteen of the 15 birds showed this decrease in the number of additional key pecks from the start to the end of testing. Training_CriterionlDays versus Testing Criterion Days The distribution of the number of training days versus the number of testing days for Groups I, II, III, IV, VI, VII and X was compared. All of these groups had no 39 limit on the number of training days and a 40 day upper limit on the number of testing days. For each bird in the groups, the number of training days and the number of testing days determined the point for that bird. The correlation between the two deteminates of each point is .102 (p greater than .10). This suggests that there was no overall relationship between the number of days to training criterion and the number of days to the testing criterion, regardless of the training criterion or the probability of punishment during testing. Responses during the Blackout Table 2 presents the high and the low mean number of responses during the blackout during testing and train- ing for the groups that had 1.00 punishment for transgres- sion training to a criterion of a mean of 3 or less tranSgressions on two consecutive days (Groups I, II, III, IV, V and X). In addition the number of days to testing and training criteria are paired with each other for the low and high numbers of blackout responses. The general pattern is that the lower the responses during the blackout the shorter the number of days to the respective criteria (r=.518, p less than 0.05). The pigeons that frequently pecked the key during the blackout, generally took longer to meet training and testing criteria than the pigeons that pecked infrequently during the blackouts. 40 TABLE 2.--The mean number of responses during the blackouts. Data is presented for Groups I, II, III, IV, V and X. For each group the low and high mean number of reSponses during the blackouts are given for both training and testing. Paired with the mean number of responses during the blackouts are the number of sessions needed by the pigeon to meet the criterion of the phase from which the number was taken. 41 TABLE 2 Training Testing Low High Low High 0.75 6.8 0.8 162.9 RBO Group I 12 22 19 40 Days 0.8 2.1 0 3.3 RBO Group II 13 18 6 40 Days 0 0.9 0.5 0.4 RBO Group III 5 20 40 19 Days 0 0.6 0.1 157.2 RBO Group IV 8 21 40 40 Days 0 0.3 -- -- RBO Group V 11 6 -- -- Days 0 93.7 0 421.1 RBO Group X 2 25 2 40 Days DISCUSSION In his discussion of self-reinforcement and behavior, Skinner (1953) notes the somewhat dubious status of the self-reinforced response. The organism can at any time obtain the reinforcer without first emitting the particular response. Skinner states that: "The ultimate question is whether the consequence has any strengthening effect upon the behavior which precedes it.” The results of the present study suggest that it does not; i.e. access to the freely available grain does not strengthen the key peck response in this paradigm. Groups I, VI, VIII and X were similar in that trans- gressions had a 0.00 probability of punishment during testing. Thus the two requirements for self—reinforcement were present: the grain was freely available and the organism could eat the grain independently of pecking. Panel A of Figure 1 and panels A, C and D of Figure 2 show the test data for these groups. Clearly, the data closely resemble extinction curves. If the key peck response were being strengthened by grain access, pecking should be maintained at approximately 100 percent self- reinforcement. Both Hull (1943) and Spence (1937) state 42 43 that the reinforcement of a response increases its strength. The observed extinction of the key peck response precludes an interpretation of increased key peck response strength. The response which is reinforced in the testing phase is that of transgressing, i.e. inserting the head into the magazine without pecking first. Transgressions during testing for Groups I, VI, VIII and X are shown in panel A of Figure 3 and panels B, D and E of Figure 4. The transgression data are acquisition curves. Thus approaching the magazine to eat without first pecking is being reinforced. Premack (1962, 1965, 1971) discusses the relation- ships between responses and their probabilities. In this framework, a response of low occurrence can be increased in probability by making a high probability response contingent on the low occurrence response. Thus, pressing a lever (low probability response) can be increased in occurrence if eating food (high probability response) is made contingent upon pressing the lever. Furthermore, the relationship is reversible if the probabilities have been altered. Thus, eating food (if it has been made a low probability response) can be increased in occurrence if pressing a lever (now a high probability response) is made contingent upon eating the food. The present study also involves the relationship between two responses: pecking the key before placing the head into the food magazine (pecking), and placing the head 44 into the food magazine without first pecking (transgressing). Initially, transgressing is a more probable response than pecking, however, punishment for transgression training alters these probabilities. The decrease in transgression probability, when punishment has a 1.00 probability, is shown in Figure 5. Panels A, B, and D show the extinction of the transgression response. The key peck data are not shown, and are at 50 peeks per session, for each bird. The contingency permits no other result. When punishment for transgression training is discontinued, the two responses return to their initial probability relationship. This is shown by the extinction of the self-reinforcing key peck and the reacquisition of the transgression response as described above and shown in Figures 1, 2, 3 and 4. The reacquisition of transgressing, and the extinc- tion of the key peck response can be impeded by several processes. The first is the continued punishment of trans- gressions during testing. The pecking response data in panels B, C and D of Figure l and the transgression data in panels B, C and D of Figure 2 suggest that a high probability of punishment is necessary to slow down the return of the probabilities to their initial state. A probability of 0.75 was more effective for several hundred trials than the lower probabilities of 0.00, 0.25 and 0.50. This same effect is shown in panels A and B of Figure 3 and panels B and C of Figure 4. 45 The second procedure for retarding the change of the altered probabilities to their original state is to increase the severity of the training criterion. Groups VI and I, and VII and IV differed only in the severity of the training criterion. Group VI was trained to a criterion of two consecutive days of zero tranSgressions, while Group I was trained to two consecutive days of a mean of three or less transgressions. Group VI continued to show a higher level of pecks to the key for the initial 500 testing trials than Group I. This effect occurs during the initial part of testing, where the 0.00 probability of punishment had little maintenance effect on the altered response probabilities. Group VII and IV also had identical conditions, save the training criterion of two consecutive days of zero transgressions for Group VII and two consecutive days of a mean of three or less transgressions for Group IV. Group VII maintained a higher level of key peck responses for the final 900 trials than Group IV. This effect occurred during the later part of testing, when the influence of the 0.75 probability of punishment during testing had diminished. It appears that a probability of 1.00 is necessary during training, at least to reach the more stringent criterion. All of the pigeons in all of the groups that had training, except Group IX, had a 1.00 training probability of punishment. During training, all of the pigeons reached 46 the training criteria of their particular groups. Group IX was trained under a 0.75 probability of punishment for transgression and the results are shown in panel C of Figure 5. While two of the three pigeons were low trans— gressers initially, the data curves resemble acquisition curves rather than the extinction curves of panels A, B and D. The birds that started at a low transgression level performed well enough to meet the training criterion of a mean of three or less transgressions on two consecu- tive days, but not well enough to meet the criterion of zero transgressions on two consecutive days. This suggests that a probability of punishment of 1.00 is more effective in altering the response probabilities than a punishment probability of 0.75. Group VIII received no punishment for transgression training, but rather 15 extra days of FRl training. Panel C of Figure 2 and panel D of Figure 4 show the key peck and transgression data during testing for this group. Group VIII showed a more rapid return to the initial response probabilities than Groups I and VI which received punishment for transgression training and were also tested at a 0.00 probability of punishment. This further demon- strates that the self-reinforced key peck is not strengthened by the reinforcement which follows it. Since all three pigeons started testing at a percent self-reinforcement of greater than zero, if the pecking response were being strengthened it would show an increase rather than a decrease in occurrence. 47 The free food pretraining prior to acquisition and testing reduced the effectiveness of the altered proba- bilities in testing, but did not influence the actual altering of the probabilities during training. The group with pretraining (Group X) showed the extinction of the transgression response and the acquisition of the key peck response in training as well as any of the other groups. By comparing panel D of Figure 5 with panels A and B of the same Figure, it can be seen that the transgression response followed the same extinction pattern as the other groups. Again, since every reinforcement was obtained by pecking the key, the key peck data (not shown) are at 50 peeks per session. During testing, the probabilities were quicker to return to their initial state for two of the three pigeons of Group X than for Group I, which differed only in pretraining. The third pigeon continued to peek the key throughout testing. While no explanation is offered for this, it will be related to the responses during the blackouts at a later time. Additional key peeks to the green illuminated key showed two systematic variations during the course of the study. The first change was an increase in additional key peeks during the first five days of punishment for trans- gression training over the level shown during the last five days of FRI pretraining, for 11 of the 15 birds. While the cause of the increase has not been examined, it is quite possible that the uncertainty of the new training situation 48 was related to the increase. During FRl pretraining a key peck to the green key resulted in the raising of the food hopper and access to the reinforcer. During training, however, the hopper was in the raised position at the start of the trial and a choice had to be made: to peek the key or to approach the hopper. Initially the pigeon had to discriminate what was occurring in the new situation. Perhaps an added degree of certainty was achieved by peck- ing the key additional times. The second change involved additional key peeks to the green key during the first five and during the last five days of testing. The high level of peeking at the start of testing may have been a carry over from the train— ing phase. During training the key peck was essential, for reinforcement could not be obtained without first pecking. The importance of the peck changed for Groups I, II, III and IV during testing, for it was then possible to obtain reinforcement in the absence of pecking. In addition, all groups tended to transgress more at the end of testing than at its start (t=3.406, df=l4, p less than 0.05). As the transgression response gained in strength, the two responses shifted toward their initial probability relationship, and pecking in general decreased. This general decrease in pecking is reflected in the decrease in additional key peeks. Group V stands out, for 100 percent self-reinforce- ment was required by definition, so there could be no decrease in first peeks. The transgression increase was, 49 however, present, and might account for the decrease in additional peeks in this group. Table 2 contains high and low key peck responses during blackouts for each group, for both training and testing separately. Generally, birds that peeked less during the blackouts reached criteria faster than birds that peeked more during the blackouts, for each group. This is true for both training and testing criteria. While no attempt was made to examine this variable, a possible explanation is as follows: Birds that peeked during the blackouts were peeking when the trials were not in effect. Thus blackout peeks were not really related to what was required during the trials, and were therefore not under the control of the conditions prevailing during the trials. The birds that peeked more during the blackouts may have partly "missed the point" about what was going on during the trials. It is not surprising that these birds took longer to meet any criteria that was based upon pecking during trials. Elicitation Interpretation Elicitation theory (Denny, 1971) nicely accounts for many of the results of the present study, which a traditional reinforcement theory fails to handle. The analysis will be presented in the order in which the pigeon experienced the conditions of the experiment. During magazine training, the food deprived pigeons were placed in the experimental chamber in which a grain 50 filled food hopper was in the raised positon. Initially the grain elicited an eating response in the pigeon. As the magazine was contiguous to the food (both temporally and physically), the magazine also came to elicit an approach response. During the course of magazine training and key peck shaping, the click of the magazine came to elicit approach to the magazine and the illuminated key came to elicit approach to the key. The situation was as follows: The illuminated key elicited approach to the key (peck), which was followed by the click of the food hopper (as it is raised), which elicited approach to the magazine, which elicited approach to the grain supply, which elicited an eating response. This sequence continued throughout pretraining. During training, both the raised magazine and the illuminated key were present at the start of a trial. Two competing responses were present. The magazine elicited approach, but so did the illuminated key. If the elicit— ing value of the magazine was greater than that of the illuminated key, the pigeon approached the magazine. If the reverse were true the pigeon approached the key. During training, approach to the magazine was punished but approach to the key was followed by grain access. Since the only way to get to the grain was through pecking the key, the eliciting value of the key increased to the point at which the pigeon approached the key rather than the food magazine. 51 During training, a punishment probability for approach to the magazine of 1.00 was required in order to extinguish the magazine approach response and to increase the eliciting value of the key enough so that the pigeon approached the key rather than the magazine. Figure 5 shows extinction of the magazine approach response. The 0.75 probability of punishment for the magazine approach response continued to provide access to the grain for approaching the magazine, though on a partial schedule. The eliciting value of the key was not increased suffi— ciently to make the key peck more probable than the magazine approach. During testing, the punishment for approach to the magazine was removed or reduced. When the punishment was removed the pigeon either approached the magazine or peeked the key in order to have grain access. When the punishment was merely reduced the pigeon approached the magazine and was punished at the probability assigned to its groups, or peeked the key and then had access to the grain. For the group that had punishment for the magazine response removed, the key peek response extinguished (panel A, Figure l and panels A, C and D of Figure 2). This extinction occurred more rapidly for the group trained to the lenient criterion (magazine approach still occurred) than for the stringent criterion (magazine approach no longer occurred). This is shown in panel A of Figure I compared to panel A of Figure 2, and panel D of Figure 4 52 compared to panel B of Figure 2. The probable reason for this was that the birds with the stringent criterion approached the magazine less at the start of testing than the lenient criterion group. Since the response of approach to the magazine was now followed by grain, the magazine again began to elicit the approach response. As the maga- zine continued to increasingly elicit the approach response and the response was followed by grain, the key began to be a less effective elicitor and the key peck gradually extin- guished. Thus the eliciting value of the magazine and the key returned to their initial levels as did the probabili— ties of the responses they elicited. The greater the probability of punishment for approach to the magazine during testing the slower the return of the eliciting value of the magazine to its original level. This is seen when comparing panel D of Figure l with panel A of Figure l, and when comparing panel B of Figure 2 with panel A of Figure 2. The continued punishment lessened the eliciting value of the food maga- zine so that the key continued to elicit pecking. The group with no punishment training peeked the key very little in testing, because the magazine maintained the high eliciting value it originally held and the key had a low eliciting value for an approach response. This can be seen in panel D of Figure 4. Also, the group with the history of free feeding had more experience with the eliciting of the magazine followed by grain, and as 53 expected, the magazine increased in eliciting value during testing faster than for the groups without the free food history (panel B of Figure 4). This elicitation analysis handles the data and does not leave the mystery of why the key peck response was not "reinforced" by the freely available grain. The results can be explained in terms of increasing and decreasing eliciting values of stimuli, as these stimuli change in value due to their relationship to obtaining access to the grain. Human Implications An important queston that can be asked of the present results is the relationship to the data from human studies of self-reinforcement. As in the present study, Kanfer, Bradley and Marston (1962) and Kanfer and Marston (1963b) have found that a more stringent training criterion results in a higher number of self-reinforced responses than a less stringent one. In addition, Kanfer and Duer- feldt (1967a) found that a subject's pretraining can affect the administration of self-reinforcement, with a lenient history reducing the performance output before taking the reward. As mentioned in the introduction, Bandura (1969, 1971) and Kanfer (1970) conclude from the literature that self-administered reinforcers maintained behavior as well as, or better than, experimenter controlled reinforcers. None of the human studies, however, reported a total extinction of a self-reinforced response as in the pigeon 54 data, with a complete abandonment to transgression, cheating, etc. Possibly, the reason for this is the presence of "moral development" in humans. In his dis- cussion of moral development, Kohlberg (1963a, 1963b, 1964) distinguishes between six types of moral judgement, existing in three progressive levels: pre-moral, morality of conventional role conformity, and morality of self- accepted moral principles. Kohlberg's two types of moral judgement at the pre-moral level are: (l) punishment and obedience orien- tation and (2) naive instrumental hedonism. Both of these types are most likely appropriately applied to the behavior of pigeons. However, the most sephisticated levels, such as authority maintaining morality and morality of individual rights lie outside of the realm of the food—deprived pigeon within an experimental chamber. As the pre-moral stage is present in young children, perhaps the use of animal analogue studies of this type is warranted. It is the young child that is being intensively socialized and is likely to be exposed to straightforward contingencies regarding the appropriateness of certain behaviors. A knowledge of the effects of punishment on behavior would therefore be valuable, as would a knowledge of the principles of self-reinforcement. While too lengthy to discuss here, the literature regarding the moral develop- ment of children is reviewed by Hoffman (1970). It is noteworthy that punishment was ineffective in preventing 55 transgressions, unless continued beyond a stringent criterion point. This argues strongly in favor of the behavior modification cry of building positive behavior rather than punishing inappropriate behavior. While the present study suggests that for pigeons the key peck response was not reinforced by a noncontingent reward, the same might not be true in humans. Hinde and Stevenson- Hinde (1973) argue that there is little reason to believe that the specifics of learning can be completely generalized across species, let alone phyla. Also, while the key peck response in the pigeon was reinforced by the specific stimulus of grain, in humans the generalized reinforcer of social approval is also likely to accompany any specific reinforcing stimulus. Both the present study and Engberg, et_§l. (1972) suggest that an early history of noncontingent food affects future performance. Engberg et_al. showed that a history of free food increased the number of trials to learn a work task, and the present study demonstrated that a free food history is related to a faster increase in trans- gressions and a concomitant decrease in the key peck response. The generalizations that might be made to human behavior are, of course, quite limited. However, both studies do indicate that an early exposure to the relation— ship of a work task and reward does facilitate work behavior. 56 Free Food Studies Neuringer (1969, 1970) found that pigeons will peck a key for grain even though identical grain is freely present in the chamber. This finding has been replicated by Alferink, Crossman and Cheeney (1973) and Sawisch and Denny (1973). Neuringer interprets this result to mean that: "Responding for food . . . appears to be a natural part of the behavior of animals and does not necessarily depend upon any prior motivating operation." The results of the present study, however, do not support Neuringer's conclusion. For while some birds do continue to peek for freely available grain, others quickly extinguish the key peck response. If it were a natural part of the behavior of the pigeon to peek for food, the birds without punish- ment for transgression training should probably peck as much as birds with training. This however is not the case. Neuringer's results are apparently limited to a particular set of conditions, and his conclusion of the naturalness of pecking for food may be in need of qualification. A major difference between the Neuringer study and the present study involves the consequences of the key peck. Unlike the Neuringer study, a key peck in the present study did not actually result in food presentation. The key peck may need to produce food (or an environmental cue change) in order to be acquired or maintained. In conclusion, the present study suggests that the self-reinforcing response in the pigeon does not behave as 57 if it were a typical operant response. It appears that the freely available reward following the response does not strengthen the response. Rather, the punishment for transgression results in an alteration of the initial probabilities of the two responses of pecking and approach- ing the magazine. An alternative way to look at this is that the eliciting value of the magazine and the key are altered. In the absence of punishment for transgression, the two responses return to their initial probability relationship. Contiguity between a response and a reinforcer is apparently not sufficient to maintain a key peck response, rather a contingency may be required. Increased training stringency and continued punishment for transgressing can impede the return of the responses to their former probabilities. Generalizations from this type of study to human situations must be quite guarded due to the presence of internalized development of moral judgement in humans, and due to the species differences. The results do not support the notion of a natural tendency to respond for food. LI ST OF REFERENCES LIST OF REFERENCES Alferink, L. A., Crossman, E. K. and Cheeney, C. D. Control of responding by a conditioned reinforcer in the presence of free food. Animal Learning and Behavior, 1973, 1, 38-40. Bandura, A. Principles of Behavior Modification. New York: Holt, Rinehart and Winston, Inc., 1969. Bandura, A. Vicarious and self-reinforcement processes. In R. Glaser, (Ed.), The Nature of Reinforcement. New York: Academic Press, 1971, pp. 228-278. Bandura, A. and Kupers, C. J. Transmission of patterns of self-reinforcement through modeling. Journal of Abnormal and Social Psychology, 1964, 69, 1-9. Bandura, A. and Perloff, B. Relative efficacy of self- reinforced and externally imposed reinforcement systems. Journal of Personality and Social Psychology. I967, 7, 111-116. Brown, P. L. and Jenkins, H. M. Auto-shaping of pigeon's key peck. Journal of the Experimental Analysis of Behavior, 1968, 11, 1-8. Denny, M. R. A theory of experimental extinction and its relation to a general theory. In H. H. Kendler and J. T. Spence (Eds.) Essays in neobehaviorism: A memorial_volume to Kenneth W. Spence. New York: Appleton-Century-Crofts, 1971, pp. 43¥67. Engberg, L. A., Hansen, G., Welker, R. L., and Thomas, D. R. Acquisition of key-pecking via auto-shaping as a function of prior experience: Learned Laziness? Science, 1972, 178, 1002-1004. Hinde, R. A. and Hinde, J. Constraints on Learning. London: Academic Press, 1973. Hoffman, M. L. Moral Development. In P. H. Mussen (Ed.), Carmichael's Manual 9f Child Psycholo y. New York: John Wiley and Sons, Inc., 1970, pp. 1—359. 58 Hull, C. Kanfer, Kanfer, Kanfer, Kanfer, Kanfer, 59 L. Principles of behavior. New York: Appleton- Century-Crofts, 1943, p. 71. F. H. Self-regulation: Research, issues and speculations. In C. Neuringer and S. L. Michaels (Eds.), Behavior Modification in Clinical Psycholoqy. New York: Appleton-Century-Crofts, 1970, pp. 178-220. F. H., Bradley, M. M. and Marston, A. R. Self- regulation as a function of degree of learning. Psychological Reports, 1962, 10, 885-886. F. H. and Duerfeldt, P. H. The effects of pre- training on self-evaluation and self-reinforcement. Journal of Personality and Social Psychology, Iggjl 7: 164‘167- Ta) F. H. and Marston, A. R. Determinants of self- reinforcement in human learning. Journal of Experimental Psychology, 1963, 66, 245-254. (a) F. H. and Marston, A. R. Conditioning of self- reinforcing responses: An analogue to self- confidence training. Psychological Reports, 1963, 13, 63-70. (b) Kohlberg, L. Moral deveIOpment and identification. In H. W. Stevenson (Ed.), Child Psychology. 62nd Yearbook of the National Society for the Study of Education. Chicago: University of Chicago Press, 1963. (a) Kohlberg, L. The deveIOpment of children's orientations toward a moral order: I. Sequence in the deveIOp- ment of moral thought. Vita Hum., 1963, 6, 11-33. (h) ‘— Kohlberg, L. Development of moral character and moral Mahoney, ideology. In M. L. Hoffman and L. W. Hoffman (Eds.), Review of Child Development Research, Vol. I. New York: Russel Sage Foundation, 1964, pp. 383-432. M. J. and Bandura, A. Self-reinforcement in Pigeons. Learning and Motivation, 1972, 3, 293-303. Neuringer, A. J. Animals respond for food in the presence of free food. Science, 1969, 166, 399-401. Neuringer, A. J. Many responses per food reward with free food present. Science, 1970, 169, 503-504. 60 Premack, D. Reversibility of the reinforcement relation. Science, 1962, 136, 255-257. Premack, D. Reinforcement theory. In D. Levine (Ed.), Nebraska symposium on motivation. Lincoln: University of Nebraska Press, 1965, pp. 123-180. Premack, D. Catching up with common sense or two sides of generalization: Reinforcement and punishment. In R. Glaser (Ed.), The Nature of Reinforcement. New York: Academic Press, 1971, pp. 121-150. Sawisch, L. P. and Denny, M. R. Reversing the reinforce- ment contingencies of eating and key pecking behaviors. Animal Learning and Behavior, 1973, 1, 189-192. Skinner, B. F. Science and Human Behavior. New York: The Free Press, 1953. Spence, K. W. Experimental studies of learning and the higher mental processes in infra-human primates. Psychological Bulletin, 1937, 34, 806-850. "IIIIIIIIIIIII‘IIIIIIIIIs