AN INVESTEGMWN 65 EXIMINER INFLUENQE 0R WEEKSEER ENTELLEGENCE SCALE 50R CHILD-BER SCORES, Ykosts foo Hm Degree cg: D“. D. MECHEGAR SM’EE UNEVERSII‘Y WiEkiam Harvey Gikiingham i970 THES‘S This is to certify that the thesis entitled An Investigation of Examiner Influence on Wechsler Intelligence Scale for Children Scores presented by William Harvey Gillingham has been accepted towards fulfillment of the requirements for m degree in_C.Qu.D.§_¢_Ll_Dg , Personnel Services and Educational PsychOIOgy we z. 0W; Major professo Date l/l 9/7 0 M W?" Ell-9W RWV We” .a‘ I) tb'rsfli‘l 00/5; 1 A /., .7! I I» _- ABSTRACT AN INVESTIGATION OF EXAMINER INFLUENCE 0N WECHSLER INTELLIGENCE SCALE FOR CHILDREN SCORES BY William Harvey Gillingham A. The Problem Discussion of the possibility of observer influence has recently become controversial largely as a result of the work of Robert Rosenthal (1966; 1967a). Observer influence has been examined by studying different types of observers such as inter- viewers, research experimenters, teachers, and projective and intelli- gence test examiners. This thesis was a study of examiner influence in that it looked at the person who administered the Wechsler Intelli- gence Scale for Children (WISC) as a possible source of undesired variation of intelligence test scores. Be “Sign The sample consisted of four male and four female "180 exami- ners who had just completed VISC training. Each examiner tested eight junior high school students (four boys and four girls) who were randomly selected from a population of students who were "average” in intelligence (that is, they scored from ninety to one hundred-ten on the California 'rest of Mental Maturity-Short Form, 1963 Revision). William Harvey Gillingham Eight examinees were randomly assigned to each examiner and then they were randomly designated 'above average“ or “below average“ so that each examiner tested two "above average'I boys, two "below average“ boys, two Iabove average“ girls, and two 'below average' girls. The order in which an examiner tested his examinees was also randomized. Parental approval for testing was secured and confidentiality was assured. Neither parents nor examiners were told that they were per- ticipating in a study or an experiment. All WISC's were administered in private offices during a one week period. A rating sheet was used by the grand experimenter to transmit an expectancy condition to examiners. On each examinee's rating sheet a discrepancy between California Test of Mental Maturity-Short Form score and school achievement was fictitiously indicated, and a 'pre- dicted WISC score" was advanced. Analysis of the data was by a mixed model, four-way analysis of variance having three fixed variables (expectancy, sex of examines, and sex of examiner), and two random variables (examiners nested in sex of examiner, and replications nested in all other variables). The five per cent level of confidence was arbitrarily chosen for signifi- cance tests. 0. Analysis of Results Both the sex of examiner and sex of examines effects were sig- nificant (p(.05). Female examiners obtained higher mean WISC scores and male examinees achieved higher mean WISC scores. The expectancy effect was not statistically significant. It was feared that "130 examiners did not retain the expectancy condition William Harvey Gillingham given to them by the grand experimenter and, therefore, examiners' expectancy was neither transmitted to nor received by examinees. It was also felt that the W180 was a structured and factual experimental task and that the W150 examiners were relatively well-trained and experienced. Neither the examiner effect nor any of the interaction effects was significant. The results of this study pointed to the conclusion that exami- ner influence was not a great problem in intelligence testing and that observer influence was difficult to demonstrate when experienced observers administered structured tasks. AN INVESTIGATION OF EXAMINER INFLUENCE ON WECHSLER INTELLIGENCE SCALE FOR CHILDREN SCORES By William.Harvey Gillingham A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services and Educational Psychology 1970 G é « 7 37,2.» 7- /.,7 0 ACKNOWLEDGEMENTS The writer is especially indebted to the late Buford Stefflre under whose direction this thesis originated. Dr. Stefflre's impact on the writer's life is continually felt. Special gratitude is also felt for Dr. Robert Craig who kindly and patiently assumed an advisory role, and to other committee members, Dr. Donald Grumman and Dr. Fred Vescolani. The writer would also like to acknowledge his wife, Judy, for her support, understanding, and unselfishness throughout his doctoral program. ii TABLE OF CONTENTS CHAPTER I 0 THE PROBLEM O O O O O O O O O O O 0 Need for the Study . Purpose of the Study . . . . . . Hypotheses . . . . . . . . . . . Definition of Terms . . . . . . . Overview of the Study . . . . . . II. REVIEW OF RELATED RESEARCH . . . . Examiner Influence . . . . . . . Experimenter Influence . . . . . Discussion of PreVious Research . III. DESIGN . . . . . . . . . . . . . . The Sample . . . . . . . . . . . Instrumentation . . . . . . . . . Procedure . . . . . . . . . . . . Design and Analysis . . . . . . . Summary . . . . . . . . . . . . . IV. ANALYSIS OF RESULTS . . . . . . . . Hypotheses and Results . . . . . Discussion of Results . . . . . . Summary . . . . . . . . . . . . . iii PAGE (DODKJOka 11 23 26 26 27 51 55 37 57 #2 #9 CHAPTER V. SUMMARY, CONCLUSIONS, AND IMPLICATIONS FURTHER RESEARCH . . . . . . . . . . Summary .............. Conclusions ............ Implications for Further Research . BI BLI OGRAPHY O O O O O O O O O O O O O O I 0 FOR APPENDIX A. TABLE 6. CTMM MEANS AND STANDARD DEVIATIONS FOR ALL SUBGROUPS . . . . APPENDIX Be Rating Sheet e e e e e e e e e APPENDIX C. TABLE 7. WISC TOTAL SCORES . . iv PAGE 52 52 59 62 6h 71 75 74 LIST OF TABLES TABLE 1. SOURCES OF VARIATION AND DEGREES OF FREEDOM . . . . . . . 2. SUMMARY OF MEAN TOTAL NISC SCORES, STANDARD DEVIATIONS, AND F-RATIOS FOR MAIN EFFECTS . . . . . . . . . . . . . 3. SUMMARY OF MEAN TOTAL VISC SCORES, STANDARD DEVIATIONS, AND F-RATIOS FOR TWO-WAY INTERACTION EFFECTS . . . . . 4. SUMMARY OF MEAN TOTAL NISC SCORES, STANDARD DEVIATIONS, AND F-RATIO FOR THREE-WAY INTERACTION EFFECT . . . . . 5. SUMMARY or mus, STANDARD DEVIATIONS, AND r-m'rms FOR SIGNIFICANT EFFECTS O O O O O O O 0' O O I O O 0 O O 6. APPENDIX A. CTMM MEANS AND STANDARD DEVIATIONS PORALLSUBGROI’PSOOOOOOeOOOOOOeeeee0 7. APPENDIX C. WISC TOTAL SCORES . . . . . . . . . . . . . PAGE 54 58 42 71 7h LI ST OF FIGURES FI GU RE PAGE 1 O R...‘r°h D0318!) O O O O O O O O O O O O O O O O I O O O O 52 vi CHAPTER I THE PROBLEM In this chapter the need for the study is developed and then the purpose of the study is specifically stated. The research hypo- theses are stated and certain terms are defined. The chapter is con- cluded with an overview of the study. A. Need for the Study The possibility that observers may influence what they observe or may be influenced in how they record what they observe has been discussed for years (Kintz, Delprato, Mettee, Persons, &.Schapps, 1965). Discussion of the possibility of observer influence has recently become controversial largely as a result of the work of Robert Rosenthal (1966, 1967a). Observer influence has been examined by studying different types of observers. Thus, Rosenthal (1966, 1967a) studied the research experimenter to determine the existence of I'experimsnter influence.” Rosenthal and Jacobson (1966, 1968) examined the possibility of "teacher influence'I by attempting to see if teachers influenced pupils' scores on intelligence tests. Socio- logists (Suchman, 1962; and Wilkie, 1965) looked at the sociological interviewer as an observer to determine the possibility of an "inter- viewer influence” effect. The possibility of "examiner influence“ has been checked by studying the examiners of individual intelligence 1 2 tests (Cieutat, 1965; Cohen; and Larrabes and Kleinsasser, 1967), and by studying the examiners of projective tests (Simmons and Christy, 1962; Turner and Coleman, 1962; and Rosenthal, 1965b). Rosenthal (1966; 1967a) is convinced that observer influence (particularly experimenter influence and teacher influence) exists. He claims to have repeatedly demonstrated experimenter influence in experimental studies. His data and their interpretation have been sharply criticized (Ingraham and Harrington, 1966, 1967; Barber and Silver, 1968a, 1968b; Thorndike, 1968; Barber, Forgione, Chaves, Calvsrlsy, McPeake, & Bowen, 1969; and Claiborn, 1969). Some critics concede the possibility of experimenter influence but question the pervasiveness of observer influence in more structured situations such as individual intelligence testing. Other criticisms center around shortcomings in Rossnthal's research designs and his statisti- cal treatment of data. The existence of observer influence needs to be examined in other than experimenter influence studies. An area of concern for the writer has been the use of individual intelligence tests to make important and sometimes irreversible decisions about people. If this practice continues it seems imperative that the possibility of examiner influence be investigated. Furthermore, future studies of observer influence of all kinds need to be statistically well-designed so that they do not meet with the same criticisms leveled at Rosenthal. In short, there seems to be a need for a well-designed study to investigate the existence of examiner influence on individual intelli- gence test scores. B. Purpose of the Study The purpose of this study was to provide a well-designed field testing of experimental research results. Specifically, this study looked at the person who administered the Wechsler Intelligence Scale for Children as a possible source of undesired variation of intelli- gence test scores. The research design also provided for an exami- nation of variables that had been rarely isolated in previous research. In this design, test examiners were given both expectancy conditions in random order; that is, examiners were led to believe that half of their examinees were “above average" in intelligence and half were "below average" in intelligence. The sex of examiner and sex of examines variables were crossed so that male examiners tested male and female examinees and female examiners tested male and female examinees. The examinees were junior high school students and therefore, were much younger than the examiners who were graduate students at Michi- gan State University. C. Hypotheses Eight hypotheses were generated from the design of this study. hypothesis 1 Female examiners will obtain a higher mean intelligence test score than male examiners. Some research (Cieutat, 1965) suggests that female examiners of individual intelligence tests obtain higher scores from examinees. Hypothesis 2 There will be no difference between mean intelligence test scores achieved by boy and girl examinees. No research was found that would suggest that male examinees score higher or lower than female examinees on the Hechsler Intelligence Scale for Children. Hypothesis 3 Examinees who are expected by examiners to be above average will achieve a higher mean intelligence test score than examinees who are sXpected to be below average. This hypothesis is a test of the expectancy effect which has been experimentally demonstrated (Larrabeo and Kleinsasser, 1967; Rosenthal, Pods, Friedman, and Vikan-Kline, 1960; Rosenthal, Persinger, Vikan-Kline, and Mulry, 1963; Rosenthal, Mulry, Parsinger, Vikan-Kline, and Grothe, 1964a, 1964b; and Friedman, Kurland, and Rosenthal, 1965). This hypothesis will be directionally tested. Hypothesis k There will be no reliable difference between mean intelligence test scores obtained by the examiners. Since examinees were randomly assigned, there is no reason to expect that examiners will obtain different mean intelligence test scores. If this test is statisti- cally significant, it probably indicates that some examiners regu- larly assign scores that are higher than scores assigned by other examiners. Hypothesis 5 Opposite sex combinations of examiners and examinees will 5 obtain higher mean intelligence test scores than same sex combina- tions. There is sufficient research (Stevenson, 1961; Stevenson and Allen, 1964; Cieutat, 1965; Hill and Stevenson, 1965; and Rosenthal, 1967a) to support a directional hypothesis concerning the sex of examiner interacting with the sex of examines variable. Hypothesis 6 There will be no interaction effect between the sex of the examiner and the extent to which the obtained scores are in the direction suggested by the expectancy set. It cannot be determined whether male or female experimenters are more susceptible to an expectancy effect. Rosenthal, Mulry, Persinger, Vikan-Kline, and Grothe (196ha) found that male experimenters are 'better biasers," but a directional hypothesis is not advanced. Hypothesis‘? There will be no interaction effect between the sex of the examinee and the extent to which the obtained scores are in the direc- tion suggested by the expectancy set. Again, a directional hypothesis is not advanced because of insufficient research to support the con- ceptualization that one sex of examinees is more influenced by the direction of the expectancy than the other sex of examinees. In two studies (Rosenthal, Mulry, Persinger, Vikan-Kline, and Grothe, 1964a; and Stevenson and Allen, 1964), it was found that female subjects were easier to influence than male subjects. As in Hypothesis 6, a directional hypothesis is not formulated because it is felt that the subjects' age may have accounted for the results found in the studies 6 cited. That is, a young male examiner may be better able to influence a college coed than an eleven year old junior high school girl. Hypothesis 8 There will be no interaction effect among the sex of examiner, sex of examines, and direction of expectancy variables. This hypo- thesis is advanced to determine the existence of a three-way inter- action effect. D. Definition of Terms The following terms need to be defined to assure common meaning. Expectancy Experimentsrs may have a preconceived idea concerning the results of their research. A test examiner may have a preconceived idea concerning the test score of the subject he is testing. That is, he has an "expectation” of what the outcome will be. For the pur- pose of the study, an "expectancy" was believed to exist when the grand experimenter (the writer) gave test examiners an esthmate of the ability of each subject tested. Experimenter of Examiner Influence "Experimenter or examiner influence" involves a conscious or unconscious effect on the outcome of research or test scores. E. Overview of the Study In the next chapter, a review of recent (1960 to present) research on experimenter and examiner influence is presented. The 7 reader will be introduced to the research that has been done on the experimenter and examiner variables. He will also be shown that the present study represents a necessary extension of the previous research. The research design (sample, instruments, procedure, and analysis) is discussed in Chapter III and results are presented in Chapter IV. Chapter V'includes a summary, conclusions, and implica- tions for further research. CHAPTER II REVIEW OF RELATED RESEARCH Chapter II contains a review of recent (1960 to present) literature concerning examiner influence on individual intelligence test scores and on projective test scores, research on experimenter influence, and a discussion of previous research. Studies of exami- ner influence on individual intelligence and projective test scores were included because of their relevance to this study's design in which an individual intelligence test was used. These studies were conducted to determine whether physical or personality characteris- tics of test examiners influenced the test scores they obtained. The rest of the studies in Chapter II were concerned with Robert Rosenthal's (1967a) contention that experimenters influence the results of their research. A. Examiner Influence Examiner Influence on Individual Intelligence Test Scores Very little research was found concerning examiner influence on intelligence test scores. Cieutat (1965) also noted this lack of research. In his study, seven male and six female examiners tested 24} boys and girls with the Stanford-Einst. Cieutat found that female examiners obtained significantly higher (pg .001) intelli- gence test scores from subjects. Analysis of variance revealed a 8 9 significant interaction effect (p5 .05) between sex of examiners and sex of subjects. Highest performances were obtained when examiners tested epposite sex subjects. Cohen (1965) also found a significant (pg .005) examiner effect using the Vechsler-Bellevue. Cohen believed that examiner influence reduced the subtest validity of the Wechsler- Bellevue. Exner (1966) studied the effect of "rapport building" on Stanford-Binet intelligence test scores. Twenty-five subjects took the test after substantial attempts were made by examiners to "build rapport! The other twenty-five examinees took the test without pre- liminary "rapport building." The experimental group scored signifi- cantly higher (p-‘. .001) than the control group. Wartenbsrg-Ekren (1962) found no significant difference on Block Design scores of the wschslsr Adult Intelligence Scale when eight examiners gave that subtsst to two examinees who were allegedly earning higher grades than the other two examinees. However, Larra- bee and Kleinsasser (1967) found a significant difference when five examiners administered the Wechsler Intelligence Scale for Children to twelve sixth graders. Each examines was tested by two examiners: one examiner administered the even items; the other examiner admini- stered the odd items. One examiner was told that the examinee was "above average"; the other examiner was told that the examinee was "below average." On the Verbal part of the test the difference was over ten IQ points (p5 .05). Friedman (1967) reviewed literature concerning examiner dif- ferences in testing and found that the examiner's and subject's race made a difference in testing. Same race combinations obtained best 10 scores. He noted, however, the lack of research on the examiner variable in testing. He pointed out that most subjects were still tested by only one examiner. He also suspected an examiner expectancy effect in testing because most examiners review the examinee's records before testing. Examiner Influence on Projective Test Scores Turner and Coleman (1962) and Simmons and Christy (1962) studied examiner influence on Thematic Apperception Test (TAT) responses. Turner and Coleman designed their study to maximize the probability of significant differences but failed to obtain many significant results. They did find that examiners who exhibited warmth elicited significantly more hostile responses from subjects. They did not find examiner competency, experience, or preference for the TAT signi- ficantly related to subjects' responses. Rosenthal (1965b) reported two studies conducted in the 1950's in which subjects' scores on Rorschach cards correlated significantly with examiners' scores. Furthermore, analysis of subjects' pretest and posttost scores revealed that their score became more like the examiners' scores as a result of the examiner-subject interaction. Examiner influence on individual intelligence test scores or on projective test scores was investigated in a limited number of research studies. In these studies, the examiner's personality (warmth, rapport) and physical characteristics (sex, race) had an effect on the dependent variable. Specifically, opposite sex and same race combinations obtained the best scores; “warm? examiners elicited more hostile responses; and examiners who built rapport 11 obtained significantly higher Stanford-Binet intelligence test scores. Examiner influence on intelligence and projective test scores prevents accurate assessment of examinees and therefore, needs to be more fully understood and controlled. B. EXperimentsr Influence There were many studies of experimenter characteristics and their influence en experimental research. These included studies of the sex of experimenter, prestige of experimenter, experience of experi- menter, personality of experimenter, and race of the experimenter. Several studies were reviewed in which the cues which mediate experi- menter influence were examined. Sex of Experimenter None of the twelve studies reviewed in this section demon- strated a significant sex of experimenter effect. Two of seven studies showed a significant sex of subject effectp-in one study male subjects performed better and in the other study female subjects per- formed better. Hhen the sex of experimenter variable was examined in interaction with another variable, significant results were usually obtained. The interaction of the sex of experimenter and sex of sub- ject variables was sometimes significant--ospecia11y when experi- menters worked with opposite sex subjects. Stevenson and Allen (1964) found significant sex ef subject and interaction effects when adult subjects were verbally reinforced in performing a marble-sorting task. Female subjects performed sig- nificantly better than males and opposite sex interactions resulted 12 in the best performances. The authors (Stevenson and Allen, 1967) repeated this study later, however, and found no significant inter- action between ths experimenter and sex of subject variables. Miller and Solkoff (1965) did not find a significant sex of experimenter effect in a verbal conditioning experiment. Stevenson (1961) studied children with the marble-sorting task and found sex of experimenter related to the age and sex of subjects. Opposite sex combinations of experimenters and subjects resulted in the best per- formances. In another study (Hill and Stevenson, 1965) using marble- sorting as the dependent variable, no significant sex effects were found until the interaction of sex and verbal reinforcement was examined. Best performances were obtained by male experimenters in the reinforcement condition. A verbal conditioning study (Sarason and Minard, 1965) revealed that male subjects were conditioned easier than female subjects, but sex of experimenter was not a significant variable. The interaction effect of the sex of experimenter and hostility of experimenter variables was significant. Low hostility male experimenters and high hostility female experimenters conditioned their subjects best. In another verbal conditioning experiment (Ogawa and Cakes, 1965) it was found that female experimenters conditioned low anxiety male subjects significantly better than high anxiety male subjects. The authors reasoned that high anxious males became more anxious which reduced the quality of their performances. A summary of these verbal conditioning studies indicated that the sex of an experimenter alone did not influence the results of the studies. Only when another independent variable interacted with the 15 sex of experimenter variable were significant results obtained. Prestige of Experimenter Pour studies were reviewed which dealt specifically with the prestige of experimenters as a variable. All of these studies involved verbal conditioning. Two of the studies revealed signifi- cant prostige effects; the other two did not. Of the former two studies, one showed that low prestige experimenters obtained better performances; the other study reported that high prestige experimenters obtained better results. These studies were discussed below. Prince (1962) found that more prestigious experimenters con- ditioned the verbal behavior of children significantly better than less prestigious experimenters. In another verbal conditioning experiment (Katkin, Risk, and Spielberger, 1966) an undergraduate experimenter (low prestige) obtained a significantly greater per- formancs increment than a professor (high prestige). Vith only two experimenters it was possible that other variables may have influenced the subjects' responses in this experiment. Sarason and Minard (1965) manipulated the experimenters' prestige level by the way they dressed and contacted subjects. They found no significant prestige of experimenter effect using sixteen experimenters in.a verbal conditioning experiment. Blaufarb (1960) found no significant prestige of experimenter effect using ten experi- menters. In a social science survey research study, Jones (1965) dis- cussed a “courtesy biasI found in Southeast Asia. She referred to the experimenter effect of prestige on people who traditionally treated 11+ visitors courteously, and stressed the importance of constructing questionnaire items which were not susceptible to influence. Experience of Experimentsr The effect of an experimenter's experience on research results was studied by several authors. Rosenthal (196A) once suggested that use of unsophisticated and less ego-involved experimenters might reduce an experimenter's influence. But he found that naive experimenters influenced results also and retracted his former suggestion. In an animal conditioning experiment, Brogdsn (1962) found that naive experi- menters did not condition rabbits as well as experienced experimenters. Hewever, the difference diminished with practice. Cordare and Ison (1965) reported an interesting study of planaria. Experimentsrs who expected high planarian activity reported significantly (p5 .001) more activity than those experimenters expecting little planarian activity. They attributed this extremely significant difference to the naivetd'of the experimenters. Another study (Ingraham and Harrington, 1966) appeared to support this notion. Some of the experimenters worked with rats which they were led to believe were ”bright"; others worked with rats which were supposedly 'dull.‘ They found no significant differ- ence and even the insignificant difference disappeared by the fifth day of the experiment. The authors conceptualized that naive experi- menters must rely on cues given by the chief experimenter because of the ambiguity of the situation. As experimenters gained experience in the experimental situation, they increasingly responded to factual cues emitted by the rats. The authors concluded that experimenter influence was not a problem if experimenters were experienced in 15 working with subjects. In a sociological study, Kish (1962) also found that experienced interviewers influenced their data signifi- cantly less than naive interviewers. He suggested the use of struc- tured questionnaire items to reduce interviewer influence. In three of four studies, naive experimenters influenced their results more than experienced experimenters and this influence was reduced as experimenters practiced or became more experienced. Personality Characteristics of Experimenter Many studies were reviewed which examined the effect of experi- mentsrs' personality in experiments. variables included hostility, anxiety, need for social approval, and warmth of the experimenter. Hostility. Two studies (Sarason, 1962; and Sarason and Minard, 1965) demonstrated a significant experimenter hostility effect on the results of verbal conditioning experiments. Hostile experimenters of both sexes elicited more hostile verbs in one study (Sarason, 1962). In the other study, low hostility male experimenters and high hostility female experimenters were able to condition subjects best. Anxiety. In discussing the effect of sxperimsnters' anxiety on results, researchers (Rosenthal, Persinger, and Feds, 1962; linkel and Sarason, 1964) felt that a curvilinear relationship existed. That is, experimenters exhibiting medium anxiety influenced subjects more than experimenters showing high or low anxiety. The results of their studies and others' did not clearly confirm this hypothesis. Winksl and Sarason found that female subjects performed better for low anxiety experimenters. Rosenthal's study demonstrated that medium l6 anxiety experimenters were the least effective influencers. In another study (Rosenthal, Persinger, Vikan-Kline, and Mulry, 1963) it was found that high anxiety experimenters were most influential on subjects' responses and that high anxiety subjects were most susceptible to being influenced. Finally, Rosenthal, Kohn, Greenfield, and Carota (1965) reported a study in which "less anxious' experimenters influenced subjects significantly better than ”more anxious' experimenters. Thus, while anxiety of the experimenter appeared to be an important variable, its direction was unpredictable. The different results may reflect some difficulty in correctly identifying "high," "medium,“ and "low" anxiety experimenters. It is possible that experimenters affected their data in some unexamined way rather than by their anxiety level. Need for social approval. Rosenthal conceptualized that experi- menters who had a high need for social approval would be better influencers in order to be approved by the grand experimenter. This conceptualization was supported in his review of the literature (1967a). Need for social approval was not found to be related to the degree of experimenter influence, however, in a study by Rosenthal, Persinger, Vikan-Kline, and Mulry (1965). [Eggmth. The warmth of experimenters seemed to be a significant variable. Reece and Whitman (1962) found that 'warm' experimenters conditioned subjects significantly better (piE.OOl) than “cold” experi- menters. Rosenthal, Kohn, Greenfield, and Carota (1965) reported that more friendly experimenters influenced subjects better. Authors of both studies felt that subjects were more likely to want to please warm, friend 1y experimenter. 17 Race of Experimenter The influence of an sxpsrimentsr's race has not been exten- sively studied. Two studies were reviewed which indicated that an experimentor's race alone does not affect data. Williams (1964) found that experimentsr's race was significant only in interaction with a social distance (between interviewer and subject) variable, and when interview questions were potentially threatening. Katz, Robinson, Epps, and Waly (1964) administered a disguised verbal test of aggression to Negro high school males. When the test was neutrally described, Negro and white experimenters obtained the same results. When described as an intelligence test, Negro experimenters obtained significantly higher aggression scores than those elicited by white experimenters. Cues Which Mediate Experimenter Influence If experimenters do influence their research, the question must be asked, "How?" One factor analytic study (Rosenthal, Fods, Friedman, and Vikan-Kline, 1960) revealed that experimenters who read directions more slowly and used more hand gestures influenced their data the most. Thus, these are examples of I'cues" which mediate some kind of experimenter influence. In more recent studies (Rosenthal and Fode, 1965a, 1965b; Rosenthal, Feds, Vikan—Kline, and Persinger, 19611; Friedman, Kurland, and Rosenthal, 1965; Rosenthal, Friedman, and Kurland, 1966; Friedman, 1967; and Rosenthal, 1967a) the following cuss have been found significant in the transmission of experimenter influence: frequency of glances, rate and accuracy of reading direc- tions, body activity, and touch. Rosenthal concluded that verbal 18 communication was the most significant mediator of experimenter influence. That verbal cues were important mediators was confirmed by Sarason (1962). He found verbal reinforcement significantly more influential on subjects than a visual reinforcer (flashing light). Contrarily, Reece and Whitman (1962) reported a verbal conditioning study in which visual cues (body movement, smiling, glancing by experimenter) reinforced subjects more than verbal cues. Similarly, Garlsmith and Aronson (1965) found visual cues to be effective medi— ators of experimenter influence. A majority of the authors of the studies reviewed in this sec- tion suggested that experimenters do influence their data. The results of these studies were too conflicting and inconsistent to warrant many definite conclusions regarding how experimenters influence their data. The sex and race of experimenters seemed to be significant variables, especially when studied in interaction with other variables. The warmth of an experimenter appeared to have a consistent effect on data; that is, "warm" experimenters influenced their research more than eXperimenters perceived as being less warm. The research.on other experimenter personality characteristics and also on the prestige of the experimenter revealed little consis- tent evidence to support an assessment of their importance. Similarly, the effect of the experience of experimenter was discussed in an unre- solved debate by Rosenthal, Harrington, and Ingraham. While visual and verbal cues were found to mediate experimenter influence, more research needs to be done to discover how experimenters influence 19 their data. In several studies the grand experimenter deliberately gave experimenters an expectancy concerning the results of the study. There seemed to be two phenomena that created experimenter influence in these studies (Rosenthal, 1964); "expectancy effects," and “effects of early data returns.“ ExPectancy Effects Rosenthal (196k) described "expectancy effects” when he rea- soned that researchers usually studied variables in which they were interested and, therefore, often had certain 'expectations' regarding that variable. If one's expectations in any way distorted the research data, experimenter influence existed. To test the expectancy effect on data, Rosenthal, Mulry, Per- singer, Vikan-Kline, and Grothe (1964b) presented a sequence of twenty photographs of people to a large non: group and asked them to rate each picture from -10 to +10 according to perceived failure (~10) or success (+10) exhibited by the faces. The mean rating was zero. In subsequent studies, experimenters were asked to administer the photographs to subjects and were told that the purpose was to develop an empathy test. Then experimenters were given differential expectancies concerning how their subjects would rate the photographs. That is, some experimenters were told to eXpect mean ratings of #5 (moderate success), and the other experimenters were led to expect -5 (moderate failure) mean ratings. Of course, subjects were assigned randomly and if significantly different mean ratings were obtained by +5 and -5 experimenters, a significant expectancy effect was 20 demonstrated. Using this basic design, Rosenthal and his students reported five studies (Rosenthal, Fode, Friedman, and Vikan-Kline, 1960; Rosenthal, Persinger, Vikan-Kline, and Mulry, 1965; Rosenthal, Mulry, Persinger, Vikan-Kline, and Grothe, 1964a, 1964b; and Fried- man, Kurland, and Rosenthal, 1965) in which a significant experi- menter expectancy effect was found. One study (Resenthal, Fode, Vikan-Kline, and Persinger, 1964) revealed no significant effect. Rosenthal and Halas (1962) and Cordaro and Ison (1963) found significant expectancy effects using planaria as subjects. Ingraham and Harrington (1966) reported no significant eXpectancy effect when experimenters were led to believe they had either I'm.aze--bright" or ”maze-dull" rats for subjects. With the same design, Rosenthal and Fode (1965a) had earlier found a significant difference; the sup- posedly "maze-bright" rats learned a discrimination task signifi- cantly faster. Cooper, Eisenberg, Robert, and Dohrenwend (1967) gave opposite expectancy conditions to ten experimenters, each of whom had ten sub- jects. A significant expectancy effect was demonstrated. Using the photo-rating task described above, Friedman (1967) also found a sig- nificant expectancy effect. Rosenthal and Jacobson (1966; 1968) examined the expectancy effect hypothesis in the classroom. They described their study at great length in Eygmalion in the Classroom (1968). The authors explained what they meant by an "interpersonal self-fulfilling prophecy“ (1968, p. vii): ' . . . how one person's expectation for another person's behavior can quite unwittingly become a more accurate prediction simply for its having been made.” Specifically, 21 they Wondered if a teacher's expectation of a pupil's ability some- how actually helped determine the pupil's ability. Working in an elementary school (Grades 1-6), Rosenthal and Jacobson administered a pretest (Flanagan Tests of General Ability) to all pupils in May of 196k, and told the teachers that it was the 'Harvard Test of Inflected Acquisition.“ Then they randomly selected twenty percent of the pupils and told the teachers in September of 1964 that the test indicated that these pupils were about to take an “intellectual spurt” or were about to "bloom." Retesting occurred in January, 1965, the basic posttest was administered in May, 1965, and a followb up posttest was administered in May, 1966. Rosenthal and Jacobson hypothesized that younger I'bloomers" would shew greater gains when compared to control pupils than older "bloomers.” This hypothesis was confirmed in that significant results were found only at the first and second grade levels. Significant results were not obtained for an ability effect (fast, medium, slow tracks) or for a minority group status effect (Mexican versus American). Analysis did indicate a significant sex of pupil effect: boy "bloomers" spurted more than girl “bloomers." Rosenthal and Jacobson concluded that teachers apparently did communicate an expectation of performance to the "bloomers“ which accounted for their gains on the intelligence test posttest. They conceptualized that the "quality of interaction" between teachers and “bloomers" probably made the difference. In view of recent criticism (Thorndike, 1968; and Claiborn, 1969) of Rosenthal and Jacobson's study, one must continue to question the existence of teacher influence. Thorndike (1968, p. 711) con- cluded that the basic data were ' . . . so untrustworthy that any 22 conclusions based upon them must be suspect." Claiborn (1969) repli- cated parts of the Rosenthal and Jacobson study and found no signifi- cant differences on the hypotheses tested in both studies. He dis- cussed the "failure to replicate'' and concluded that the question of teacher influence remained “equivocal.” Taking the strength of Rosenthal's convictions and assuredness of his critics together, the writer was left rather confused. It seemed, however, that Rosenthal may be guilty of that which he has warned us about—-observer influence. Effects of Early Data Returns "Early data return effect" referred to the tendency for experi— menters to be influenced by the hypotheses suggested by data collected early in the life of an experiment. Rosenthal directed two ingenious studies of this effect. In the first study (Rosenthal, Persinger, Vikan-Kline, and Redo, 1965a) all experimenters were led to expect -§5 mean photograph ratings. The researchers arranged for four of the experimenters to obtain 'good' scores from their first two subjects by the use of coached accomplices and arranged for four experimenters to obtain "bad" scores by the same procedure. Four experimenters experienced only naive subjects who were not accomplices. The dif- ference between the two eXperimental groups was significant and, as hypothesized, the control experimenters obtained a mean rating between the two eXperimental groups. Furthermore, there was some evidence of a sequence effect; that is, "good" data got better and 'bad” data got worse. In a similar but more complex eXperiment (Rosenthal, Kohn, Greenfield, and Carota, 1965), the authors found a significant (p $.05) early date return effect. In this study, early data return 25 effect was strong enough to change an opposite, initial experimenter expectancy. The effect was strongest, however, when it confirmed an initial expectancy. The authors concluded that when experimenters experienced 'good' data they became warmer and more friendly and exercised greater influence on their subjects. 0. Discussion of Previous Research In this chapter studies of examiner influence on test scores were reviewed. There was only one study (Larrabee and Kleinsasser, 1967) reviewed in which the Wechsler Intelligence Scale for Children was used as an instrument. Several studies concerning experimenter influence on experi- mental research were reviewed. The sex of the experimenter sometimes had an influence on experimental results when combined with other variables. Four studies were discussed in which the prestige of the experimenter was examined and that variable did not appear to have a consistent influence on research data. In a review of literature concerning the influence on research results of an experimenter's experience, it was concluded by this writer that experienced experi- menters were less likely to influence their data. It was also found that certain personality characteristics (hostility, anxiety, need for social approval, and warmth) of eXperimenters sometimes influ- enced results of experiments. If experimenter influence exists, it is important to understand how it is mediated. Studies were reported in which it was demonstrated that both verbal and non-verbal cues mediated experimenter influence. Several studies were reviewed in which the grand experimenter 24 deliberately gave experimenters an expectancy regarding the experi- ment. In these studies an experimenter influence effect was consis- tently demonstrated. This review of the literature revealed that experimenter influence has been extensively examined in experimental research. Rosenthal and Jacobson (1968) applied experimental research findings to a practical setting. They are vulnerable to criticism in regards to their research design and the interpretation of their data. The review of literature revealed weaknesses in research design of other studies as well. In many studies experimenters were given only one expectancy condition. This procedure was seen as a weakness because an expectancy "set" could easily deve10p which might account for the influencing effect. Much of the research involved male experimenters and female subjects of similar age. The sex of experimenter and sex of subject variables need to be crossed and younger subjects need to be tested. Most of the studies employed meaningless rating instru- ments which required little or no training to administer. This pre- vented an examination of the experimenter's experience or training as a variable in research on experimenter influence. There seemed to be confusion concerning just how experimenter influence operated. Rosenthal (1967a) believed that experienced experimenters were more likely to influence their research because they were more ego-involved with their research and were ”better biasers." That is, they were better at communicating their expectancy to subjects and better at reinforcing ”correct" responses. Thus, for Rosenthal, experimenter influence increased with the experimenter's 25 experience and "snowballed" as an experiment progressed. Friedman (1967), one of Rosenthal's associates, agreed with this hypothesis. This hypothesis has been continually criticized by Ingraham and Harrington (1966, 1967) in a running debate in Psychological Reports. They found (Ingraham and Harrington, 1966) that experi- menters influenced less as an experiment progressed and as experimen- ters gained experience in the experimental task. They also found that training of experimenters reduced the experimenter influence effect and that experimenters who were given both expectancy conditions influenced their results less. They concluded (Harrington and Ingra- ham, 1967) that experimenter influenoe existed when an inexperienced and untrained experimenter began an experiment in which he was given an expectancy condition. As the eXperiment progressed, the experi- menter responded more and more to factual cues presented by the sub- jects and.less and less to the experimentally induced expectancy effect. Rosenthal (1967b, 1967c) retorted by manipulating Ingraham and Harring- ton's statistical methods in such a way that their research confirmed his hypothesis. From this discussion it can be seen that the operation of experimenter influence is difficult to understand. For example, does the subject in an experimenter influence study actually perform better or does the experimenter just think the subject does better? In Chapter III, the design of this study is presented. The sample, instrumentation, procedure, and the design and analysis are described. The research design was chosen to correct some of the weaknesses mentioned in the discussion of previous research. CHAPTER III DESIGN Chapter III includes an explanation of the samples, instru- mentation, procedure, and the design and analysis. A. The Sample Examiners Eight trained Wechsler Intelligence Scale for Children (WISC) examiners participated in this study. They were volunteers from two Michigan State University individual testing classes (Education 866A) totalling approximately thirty-five people. Random selection of examiners would have been statistically desirable; however, due to insufficient funds, the writer could not make the research attractive enough to insure a large pool of examiners from which to randomly draw eight. Four men and five women volunteered to test eight exami- nees each for $2.50 per test. A scheduling conflict eliminated one woman examiner which left four men and four women examiners. Examinees There were sixty-four examinees--thirty-two boys and thirty- two girls. Examinees were sixth, seventh, and eighth grade students at DeWitt Junior High School. DeWitt, Michigan is a middle-class suburb of Lansing, Michigan. The writer identified fifty-eight boys 26 27 and sixty-nine girls who had previously scored from ninety to one hundred-ten on the California Test of Mental Maturity-Short Form. From this population, thirty-two boys and thirty-two girls were ran- domly selected for the sample. The means and standard deviations obtained by examinees on the California Test of Mental Maturity are presented in Appendix A. The examinees ranged in age from ten years, ten months to fourteen years, six months, with a median age of twelve years, seven and one-half months. B. Instrumentation Wechsler Intelligence Scale for Children The Wechsler Intelligence Scale for Children (WISC) was used in this study for two reasons: first, a pool of trained WISC examiners was more accessible than Stanford-Binet or other individual intelli- gence test examiners; and second, although the WISC had high reliability, scoring was somewhat subjective. The nature of this study demanded that some degree of subjectivity in the scoring of protocols be perk mitted. The WISC is an individual intelligence test for children between age five and age fifteen (Wechsler, l9k9). There are five Verbal Scale subtests: Information, Comprehension, Arithmetic, Similari- ties, and vocabulary. There are five Performance Scale subtests; Picture Completion, Picture Arrangement, Block Design, Object Assembly, and Coding. In addition, there are two optional subtests; Digit Span and Mazes. Verbal, Performance, and Total I.Q.'s are computed by a standard score formula with mean set at 100 and the standard 28 deviation at fifteen. The WISC was standardized on a stratified random sampling of 2200 White American boys and girls. Published in l9h9, the sampling was based on 1940 Census Bureau data. Scoring of the W180 was made as objective as possible. How- ever, Cronbach (1960, p. 19#) pointed out that, ”The skill of the examiner may influence the score greatly. In some of the verbal tests, the examiner must make rather sensitive judgments as to the correctness of an answer since it may be necessary to request the subject to elaborate his meaning. Answers that seem.wrong may be correct when the subject explains himself. Subjectivity in scoring border- line answers is also a potential problem." Littell (1960) noted that the predictive validity of the W180 had not been demonstrated. Though Burstein (Buros, 1965) did not mention this limitation, no recent studies were reviewed in which an attempt was made to demonstrate the WISC's predictive validity. Con- current validity has been successfully demonstrated. In various studies, the WISC correlates with the Stanford-Binet from .h9 to .94 with a median correlation of about .80. Correlations between the WISC and the Wechsler-Bellevue ranged from .72 to .87. With the California Test of Mental Maturity, correlations ranged from .77 to .81. When correlated with achievement tests, coefficients ranged from .14 to .81 with a median correlation of .66. WISC reliability is very high. Split-half reliability coef- ficients vary from .86 to .96 depending on the age level of examinees (Littell, 1960). Cronbach (1960, p. 198) felt that the Wechsler Performance Scale was the most reliable performance scale ever deve- loped. Coefficients of internal consistency ranged from .59 to .91 with standard error of measurement ranging from 5.00 to 5.61. Only one coefficient of stability was reported (Littell, 1960) and it 29 was .77 over a four year period. Reliability coefficients were greatest from age ten and one-half to age thirteen and one-half. The subjects in this study were between the ages of ten and fourteen. Fraser (Buros, 1959) felt that the W130 seemed most valid for normal range subjects. All the subjects in the present study were drawn from the "normal” range. In the Sixth Mental Measurement Yearbook (Buros, 1965), Burstein supported the WISC as having good reliability. He pointed out that much of the W180 research since 1960 concerned "psychopathological applications." That is, the W180 was used to measure the intelligence of special groups such as the retarded, the gifted, brain-damaged children, and the disadvan- taged. A review of WISC literature since 1960 brought the writer to the same conclusion.made earlier by Littell (1960): Studies need to be made concerning possible sources of score variation other than intelligence. Rating Sheet An expectancy influence was experimentally induced by giving test examiners a rating sheet for each examinee (see Appendix B). The rating sheet included the examinee's name and fictitious ratings of his California Test of Mental Maturity-Shcrt Form 1.0. and school achievement. The fictitious ratings were marked on continua from "bottom one-fourth" to "tap one—fourth." In each case a discrepancy between intelligence test score and school achievement was indicated. For example, an "above average" examinee was rated at the top one- fourth on the CTMM, and second one-fourth on school achievement. The last item on the rating sheet was a "predicted WISC score" which was 50 above average or below average depending on whether the examinee had previously been designated as "above average" or "below average." That is, examinees who were supposedly "above average” were given predicted WISC scores above one hundred. Examinees who were supposedly ”below average" were assigned predicted WISC scores below one hundred. In this way it was hoped that examiners would have an ”expectancy" regarding the outcome of an examinee's WISC score. C. Procedure After the thirty-two boys and thirty-two girls were selected, they were randomly assigned to one of the eight examiners so that each examiner had four boys and four girls. Then “above average“ or "below average" designations were randomly assigned to examinees so that each examiner would test two ”above average“ boys, two 'below average" boys, two "above average" girls, and two 'below average“ girls. Finally, the order in which examinees were tested was ran- domized for each examiner. It was necessary to call seventy parents to secure approval for testing sixty-four subjects. When an originally selected subject could not participate, a replacement was randomly drawn from the remaining papulation. Testing was offered as a service to parents and confidentiality was assured. Neither the subjects nor their parents were told that the testing was part of a “study" or an ”experi- ment." Test scores were interpreted to parents at a later date. Examiners were given the following information before testing began: 51 "These students are being tested because there is a dis- crepancy between their I.Q. (CTMM) and school grades. Be sure to read the information for each student before test- ing because we are interested in knowing your Opinion with regard to whether he has been over-achieving or under- achieving. When turning in test scores please let us know the student's 1.0. and any special circumstances con- cerning the Verbal and Performances scores." Examiners were led to believe that the writer was merely an agent of the DeWitt Public Schools. In no case did an examiner indi- cate that he "guessed" that he was taking part in an eXperiment or was indeed a subject in an experiment. All testing was completed within a one week period starting the day after examiners completed their WISC course. Most examiners tested four examinees in each of two sessions. Two examiners tested eight examinees in one day. Scheduling conflicts prevented a more uniform testing schedule which would have been desirable in terms of research design. Examinees were scheduled at seventy-five or ninety minute intervals. Each examiner had a private office in which to administer the W180. No more than three examiners were testing at one time and conversation among examiners was infrequent. It was feared that too much discussion among examiners might accidentally lead examiners to the realization that they were subjects in the research design. All tests were scored by the examiners and returned to the writer. D. Design and Analysis Analysis ef the data was by a mixed model, four-way analysis of variance (Hays, pp. 459-hh7). A I'mixed model” was used because the design had both "fixed'' and "random" variables. A variable is 52 "fixed" when there are qualitatively distinct levels of the variable. A "random'' variable is one in which the levels chosen represent only a sample of the pOpulation of possible levels. The design of the study (see Figure 1) consisted of three fixed variables (expectancy, sex of examinee, and sex of examiner), and two random variables (examiners nested in sex of examiner, and replications nested in all other variables). A variable is nested in a second variable when each level of the nested variable does not appear in all levels of the second variable. For example, from Figure 1 it can be seen that Examiner 2 can only be in one level of the sex of examiner variable because he is a male. Expectancy of Examiner by Examines Sex Male Examines Female Examinee ”Above IffiBelow "Above I'Below Sex of Average" Average" Average“ Average" Examiner Examiner Expectancy Expectancy ExJectancy Expectancy 1 R l, R 2 R 1, R 2 2 Male a; 5 n 5 6 Female 7 8 Fig. 1 Research Design 55 As seen in Figure 1, the sex of examiner variable is crossed with the sex of examinee and expectancy variables. Variables are "crossed" when each level of one variable occurs with each level of the other variable. From the design diagrammed in Figure 1 it can be seen that four male examiners and four female examiners each tested two "above average" male examinees, two 'below average“ males, two ”above average" females, and two ”below average" females. The five per cent level of confidence was arbitrarily chosen for significance tests. Analysis of variance with fixed and random variables involves the usual assumptions of independence of observations, equality of variance, and normality of distributions. In less complex designs, at least, the violation of these assumptions in designs with equal numbers in the subgroups has been shown to have negligible effects on the significance tests. See, for example, Scheffe (1959, p. 554), Norton (1952), Young and Veldman (1965), and Boneau (1969). Analysis of variance with fixed and random variables, as con- ducted here, involves the additional assumption (Hays, p. 465) that the degree of relationship, if any, among the different observations for the ammo fixed variable (expectancy, sex of examiner, and sex of examinees) are the same for all levels of that variable. There is no reason to believe this assumption is violated with the present data. The use of analysis of variance in complex designs requires the assumption, accepted here and by most investigators, that the technique continues to be robust and that it is insensitive to vio- lations of more basic assumptions in these applications. 54 The sources of variation and their degrees of freedom are presented in Table 1. TABLE 1 SOURCES OF VARIATION AND DEGREES OF FREEDOM Sources of Variation Deggges of Freedom Sex of Examiner 1 Sex of Examinee 1 Expectancy 1 Examiner Nested in Sex of Examiner 6 Replications 52 Sex of Examiner 1 Sex of Examines 1 Sex of Examiner x Expectancy 1 Sex of Examinee x Expectancy l Examiner x Expectancy Nested in Sex of Examiner 6 Examiner X Sex of Examinee Nested in Expectancy 6 Examiner x Expectancy x Sex of Exami- nee Nested in Sex of Examiner 6 Sex of Examiner x Expectancy X Sex of Examinee 1 Total 65 E. Summary Chapter III contained an explanation of the samples, instrumen- tation, and analysis of the research design. The sample consisted of four male and four female Wechsler Intelligence Scale for Children (WISC) examiners who had just completed WISC training. Each examiner tested eight junior high school students (four boys and four girls) who were randomly selected from a population of students who were ”average“ in intelligence; that is, they scored from ninety to one hundred-ten on the California Test of Mental Maturity--Short Form, 1965 Revision. Examinees were randomly assigned to examiners. The WISC is an individual intelligence test for children between ages five and fifteen. It has good reliability with split- half reliability coefficients ranging from .86 to .96. A rating sheet (see Appendix B) was used by the grand experimenter to trans- mit an expectancy condition to examiners. On each examinee's rating sheet a discrepancy between California Test of Mental Maturity-- Short Form score and school achievement was fictitiously indicated, and a ”predicted WISC score" was advanced. Eight examinees were randomly assigned to each examiner and then they were randomly designated "above average“ or “below average” so that each examiner tested two "above average" boys, two ”below average" boys, two “above average" girls, and two ”below averageI girls. The order in which an examiner tested his examinees was also randomized. Parental approval for testing was secured and confi- dontiality was assured. Neither parents nor examiners were told that they were participating in a study or an experiment. All testing 55 56 was done in private offices during a one week period. Analysis of the data was by four-way analysis of variance. In Chapter IV the hypotheses will be restated and the results will be presented. CHAPTER IV ANALYSIS OF RESULTS In Chapter IV each hypothesis will be presented with a state- ment concerning whether the hypothesis in question will be rejected or not rejected. The significance of the obtained F-ratios will be determined by using Table IV in Hays (1964, pp. 677-688). Following this section, the results of the hypotheses will be discussed. Wechs- ler Intelligence Scale for Children (WISC) Total scores are presented in Appendix C. A. Hypotheses and Results Hypothesis 1 Female examiners will obtain a significantly higher mean Total WISC score than male examiners. For a one-tailed F-test with one and six degrees of freedom, the required F was 5.99. Since the obtained F-ratio of 8.17 exceeded 5.99, the sex of examiner hypothesis was not rejected. The sex of examiner effect was significant (p 5.05) in that female examiners obtained a higher mean Total WISC score than male examiners (see Table 2). 57 58 TABLE 2 311?»an or MEAN TOTAL WISC scones, STANDARD DEVIATIONS, AND F—RATIOS FOR MAIN EFFECTS Standard Main Effect Means Deviations F-Ratio Sex of Examiner Males 97.55 8.75 Females 105.12 8.06 8.17‘ Sex of Examines Males 101.66 9.05 Female! 99.00 9e 07 9e 50. Expectancy "Above Average“ 101.17 9.89 ”Below Average" 99.50 8.25 .84 Examiner 1 97.75 7.49 2 101.50 9.52 5 94.00 8.60 4 96.88 11.51 5 104.75 7.85 6 105.50 6.65 7 104.62 9.98 8 99.62 8.79 .92 I"Sig. at .05 level of confidence Hypothesis 2 There will be no significant difference between mean Total WISC scores achieved by boy and girl examinees. For a two-tailed F-test with one and six degrees of freedom, the required F was 8.81. Since the obtained F—ratio was 9.50, the null hypothesis was rejected. The sex of examinee effect was signifi- cant (p 6.05) in that male examinees achieved a higher mean Total WISC score than female examinees (see Table 2). 59 Hypothesis_5 The mean Total WISC score achieved by ”above average“ exami- nees will be significantly higher than the mean Total WISC score achieved by "below average" examinees ("above average" examinees were those whose predicted WISC score was above one hundred; ”below average” examinees were those whose predicted WISC score was below one hundred). For a one-tailed F-test with one and six degrees of freedom, the required F was 5.99. Since the obtained F-ratio was .84, the directional hypothesis was rejected. The expectancy effect was not significant (see Table 2). Hypothesis 4 There will be no significant difference between.mean Total WISC scores obtained by the eight examiners. For a two-tailed F-test with six and thirty-two degrees of freedom, the required F was 2.87. Since the obtained F-ratio was .92, the null hypothesis was not rejected. The examiner effect was not significant (see Table 2). Hypothesisg5 The mean Total WISC score obtained by opposite sex combinations of examiners and examinees will be significantly higher than the mean Total WISC score obtained by same sex combinations. For a one-tailed F-test with one and thirty-two degrees of freedom, the required F was 4.17. Since the obtained F—ratio was .28, the directional hypothesis was rejected. The interaction effect between the sex of examiner and sex of examinee variables was not 40 significant (see Table 5). TABLE 5 SUMMARY OF MEAN TOTAL WISC SCORES, STANDARD DEVIATIONS, AND F-RATIOS FOR TWO-WAY INTERACTION EFFECTS Standard Interaction Effect Means Deviations F Sex of Examiner x Sex of Examines Male Examiner-Male Examines 98.51 9.52 Male Examiner-Female Examines 96.75 9.28 Female Examiner-Male Examinee 105.00 7.54 Female Examiner-Female Examinee 100.75 12.76 .28 Sex of Examiner X Expectancy Male Examiner-"Above Average“ Examinees 99.00 11.25 Male Examiner-“Below Average" Examinees 96.06 6.86 Female Examiner-"Above Average" Examinees 102.81 15.26 Female Examiner-"Below Average" Examinees 102.94 8.27 .59 Sex of Examines X EXpectancy Male Examines-"Above Average" 102.69 9.95 Male Examines-"Below Average“ 100.62 8.19 Female Examines-"Above Average' 99.62 9.92 Female Examines-"Below Average' 98.58 8.41 .01 Hypothesis 6 There will be no significant interaction effect on mean Total WISC scores between the sex of the examiner and the extent to which the obtained Total WISC scores are in the direction suggested by the expectancy set. For a two-tailed F-test with one and thirty-two degrees of freedom, the required F was 5.57. Since the obtained Fiwas .59, the null hypothesis was not rejected. The interaction effect between the sex of examiner and expectancy variables was not significant (see Table 5). 41 Hypothesis 7 There will be no significant interaction effect on.mean Total WISC scores between the sex of the examinee and the extent to which the obtained Total WISC scores are in the direction suggested by the expec- tancy set. For a two-tailed F-test with one and six degrees of freedom, the required was 8.81. Since the obtained F-ratio was .01, the null hypothesis was not rejected. The interaction effect between the sex of examinee and expectancy variables was not significant (see Table 5). Hypothesis 8 There will be no significant interaction effect among the sex of examiner, sex of examinee, and direction of expectancy variables on mean Tetal WISC scores. For a two-tailed F-test with one and thirty-two degrees of freedom, the required F was 5.57. Since the obtained F was .01, the null hypothesis was not rejected. The three-way interaction effect between the sex of examiner, sex of examinee, and expectancy variables was not significant (see Table 4). Il‘lr A I]? 4 42 TABLE 4 SUMMARY OF MEAN TOTAL WISC SCORES, STANDARD DEVIATIONS, AND F-RATIO FOR THREE-WAY INTERACTION EFFECT Standard Interaction Effect Means Deviations F Sex of Examiner x Sex of Examines x Expectancy Male Examiner-Male Examines-"Above Average“ 100.12 12.12 Male Examiner-Male Examines-“Below Averags'I 96.50 6.50 Male Examiner-Female Examines-"Above Average” 97.88 11.02 Male Examineeremale Examines-"Below Average” 95.62 7.79 Female Examiner-Male Examines-“Above Average|| 105.25 7.09 Female Examiner-Male Examines-“Below Average" 107.75 8.07 Female Examiner-Female Examines-"Above Average' 100.28 17.19 Female Examiner-Female Examines-”Below Average" 101.12 8.59 .01 B. Discussion of Results Female WISC examiners obtained significantly (pé.05) higher WISC scores from examinees than male examiners obtained. This result supported Cieutat's (1965) research with the Stanford-Binst. As Rosenthal (1966, p. 47) concluded, sex of experimenter seemed to be an 'aotive' rather than a “passive' variable but not a very predic- table one. The design of this study did not provide for detsnnining whether significant effects were brought about by examinees or by examiners' perceptions of examinees. It was felt, however, that the WISC testing was a novel and, perhaps, an anxiety-producing event for most examinees. Junior high school students were asked to go to the nearly empty high school at a designated time and be tested by a complete stranger. A possible explanation is that examinees felt more comfortable in the presence of female examiners since most of their teachers were females. There were only three male teachers in Grades K through 8 of the school district. Three of four female 45 examiners obtained.mean WISC scores higher than the highest mean WISC score obtained by any male examiner. The significant (pr.05) sex of examinee effect was unexpected. Cieutat (1965) did not find a significant sex of examinee effect. It was conjectured that boys were more aggressive and competitive during the testing situation which accounted for their mean WISC score. No further attempt was made to explain this effect. The central hypothesis of this study was that examinees who were allegedly ”above average" would obtain a mean WISC score signi- ficantly higher than supposedly ”below average“ examinees. That is, did the grand experimenter's estimate of an examinee's WISC score create an expectancy on the part of the examiner which influenced the obtained WISC score? Though the results were in the desired directiOn, the eXpectancy effect did not approach significance. Several explanations for this failure to demonstrate examiner influ- ence were made. First, it.must be considered possible that the WISC examiners did not receive and/or retain an expectancy regarding the outcome of an examinee's WISC score based on the grand experimenter's estimate of that WISC score. This possibility was believed very plausible. In criticizing Rosenthal's research, Barber and Silver (1968a, p. 25) outlined an “eight-step transmission process" involved in inducing experimenter influence. '(a) The student experimenter attended to the expectancy come munication from the principal investigator. (b) The experi- menter comprehended the expectancy communication. (c) The experimenter retained the communication. (d) The experimenter (intentionally or unintentionally) attempted to transmit the expectancy to the subject. (e) The subject (consciously or unconsciously) attended to the expectancy communication from 414 the experimenter. (f) The subject (consciously or uncon- sciously) comprehended the experimenter's expectancy. (g) The subject (consciously or unconsciously) retained the experimenter's expectancy. (h) The subject (wittingly or unwittingly) acted upon (gave responses in harmony with) the experimentsr's expectancy.“ It was believed that WISC examiners "attended to the expectancy comp munication'I and comprehended it. However, a break in the transmis- sion process may have come when examiners failed to retain the expec- tancy communication. Therefore, little attempt was made to inten- tionally or unintentionally transmit the expectancy to examinees. The grand experimenter gave an expectancy for each examinee to the examiner at the very beginning of testing. It was believed that examiners listened and understood that each examinee would be either 'above average" or ”below average.” Furthermore, the grand experi- menter reinforced the expectancy condition for each.exsminee immedi- ately preceding each test administration. However, once testing began, it was felt that examiners may have forgotten the expectancy regarding the examinee being tested. Examiners did not appear to be ego involved with the expectancy for each examinee. The possible failure to control and.measurs the expectancy condition.must be regarded as a limitation of this study. In addition to a breakdown in the transmission process, the failure to show a significant expectancy effect was possibly due to the nature of the experimental task and the experience of examiners. Ingraham.and Harrington (1966, 1967), and Barber and Silver (1968a) suggested that expectancy effect was difficult to demonstrate in relatively structured tasks. Barber and Silver (1968a, p. 26) con- cluded, 'Ssveral studies in this area used relatively structured or 45 factual tasks, such as the Wechsler Adult Intelligence Scale, the Taylor Manifest Anxiety scale, and a number-estimation task; none of these studies showed an experimenter bias effect.‘I Compared to Rosenthal's photo-rating task, the W130 was a very structured experi- mental task. Examiners did not have to respond to the ambiguous cues given them by the grand experimenter; rather, they very quickly responded to factual cues presented to them by examinees. Since these factual cues represented continuing information concerning an examinee's 2523 intelligence, the “expected” WISC score for an exami- nes may have become increasingly less important and disregarded or forgotten. Ingraham and Harrington (1966, 1967) also believed that experi- enced and well-trained experimenters did not allow their expectan- cies to influence their results as much as less experienced experi- menters. Though the WISC examiners were inexperienced when compared to experienced WISC examiners, they were well-trained when compared to the experimenters employed in most of Rosenthal's research. Each examiner had just completed a five-week course in WISC-WAIS test administration. Presumably, they were trained to rigorously follow directions of administering the test and recording examinees' responses. They were trained not to intentionally influence or dis- tort an examinee's responses. Examiners did not need to remember a “predicted” WISC score for an examinee because they would soon deterb mine an "actual“ WISC score for an examinee. Barber and Silver (1968a, p. 26) postulated that an expectancy effect was easier to demonstrate when a subordinate-superordinate relationship existed between experimenters and the grand experimenter. 46 The WISC examiners in this study were responsible for eight test administrations but in no other way were they subordinate to the grand experimenter. Furthermore, the amount of compensation they received for testing was fixed and not dependent on the test results. In some studies experimenters received more compensation if their results confirmed the expectancy condition. Finally, it was believed that the design of this study mini- mized the possibility of Type I error (rejecting null hypothesis when it should not be rejected) in regards to the expectancy effect. As mentioned above, the WISC was a relatively structured and factual experimental task, and examiners were relatively well-trained and experienced. In addition, each examiner tested both male and female examinees, and worked under both expectancy conditions which were randomly assigned. It was impossible for an examiner to think that gll_his examinees were “above average” or that they were all 'below average.“ Considering the failure to demonstrate a significant expec- tancy effect, the results of this study seemed to support the con- clusions reached by Ingraham and Harrington (1966, 1967). and Barber and Silver (1968a, 1968b): experimenter influence was not as easy to demonstrate as Rosenthal claimed--especially in structured tasks using experienced experimenters. The examiner effect was examined in the study to obtain evi— dence on Rosenthal's contention that this variable should always be tested in research, and also because adding this variable strengthened the research design. Though some variation existed among the mean Total WISC scores obtained by examiners, the F—ratio (.92) did not 47 approach significance. This insignificant result contraindicated the possibility that examiners influenced test scores in idiosyncratic ways. The result.may also be taken as further evidence of the “180's reliability. Turning to a discussion of interaction effects, it was inter- esting to note that none of the four interaction effects tested was significant. The review of literature for this study abounded with significant interaction effects using marble sorting ability or photo ratings as the dependent variable. Again, the failure to demonstrate significant interaction effects was possibly due to the structured nature of the eXperimental task and the experience of the WISC examiners. Since no interaction effects were significant, no t-tests were computed between means. There was considerable evidence in related literature to support the directional hypothesis that opposite sex combinations of examiners and examinees would obtain higher’mean WISC scores than same sex combinations. This hypothesis was not supported in this study (F=.28). Though the female examiner-male examinee combination resulted in the highest mean Total WISC score (105.00), the male examiner-female examinee mean Total V130 score (96.75) was the lowest combination. The interaction of the sex of examiner and expectancy variables was investigated to determine whether~males or females influenced their data more. Rosenthal,‘flu1ry, Persinger, Vikanrxline, and Grothe (196#) found that males were better "biasers." With an F-ratio of .39, that conclusion was not supported in the present study. Examination of mean WISC scores (see Table 5) showed that male 48 examiners did obtain a mean WISC score for ”above average” examinees that was almost three points above that obtained for I‘below average" examinees. There was practically no difference in mean WISC scores obtained for "above average" and 'below average“ examinees by female examiners. While the results were in the expected direction, the obtained F-ratio was far from significant and this discussion should not be interpreted as support for the hypothesis that males are better 'biasers' than females. In Barber and Silver's (1968a, p. 25) terms, examiners apparently did not receive and/or retain an expec- tancy for examinees and therefore did not transmit an.expectancy communication to examinees. There was virtually no (F==.Ol) interaction effect between the sex of examinee and expectancy variables. Though a directional hypothesis was not advanced, it was postulated that female examinees' WISC performance would be more influenced by examiners than male examinees' WISC performance. Non-statistical examination of mean scores (see Table 3) revealed insignificant differences in the oppo- site direction. That is, the difference between 'above average" and “below average“ male examinees' mean WISC score was greater than the difference between "above average“ and "below average'I mean WISC scores for female examinees. As suggested earlier (p. 6), this result possibly contradicted previous research (Rosenthal, Mulry, Persinger, Vikan-Kline, and Grothe, 1964a; and Stevenson and Allen, 1964) because of the greater age difference between examiners and examinees in this study. Specifically, male college experimenters may have more cues by which to influence female college subjects than a junior high school girl. Of course one could counter with 49 the hypothesis that the greater age difference between examiner and examines in this study could increase the subordinate-superordinate relationship which should increase examiner influence. A more plausible explanation for the insignificant interaction effect was believed to be that the WISC was a structured task and that the expec- tancy effect demonstrated in.aypothesis 3 was just too weak. Appar- ently the expectancy communication was not received by the exami- nees of this study. The three-way interaction effect was investigated to determine if the three variables (sex of examiner, sex of examinee, and expec- tancy) were working in some complex way. No significant interaction effect (F::.01) was found. The hypothesis was advanced for explora- tory reasons and no attempt was made to explain the insignificant effect. 0. Summary In Chapter IV the statistical hypotheses were restated and a decision was made to reject or not reject each hypothesis. Then the results of each hypothesis were discussed in reference to previous research and possible explanations for the results. Both the sex of examiner and sex of examinee effects were significant (pé.05). Female examiners obtained higher mean WISC scores and male examinees achieved higher’mean WISC scores. While the significant sex of examiner effect confirmed previous research (Cieutat, 1965), the significant sex of examinee effect was unex- pected. It was conceptualized that male examinees were less anxious and more competitive when placed in a novel situation. The 50 interaction effect between sex of examiner and sex of examinee was not significant. The main sex effects seemed to over-shadow the interaction.effect. The major finding in this study was that the expectancy effect was not significant. Two principle explanations were offered. First, the eight-step transmission process delineated by Barber and Silver (1968a, p. 25) broke down in this study. That is, it was feared that 'ISG examiners did not retain the expectancy condition given to them by the grand experimenter and, therefore, examiners' expectancy was neither transmitted to nor received by examinees. Secondly, it was felt that the WISC represented a structured and factual experi- mental task, and that the W180 examiners were relatively well- trained and experienced. Thus, the results of the study seemed to confirm Ingraham and Harrington's (1966, 1967) and Barber and Silver's (1968a, 1968b) criticism of Rosenthal's research: experimenter influence has not been consistently demonstrated when experienced experimenters are administering structured tasks. The fact that no significant examiner effect was demonstrated in this study added to the argument that examiner influence was not a great problem in intelligence testing, at least not in this study. The expectancy variable was examined in interaction with both the sex of examiner and sex of examinee variables and neither inter- action effect was significant. Thus, the results of this study were not construed as evidence that.ma1e expert-enters were better influ- encers or that female examinees were more easily influenced. These two insignificant interaction effects represented further evidence of a lack of an expectancy effect in situations having this degree of 51 of examination structure and examiner experience. Analysis of the three-way interaction effect among the sex of examiner, sex of exami- nee, and expectancy variables revealed no significant complex inter» action. Chapter V will include a summary of this study and conclusions made from it. CHAPTER V SUMMARY, CONCLUSIONS, AND IMPLICATION FOR FURTHER RESEAICH A. Summary The present study was made to examine the claim of Robert Rosenthal (1967a) and others that experimenters frequently influ- enced (intentionally or unintentiOnally) the results of their research. Cieutat (1965), Cohen (1965), and Larrabee and Kleinsas- ser (1967) investigated the possibility that examiners of individual intelligence tests affected the scores of examinees. The purpose of this study was to provide a well-designed field testing of experimental research results. Specifically, the writer sought to determine if Wechsler Intelligence Scale for Children (WISC) examiners influenced their exmminees' scores when the exami- ners were given an expectancy or estimate of each examinee's score. Eight.hypotheses were advanced: hypothesis 1 Female examiners will obtain a.higher*mean intelligence test score than male examiners. gypothesis 2 There will be no difference between mean intelligence test scores achieved by boy and girl examinees. 52 53 Hypothesis): Examinees who are expected by examiners to be above average will achieve a higher*mean intelligence test score than examinees who are expected to be below average. Hypothesis # There will be no difference between mean intelligence test scores obtained by the examiners. Hypothesis 5 Opposite sex combinations of examiners and examinees will obtain higher mean intelligence test scores than same sex combinations. Hypothesis 6 There will be no interaction effect between the sex of the examiner and the extent to which the obtained scores are in the direction suggested by the expectancy set. gypothesis 7 There will be no interaction effect between the sex of the examinee and the extent to which the obtained scores are in the direction suggested by the expectancy set. hypothesis 8 There will be no interaction effect among the sex of exami- ner, sex of examinee, and direction of expectancy variables. Chapter II contained a review of recent (1960 to present ) literature concerning examiner influence on individual intelligence 54 test scores and on projective test score; research on experimenter influence; and a discussion of previous research. Examiner influence on individual intelligence test scores and on projective test scores was investigated in a limited number of research studies. In these studies the examiner's personality (warmth and rapport) or physical characteristics (sex and race) had an effect on the dependent variable. Specifically, opposite sex combinations and same race combinations produced the highest scores. A review of studies of experimenter influence revealed that experimenters frequently appear to influence the results of their studies. However, the results were too conflicting and inconsistent to warrant many definite conclusions regarding how experimenters influence their data. The sex and race of experimenters seemed to be significant variables, expecially when studied in interaction with other variables. The warmth of an experimenter appeared to have a consistent effect on data; that is, "warm? experimenters influenced their research more than experimenters perceived as being less warm. The research on other experimenter personality characteristics and also on the prestige of the experimenter revealed little consis- tent evidence to support an assessment of their>importance. Simi- larly, the effect of the experience of experimenter was discussed in an unresolved debate by Rosenthal, Harrington, and Ingraham. Ihile visual and verbal cues were found to mediate experimenter influence, more research needs to be done to discover how experimenters influ- ence their data. 55 The review of studies in which a'grand experimenter deliberately gave experimenters an expectancy concerning the results of the study consistently revealed a significant expectancy effect. Rosenthal and Jacobson‘s (1968) Oak School experiment was dis- cussed as a study in which the authors attempted to apply experimental results to a practical setting. Rosenthal and Jacobson concluded that the teachers in the experiment had communicated an expectation of ability to “bloomers' which accounted for their intellectual 'spurt." In a discussion of previous research some weaknesses in research design were mentioned. In most studies experimenters were given only one expectancy condition. This procedure was seen as a weakness because an expectancy "set'' could develop which might account for the influence effect. Much of the research involved male experimenters and female subjects of similar age. The sex of experimenter and sex of subject variables need to be crossed and younger subjects need to be tested. Most of the studies employed meaningless rating instru- ments which required little or no training to administer. This pre- vented an examination of the experimenter's experience or training as a variable in research on experimenter influence. There seemed to be confusion concerning just how experimenter influence operated. Rosenthal (1967a) believed that experienced experimenters were likely to influence their research because they were more ego-involved with their research and were 'better biasers.“ That is, they were better at communicating their expectancy to sub- jects and better at reinforcing ”correct' responses. Thus, for Rosenthal, experimenter influence increased with the experimenter's experience and 'snowballed" as an experiment progressed. Friedman 56 (1967), one of Rosenthal's associates, agreed with this hypothesis. This hypothesis was sharply criticized by Ingraham and Bar- rington (1966, 1967) and by Barber and Silver (1968a, 1968b). These authors felt that experimenters who were experienced and well-trained relied more on factual cues given them during an experiment and less on ambigqu cues given them by grand experimenters. Chapter III included a discussion of the samples, instrumen- tation, procedure, and the design and analysis of the study. The souple consisted of four male and four female Vechsler Intelligence Scale for Children (WISC) volunteer examiners who had just cwpleted WISC training. They were paid 82.50 per test. Each examiner tested eight junior high school students (four boys and four girls) who were randomly selected from a population of students who were ”average“ in intelligence (that is, they scored from ninety to one hundred-ten on the California Test of Mental Maturity-«Short Form, 1965 Revision). Examinees were randomly assigned to examiners. The WISC is an individual intelligence test for children between ages five and fifteen. It was selected for use in this study because VISC examiners were available and because the IISC had good reliability (split-half reliability coefficients ranged free: .86 to .96). A rating sheet (see Appendix B) was used by the grand experimenter to transmit an expectancy condition to examiners. On each examinee's rating sheet a discrepancy between California Test of Mental Maturity- Short Form score and school achievment was fictitiously indicated, and a “predicted WISC score" was advanced. After eight examinees were randomly assigned to an examiner, 57 they were randomly designated ”above average“ or “below average” so that each examiner tested two "above average" boys, two "below averageI boys, two ”above average” girls, and two “below average“ girls. The order in which an examiner tested his examinees was also randaaized. Parental approval for testing was secured and confidentiality was assured. Neither parents nor examiners were told that the test- ing was part of a research study or experiment. In no case did an examiner indicate that he “guessed” that he was taking part in an experiment. A11 testing was done in private offices at the high school during a one week period. Results were analyzed by four-way analysis of variance. The variables were sex of examiner, sex of examinee, expectancy (“above average" or “below average“), examiners nested in sex of examiner, and replications nested in all other variables. The five per cent level of significance was arbitrarily chosen. In Chapter IV the eight hypotheses were restated in statisti- cal form and rejected or not rejected. Both the sex of examiner and sex of examinee effects were significant (p 9.9.05). As seen in Table 5 on the next page, female examiners obtained higher’mean WISC scores and male examinees achieved higher mean.VISC scores. While the significant sex of examiner effect confirmed previous research (Cieutat, 1965), the significant sex of examinee effect was unexpected. It was conceptualized that male examinees were less anxious and more competitive when placed in a novel situation. The interaction effect between sex of examiner and sex of examinee was not significant. The main sex effects seemed to over-shadow the 58 interaction effect. TABLE 5 SUMMARY OF MEANS, STANDARD DEVIATIONS, AND F-RATIOS FOR SIGNIFICANT EFFECTS Standard Effect Means DeViations FbRatio Sex of Examiner M810 97055 8’75 Female 103.12 8.06 8.17‘ Sex of Examines Male 101.66 9.0} Female 99.00 9.07 9.50' I"Sig. at .05 level of confidence The major finding in this study was that the expectancy effect was not significant. Two principle explanations were offered. First, the eight-step transmission process delineated by Barber and Silver (1968a, p. 25) broke down in this study. That is, it was feared that WISC examiners did not retain the expectancy condition given to them by the grand experhmenter and, therefore, examiners' expectancy was neither transmitted to nor received by examinees. Secondly, it was felt that the WISC examiners were relatively well-trained and experienced. Thus, the results of the study seemed to confirm Ingra- ham and Harrington's (1966, 1967) and Barber and Silver‘s (1968a, 1968b) criticism of Rosenthal's research: experimenter influence has not been consistently demonstrated when experienced eXperimenters are administering structured tasks. The fact that no significant examiner effect was demonstrated in this study added to the argument that examiner influence was not 59 a great problem in intelligence testing, at least not in this study. The expectancy variable was examined in interaction with both the sex of examiner and sex of examinee variables and neither inter- action was significant. Thus, the results of this study were not construed as evidence that,ma1e eXperimenters were better “influ- encers' or that female examinees were more easily influenced. These two insignificant interaction effects represented further evidence that the transmission of an expectancy condition did not take place, and that the NISC was a structured task administered by experienced examiners. Analysis of the three-way interaction effect among the sex of examiner, sex of examinee, and eXpectancy variables revealed no significant complex interaction. 3. Conclusions The significant (p£E.C5) sex of examiner effect supported the feeling that the sex of experimenter or examiner was an active, though usually unpredictable variable. The mean Total WISC score obtained by the female examiners in this study was 5.59 IQ points higher than the mean Total WISC score obtained by male examiners. If important and somewhat irreversible decisions were going to be made for an individual on the basis of a WISC score, it would seem wise to have that individual tested by both a male and a female examiner. The added expense, effort, and time would be justified in relation to the importance of the decision to be made. The fact that a significant expectancy effect was not demons strated in this study confirmed Barber and Silver's (1968a, p. 26) conclusion that expectancy influence was not present when the experi- mental task was structured and factual. The insignificant 6O expectancy effect also supported Ingraham and Harrington's (1966, 1967) contention that experienced experimenters were not as likely to influence their results as inexperienced experimenters. The WISC examiners in this study did not retain the expectancy for an examinee when they started the testing situation. Instead, it was hypothesized that they increasingly responded to the factual stimuli presented them by examinees. It was concluded that a well-designed study would minimize the probability of a significant expectancy effect. Specifically, it was considered important to design a study employing both.male and female examiners, and both male and female examinees. The sex of experimenter and sex of subject variables were infrequently crossed in previous research. In agreement with Ingraham.and Harrington (1966) was the feeling that it was necessary to give examiners both expectancy conditions in random order to prevent development of an expectancy 'set.‘ It was also felt that many of the studies which demonstrated a significant expectancy effect lacked rigorous statistical metho- dology. Barber and Silver (1968a, p. 24) stated that,"(l) the vari- ables to be studied and the statistics to be used should be speci- fied in advance; (2) the level of significance should be stated in advance; (3) the data should be analyzed by some 'overall' test such as multivariate analysis of variance; and (h) conclusions should not be made from the results of post hoc tests performed upon the data after an overall test has failed to reject the null hypothesis." In the present study results were analyzed by an overall test 61 (four-way analysis of variance), and no “post hoc tests' were per- formed when the overall test failed to reject the null hypotheses. Rosenthal (1967a) and other experimenters apparently assumed that if experimenter influence could be demonstrated in laboratory tasks, the results could be generalized or applied to practical and more meaningful situations such as individual intelligence testing. On the basis of this study and criticisms leveled by others (Barber and Silver, 1968a, 1968b: Barber, 331.31., 1969; and Claiborn, 1969), this assumption appeared unwarranted. Rosenthal (1967a) often maxi- mized the probability of obtaining a significant expectancy effect by (1) designing ambiguous experimental tasks; (2) using opposite sex combinations of experimenters and subjects; (5) making experi- menters subordinate to the grand experimenter; (h) giving experi- menters only one expectancy condition; or even by (5) paying experi- menters more if their results were in the desired direction. Under a combination of those conditions, it was not surprising that a significant expectancy effect was found. However, in practical settings (such as individual intelligence testing) it was believed that these conditions usually did not exist. The generalizability of Rosenthal's experimental research was considered questionable and was not supported in this study. In.fact, in light of the well- reascned criticisms of Rosenthal's basic experimenter influence research (Ingraham and Harrington, 1966, 1967: Barber and Silver, 1968a, 1968b; and Barber, 3_t_.__g_1_., 1969). one must consider the possibility that observer influence and expectancy effect simply may not exist. 0. Implications for Further Research Any replication of the present study should include a more adequate method of communicating the expectancy condition to V130 examiners and checking its presence during actual testing. Examiners ego-involvement in the expectancy condition given them must be increased and sustained. Controls should be included which derive from Barber and Silver's (1968a, p. 25) 'eight—step transmission process." Before the existence of examiner influence, if am, can be demonstrated, future researchers must be certain that the eXpectancy communication was “attended to,“ "comprehended," and ”retained“ by test examiners 21d “transmitted“ to examinees. More importantly, further research concerning the sex of the examiner as a variable influencing individual intelligence test scores should be conducted. A research design employing male and female examiners testing the same examinees with alternate forms of the same test might be profitable. From the results of their study, Rosenthal and Jacobson (1968) implied that more research should be conducted in teacher training programs to explore how teachers' expectations of pupils' ability affects actual pupil performance. The results of this study, however, did not imply a great need for further research on the problem of an expectancy effect on individual intelligence testing. It was not claimed that this study settled the question 'once and for all;' however, the insignificant expectancy effect did support the conclu- sions of other researchers (Ingraham and Harrington, 1966, 19673 and Barber and Silver, 1968a, 1968b). Experimenter influence was not a 62 65 problem in structured, factual tasks administered by eXperienced and well-trained experimenters. BI BLI OGRAPHY BI BLI OGRAPHY Barber, T.X. Invalid arguments, postmortem analyses, and the experi- menter bias effect. Journal of Consultingfnd Clinical Psy- ChOIOEZ, 1969’ 35. 11-1Ke Barber, '1'.x., Porgione, A., Chaves, .J.l"., Calvsrlsy, D.S., and McPeake, J.D. dc Bowen, B. Five attempts to replicate the experimenter bias effect. Journal of Consultingand Clinical Psycholgg, 1969’ 559 1"6e A Barber, T.x., do Silver, ILJ. Fact, fiction, and the experimenter bias effect. Psychological Bulletin Monograph, 1968a, 70, 1-29e ’ Barber, T.X., & Silver,‘M.J. Pitfalls in data analysis and interpre- tation: a reply to Rosenthal. Psmhclogical Bulletin Mono- m 1" lfiab’ 70, 48-62e ‘ ' Bass, BM. Measures of average influence and change in agreement of rankings by a group of judges. Socicmetgy, 1960, 25, 195-202. Blaufarb, H. The relation of experimenter status and achievement imagery to the conditioning of verbal behavior. Unpublished Ph.D. dissertation, Univ. of Illinois, 1960. Boneau, C.A. The effects of violations of assumptions underlying the t-test. Psychological Bulletin, 1969, 57, h9—69. Buros, 0.x. (ed. ). The Fifth Mental Meagxgemehts Yearbook. Highland Park, N.J.: The Gryphon Press, 1959. Buros, 0.x. (ed. ).,The Sixth Mental Heasurements Yearbook. Highland Park, N.J.: The Gryphon Press, W - Brogden, V.J. The experimenter as a factor in animal conditioning. Psychological Raports, 1962, 11, 259-242. Carlsmith, J.M., a Aronson, E. Some hedonic consequences of the con- firmation and disconfirmation of expectancies. Journal of Abnormal and Social Psycholog, 1965, 66, 151-156. Cieutat, V.J. Examiner differences with individual intelligence tests. Perceptual and Motor Skills, 1965. 20, 1517-1318. Claiborn, LL. hpectancy effects in the classroom: a failure to repli- cate. Journal of Educational Psycholofl, 1969. 60, 577-585. 64 65 Cohen, E. Examiner differences with individual intelligence tests. Perceptual and Motor Skills, 1965, 20, 1524. CoOper, J., Eisenberg, L., Robert, J., do Dohrenwend, B. The effect of experimenter expectancy and preparatory effort on belief in the probable occurrence of future events. Journal of Social chholog, 1967, 71, 221-226. Cordaro, L., dc Ison, J.R. Psychology of the scientist: I. Observer bias in classical conditioning of the planarian. Psychologi- cal ”Rom, 1963’ 15’ 787‘789. Cronbach, L.J., Essentials of Psychological Testing. Second edition. New York: Harper and Brothers, 1960. Exner, J.E. Jr., Variations in WISC performances as influenced by differences in pretest rapport. Journal of Genetic Psflhology, 1966.715 299-506 Friedman, N. The Social Nature of P jchological Research. New York: Basic Books, Inc., 1967. Friedman, N., Kurland, D., a Rosenthal, R. Experimenter behavior as an unintended determinant of experimental results. Journal of Projective Technimies and Personality Assessment, 1965, 29’ 1‘79190e Griffith, R.M. Rorschach water percepts: a study in conflicting results. American Psychologist, 1961, 16, 507-511. Harrington, 6.14., a. Ingraham, 18., Psychology of the scientist: XXV. Experimenter bias and tails of Pascal. Psychological Reports, 1967. 21. 515-516. Hays, 11.1.. Statistics for Psycholojgibsts. New York: Holt, Rinehart, & Winston, 1961‘s Hill, K.T., in Stevenson, HJ. The effects of social reinforcement vs. non-reinforcement and sex of E on the performance of adolescent firls. Journal offiPersonality, 1965, 55, 50-56. Ingraham, L.H., b Harrington, (Mi. Psychology of the scientist: XVI. EXperience of E as a variable in reducing experimenter bias. Psycholggipal Reports, 1966, 19, 455-461. Jones, E.L. The courtesy bias in South-East Asian surveys. Interna- tional Social Science Journal, 1965, 15. 70-76. Katkin, 3.8., Risk, R.T., dc Spielberger, C.D. The effects of experi- menter status and subject awareness on verbal conditioning. Journal of Echrimental Research in Personality, 1966, 1, 155-160. 66 Katz, 1., Robinson, J., Epps, 3., do Naly, P. Race of experimenter and instructions in the expression of hostility by Negro boys. Journal of Social Issues, 1964, 20, 54-60. Kintz, B.L., Delprato, D.J., Mettee, D.R., Persons, 0.3., b Schappe, 3.1!. The experimenter effect. ngcholcgical Bulletin, 1965, Kish, L. Studies of interviewer variance for attitudinal variables. Journal of the American Statistical Association, 1962, 57, 92-115e Larrabee, L. I», do Kleinsasser, L.D. The effect of experimenter bias on WISC performance. Unpublished paper, St. Louis, Mo: Psy- chological Associates, 1967. Levy, L.H. Reflections on replications and the experimenter bias effect. Journal of Consulting and Clinical Psychology, 1969 53’ 11-1he ' Littell, U.A. The Wechsler Scale for Children: review of a decade of research. Psychological Bulletin, 1960, 57, 152-156. Masling, J. The influence of situational and interpersonal variables in projective testing. Psychological Bulletin, 1960, 57, 65-85. McFall, R.M. Unintentional comunication: the effect of congruence and incongruence between subject and experimenter constructions. Unpublished Ph.D. dissertation, Ohio State Univ., 1966. McGuigan, 1".J. The experimenter: a neglected stimulus object. Psychological Bulletin, 1965, 60, 421-428. Miller, M.E., & Solkoff, N. Effects of mode of response and sex of E upon recognition thresholds of taboo words. Perceptual and Motor Skills, 1965. 20, 575-578. Mulry, R.C. The effects of the experimenter's perception of his own performance on subject performance in a pursuit rotor task. Unpublished master's thesis, Univ. of North Dakota, 1962. Norton, D.w. An empirical investigation of the effects of non- normality and heterogeneity upon the F-test of analysis of variance. Ph.D. thesis, State Univ. of Iowa, 1952. Ogawa, J ., & Cakes, NJ. Sex of experimenter and manifest anxiety as related to verbal conditioning. Journal of Personality, 1965, 55. 555-569- Prince, A.I. Relative prestige and the verbal conditioning of children. Paper read at the Seventieth Annual Convention of the American Psychological Association, St. Louis, Missouri, 1962. 67 Reese, M.M., Jo Whitman, R.N. Expressive movements, warmth, and verbal reinforcement, Journal of Abnormal and Social Psycholog, 1962, 64, 254-256. Rosenthal, R. Covert communication in the psychological experiment. Psychological Bulletin, 1967a, 67, 556-567. Rosenthal, R. Experimenter attributes as determinants of subjects' responses. Journal of Proaective Techniques and Personality Assessment, 1 5a, 27, 52 -551. Rosenthal, R. Experimenter Effects in Behavioral Research. New York: Appleton-Century-Crofts, 1966. Rosenthal, R. Experimenter expectancy and the reassuring nature of the null hypothesis decision procedures. Psychological Bulle- tin Mongfimlh’ 1%8’ 70’ 30-470 Rosenthal, R. Experimenter modeling effects as determinants of sub- jects' responses. Journal of Prpjective Techniques and Personality Assessment, 1965b, 27, 467371. Rosenthal, R. Experimenter outcome-orientation and the results of the psychological experiment. Psychological Bulletin, 1964, 61, Rosenthal, R. On not so replicated experiments and not so null results. Journal of ConsultingCand Clinical Psycholpfl, 1969, 55, 7-10. Rosenthal, R. Psychology of the scientist: XXIII. Experimenter expectancy, eXperimenter experience, and Pascal's wager. Psychological Reports, 1967b, 20, 619-622. Rosenthal, R. Psychology of the scientist: XXVI. Experimenter expec- tancy, one tail of Pascal, and the distribution of three tails. Psychological Reports, 1967c, 21, 517-520. Rosenthal, R., 8c Pods, K.L. The effect of experimenter bias on the performance of the albino rat. Behavioral Science, 1965a, 8, 185-189. Rosenthal, R., do Pods, K.L. Psychology of the scientist: V. Three experiments in experimenter bias. Psychological Reports, 1965b, Rosenthal, R., Pods, K.L., Friedman, C.J., & Vikan-Kline, L.L. Sub- jects' perception of their experimenter under conditions of experimenter bias. Perceptual and Motor Skills, 1960, 11, 325-551- Rosenthal, R., Pods, K.L., Vikan-Kline, L.L., do Persinger, GJ. Verbal conditioning: mediator of experimenter expectancy effects. Psychological Reports, 1964, 14, 71-74. 68 Rosenthal, R., Friedman, C.J., Johnson, C.J., Fode, K.L., Schill, T., White, R.C., &.Vikan-Kline, L.L. Variables affecting experi- menter bias in a group situation. Genetic Psychology Meno- ra hs, 1964, 70, 271-296. Rosenthal, R., Friedman, N., & Kurland, D. Instruction-reading behavior of the experimenter as an unintended determinant of experi- mental results. Journal of Experimental Research in Personality, 1966, 1, 221-226. Rosenthal, R., &,Halas, E.S. Experimenter effect in the study of invertebrate behavior. Psychological Reports, 1962, 11, 251-2%e Rosenthal, R., &.Jacobson, L. Pygmalion in the Classromm: Teacher Expectation and Pupils' Intellectual DevelOpment. New York: Holt, Rinehart, and Winston, 1968. Rosenthal, R., &,Jacobson, L. Teachers' expectancies: determinants of pupils' IQ. gains. Psychological Reports, 1966, 19, 115-118. Rosenthal, R., Kohn, P., Greenfield, P.M., & Carota, N. Data desira- bility, experimenter expectancy, and the results of psycholo- gical research. Journal of Personality and Social Psychology, 1966. 5. 20-27. Rosenthal, R., Kohn, P., Greenfield, P.M., aiCarota, N. Psychology of the scientist: XIV. Experimentsrs' hypothesis-confirmation and mood as determinants of experimental results. Perceptual and Motor Skills, 1965, 20, 1257-1252. Rosenth.1’ Re, ““117, R.C., "nimr, Go's, Vikan-Klim, Lelia, & Grothe, M. Changes in experimental hypotheses as determinants of experimental results. Journal of Pro'ective Techniques and Personality Assessment, 196ha,257'435-h69. Rosenthal, R., Mulry, R.C., Persinger, G.W., Vikan-Kline, L.L., 8r. Grothe, M. Emphasis on experimental procedure, sex of subjects and the biasing effects of experimental hypotheses. Journal of Projective Technigues and Personality Assessment, 1964b, 2 . 70 75. Rosenthal, R., &.Persinger, G.W. Let's pretend: subjects' perception of imaginary experimenters. Perceppual and Motor Skills, 1962, 14, 407-409. Rosenthal, R., Persinger, G.W., a Rode, K.L. Experimenter bias, anxiety, and social desirability. Perceptual and Motor Skills, 1962. 15. 75-74- Rosenthal, R., Persinger, G.W., Vikan-Kline, L.L., & Fode, K.L. The effect of early data returns on data subsequently obtained by outcome-biased experimenters. Sociometry, 1965a, 26, 487-498. 69 Rosenthal, R., Persinger, G.W., Vikan-Kline, L.L., & Fode, K.L. The effect of experimenter outcome-bias and subject set on aware- ness in verbal conditioning eXperiments. Journal of Verbal Learningyand Verbal Behavior, 1965b, 2, 275-285. Rosenthal, R., Persinger, G.W., Vikan-Kline, L.L., & Mulry, R.C. The role of the research assistant in the mediation of experi- menter bias. Journal of Personality, 1965, 51, 515-555. Sarason, I.G. Individual differences, situational variables, and personality research. Journal of Abnormal and Social ng- chology, 1962, 65, 576-580. Sarason, I.G., & Minard, J. Interrelationships among subject, experi- menter, and situational variables. Journal of Abnormal and Social Psycholggy, 1965, 67, 87-91. Scheffe, H. The Analysis of Variance. New York: Wiley, 1959. Simmons, W.L., & Christy, E.G. Verbal reinforcement of a TAT theme. Journal of Projective Techniques and Personality Assessment, 1962. 25. 557-541- Stevenson, H.W. Social reinforcement with children as a function of CA, sex of E, and sex of S. Journal of Abnormal and Social Ps cholo y, 1961, 65, 147-154. Stevenson, H.W., & Allen, 8. Adult performance as a function of sex of experimenter and sex of subject. Journal of Abnormal and Social Psychology, 1964, 68, 214-216. Stevenson, H.W., & Allen, 8. Variables associated with adults' effec- tiveness as reinforcing agents, Journal of Personality, 1967, 55’ 2146-264. Suchman, E.A. An analysis of "bias” in survey research. Public Opinion Quarterly, 1962, 26, 102—111. Thorndike, R.L. Review of Rosenthal, R. & Jacobson, L. :yggalion in the Classroom. American Educational Research Journal, 19 8, 5. 708-711. Turner, G.C., &,Coleman, J.C. Examiner influence on Thematic Appor- ception Test responses. Journal of Projective Technigues and Personality Assessment, 1962, 26, 478-486. Wartenberg-Ekren, U. The effect of experimenter knowledge of a sub- ject's scholastic standing on the performance of a reasoning task. Unpublished master's thesis, Marquette Univ., 1962. Wechsler, D. WISC Manual. New York: The Psychological Corporation, 1949. 7O Wilkie, C.H. A study of distortions in recording interviews. Social work, 1965’ 8’ 51-56. Williams, J.A. Interviewer—respondent interaction: a study of bias in the information interview. Sociometry, 1964, 27, 558-552. Winkel, G.H., a Sarason, I.G. Subject, experimenter, and situational variables in research on anxiety. Journal of Abnormal and Social Psycholggy, 1964, 68, 601-658: Young, R.K., & Veldman, D.J. Heterogeneity and skewness in analysis of variance. Perceptual and Motor Skills, 1965. 16, 588. APPENDICES APPENDIX A APPENDIX A TABLE 6 CM MEANS AND STANDARD DEVIATIONS FOR ALL SUBGROUPS Standard Effect Means Deviations Sex of Examiner Males 101.78 5.81 FemaIQ. 102s 00 5e61 Sex of Examinee Males 100.97 6.05 Females 102.81 5.21 EXpectancy ”Above Average' 102.88 5.29 "Below Average" 100.91 5.94 Examiner v 1 105.58 6.19 2 99.62 5.55 5 101.25 5.96 4 102.88 7.22 5 105.75 5.86 6 101.75 5.42 7 101.65 5.50 8 100.88 7.72 Sex of Examiner X Sex of Examinee Male Examiner-Male Examinee 100.44 6.46 Male Examiner-Female Examinee 105.12 74.92 Female Examiner-Male Examinee 101.50 5.75 Female Examiner-Female Examinee 102.50 5.65 71 TABLE 6 (cont'd.) Standard Effects Means Deviations Sex of Examiner X Expectancy Male Examiner-“Above Average“ Examinees 102.94 5.52 Male Examiner-”Below Average" Examinees 100.62 6.22 Female Examiner-"Above Average" Examinees 102.81 6.02 Female Examiner-“Below Average" Examinees 101.19 5.25 Sex of Examinee x Expectancy Male Examinee-"Above Average” 102.44 5.75 Male Examinee-"Below Average' 99.50 6.07 Female Examinee-'Above AverageII 105.51 4.88 Female Examinee-"Below Average“ 102.51 5.64 Sex of Examiner X Sex of Examinee x Expectancy Male Examiner-Male Examinee-'Above Average" 102.00 6.48 Male Examiner-Male Examinee-'Below Average' 98.88 6.47 Male Examiner-Female EXaminee-"Above Average" 105.88 4.09 Male Examiner-Female Examinee-“Below Average” 102.58 5.85 Female Examiner-Male Examinee-”Above Average“ 102.88 5.46 Female Examiner-Male Examinee-”Below Average“ 100.12 6.01 Female Examiner-Female Examinee-”Above Average“ 102.75 5.80 Female Examiner-Female Examinee-“Below Average” 102.25 5.90 72 APPENDIX B APPENDIX 8 Rating Sheet NAME Group 1.0. Score (CTMM-S Form) 1 _l ‘__ 1 Bottom } [5 2nd «} I 5rd }- II Top 3} School Aciev- ment Average | ] 1? Bottom f l 2nd 3} I 5rd '5 I Top If Predicted WISC Score 75 APPENDIX C APPENDIX 0 TABLE 7 WISC TOTAL SCORES Expectancy of Examiner by Examinee Sex Male Examinee Female Examinee 'Above |'Below ‘Above “Below Sex of Average" Average' Average” Average” Examiner Examiner Expectgpcy Expectancy Expectgpcy Expectanc 1 107 106’ 95 101 88 101 497 87 2 116’ 101 101 90 Male 107 99 88 110 ' ""5: " 111 88 8'5 101 89 2L 9: 95 4 100 89 109 89 85 94 117 94 T 98 106 107 91 104 118 108 106 6 163‘ 105 975‘ 101? Female 109 102 111 101 7 112 109 ‘85 107 115, «98 99 114 8 110 91 99 95 199 109 98 74 "I1i!1111'1171111111111S